date:20220620

[spark] branch master updated: [SPARK-39152][CORE] Deregistering disk persisted local blocks in case of IO related errors

2022-06-20 Thread wuyi

This is an automated email from the ASF dual-hosted git repository.

wuyi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 282c7ae7b5a [SPARK-39152][CORE] Deregistering disk persisted local 
blocks in case of IO related errors
282c7ae7b5a is described below

commit 282c7ae7b5adbd88466681bc986a7d914080f08a
Author: attilapiros 
AuthorDate: Tue Jun 21 12:06:56 2022 +0800

[SPARK-39152][CORE] Deregistering disk persisted local blocks in case of IO 
related errors

### What changes were proposed in this pull request?

Deregistering disk persisted local blocks from the block manager in case of 
IO related errors.

### Why are the changes needed?

In case of a disk corruption a disk persisted block will lead to job 
failure as the block registration is always leads to the same file. So even 
when the task is rescheduled on a different executor the job will fail.

Example:

First failure (the block is locally available):
```
22/04/25 07:15:28 ERROR executor.Executor: Exception in task 17024.0 in 
stage 12.0 (TID 51853)
java.io.StreamCorruptedException: invalid stream header: 
  at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:943)
  at java.io.ObjectInputStream.(ObjectInputStream.java:401)
  at 
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.(JavaSerializer.scala:63)
  at 
org.apache.spark.serializer.JavaDeserializationStream.(JavaSerializer.scala:63)
  at 
org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:122)
  at 
org.apache.spark.serializer.SerializerManager.dataDeserializeStream(SerializerManager.scala:209)
  at 
org.apache.spark.storage.BlockManager.getLocalValues(BlockManager.scala:617)
  at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:897)
  at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
  at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
```

Then the task might be rescheduled on a different executor but as the block 
is registered to the first block manager the error will be the same:
```
java.io.StreamCorruptedException: invalid stream header: 
  at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:943)
  at java.io.ObjectInputStream.(ObjectInputStream.java:401)
  at 
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.(JavaSerializer.scala:63)
  at 
org.apache.spark.serializer.JavaDeserializationStream.(JavaSerializer.scala:63)
  at 
org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:122)
  at 
org.apache.spark.serializer.SerializerManager.dataDeserializeStream(SerializerManager.scala:209)
  at 
org.apache.spark.storage.BlockManager$$anonfun$getRemoteValues$1.apply(BlockManager.scala:698)
  at 
org.apache.spark.storage.BlockManager$$anonfun$getRemoteValues$1.apply(BlockManager.scala:696)
  at scala.Option.map(Option.scala:146)
  at 
org.apache.spark.storage.BlockManager.getRemoteValues(BlockManager.scala:696)
  at org.apache.spark.storage.BlockManager.get(BlockManager.scala:831)
  at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:886)
  at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
 ```

My idea deregistering the block when the IO operation occurs and let the 
following task to recompute it.

This PR only targets only local blocks. In a follow up PR `getRemoteValues` 
can be extended with the block removing.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

1) An existing unit test was extended.
2) Manually.

 Manual testing

Start Spark:
```
$ ./bin/spark-shell --master "local-cluster[3,1,1200]" --conf 
spark.serializer=org.apache.spark.serializer.JavaSerializer
```

Create a persisted RDD (here via a DF):
```
scala> val df = sc.parallelize(1 to 20, 4).toDF
...
scala> df.persist(org.apache.spark.storage.StorageLevel.DISK_ONLY)
...
scala> df.show()
+-+
|value|
+-+
|1|
|2|
|3|
|4|
|5|
|6|
|7|
|8|
|9|
|   10|
|   11|
|   12|
|   13|
|   14|
|   15|
|   16|
|   17|
|   18|
|   19|
|   20|
+-+
```

Now as the blocks are persisted let's corrupt one of the file. For this we 
have to find the the directory where the blocks stored:
```
$ grep "DiskBlockManager: Created local directory" 
work/app-2022052820-/*/stdout
work/app-2022052820-/0/stdout:22/05/11 11:28:21

[spark] branch master updated: [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

2022-06-20 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ca5f7e6c35d [SPARK-39263][SQL] Make GetTable, TableExists and 
DatabaseExists be compatible with 3 layer namespace
ca5f7e6c35d is described below

commit ca5f7e6c35d49e9599c39fcd0828b3e557848d11
Author: Rui Wang 
AuthorDate: Tue Jun 21 11:18:36 2022 +0800

[SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be 
compatible with 3 layer namespace

### What changes were proposed in this pull request?

Make GetTable, TableExists and DatabaseExists be compatible with 3 layer 
namespace

### Why are the changes needed?

This is a part of effort to make catalog API be compatible with 3 layer 
namespace

### Does this PR introduce _any_ user-facing change?

Yes. The API change here is backward compatible and it extends the API to 
further support 3 layer namespace (e.g. catalog.database.table).
### How was this patch tested?

UT

Closes #36641 from amaliujia/catalogapi2.

Authored-by: Rui Wang 
Signed-off-by: Wenchen Fan 
---
 .../sql/catalyst/catalog/SessionCatalog.scala  |  2 +-
 .../apache/spark/sql/internal/CatalogImpl.scala| 55 ---
 .../apache/spark/sql/internal/CatalogSuite.scala   | 64 +-
 3 files changed, 112 insertions(+), 9 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
index 0152f49c798..54959b523c9 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
@@ -966,7 +966,7 @@ class SessionCatalog(
   }
 
   def isGlobalTempViewDB(dbName: String): Boolean = {
-globalTempViewManager.database.equals(dbName)
+globalTempViewManager.database.equalsIgnoreCase(dbName)
   }
 
   def lookupTempView(name: TableIdentifier): Option[View] = {
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala
index 4b6ea33f3e6..f89a87c3011 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala
@@ -23,12 +23,13 @@ import scala.util.control.NonFatal
 import org.apache.spark.sql._
 import org.apache.spark.sql.catalog.{Catalog, Column, Database, Function, 
Table}
 import org.apache.spark.sql.catalyst.{DefinedByConstructorParams, 
FunctionIdentifier, TableIdentifier}
-import org.apache.spark.sql.catalyst.analysis.{ResolvedTable, ResolvedView, 
UnresolvedDBObjectName, UnresolvedNamespace, UnresolvedTable, 
UnresolvedTableOrView}
+import org.apache.spark.sql.catalyst.analysis.{ResolvedNamespace, 
ResolvedTable, ResolvedView, UnresolvedDBObjectName, UnresolvedNamespace, 
UnresolvedTable, UnresolvedTableOrView}
 import org.apache.spark.sql.catalyst.catalog._
 import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
 import org.apache.spark.sql.catalyst.plans.logical.{CreateTable, 
LocalRelation, RecoverPartitions, ShowTables, SubqueryAlias, TableSpec, View}
 import org.apache.spark.sql.catalyst.util.CharVarcharUtils
-import org.apache.spark.sql.connector.catalog.{CatalogManager, TableCatalog}
+import org.apache.spark.sql.connector.catalog.{CatalogManager, Identifier, 
SupportsNamespaces, TableCatalog}
+import org.apache.spark.sql.connector.catalog.CatalogV2Implicits.CatalogHelper
 import org.apache.spark.sql.errors.QueryCompilationErrors
 import org.apache.spark.sql.execution.datasources.DataSource
 import org.apache.spark.sql.types.StructType
@@ -250,8 +251,26 @@ class CatalogImpl(sparkSession: SparkSession) extends 
Catalog {
* table/view. This throws an `AnalysisException` when no `Table` can be 
found.
*/
   override def getTable(tableName: String): Table = {
-val tableIdent = 
sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
-getTable(tableIdent.database.orNull, tableIdent.table)
+// calling `sqlParser.parseTableIdentifier` to parse tableName. If it 
contains only table name
+// and optionally contains a database name(thus a TableIdentifier), then 
we look up the table in
+// sessionCatalog. Otherwise we try `sqlParser.parseMultipartIdentifier` 
to have a sequence of
+// string as the qualified identifier and resolve the table through SQL 
analyzer.
+try {
+  val ident = 
sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
+  if (tableExists(ident.database.orNull, ident.table)) {
+makeTable(ident)
+  } else {
+

[spark] branch master updated: [SPARK-39521][INFRA][FOLLOW-UP] Update step name to "Run / Check changes" to detect the workflow

2022-06-20 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new aeedf145563 [SPARK-39521][INFRA][FOLLOW-UP] Update step name to "Run / 
Check changes" to detect the workflow
aeedf145563 is described below

commit aeedf1455630f1257e64fa4278d4de737574fc09
Author: Hyukjin Kwon 
AuthorDate: Tue Jun 21 11:44:06 2022 +0900

[SPARK-39521][INFRA][FOLLOW-UP] Update step name to "Run / Check changes" 
to detect the workflow

### What changes were proposed in this pull request?

This PR changes the step name that's renamed in previous PR.

### Why are the changes needed?
To recover the build.

### Does this PR introduce _any_ user-facing change?
No, dev-only.

### How was this patch tested?

Tested at 
https://github.com/HyukjinKwon/spark/pull/52/checks?check_run_id=6977082541.

Closes #36933 from HyukjinKwon/SPARK-39521-followup2.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 .github/workflows/notify_test_workflow.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/notify_test_workflow.yml 
b/.github/workflows/notify_test_workflow.yml
index 55511346a9a..f09d541609c 100644
--- a/.github/workflows/notify_test_workflow.yml
+++ b/.github/workflows/notify_test_workflow.yml
@@ -113,7 +113,7 @@ jobs:
 
   // Here we get check run ID to provide Check run view instead of 
Actions view, see also SPARK-37879.
   const check_runs = await github.request(check_run_endpoint, 
check_run_params)
-  const check_run_head = check_runs.data.check_runs.filter(r => 
r.name === "Configure jobs")[0]
+  const check_run_head = check_runs.data.check_runs.filter(r => 
r.name === "Run / Check changes")[0]
 
   if (check_run_head.head_sha != 
context.payload.pull_request.head.sha) {
 throw new Error('There was a new unsynced commit pushed. 
Please retrigger the workflow.');


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-39521][INFRA][FOLLOW-UP] Fix notify workload to detect the main build, and readme link

2022-06-20 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c842b4b6d28 [SPARK-39521][INFRA][FOLLOW-UP] Fix notify workload to 
detect the main build, and readme link
c842b4b6d28 is described below

commit c842b4b6d28226e4e05c20740ddf69b70c5f8135
Author: Hyukjin Kwon 
AuthorDate: Tue Jun 21 11:19:30 2022 +0900

[SPARK-39521][INFRA][FOLLOW-UP] Fix notify workload to detect the main 
build, and readme link

### What changes were proposed in this pull request?

This PR fixes the notify_test_workflow.yml to detect build_main.yml that is 
for PR builds.

In addition, this PR fixes the link of build status in README.md.

### Why are the changes needed?

To make the build fixed.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

N/A. Should better be merged fixed and tested since it's already broken.

Closes #36932 from HyukjinKwon/SPARK-39521-followup.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 .github/workflows/notify_test_workflow.yml | 2 +-
 README.md  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/notify_test_workflow.yml 
b/.github/workflows/notify_test_workflow.yml
index 4c84f5f25e6..55511346a9a 100644
--- a/.github/workflows/notify_test_workflow.yml
+++ b/.github/workflows/notify_test_workflow.yml
@@ -46,7 +46,7 @@ jobs:
 const params = {
   owner: context.payload.pull_request.head.repo.owner.login,
   repo: context.payload.pull_request.head.repo.name,
-  id: 'build_and_test.yml',
+  id: 'build_main.yml',
   branch: context.payload.pull_request.head.ref,
 }
 const check_run_params = {
diff --git a/README.md b/README.md
index dbc0f2ba87e..f7bc1994fc8 100644
--- a/README.md
+++ b/README.md
@@ -9,7 +9,7 @@ and Structured Streaming for stream processing.
 
 
 
-[![GitHub Action 
Build](https://github.com/apache/spark/actions/workflows/build_and_test.yml/badge.svg?branch=master=push)](https://github.com/apache/spark/actions/workflows/build_and_test.yml?query=branch%3Amaster+event%3Apush)
+[![GitHub Actions 
Build](https://github.com/apache/spark/actions/workflows/build_main.yml/badge.svg)](https://github.com/apache/spark/actions/workflows/build_main.yml)
 [![AppVeyor 
Build](https://img.shields.io/appveyor/ci/ApacheSoftwareFoundation/spark/master.svg?style=plastic=appveyor)](https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark)
 [![PySpark 
Coverage](https://codecov.io/gh/apache/spark/branch/master/graph/badge.svg)](https://codecov.io/gh/apache/spark)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-39534][PS] Series.argmax only needs single pass

2022-06-20 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 95fba869169 [SPARK-39534][PS] Series.argmax only needs single pass
95fba869169 is described below

commit 95fba8691696f6c4c00927cbcd8fde81765f0252
Author: Ruifeng Zheng 
AuthorDate: Tue Jun 21 08:57:27 2022 +0900

[SPARK-39534][PS] Series.argmax only needs single pass

### What changes were proposed in this pull request?
compute `Series.argmax ` with one pass

### Why are the changes needed?
existing implemation of `Series.argmax` needs two pass on the dataset, the 
first one is to compute the maximum value, and the second one is to get the 
index.
However, they can be computed on one pass.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
existing UT

Closes #36927 from zhengruifeng/ps_series_argmax_opt.

Authored-by: Ruifeng Zheng 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/pandas/series.py | 20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/python/pyspark/pandas/series.py b/python/pyspark/pandas/series.py
index 813d27709e4..352e7dd750b 100644
--- a/python/pyspark/pandas/series.py
+++ b/python/pyspark/pandas/series.py
@@ -6301,22 +6301,18 @@ class Series(Frame, IndexOpsMixin, Generic[T]):
 scol = scol_for(sdf, self._internal.data_spark_column_names[0])
 
 if skipna:
-sdf = sdf.orderBy(scol.desc_nulls_last(), 
NATURAL_ORDER_COLUMN_NAME)
+sdf = sdf.orderBy(scol.desc_nulls_last(), 
NATURAL_ORDER_COLUMN_NAME, seq_col_name)
 else:
-sdf = sdf.orderBy(scol.desc_nulls_first(), 
NATURAL_ORDER_COLUMN_NAME)
+sdf = sdf.orderBy(scol.desc_nulls_first(), 
NATURAL_ORDER_COLUMN_NAME, seq_col_name)
 
-max_value = sdf.select(
-F.first(scol),
-F.first(NATURAL_ORDER_COLUMN_NAME),
-).head()
+results = sdf.select(scol, seq_col_name).take(1)
 
-if max_value[1] is None:
+if len(results) == 0:
 raise ValueError("attempt to get argmax of an empty sequence")
-elif max_value[0] is None:
-return -1
-
-# If the maximum is achieved in multiple locations, the first row 
position is returned.
-return sdf.filter(scol == max_value[0]).head()[0]
+else:
+max_value = results[0]
+# If the maximum is achieved in multiple locations, the first row 
position is returned.
+return -1 if max_value[0] is None else max_value[1]
 
 def argmin(self) -> int:
 """


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-39521][INFRA] Separate scheduled jobs to each workflow

2022-06-20 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9e468cf010f [SPARK-39521][INFRA] Separate scheduled jobs to each 
workflow
9e468cf010f is described below

commit 9e468cf010f7381c1e85b02c0b3b043db7ffc07d
Author: Hyukjin Kwon 
AuthorDate: Tue Jun 21 08:46:18 2022 +0900

[SPARK-39521][INFRA] Separate scheduled jobs to each workflow

### What changes were proposed in this pull request?

This PR proposes to separate workflows for each scheduled jobs. After this 
PR, each scheduled build can be easily searched at 
https://github.com/apache/spark/actions. For example, as below:

![Screen Shot 2022-06-20 at 6 01 04 
PM](https://user-images.githubusercontent.com/6477701/174565779-ab54eb69-49f9-4746-b714-902741e1d554.png)

In addition, this PR switches ANSI build to a scheduled build too because 
it's too expensive to run it for each commit.

### Why are the changes needed?

Currently it is very inconvenient to navigate scheduled jobs now. We should 
use 
https://github.com/apache/spark/actions/workflows/build_and_test.yml?query=event%3Aschedule
 link and manually search one by one.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Tested in my fork (https://github.com/HyukjinKwon/spark/actions).

Closes #36922 from HyukjinKwon/SPARK-39521.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 .github/workflows/build_and_test.yml   | 230 +++--
 .../{build_and_test_ansi.yml => build_ansi.yml}|  21 +-
 ...{build_and_test_ansi.yml => build_branch32.yml} |  21 +-
 ...{build_and_test_ansi.yml => build_branch33.yml} |  21 +-
 ...{build_and_test_ansi.yml => build_coverage.yml} |  21 +-
 .../{build_and_test_ansi.yml => build_hadoop2.yml} |  17 +-
 .../{build_and_test_ansi.yml => build_java11.yml}  |  22 +-
 .../{build_and_test_ansi.yml => build_java17.yml}  |  22 +-
 .../{build_and_test_ansi.yml => build_main.yml}|  10 +-
 ...{build_and_test_ansi.yml => build_scala213.yml} |  21 +-
 .../workflows/cancel_duplicate_workflow_runs.yml   |   2 +-
 .github/workflows/notify_test_workflow.yml |   2 +-
 .github/workflows/test_report.yml  |   2 +-
 .github/workflows/update_build_status.yml  |   2 +-
 14 files changed, 187 insertions(+), 227 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 81381eb16d4..084cbb95b07 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -20,105 +20,32 @@
 name: Build and test
 
 on:
-  push:
-branches:
-- '**'
-  schedule:
-# Note that the scheduled jobs are only for master branch.
-# master, Hadoop 2
-- cron: '0 1 * * *'
-# master
-- cron: '0 4 * * *'
-# branch-3.2
-- cron: '0 7 * * *'
-# PySpark coverage for master branch
-- cron: '0 10 * * *'
-# Java 11
-- cron: '0 13 * * *'
-# Java 17
-- cron: '0 16 * * *'
-# branch-3.3
-- cron: '0 19 * * *'
   workflow_call:
 inputs:
-  ansi_enabled:
+  java:
 required: false
-type: boolean
-default: false
-
+type: string
+default: 8
+  branch:
+required: false
+type: string
+default: master
+  hadoop:
+required: false
+type: string
+default: hadoop3
+  type:
+required: false
+type: string
+default: regular
+  envs:
+required: false
+type: string
+default: "{}"
 jobs:
-  configure-jobs:
-name: Configure jobs
-runs-on: ubuntu-20.04
-# All other jobs in this workflow depend on this job,
-# so the entire workflow is skipped when these conditions evaluate to 
false:
-# Run all jobs for Apache Spark repository
-# Run only non-scheduled jobs for forked repositories
-if: github.repository == 'apache/spark' || github.event_name != 'schedule'
-outputs:
-  java: ${{ steps.set-outputs.outputs.java }}
-  branch: ${{ steps.set-outputs.outputs.branch }}
-  hadoop: ${{ steps.set-outputs.outputs.hadoop }}
-  type: ${{ steps.set-outputs.outputs.type }}
-  envs: ${{ steps.set-outputs.outputs.envs }}
-steps:
-- name: Configure branch and additional environment variables
-  id: set-outputs
-  run: |
-if [ "${{ github.event.schedule }}" = "0 1 * * *" ]; then
-  echo '::set-output name=java::8'
-  echo '::set-output name=branch::master'
-  echo '::set-output name=type::scheduled'
-  echo '::set-output name=envs::{}'
-  echo '::set-output name=hadoop::hadoop2'
-elif [ "${{ github.event.schedule }}" = "0 4 * *

[spark] branch dependabot/maven/mysql-mysql-connector-java-8.0.28 created (now 70827677908)

2022-06-20 Thread github-bot

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch 
dependabot/maven/mysql-mysql-connector-java-8.0.28
in repository https://gitbox.apache.org/repos/asf/spark.git


  at 70827677908 Bump mysql-connector-java from 8.0.27 to 8.0.28

No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-38846][SQL] Add explicit data mapping between Teradata Numeric Type and Spark DecimalType

2022-06-20 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e31d0726a7b [SPARK-38846][SQL] Add explicit data mapping between 
Teradata Numeric Type and Spark DecimalType
e31d0726a7b is described below

commit e31d0726a7baae3ff030ace25d9e2e1bfb1a7da6
Author: Eugene-Mark 
AuthorDate: Mon Jun 20 18:10:44 2022 -0500

[SPARK-38846][SQL] Add explicit data mapping between Teradata Numeric Type 
and Spark DecimalType

### What changes were proposed in this pull request?
 - Implemented getCatalystType method in TeradataDialect
 - Handle Types.NUMERIC explicitly

### Why are the changes needed?
Load table from Teradata, if the type of column in Teradata is `Number`, it 
will be converted to `DecimalType(38,0)` which will lose the fractional part of 
original data.

### Does this PR introduce _any_ user-facing change?
Yes, it will convert Number type to DecimalType(38,18) if the scale is 0, 
so that keep the fractional part in some way.

### How was this patch tested?
UT is added to JDBCSuite.scala.

Closes #36499 from Eugene-Mark/teradata-loading.

Lead-authored-by: Eugene-Mark 
Co-authored-by: Eugene 
Co-authored-by: Eugene 
Signed-off-by: Sean Owen 
---
 docs/sql-migration-guide.md|  4 
 .../org/apache/spark/sql/types/DecimalType.scala   |  3 ++-
 .../apache/spark/sql/jdbc/TeradataDialect.scala| 28 ++
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala  | 27 -
 4 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 43ad780db08..ab0a7af7bf1 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -22,6 +22,10 @@ license: |
 * Table of contents
 {:toc}
 
+## Upgrading from Spark SQL 3.3 to 3.4
+  
+  - Since Spark 3.4, Number or Number(\*) from Teradata will be treated as 
Decimal(38,18). In Spark 3.3 or earlier, Number or Number(\*) from Teradata 
will be treated as Decimal(38, 0), in which case the fractional part will be 
removed.
+
 ## Upgrading from Spark SQL 3.2 to 3.3
 
   - Since Spark 3.3, the `histogram_numeric` function in Spark SQL returns an 
output type of an array of structs (x, y), where the type of the 'x' field in 
the return value is propagated from the input values consumed in the aggregate 
function. In Spark 3.2 or earlier, 'x' always had double type. Optionally, use 
the configuration `spark.sql.legacy.histogramNumericPropagateInputType` since 
Spark 3.3 to revert back to the previous behavior. 
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala
index 08ddd12ef7d..ce325024c3f 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala
@@ -126,7 +126,8 @@ object DecimalType extends AbstractDataType {
 
   val MAX_PRECISION = 38
   val MAX_SCALE = 38
-  val SYSTEM_DEFAULT: DecimalType = DecimalType(MAX_PRECISION, 18)
+  val DEFAULT_SCALE = 18
+  val SYSTEM_DEFAULT: DecimalType = DecimalType(MAX_PRECISION, DEFAULT_SCALE)
   val USER_DEFAULT: DecimalType = DecimalType(10, 0)
   val MINIMUM_ADJUSTED_SCALE = 6
 
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala
index 79fb710cf03..2b2d1fb7e86 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/TeradataDialect.scala
@@ -17,6 +17,7 @@
 
 package org.apache.spark.sql.jdbc
 
+import java.sql.Types
 import java.util.Locale
 
 import org.apache.spark.sql.connector.expressions.aggregate.{AggregateFunc, 
GeneralAggregateFunc}
@@ -96,4 +97,31 @@ private case object TeradataDialect extends JdbcDialect {
   override def getLimitClause(limit: Integer): String = {
 ""
   }
+
+  override def getCatalystType(
+  sqlType: Int, typeName: String, size: Int, md: MetadataBuilder): 
Option[DataType] = {
+sqlType match {
+  case Types.NUMERIC =>
+if (md == null) {
+  Some(DecimalType.SYSTEM_DEFAULT)
+} else {
+  val scale = md.build().getLong("scale")
+  // In Teradata, define Number without parameter means precision and 
scale is flexible.
+  // However, in this case, the scale returned from JDBC is 0, which 
will lead to
+  // fractional part loss. And the precision returned from JDBC is 40, 
which conflicts to
+  // DecimalType.MAX_PRECISION.
+  // Handle this special case by adding explicit conversion to system 
default decimal type.
+

[spark] branch master updated (1f1d7964902 -> e5001219af1)

2022-06-20 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 1f1d7964902 [SPARK-39497][SQL] Improve the analysis exception of 
missing map key column
 add e5001219af1 [SPARK-39520][SQL] Override `--` method for 
`ExpressionSet` in Scala 2.13

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/catalyst/expressions/ExpressionSet.scala   | 6 ++
 1 file changed, 6 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-39497][SQL] Improve the analysis exception of missing map key column

2022-06-20 Thread gengliang

This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1f1d7964902 [SPARK-39497][SQL] Improve the analysis exception of 
missing map key column
1f1d7964902 is described below

commit 1f1d79649027a4c03e48dea2bcef280dca53767a
Author: Gengliang Wang 
AuthorDate: Mon Jun 20 10:35:22 2022 -0700

[SPARK-39497][SQL] Improve the analysis exception of missing map key column

### What changes were proposed in this pull request?

Sometimes users forgot to add single quotes on the map key string literal, 
for example `map_col[a]`. In such a case, the Analyzer will throw an exception:
```
[MISSING_COLUMN] Column 'struct.a' does not exist. Did you mean one of the 
following? ...
```
We can improve this message by saying that the user should append single 
quotes if the map key is a string literal.

```
[UNRESOLVED_MAP_KEY] Cannot resolve column 'a' as a map key. If the key is 
a string literal, please add single quotes around it. Otherwise, did you mean 
one of the following column(s)? ...
```

### Why are the changes needed?

Error message improvement

### Does this PR introduce _any_ user-facing change?

Yes but trivial, an improvement on the error message of unresolved map key 
column

### How was this patch tested?

New UT

Closes #36896 from gengliangwang/unreslovedMapKey.

Authored-by: Gengliang Wang 
Signed-off-by: Gengliang Wang 
---
 core/src/main/resources/error/error-classes.json   |  8 -
 .../spark/sql/catalyst/analysis/Analyzer.scala |  4 +--
 .../sql/catalyst/analysis/CheckAnalysis.scala  | 39 +-
 .../expressions/complexTypeExtractors.scala|  2 +-
 .../spark/sql/errors/QueryCompilationErrors.scala  |  9 +++--
 .../sql/errors/QueryCompilationErrorsSuite.scala   | 12 +++
 6 files changed, 58 insertions(+), 16 deletions(-)

diff --git a/core/src/main/resources/error/error-classes.json 
b/core/src/main/resources/error/error-classes.json
index d4c0910c5ad..f9257b6c21b 100644
--- a/core/src/main/resources/error/error-classes.json
+++ b/core/src/main/resources/error/error-classes.json
@@ -352,6 +352,12 @@
 ],
 "sqlState" : "42000"
   },
+  "UNRESOLVED_MAP_KEY" : {
+"message" : [
+  "Cannot resolve column  as a map key. If the key is a string 
literal, please add single quotes around it. Otherwise, did you mean one of the 
following column(s)? []"
+],
+"sqlState" : "42000"
+  },
   "UNSUPPORTED_DATATYPE" : {
 "message" : [
   "Unsupported data type "
@@ -556,4 +562,4 @@
 ],
 "sqlState" : "4"
   }
-}
\ No newline at end of file
+}
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 931a0fcf77f..4d2dd175260 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -3420,8 +3420,8 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 
   i.userSpecifiedCols.map { col =>
 i.table.resolve(Seq(col), resolver).getOrElse(
-  throw QueryCompilationErrors.unresolvedColumnError(
-col, i.table.output.map(_.name), i.origin))
+  throw QueryCompilationErrors.unresolvedAttributeError(
+"UNRESOLVED_COLUMN", col, i.table.output.map(_.name), i.origin))
   }
 }
 
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
index f9f8b590a31..759683b8c00 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
@@ -91,6 +91,26 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog {
 }
   }
 
+  private def isMapWithStringKey(e: Expression): Boolean = if (e.resolved) {
+e.dataType match {
+  case m: MapType => m.keyType.isInstanceOf[StringType]
+  case _ => false
+}
+  } else {
+false
+  }
+
+  private def failUnresolvedAttribute(
+  operator: LogicalPlan,
+  a: Attribute,
+  errorClass: String): Nothing = {
+val missingCol = a.sql
+val candidates = operator.inputSet.toSeq.map(_.qualifiedName)
+val orderedCandidates = StringUtils.orderStringsBySimilarity(missingCol, 
candidates)
+throw QueryCompilationErrors.unresolvedAttributeError(
+  errorClass, missingCol, orderedCandidates, a.origin)
+  }
+
   def checkAnalysis(plan: LogicalPlan): Unit = {
 // We transform up and

[spark] branch master updated (550f5fe42dc -> 0a68435e1fb)

2022-06-20 Thread yumwang

This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 550f5fe42dc [SPARK-39530][SS][TESTS] Fix `KafkaTestUtils` to support 
IPv6
 add 0a68435e1fb [SPARK-39445][SQL] Remove the window if windowExpressions 
is empty in column pruning

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/optimizer/Optimizer.scala   |  5 +++--
 .../sql/catalyst/optimizer/ColumnPruningSuite.scala| 18 ++
 2 files changed, 21 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-39530][SS][TESTS] Fix `KafkaTestUtils` to support IPv6

2022-06-20 Thread yumwang

This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 550f5fe42dc [SPARK-39530][SS][TESTS] Fix `KafkaTestUtils` to support 
IPv6
550f5fe42dc is described below

commit 550f5fe42dcb078e1cf9460ac7a9a7689918e92b
Author: Dongjoon Hyun 
AuthorDate: Mon Jun 20 22:07:12 2022 +0800

[SPARK-39530][SS][TESTS] Fix `KafkaTestUtils` to support IPv6

### What changes were proposed in this pull request?

This PR aims to fix `KafkaTestUtils` to support IPv6.

### Why are the changes needed?

Currently, the test suite is using a hard-coded `127.0.0.1` like the 
following.
```
props.put("listeners", "SASL_PLAINTEXT://127.0.0.1:0")
props.put("advertised.listeners", "SASL_PLAINTEXT://127.0.0.1:0")
```

### Does this PR introduce _any_ user-facing change?

No. This is a test-only change.

### How was this patch tested?

Pass the CIs.

Closes #36923 from dongjoon-hyun/SPARK-39530.

Authored-by: Dongjoon Hyun 
Signed-off-by: Yuming Wang 
---
 .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 27 +++---
 .../spark/streaming/kafka010/KafkaTestUtils.scala  | 12 ++
 2 files changed, 21 insertions(+), 18 deletions(-)

diff --git 
a/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala
 
b/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala
index c5d2a99d156..58b8778c963 100644
--- 
a/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala
+++ 
b/connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala
@@ -18,7 +18,7 @@
 package org.apache.spark.sql.kafka010
 
 import java.io.{File, IOException}
-import java.net.{InetAddress, InetSocketAddress}
+import java.net.InetSocketAddress
 import java.nio.charset.StandardCharsets
 import java.util.{Collections, Properties, UUID}
 import java.util.concurrent.TimeUnit
@@ -68,13 +68,13 @@ class KafkaTestUtils(
 
   private val JAVA_AUTH_CONFIG = "java.security.auth.login.config"
 
-  private val localCanonicalHostName = 
InetAddress.getLoopbackAddress().getCanonicalHostName()
-  logInfo(s"Local host name is $localCanonicalHostName")
+  private val localHostNameForURI = Utils.localHostNameForURI()
+  logInfo(s"Local host name is $localHostNameForURI")
 
   private var kdc: MiniKdc = _
 
   // Zookeeper related configurations
-  private val zkHost = localCanonicalHostName
+  private val zkHost = localHostNameForURI
   private var zkPort: Int = 0
   private val zkConnectionTimeout = 6
   private val zkSessionTimeout = 1
@@ -83,12 +83,12 @@ class KafkaTestUtils(
   private var zkClient: KafkaZkClient = _
 
   // Kafka broker related configurations
-  private val brokerHost = localCanonicalHostName
+  private val brokerHost = localHostNameForURI
   private var brokerPort = 0
   private var brokerConf: KafkaConfig = _
 
   private val brokerServiceName = "kafka"
-  private val clientUser = s"client/$localCanonicalHostName"
+  private val clientUser = s"client/$localHostNameForURI"
   private var clientKeytabFile: File = _
 
   // Kafka broker server
@@ -202,17 +202,17 @@ class KafkaTestUtils(
 assert(kdcReady, "KDC should be set up beforehand")
 val baseDir = Utils.createTempDir()
 
-val zkServerUser = s"zookeeper/$localCanonicalHostName"
+val zkServerUser = s"zookeeper/$localHostNameForURI"
 val zkServerKeytabFile = new File(baseDir, "zookeeper.keytab")
 kdc.createPrincipal(zkServerKeytabFile, zkServerUser)
 logDebug(s"Created keytab file: ${zkServerKeytabFile.getAbsolutePath()}")
 
-val zkClientUser = s"zkclient/$localCanonicalHostName"
+val zkClientUser = s"zkclient/$localHostNameForURI"
 val zkClientKeytabFile = new File(baseDir, "zkclient.keytab")
 kdc.createPrincipal(zkClientKeytabFile, zkClientUser)
 logDebug(s"Created keytab file: ${zkClientKeytabFile.getAbsolutePath()}")
 
-val kafkaServerUser = s"kafka/$localCanonicalHostName"
+val kafkaServerUser = s"kafka/$localHostNameForURI"
 val kafkaServerKeytabFile = new File(baseDir, "kafka.keytab")
 kdc.createPrincipal(kafkaServerKeytabFile, kafkaServerUser)
 logDebug(s"Created keytab file: 
${kafkaServerKeytabFile.getAbsolutePath()}")
@@ -489,7 +489,7 @@ class KafkaTestUtils(
   protected def brokerConfiguration: Properties = {
 val props = new Properties()
 props.put("broker.id", "0")
-props.put("listeners", s"PLAINTEXT://127.0.0.1:$brokerPort")
+props.put("listeners", s"PLAINTEXT://$localHostNameForURI:$brokerPort")
 props.put("log.dir", Utils.createTempDir().getAbsolutePath)
 props.put("zookeeper.connect", zkAddress)
 props.put("zookeeper.connection.timeout.ms", "6")
@@ -505,8

[spark] branch master updated: [SPARK-39464][CORE][TESTS][FOLLOWUP] Use Utils.localHostNameForURI instead of Utils.localCanonicalHostName in tests

2022-06-20 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 540e695e70c [SPARK-39464][CORE][TESTS][FOLLOWUP] Use 
Utils.localHostNameForURI instead of Utils.localCanonicalHostName in tests
540e695e70c is described below

commit 540e695e70c8d53d70f1b74234877ac5733fae4b
Author: yangjie01 
AuthorDate: Mon Jun 20 01:41:04 2022 -0700

[SPARK-39464][CORE][TESTS][FOLLOWUP] Use Utils.localHostNameForURI instead 
of Utils.localCanonicalHostName in tests

### What changes were proposed in this pull request?
This PR aims to use `Utils.localHostNameForURI` instead of 
`Utils.localCanonicalHostName` in the following suites which changed in 
https://github.com/apache/spark/pull/36866

- `MasterSuite`
- `MasterWebUISuite`
- `RocksDBBackendHistoryServerSuite`

### Why are the changes needed?
These test cases fails when we run with `SPARK_LOCAL_IP=::1` and 
`-Djava.net.preferIPv6Addresses=true`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- Pass GA
- Manual test:

1.  `export SPARK_LOCAL_IP=::1`
```
echo $SPARK_LOCAL_IP
::1
```

2. add `-Djava.net.preferIPv6Addresses=true` to MAVEN_OPTS, for example:
```
diff --git a/pom.xml b/pom.xml
index 1ce3b43faf..3356622985 100644
--- a/pom.xml
+++ b/pom.xml
 -2943,7 +2943,7
   **/*Suite.java
 
 
${project.build.directory}/surefire-reports
--ea -Xmx4g -Xss4m -XX:MaxMetaspaceSize=2g 
-XX:ReservedCodeCacheSize=${CodeCacheSize} ${extraJavaTestArgs} 
-Dio.netty.tryReflectionSetAccessible=true
+-ea -Xmx4g -Xss4m -XX:MaxMetaspaceSize=2g 
-XX:ReservedCodeCacheSize=${CodeCacheSize} ${extraJavaTestArgs} 
-Dio.netty.tryReflectionSetAccessible=true 
-Djava.net.preferIPv6Addresses=true

[spark] branch master updated: [SPARK-39491][YARN] Fix yarn module compilation error with `-Phadoop-2` profile

2022-06-20 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1128cc08132 [SPARK-39491][YARN] Fix yarn module compilation error with 
`-Phadoop-2` profile
1128cc08132 is described below

commit 1128cc08132383ad35e383644f3b378d930b6c4c
Author: yangjie01 
AuthorDate: Mon Jun 20 17:30:34 2022 +0900

[SPARK-39491][YARN] Fix yarn module compilation error with `-Phadoop-2` 
profile

### What changes were proposed in this pull request?
Build `yarn` module with `-Phadoop-2` profile failed now as follows:

```
[ERROR] [Error] 
/basedir/spark/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala:454:
 value DECOMMISSIONING is not a member of object 
org.apache.hadoop.yarn.api.records.NodeState
```

The above compilation error due to `hadoop-2.7` not support  
`NodeState.DECOMMISSIONING`, so this pr change to use string comparison instead 
for compilation, and the test suite `Test YARN container decommissioning` in 
`YarnAllocatorSuite` should only run when `VersionUtils.isHadoop3` is true

### Why are the changes needed?
Fix yarn module compilation error with `-Phadoop-2` profile

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GA
- Manual test

run `mvn clean install -DskipTests -pl resource-managers/yarn -am -Pyarn 
-Phadoop-2`

**Before**

```
[ERROR] [Error] 
/basedir/spark-source/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala:454:
 value DECOMMISSIONING is not a member of object 
org.apache.hadoop.yarn.api.records.NodeState
[ERROR] one error found
[INFO] 

[INFO] Reactor Summary for Spark Project Parent POM 3.4.0-SNAPSHOT:
[INFO]
[INFO] Spark Project Parent POM ... SUCCESS [  
3.252 s]
[INFO] Spark Project Tags . SUCCESS [  
5.735 s]
[INFO] Spark Project Local DB . SUCCESS [  
5.492 s]
[INFO] Spark Project Networking ... SUCCESS [  
8.251 s]
[INFO] Spark Project Shuffle Streaming Service  SUCCESS [  
6.334 s]
[INFO] Spark Project Unsafe ... SUCCESS [ 
15.326 s]
[INFO] Spark Project Launcher . SUCCESS [  
4.905 s]
[INFO] Spark Project Core . SUCCESS [02:07 
min]
[INFO] Spark Project YARN Shuffle Service . SUCCESS [ 
17.382 s]
[INFO] Spark Project YARN . FAILURE [  
7.718 s]
[INFO] 

[INFO] BUILD FAILURE
[INFO] 

[INFO] Total time:  03:22 min
[INFO] Finished at: 2022-06-20T11:57:54+08:00
[INFO] 


```

**After**

```
[INFO] Reactor Summary for Spark Project Parent POM 3.4.0-SNAPSHOT:
[INFO]
[INFO] Spark Project Parent POM ... SUCCESS [  
5.451 s]
[INFO] Spark Project Tags . SUCCESS [  
5.739 s]
[INFO] Spark Project Local DB . SUCCESS [  
5.908 s]
[INFO] Spark Project Networking ... SUCCESS [  
8.310 s]
[INFO] Spark Project Shuffle Streaming Service  SUCCESS [  
5.857 s]
[INFO] Spark Project Unsafe ... SUCCESS [  
8.439 s]
[INFO] Spark Project Launcher . SUCCESS [  
4.795 s]
[INFO] Spark Project Core . SUCCESS [02:36 
min]
[INFO] Spark Project YARN Shuffle Service . SUCCESS [ 
15.044 s]
[INFO] Spark Project YARN . SUCCESS [ 
32.517 s]
[INFO] 

[INFO] BUILD SUCCESS
[INFO] 

[INFO] Total time:  04:09 min
[INFO] Finished at: 2022-06-20T13:10:04+08:00
[INFO] 

```

run `mvn clean install  -pl resource-managers/yarn -Pyarn -Phadoop-2 
-Dtest=none -DwildcardSuites=org.apache.spark.deploy.yarn.YarnAllocatorSuite`

```
- Test YARN container decommissioning !!! CANCELED !!!
  org.apache.spark.util.VersionUtils.isHadoop3 was false 
(YarnAllocatorSuite.scala:749)
Run completed in 2 seconds, 140 milliseconds.

[spark] branch master updated: [SPARK-39516][INFRA] Set a scheduled build for branch-3.3

2022-06-20 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 38e175074e7 [SPARK-39516][INFRA] Set a scheduled build for branch-3.3
38e175074e7 is described below

commit 38e175074e77f07b6de4ce372eb64e8e22196061
Author: Hyukjin Kwon 
AuthorDate: Sun Jun 19 23:25:25 2022 -0700

[SPARK-39516][INFRA] Set a scheduled build for branch-3.3

### What changes were proposed in this pull request?

This PR creates a scheduled job for branch-3.3.

### Why are the changes needed?

To make sure branch-3.3 build fine.

### Does this PR introduce _any_ user-facing change?
No, dev-only.

### How was this patch tested?

This is a copy of branch-3.2. Should work. Also, scheduled jobs are already 
broken now. I will fix them in parallel to recover.

Closes #36914 from HyukjinKwon/SPARK-39516.

Lead-authored-by: Hyukjin Kwon 
Co-authored-by: Hyukjin Kwon 
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_and_test.yml | 8 
 1 file changed, 8 insertions(+)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 844fd130d50..81381eb16d4 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -37,6 +37,8 @@ on:
 - cron: '0 13 * * *'
 # Java 17
 - cron: '0 16 * * *'
+# branch-3.3
+- cron: '0 19 * * *'
   workflow_call:
 inputs:
   ansi_enabled:
@@ -99,6 +101,12 @@ jobs:
   echo '::set-output name=type::scheduled'
   echo '::set-output name=envs::{"SKIP_MIMA": "true", "SKIP_UNIDOC": 
"true"}'
   echo '::set-output name=hadoop::hadoop3'
+elif [ "${{ github.event.schedule }}" = "0 19 * * *" ]; then
+  echo '::set-output name=java::8'
+  echo '::set-output name=branch::branch-3.3'
+  echo '::set-output name=type::scheduled'
+  echo '::set-output name=envs::{"SCALA_PROFILE": "scala2.13"}'
+  echo '::set-output name=hadoop::hadoop3'
 else
   echo '::set-output name=java::8'
   echo '::set-output name=branch::master'  # NOTE: UPDATE THIS WHEN 
CUTTING BRANCH


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-39152][CORE] Deregistering disk persisted local blocks in case of IO related errors

[spark] branch master updated: [SPARK-39263][SQL] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

[spark] branch master updated: [SPARK-39521][INFRA][FOLLOW-UP] Update step name to "Run / Check changes" to detect the workflow

[spark] branch master updated: [SPARK-39521][INFRA][FOLLOW-UP] Fix notify workload to detect the main build, and readme link

[spark] branch master updated: [SPARK-39534][PS] Series.argmax only needs single pass

[spark] branch master updated: [SPARK-39521][INFRA] Separate scheduled jobs to each workflow

[spark] branch dependabot/maven/mysql-mysql-connector-java-8.0.28 created (now 70827677908)

[spark] branch master updated: [SPARK-38846][SQL] Add explicit data mapping between Teradata Numeric Type and Spark DecimalType

[spark] branch master updated (1f1d7964902 -> e5001219af1)

[spark] branch master updated: [SPARK-39497][SQL] Improve the analysis exception of missing map key column

[spark] branch master updated (550f5fe42dc -> 0a68435e1fb)

[spark] branch master updated: [SPARK-39530][SS][TESTS] Fix `KafkaTestUtils` to support IPv6

[spark] branch master updated: [SPARK-39464][CORE][TESTS][FOLLOWUP] Use Utils.localHostNameForURI instead of Utils.localCanonicalHostName in tests

[spark] branch master updated: [SPARK-39491][YARN] Fix yarn module compilation error with `-Phadoop-2` profile

[spark] branch master updated: [SPARK-39516][INFRA] Set a scheduled build for branch-3.3

15 matches

Site Navigation

Mail list logo

Footer information