[spark] branch master updated (b2e38e16bfc -> fd0498f81df)

2022-09-01 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from b2e38e16bfc [SPARK-40302][K8S][TESTS] Add `YuniKornSuite`
 add fd0498f81df [SPARK-40304][K8S][TESTS] Add `decomTestTag` to K8s 
Integration Test

No new revisions were added by this update.

Summary of changes:
 .../deploy/k8s/integrationtest/DecommissionSuite.scala  | 13 +++--
 .../spark/deploy/k8s/integrationtest/KubernetesSuite.scala  |  1 +
 2 files changed, 8 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.3 updated: [SPARK-40304][K8S][TESTS] Add `decomTestTag` to K8s Integration Test

2022-09-01 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 399c397e703 [SPARK-40304][K8S][TESTS] Add `decomTestTag` to K8s 
Integration Test
399c397e703 is described below

commit 399c397e7035665c928b1d439a860f9e7b1ce3b3
Author: Dongjoon Hyun 
AuthorDate: Thu Sep 1 09:34:55 2022 -0700

[SPARK-40304][K8S][TESTS] Add `decomTestTag` to K8s Integration Test

### What changes were proposed in this pull request?

This PR aims to add a new test tag, `decomTestTag`, to K8s Integration Test.

### Why are the changes needed?

Decommission-related tests took over 6 minutes (`363s`). It would be 
helpful we can run them selectively.
```
[info] - Test basic decommissioning (44 seconds, 51 milliseconds)
[info] - Test basic decommissioning with shuffle cleanup (44 seconds, 450 
milliseconds)
[info] - Test decommissioning with dynamic allocation & shuffle cleanups (2 
minutes, 43 seconds)
[info] - Test decommissioning timeouts (44 seconds, 389 milliseconds)
[info] - SPARK-37576: Rolling decommissioning (1 minute, 8 seconds)
```

### Does this PR introduce _any_ user-facing change?

No, this is a test-only change.

### How was this patch tested?

Pass the CIs and test manually.
```
$ build/sbt -Psparkr -Pkubernetes -Pkubernetes-integration-tests \
-Dspark.kubernetes.test.deployMode=docker-desktop 
"kubernetes-integration-tests/test" \
-Dtest.exclude.tags=minikube,local,decom
...
[info] KubernetesSuite:
[info] - Run SparkPi with no resources (12 seconds, 441 milliseconds)
[info] - Run SparkPi with no resources & statefulset allocation (11 
seconds, 949 milliseconds)
[info] - Run SparkPi with a very long application name. (11 seconds, 999 
milliseconds)
[info] - Use SparkLauncher.NO_RESOURCE (11 seconds, 846 milliseconds)
[info] - Run SparkPi with a master URL without a scheme. (11 seconds, 176 
milliseconds)
[info] - Run SparkPi with an argument. (11 seconds, 868 milliseconds)
[info] - Run SparkPi with custom labels, annotations, and environment 
variables. (11 seconds, 858 milliseconds)
[info] - All pods have the same service account by default (11 seconds, 5 
milliseconds)
[info] - Run extraJVMOptions check on driver (5 seconds, 757 milliseconds)
[info] - Verify logging configuration is picked from the provided 
SPARK_CONF_DIR/log4j2.properties (12 seconds, 467 milliseconds)
[info] - Run SparkPi with env and mount secrets. (21 seconds, 119 
milliseconds)
[info] - Run PySpark on simple pi.py example (13 seconds, 129 milliseconds)
[info] - Run PySpark to test a pyfiles example (14 seconds, 937 
milliseconds)
[info] - Run PySpark with memory customization (12 seconds, 195 
milliseconds)
[info] - Run in client mode. (11 seconds, 343 milliseconds)
[info] - Start pod creation from template (11 seconds, 975 milliseconds)
[info] - SPARK-38398: Schedule pod creation from template (11 seconds, 901 
milliseconds)
[info] - Run SparkR on simple dataframe.R example (14 seconds, 305 
milliseconds)
...
```

Closes #37755 from dongjoon-hyun/SPARK-40304.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit fd0498f81df72c196f19a5b26053660f6f3f4d70)
Signed-off-by: Dongjoon Hyun 
---
 .../deploy/k8s/integrationtest/DecommissionSuite.scala  | 13 +++--
 .../spark/deploy/k8s/integrationtest/KubernetesSuite.scala  |  1 +
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git 
a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DecommissionSuite.scala
 
b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DecommissionSuite.scala
index 5d1a57fb46e..81f4660afe9 100644
--- 
a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DecommissionSuite.scala
+++ 
b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DecommissionSuite.scala
@@ -34,7 +34,7 @@ import org.apache.spark.internal.config.PLUGINS
 private[spark] trait DecommissionSuite { k8sSuite: KubernetesSuite =>
 
   import DecommissionSuite._
-  import KubernetesSuite.k8sTestTag
+  import KubernetesSuite.{decomTestTag, k8sTestTag}
 
   def runDecommissionTest(f: () => Unit): Unit = {
 val logConfFilePath = s"${sparkHomeDir.toFile}/conf/log4j2.properties"
@@ -61,7 +61,7 @@ private[spark] trait DecommissionSuite { k8sSuite: 
KubernetesSuite =>
 }
   }
 
-  test("Test basic decommissioning", k8sTestTag) {
+  test("Test basic decommissioning", k8sTestTag, decomTestTag) {
 runDecommissionTest(() => {
 

[spark] branch branch-3.3 updated: [SPARK-40302][K8S][TESTS] Add `YuniKornSuite`

2022-09-01 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 7c19df6fb2f [SPARK-40302][K8S][TESTS] Add `YuniKornSuite`
7c19df6fb2f is described below

commit 7c19df6fb2f684a80ea3366fd365a6bbc13421b3
Author: Dongjoon Hyun 
AuthorDate: Thu Sep 1 09:26:45 2022 -0700

[SPARK-40302][K8S][TESTS] Add `YuniKornSuite`

This PR aims the followings.
1. Add `YuniKornSuite` integration test suite which extends 
`KubernetesSuite` on Apache YuniKorn scheduler.
2. Support `--default-exclude-tags` command to override 
`test.default.exclude.tags`.

To improve test coverage.

No. This is a test suite addition.

Since this requires `Apache YuniKorn` installation, the test suite is 
disabled by default.
So, CI K8s integration test should pass without running this suite.

In order to run the tests, we need to override `test.default.exclude.tags` 
like the following.

**SBT**
```
$ build/sbt -Psparkr -Pkubernetes -Pkubernetes-integration-tests \
-Dspark.kubernetes.test.deployMode=docker-desktop 
"kubernetes-integration-tests/test" \
-Dtest.exclude.tags=minikube,local \
-Dtest.default.exclude.tags=
```

**MAVEN**
```
$ dev/dev-run-integration-tests.sh --deploy-mode docker-desktop \
--exclude-tag minikube,local \
--default-exclude-tags ''
```

Closes #37753 from dongjoon-hyun/SPARK-40302.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit b2e38e16bfc547a62957e0a67085985b3c65d525)
Signed-off-by: Dongjoon Hyun 
---
 project/SparkBuild.scala   |  3 ++-
 .../dev/dev-run-integration-tests.sh   | 10 
 .../kubernetes/integration-tests/pom.xml   |  5 ++--
 .../deploy/k8s/integrationtest/YuniKornTag.java| 27 
 .../deploy/k8s/integrationtest/YuniKornSuite.scala | 29 ++
 5 files changed, 71 insertions(+), 3 deletions(-)

diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 934fa4a1fdd..f830e64edfc 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -1135,7 +1135,8 @@ object CopyDependencies {
 
 object TestSettings {
   import BuildCommons._
-  private val defaultExcludedTags = Seq("org.apache.spark.tags.ChromeUITest")
+  private val defaultExcludedTags = Seq("org.apache.spark.tags.ChromeUITest",
+"org.apache.spark.deploy.k8s.integrationtest.YuniKornTag")
 
   lazy val settings = Seq (
 // Fork new JVMs for tests and set Java options for those
diff --git 
a/resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh
 
b/resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh
index 5f94203c0e2..f5f93adeddf 100755
--- 
a/resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh
+++ 
b/resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh
@@ -37,6 +37,7 @@ SERVICE_ACCOUNT=
 CONTEXT=
 INCLUDE_TAGS="k8s"
 EXCLUDE_TAGS=
+DEFAULT_EXCLUDE_TAGS="N/A"
 JAVA_VERSION="8"
 BUILD_DEPENDENCIES_MVN_FLAG="-am"
 HADOOP_PROFILE="hadoop-3"
@@ -101,6 +102,10 @@ while (( "$#" )); do
   EXCLUDE_TAGS="$2"
   shift
   ;;
+--default-exclude-tags)
+  DEFAULT_EXCLUDE_TAGS="$2"
+  shift
+  ;;
 --base-image-name)
   BASE_IMAGE_NAME="$2"
   shift
@@ -180,6 +185,11 @@ then
   properties=( ${properties[@]} -Dtest.exclude.tags=$EXCLUDE_TAGS )
 fi
 
+if [ "$DEFAULT_EXCLUDE_TAGS" != "N/A" ];
+then
+  properties=( ${properties[@]} 
-Dtest.default.exclude.tags=$DEFAULT_EXCLUDE_TAGS )
+fi
+
 BASE_IMAGE_NAME=${BASE_IMAGE_NAME:-spark}
 JVM_IMAGE_NAME=${JVM_IMAGE_NAME:-${BASE_IMAGE_NAME}}
 PYTHON_IMAGE_NAME=${PYTHON_IMAGE_NAME:-${BASE_IMAGE_NAME}-py}
diff --git a/resource-managers/kubernetes/integration-tests/pom.xml 
b/resource-managers/kubernetes/integration-tests/pom.xml
index 40e578f9a7e..516c92b1df6 100644
--- a/resource-managers/kubernetes/integration-tests/pom.xml
+++ b/resource-managers/kubernetes/integration-tests/pom.xml
@@ -46,6 +46,7 @@
 
Dockerfile.java17
 
 
+
org.apache.spark.deploy.k8s.integrationtest.YuniKornTag
 
 **/*Volcano*.scala
   
@@ -137,7 +138,7 @@
 ${spark.kubernetes.test.dockerFile}
 
 --test-exclude-tags
-"${test.exclude.tags}"
+
"${test.exclude.tags},${test.default.exclude.tags}"
   
 
   
@@ -179,7 +180,7 @@
 
${spark.kubernetes.test.pythonImage}
 
${spark.kubernetes.test.rImage}
   
-  ${test.exclude.tags}
+  
${test.exclude.tags},${test.default.exclude.tags}
   ${test.include.tags}
 
 
diff --git 
a/resource-managers/kubernete

[spark] branch master updated (f4ff2d16483 -> b2e38e16bfc)

2022-09-01 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from f4ff2d16483 [SPARK-40297][SQL] CTE outer reference nested in CTE main 
body cannot be resolved
 add b2e38e16bfc [SPARK-40302][K8S][TESTS] Add `YuniKornSuite`

No new revisions were added by this update.

Summary of changes:
 project/SparkBuild.scala   |  1 +
 .../integration-tests/dev/dev-run-integration-tests.sh | 10 ++
 resource-managers/kubernetes/integration-tests/pom.xml |  5 +++--
 .../spark/deploy/k8s/integrationtest/YuniKornTag.java  | 13 +
 .../{VolcanoSuite.scala => YuniKornSuite.scala}| 14 +-
 5 files changed, 24 insertions(+), 19 deletions(-)
 copy common/tags/src/test/java/org/apache/spark/tags/SlowHiveTest.java => 
resource-managers/kubernetes/integration-tests/src/test/java/org/apache/spark/deploy/k8s/integrationtest/YuniKornTag.java
 (89%)
 copy 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/{VolcanoSuite.scala
 => YuniKornSuite.scala} (76%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-40297][SQL] CTE outer reference nested in CTE main body cannot be resolved

2022-09-01 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f4ff2d16483 [SPARK-40297][SQL] CTE outer reference nested in CTE main 
body cannot be resolved
f4ff2d16483 is described below

commit f4ff2d16483f7da2c7ab73c7cfec75bb9e91064d
Author: Maryann Xue 
AuthorDate: Thu Sep 1 22:03:58 2022 +0800

[SPARK-40297][SQL] CTE outer reference nested in CTE main body cannot be 
resolved

### What changes were proposed in this pull request?

This PR fixes a bug where a CTE reference cannot be resolved if this 
reference occurs in an inner CTE definition nested in the outer CTE's main body 
FROM clause. E.g.,
```
WITH cte_outer AS (
  SELECT 1
)
SELECT * FROM (
  WITH cte_inner AS (
SELECT * FROM cte_outer
  )
  SELECT * FROM cte_inner
)
```

This fix is to change the `CTESubstitution`'s traverse order from 
`resolveOperatorsUpWithPruning` to `resolveOperatorsDownWithPruning` and also 
to recursively call `traverseAndSubstituteCTE` for CTE main body.

### Why are the changes needed?

Bug fix. Without the fix an `AnalysisException` would be thrown for CTE 
queries mentioned above.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added UTs.

Closes #37751 from maryannxue/spark-40297.

Authored-by: Maryann Xue 
Signed-off-by: Wenchen Fan 
---
 .../sql/catalyst/analysis/CTESubstitution.scala|  30 ++--
 .../test/resources/sql-tests/inputs/cte-nested.sql |  59 +++-
 .../resources/sql-tests/results/cte-legacy.sql.out |  80 +++
 .../resources/sql-tests/results/cte-nested.sql.out |  79 ++
 .../sql-tests/results/cte-nonlegacy.sql.out|  79 ++
 .../org/apache/spark/sql/CTEInlineSuite.scala  | 160 -
 6 files changed, 476 insertions(+), 11 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
index 62ebfa83431..6a4562450b9 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala
@@ -56,7 +56,7 @@ object CTESubstitution extends Rule[LogicalPlan] {
   case _ => false
 }
 val cteDefs = ArrayBuffer.empty[CTERelationDef]
-val (substituted, lastSubstituted) =
+val (substituted, firstSubstituted) =
   
LegacyBehaviorPolicy.withName(conf.getConf(LEGACY_CTE_PRECEDENCE_POLICY)) match 
{
 case LegacyBehaviorPolicy.EXCEPTION =>
   assertNoNameConflictsInCTE(plan)
@@ -68,12 +68,17 @@ object CTESubstitution extends Rule[LogicalPlan] {
 }
 if (cteDefs.isEmpty) {
   substituted
-} else if (substituted eq lastSubstituted.get) {
+} else if (substituted eq firstSubstituted.get) {
   WithCTE(substituted, cteDefs.toSeq)
 } else {
   var done = false
   substituted.resolveOperatorsWithPruning(_ => !done) {
-case p if p eq lastSubstituted.get =>
+case p if p eq firstSubstituted.get =>
+  // `firstSubstituted` is the parent of all other CTEs (if any).
+  done = true
+  WithCTE(p, cteDefs.toSeq)
+case p if p.children.count(_.containsPattern(CTE)) > 1 =>
+  // This is the first common parent of all CTEs.
   done = true
   WithCTE(p, cteDefs.toSeq)
   }
@@ -181,21 +186,28 @@ object CTESubstitution extends Rule[LogicalPlan] {
   isCommand: Boolean,
   outerCTEDefs: Seq[(String, CTERelationDef)],
   cteDefs: ArrayBuffer[CTERelationDef]): (LogicalPlan, 
Option[LogicalPlan]) = {
-var lastSubstituted: Option[LogicalPlan] = None
-val newPlan = plan.resolveOperatorsUpWithPruning(
+var firstSubstituted: Option[LogicalPlan] = None
+val newPlan = plan.resolveOperatorsDownWithPruning(
 _.containsAnyPattern(UNRESOLVED_WITH, PLAN_EXPRESSION)) {
   case UnresolvedWith(child: LogicalPlan, relations) =>
 val resolvedCTERelations =
-  resolveCTERelations(relations, isLegacy = false, isCommand, 
outerCTEDefs, cteDefs)
-lastSubstituted = Some(substituteCTE(child, isCommand, 
resolvedCTERelations))
-lastSubstituted.get
+  resolveCTERelations(relations, isLegacy = false, isCommand, 
outerCTEDefs, cteDefs) ++
+outerCTEDefs
+val substituted = substituteCTE(
+  traverseAndSubstituteCTE(child, isCommand, resolvedCTERelations, 
cteDefs)._1,
+  isCommand,
+  resolvedCTERelations)
+if (firstSubstituted.isEmpty) {
+  firstSubstituted = Some(substituted)
+}
+   

[spark] branch master updated: [SPARK-40279][DOC] Document spark.yarn.report.interval

2022-09-01 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 25b51aa51fb [SPARK-40279][DOC] Document spark.yarn.report.interval
25b51aa51fb is described below

commit 25b51aa51fb83af69bef481d228efc7912f1a2c0
Author: Luca Canali 
AuthorDate: Thu Sep 1 08:30:10 2022 -0500

[SPARK-40279][DOC] Document spark.yarn.report.interval

### What changes were proposed in this pull request?

This proposes to document the configuration paramter 
spark.yarn.report.interval -> Interval between reports of the current Spark job 
status in cluster mode.

### Why are the changes needed?
Document a configuration parameter for Spark on Yarn.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Not relevant.

Closes #37731 from LucaCanali/docReportInterval.

Authored-by: Luca Canali 
Signed-off-by: Sean Owen 
---
 docs/running-on-yarn.md | 8 
 1 file changed, 8 insertions(+)

diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index 48b0c7dc315..4c85bc3ceeb 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -624,6 +624,14 @@ To use a custom metrics.properties for the application 
master and executors, upd
   
   2.4.0
 
+
+  spark.yarn.report.interval
+  1s
+  
+Interval between reports of the current Spark job status in cluster mode.
+  
+  0.9.0
+
 
 
  Available patterns for SHS custom executor log URL


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org