[spark] branch master updated (b2e38e16bfc -> fd0498f81df)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from b2e38e16bfc [SPARK-40302][K8S][TESTS] Add `YuniKornSuite` add fd0498f81df [SPARK-40304][K8S][TESTS] Add `decomTestTag` to K8s Integration Test No new revisions were added by this update. Summary of changes: .../deploy/k8s/integrationtest/DecommissionSuite.scala | 13 +++-- .../spark/deploy/k8s/integrationtest/KubernetesSuite.scala | 1 + 2 files changed, 8 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.3 updated: [SPARK-40304][K8S][TESTS] Add `decomTestTag` to K8s Integration Test
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 399c397e703 [SPARK-40304][K8S][TESTS] Add `decomTestTag` to K8s Integration Test 399c397e703 is described below commit 399c397e7035665c928b1d439a860f9e7b1ce3b3 Author: Dongjoon Hyun AuthorDate: Thu Sep 1 09:34:55 2022 -0700 [SPARK-40304][K8S][TESTS] Add `decomTestTag` to K8s Integration Test ### What changes were proposed in this pull request? This PR aims to add a new test tag, `decomTestTag`, to K8s Integration Test. ### Why are the changes needed? Decommission-related tests took over 6 minutes (`363s`). It would be helpful we can run them selectively. ``` [info] - Test basic decommissioning (44 seconds, 51 milliseconds) [info] - Test basic decommissioning with shuffle cleanup (44 seconds, 450 milliseconds) [info] - Test decommissioning with dynamic allocation & shuffle cleanups (2 minutes, 43 seconds) [info] - Test decommissioning timeouts (44 seconds, 389 milliseconds) [info] - SPARK-37576: Rolling decommissioning (1 minute, 8 seconds) ``` ### Does this PR introduce _any_ user-facing change? No, this is a test-only change. ### How was this patch tested? Pass the CIs and test manually. ``` $ build/sbt -Psparkr -Pkubernetes -Pkubernetes-integration-tests \ -Dspark.kubernetes.test.deployMode=docker-desktop "kubernetes-integration-tests/test" \ -Dtest.exclude.tags=minikube,local,decom ... [info] KubernetesSuite: [info] - Run SparkPi with no resources (12 seconds, 441 milliseconds) [info] - Run SparkPi with no resources & statefulset allocation (11 seconds, 949 milliseconds) [info] - Run SparkPi with a very long application name. (11 seconds, 999 milliseconds) [info] - Use SparkLauncher.NO_RESOURCE (11 seconds, 846 milliseconds) [info] - Run SparkPi with a master URL without a scheme. (11 seconds, 176 milliseconds) [info] - Run SparkPi with an argument. (11 seconds, 868 milliseconds) [info] - Run SparkPi with custom labels, annotations, and environment variables. (11 seconds, 858 milliseconds) [info] - All pods have the same service account by default (11 seconds, 5 milliseconds) [info] - Run extraJVMOptions check on driver (5 seconds, 757 milliseconds) [info] - Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j2.properties (12 seconds, 467 milliseconds) [info] - Run SparkPi with env and mount secrets. (21 seconds, 119 milliseconds) [info] - Run PySpark on simple pi.py example (13 seconds, 129 milliseconds) [info] - Run PySpark to test a pyfiles example (14 seconds, 937 milliseconds) [info] - Run PySpark with memory customization (12 seconds, 195 milliseconds) [info] - Run in client mode. (11 seconds, 343 milliseconds) [info] - Start pod creation from template (11 seconds, 975 milliseconds) [info] - SPARK-38398: Schedule pod creation from template (11 seconds, 901 milliseconds) [info] - Run SparkR on simple dataframe.R example (14 seconds, 305 milliseconds) ... ``` Closes #37755 from dongjoon-hyun/SPARK-40304. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit fd0498f81df72c196f19a5b26053660f6f3f4d70) Signed-off-by: Dongjoon Hyun --- .../deploy/k8s/integrationtest/DecommissionSuite.scala | 13 +++-- .../spark/deploy/k8s/integrationtest/KubernetesSuite.scala | 1 + 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DecommissionSuite.scala b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DecommissionSuite.scala index 5d1a57fb46e..81f4660afe9 100644 --- a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DecommissionSuite.scala +++ b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DecommissionSuite.scala @@ -34,7 +34,7 @@ import org.apache.spark.internal.config.PLUGINS private[spark] trait DecommissionSuite { k8sSuite: KubernetesSuite => import DecommissionSuite._ - import KubernetesSuite.k8sTestTag + import KubernetesSuite.{decomTestTag, k8sTestTag} def runDecommissionTest(f: () => Unit): Unit = { val logConfFilePath = s"${sparkHomeDir.toFile}/conf/log4j2.properties" @@ -61,7 +61,7 @@ private[spark] trait DecommissionSuite { k8sSuite: KubernetesSuite => } } - test("Test basic decommissioning", k8sTestTag) { + test("Test basic decommissioning", k8sTestTag, decomTestTag) { runDecommissionTest(() => {
[spark] branch branch-3.3 updated: [SPARK-40302][K8S][TESTS] Add `YuniKornSuite`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.3 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.3 by this push: new 7c19df6fb2f [SPARK-40302][K8S][TESTS] Add `YuniKornSuite` 7c19df6fb2f is described below commit 7c19df6fb2f684a80ea3366fd365a6bbc13421b3 Author: Dongjoon Hyun AuthorDate: Thu Sep 1 09:26:45 2022 -0700 [SPARK-40302][K8S][TESTS] Add `YuniKornSuite` This PR aims the followings. 1. Add `YuniKornSuite` integration test suite which extends `KubernetesSuite` on Apache YuniKorn scheduler. 2. Support `--default-exclude-tags` command to override `test.default.exclude.tags`. To improve test coverage. No. This is a test suite addition. Since this requires `Apache YuniKorn` installation, the test suite is disabled by default. So, CI K8s integration test should pass without running this suite. In order to run the tests, we need to override `test.default.exclude.tags` like the following. **SBT** ``` $ build/sbt -Psparkr -Pkubernetes -Pkubernetes-integration-tests \ -Dspark.kubernetes.test.deployMode=docker-desktop "kubernetes-integration-tests/test" \ -Dtest.exclude.tags=minikube,local \ -Dtest.default.exclude.tags= ``` **MAVEN** ``` $ dev/dev-run-integration-tests.sh --deploy-mode docker-desktop \ --exclude-tag minikube,local \ --default-exclude-tags '' ``` Closes #37753 from dongjoon-hyun/SPARK-40302. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit b2e38e16bfc547a62957e0a67085985b3c65d525) Signed-off-by: Dongjoon Hyun --- project/SparkBuild.scala | 3 ++- .../dev/dev-run-integration-tests.sh | 10 .../kubernetes/integration-tests/pom.xml | 5 ++-- .../deploy/k8s/integrationtest/YuniKornTag.java| 27 .../deploy/k8s/integrationtest/YuniKornSuite.scala | 29 ++ 5 files changed, 71 insertions(+), 3 deletions(-) diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala index 934fa4a1fdd..f830e64edfc 100644 --- a/project/SparkBuild.scala +++ b/project/SparkBuild.scala @@ -1135,7 +1135,8 @@ object CopyDependencies { object TestSettings { import BuildCommons._ - private val defaultExcludedTags = Seq("org.apache.spark.tags.ChromeUITest") + private val defaultExcludedTags = Seq("org.apache.spark.tags.ChromeUITest", +"org.apache.spark.deploy.k8s.integrationtest.YuniKornTag") lazy val settings = Seq ( // Fork new JVMs for tests and set Java options for those diff --git a/resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh b/resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh index 5f94203c0e2..f5f93adeddf 100755 --- a/resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh +++ b/resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh @@ -37,6 +37,7 @@ SERVICE_ACCOUNT= CONTEXT= INCLUDE_TAGS="k8s" EXCLUDE_TAGS= +DEFAULT_EXCLUDE_TAGS="N/A" JAVA_VERSION="8" BUILD_DEPENDENCIES_MVN_FLAG="-am" HADOOP_PROFILE="hadoop-3" @@ -101,6 +102,10 @@ while (( "$#" )); do EXCLUDE_TAGS="$2" shift ;; +--default-exclude-tags) + DEFAULT_EXCLUDE_TAGS="$2" + shift + ;; --base-image-name) BASE_IMAGE_NAME="$2" shift @@ -180,6 +185,11 @@ then properties=( ${properties[@]} -Dtest.exclude.tags=$EXCLUDE_TAGS ) fi +if [ "$DEFAULT_EXCLUDE_TAGS" != "N/A" ]; +then + properties=( ${properties[@]} -Dtest.default.exclude.tags=$DEFAULT_EXCLUDE_TAGS ) +fi + BASE_IMAGE_NAME=${BASE_IMAGE_NAME:-spark} JVM_IMAGE_NAME=${JVM_IMAGE_NAME:-${BASE_IMAGE_NAME}} PYTHON_IMAGE_NAME=${PYTHON_IMAGE_NAME:-${BASE_IMAGE_NAME}-py} diff --git a/resource-managers/kubernetes/integration-tests/pom.xml b/resource-managers/kubernetes/integration-tests/pom.xml index 40e578f9a7e..516c92b1df6 100644 --- a/resource-managers/kubernetes/integration-tests/pom.xml +++ b/resource-managers/kubernetes/integration-tests/pom.xml @@ -46,6 +46,7 @@ Dockerfile.java17 + org.apache.spark.deploy.k8s.integrationtest.YuniKornTag **/*Volcano*.scala @@ -137,7 +138,7 @@ ${spark.kubernetes.test.dockerFile} --test-exclude-tags -"${test.exclude.tags}" + "${test.exclude.tags},${test.default.exclude.tags}" @@ -179,7 +180,7 @@ ${spark.kubernetes.test.pythonImage} ${spark.kubernetes.test.rImage} - ${test.exclude.tags} + ${test.exclude.tags},${test.default.exclude.tags} ${test.include.tags} diff --git a/resource-managers/kubernete
[spark] branch master updated (f4ff2d16483 -> b2e38e16bfc)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from f4ff2d16483 [SPARK-40297][SQL] CTE outer reference nested in CTE main body cannot be resolved add b2e38e16bfc [SPARK-40302][K8S][TESTS] Add `YuniKornSuite` No new revisions were added by this update. Summary of changes: project/SparkBuild.scala | 1 + .../integration-tests/dev/dev-run-integration-tests.sh | 10 ++ resource-managers/kubernetes/integration-tests/pom.xml | 5 +++-- .../spark/deploy/k8s/integrationtest/YuniKornTag.java | 13 + .../{VolcanoSuite.scala => YuniKornSuite.scala}| 14 +- 5 files changed, 24 insertions(+), 19 deletions(-) copy common/tags/src/test/java/org/apache/spark/tags/SlowHiveTest.java => resource-managers/kubernetes/integration-tests/src/test/java/org/apache/spark/deploy/k8s/integrationtest/YuniKornTag.java (89%) copy resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/{VolcanoSuite.scala => YuniKornSuite.scala} (76%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-40297][SQL] CTE outer reference nested in CTE main body cannot be resolved
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f4ff2d16483 [SPARK-40297][SQL] CTE outer reference nested in CTE main body cannot be resolved f4ff2d16483 is described below commit f4ff2d16483f7da2c7ab73c7cfec75bb9e91064d Author: Maryann Xue AuthorDate: Thu Sep 1 22:03:58 2022 +0800 [SPARK-40297][SQL] CTE outer reference nested in CTE main body cannot be resolved ### What changes were proposed in this pull request? This PR fixes a bug where a CTE reference cannot be resolved if this reference occurs in an inner CTE definition nested in the outer CTE's main body FROM clause. E.g., ``` WITH cte_outer AS ( SELECT 1 ) SELECT * FROM ( WITH cte_inner AS ( SELECT * FROM cte_outer ) SELECT * FROM cte_inner ) ``` This fix is to change the `CTESubstitution`'s traverse order from `resolveOperatorsUpWithPruning` to `resolveOperatorsDownWithPruning` and also to recursively call `traverseAndSubstituteCTE` for CTE main body. ### Why are the changes needed? Bug fix. Without the fix an `AnalysisException` would be thrown for CTE queries mentioned above. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added UTs. Closes #37751 from maryannxue/spark-40297. Authored-by: Maryann Xue Signed-off-by: Wenchen Fan --- .../sql/catalyst/analysis/CTESubstitution.scala| 30 ++-- .../test/resources/sql-tests/inputs/cte-nested.sql | 59 +++- .../resources/sql-tests/results/cte-legacy.sql.out | 80 +++ .../resources/sql-tests/results/cte-nested.sql.out | 79 ++ .../sql-tests/results/cte-nonlegacy.sql.out| 79 ++ .../org/apache/spark/sql/CTEInlineSuite.scala | 160 - 6 files changed, 476 insertions(+), 11 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala index 62ebfa83431..6a4562450b9 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala @@ -56,7 +56,7 @@ object CTESubstitution extends Rule[LogicalPlan] { case _ => false } val cteDefs = ArrayBuffer.empty[CTERelationDef] -val (substituted, lastSubstituted) = +val (substituted, firstSubstituted) = LegacyBehaviorPolicy.withName(conf.getConf(LEGACY_CTE_PRECEDENCE_POLICY)) match { case LegacyBehaviorPolicy.EXCEPTION => assertNoNameConflictsInCTE(plan) @@ -68,12 +68,17 @@ object CTESubstitution extends Rule[LogicalPlan] { } if (cteDefs.isEmpty) { substituted -} else if (substituted eq lastSubstituted.get) { +} else if (substituted eq firstSubstituted.get) { WithCTE(substituted, cteDefs.toSeq) } else { var done = false substituted.resolveOperatorsWithPruning(_ => !done) { -case p if p eq lastSubstituted.get => +case p if p eq firstSubstituted.get => + // `firstSubstituted` is the parent of all other CTEs (if any). + done = true + WithCTE(p, cteDefs.toSeq) +case p if p.children.count(_.containsPattern(CTE)) > 1 => + // This is the first common parent of all CTEs. done = true WithCTE(p, cteDefs.toSeq) } @@ -181,21 +186,28 @@ object CTESubstitution extends Rule[LogicalPlan] { isCommand: Boolean, outerCTEDefs: Seq[(String, CTERelationDef)], cteDefs: ArrayBuffer[CTERelationDef]): (LogicalPlan, Option[LogicalPlan]) = { -var lastSubstituted: Option[LogicalPlan] = None -val newPlan = plan.resolveOperatorsUpWithPruning( +var firstSubstituted: Option[LogicalPlan] = None +val newPlan = plan.resolveOperatorsDownWithPruning( _.containsAnyPattern(UNRESOLVED_WITH, PLAN_EXPRESSION)) { case UnresolvedWith(child: LogicalPlan, relations) => val resolvedCTERelations = - resolveCTERelations(relations, isLegacy = false, isCommand, outerCTEDefs, cteDefs) -lastSubstituted = Some(substituteCTE(child, isCommand, resolvedCTERelations)) -lastSubstituted.get + resolveCTERelations(relations, isLegacy = false, isCommand, outerCTEDefs, cteDefs) ++ +outerCTEDefs +val substituted = substituteCTE( + traverseAndSubstituteCTE(child, isCommand, resolvedCTERelations, cteDefs)._1, + isCommand, + resolvedCTERelations) +if (firstSubstituted.isEmpty) { + firstSubstituted = Some(substituted) +} +
[spark] branch master updated: [SPARK-40279][DOC] Document spark.yarn.report.interval
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 25b51aa51fb [SPARK-40279][DOC] Document spark.yarn.report.interval 25b51aa51fb is described below commit 25b51aa51fb83af69bef481d228efc7912f1a2c0 Author: Luca Canali AuthorDate: Thu Sep 1 08:30:10 2022 -0500 [SPARK-40279][DOC] Document spark.yarn.report.interval ### What changes were proposed in this pull request? This proposes to document the configuration paramter spark.yarn.report.interval -> Interval between reports of the current Spark job status in cluster mode. ### Why are the changes needed? Document a configuration parameter for Spark on Yarn. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Not relevant. Closes #37731 from LucaCanali/docReportInterval. Authored-by: Luca Canali Signed-off-by: Sean Owen --- docs/running-on-yarn.md | 8 1 file changed, 8 insertions(+) diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md index 48b0c7dc315..4c85bc3ceeb 100644 --- a/docs/running-on-yarn.md +++ b/docs/running-on-yarn.md @@ -624,6 +624,14 @@ To use a custom metrics.properties for the application master and executors, upd 2.4.0 + + spark.yarn.report.interval + 1s + +Interval between reports of the current Spark job status in cluster mode. + + 0.9.0 + Available patterns for SHS custom executor log URL - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org