[spark] branch master updated: [SPARK-38423][K8S] Reuse driver pod's `priorityClassName` for `PodGroup`
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f36d1bf [SPARK-38423][K8S] Reuse driver pod's `priorityClassName` for `PodGroup` f36d1bf is described below commit f36d1bfba47f6f6ff0f4375a1eb74bb606f8a0b7 Author: Yikun Jiang AuthorDate: Sun Mar 6 23:54:18 2022 -0800 [SPARK-38423][K8S] Reuse driver pod's `priorityClassName` for `PodGroup` ### What changes were proposed in this pull request? This patch set podgroup `priorityClassName` to `driver.pod.spec.priorityClassName`. ### Why are the changes needed? Support priority scheduling with Volcano implementations ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - New UT to make sure feature step set podgroup priority as expected. - Add two integration tests: - 1. Submit 3 different priority jobs (spark pi) to make sure job completed result as expected. - 2. Submit 3 different priority jobs (driver submisson) to make sure job scheduler order as expected. - All existing UT and IT Closes #35639 from Yikun/SPARK-38189. Authored-by: Yikun Jiang Signed-off-by: Dongjoon Hyun --- .../deploy/k8s/features/VolcanoFeatureStep.scala | 6 + .../k8s/features/VolcanoFeatureStepSuite.scala | 30 .../src/test/resources/volcano/disable-queue.yml | 24 +++ .../src/test/resources/volcano/enable-queue.yml| 24 +++ .../volcano/high-priority-driver-template.yml | 26 .../volcano/low-priority-driver-template.yml | 26 .../volcano/medium-priority-driver-template.yml| 26 .../src/test/resources/volcano/priorityClasses.yml | 33 + .../k8s/integrationtest/VolcanoTestsSuite.scala| 163 ++--- 9 files changed, 340 insertions(+), 18 deletions(-) diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/VolcanoFeatureStep.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/VolcanoFeatureStep.scala index c6efe4d..48303c8 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/VolcanoFeatureStep.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/VolcanoFeatureStep.scala @@ -32,6 +32,7 @@ private[spark] class VolcanoFeatureStep extends KubernetesDriverCustomFeatureCon private lazy val podGroupName = s"${kubernetesConf.appId}-podgroup" private lazy val namespace = kubernetesConf.namespace private lazy val queue = kubernetesConf.get(KUBERNETES_JOB_QUEUE) + private var priorityClassName: Option[String] = None override def init(config: KubernetesDriverConf): Unit = { kubernetesConf = config @@ -50,10 +51,15 @@ private[spark] class VolcanoFeatureStep extends KubernetesDriverCustomFeatureCon queue.foreach(podGroup.editOrNewSpec().withQueue(_).endSpec()) + priorityClassName.foreach(podGroup.editOrNewSpec().withPriorityClassName(_).endSpec()) + Seq(podGroup.build()) } override def configurePod(pod: SparkPod): SparkPod = { + +priorityClassName = Some(pod.pod.getSpec.getPriorityClassName) + val k8sPodBuilder = new PodBuilder(pod.pod) .editMetadata() .addToAnnotations(POD_GROUP_ANNOTATION, podGroupName) diff --git a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/VolcanoFeatureStepSuite.scala b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/VolcanoFeatureStepSuite.scala index eda1ccc..350df77 100644 --- a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/VolcanoFeatureStepSuite.scala +++ b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/VolcanoFeatureStepSuite.scala @@ -16,6 +16,7 @@ */ package org.apache.spark.deploy.k8s.features +import io.fabric8.kubernetes.api.model.{ContainerBuilder, PodBuilder} import io.fabric8.volcano.scheduling.v1beta1.PodGroup import org.apache.spark.{SparkConf, SparkFunSuite} @@ -57,4 +58,33 @@ class VolcanoFeatureStepSuite extends SparkFunSuite { val annotations = configuredPod.pod.getMetadata.getAnnotations assert(annotations.get("scheduling.k8s.io/group-name") === s"${kubernetesConf.appId}-podgroup") } + + test("SPARK-38423: Support priorityClassName") { +// test null priority +val podWithNullPriority = SparkPod.initialPod() +assert(podWithNullPriority.pod.getSpec.getPriorityClassName === null) +verifyPriority(SparkPod.initialPod()) +// test normal priority +val podWithPriority = SparkPod( + new PodBuilder() +.withNewMetadata() +.endMetadata() +.withNewSpec() +
[spark] branch branch-3.2 updated: [SPARK-38430][K8S][DOCS] Add `SBT` commands to K8s IT README
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 7eafadb [SPARK-38430][K8S][DOCS] Add `SBT` commands to K8s IT README 7eafadb is described below commit 7eafadbbd962f28bac54cbc45eef9a37fc785966 Author: William Hyun AuthorDate: Sun Mar 6 22:04:20 2022 -0800 [SPARK-38430][K8S][DOCS] Add `SBT` commands to K8s IT README ### What changes were proposed in this pull request? This PR aims to add SBT commands to K8s IT README. ### Why are the changes needed? This will introduce new SBT commands to developers. ### Does this PR introduce _any_ user-facing change? No, this is a dev-only change. ### How was this patch tested? Manual. Closes #35745 from williamhyun/sbtdoc. Authored-by: William Hyun Signed-off-by: Dongjoon Hyun (cherry picked from commit 3bbc43d662ccfff6bd93a351fcbf96179289f58f) Signed-off-by: Dongjoon Hyun --- .../kubernetes/integration-tests/README.md | 25 ++ 1 file changed, 25 insertions(+) diff --git a/resource-managers/kubernetes/integration-tests/README.md b/resource-managers/kubernetes/integration-tests/README.md index 3a81033..a7edcf4 100644 --- a/resource-managers/kubernetes/integration-tests/README.md +++ b/resource-managers/kubernetes/integration-tests/README.md @@ -255,3 +255,28 @@ to the wrapper scripts and using the wrapper scripts will simply set these appro + +# Running the Kubernetes Integration Tests with SBT + +You can use SBT in the same way to build image and run all K8s integration tests except Minikube-only ones. + +build/sbt -Psparkr -Pkubernetes -Pkubernetes-integration-tests \ +-Dtest.exclude.tags=minikube \ +-Dspark.kubernetes.test.deployMode=docker-desktop \ +-Dspark.kubernetes.test.imageTag=2022-03-06 \ +'kubernetes-integration-tests/test' + +The following is an example to rerun tests with the pre-built image. + +build/sbt -Psparkr -Pkubernetes -Pkubernetes-integration-tests \ +-Dtest.exclude.tags=minikube \ +-Dspark.kubernetes.test.deployMode=docker-desktop \ +-Dspark.kubernetes.test.imageTag=2022-03-06 \ +'kubernetes-integration-tests/runIts' + +In addition, you can run a single test selectively. + +build/sbt -Psparkr -Pkubernetes -Pkubernetes-integration-tests \ +-Dspark.kubernetes.test.deployMode=docker-desktop \ +-Dspark.kubernetes.test.imageTag=2022-03-06 \ +'kubernetes-integration-tests/testOnly -- -z "Run SparkPi with a very long application name"' - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-38430][K8S][DOCS] Add `SBT` commands to K8s IT README
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3bbc43d [SPARK-38430][K8S][DOCS] Add `SBT` commands to K8s IT README 3bbc43d is described below commit 3bbc43d662ccfff6bd93a351fcbf96179289f58f Author: William Hyun AuthorDate: Sun Mar 6 22:04:20 2022 -0800 [SPARK-38430][K8S][DOCS] Add `SBT` commands to K8s IT README ### What changes were proposed in this pull request? This PR aims to add SBT commands to K8s IT README. ### Why are the changes needed? This will introduce new SBT commands to developers. ### Does this PR introduce _any_ user-facing change? No, this is a dev-only change. ### How was this patch tested? Manual. Closes #35745 from williamhyun/sbtdoc. Authored-by: William Hyun Signed-off-by: Dongjoon Hyun --- .../kubernetes/integration-tests/README.md | 25 ++ 1 file changed, 25 insertions(+) diff --git a/resource-managers/kubernetes/integration-tests/README.md b/resource-managers/kubernetes/integration-tests/README.md index edd3bf5..2151b7f 100644 --- a/resource-managers/kubernetes/integration-tests/README.md +++ b/resource-managers/kubernetes/integration-tests/README.md @@ -269,3 +269,28 @@ to the wrapper scripts and using the wrapper scripts will simply set these appro + +# Running the Kubernetes Integration Tests with SBT + +You can use SBT in the same way to build image and run all K8s integration tests except Minikube-only ones. + +build/sbt -Psparkr -Pkubernetes -Pkubernetes-integration-tests \ +-Dtest.exclude.tags=minikube \ +-Dspark.kubernetes.test.deployMode=docker-desktop \ +-Dspark.kubernetes.test.imageTag=2022-03-06 \ +'kubernetes-integration-tests/test' + +The following is an example to rerun tests with the pre-built image. + +build/sbt -Psparkr -Pkubernetes -Pkubernetes-integration-tests \ +-Dtest.exclude.tags=minikube \ +-Dspark.kubernetes.test.deployMode=docker-desktop \ +-Dspark.kubernetes.test.imageTag=2022-03-06 \ +'kubernetes-integration-tests/runIts' + +In addition, you can run a single test selectively. + +build/sbt -Psparkr -Pkubernetes -Pkubernetes-integration-tests \ +-Dspark.kubernetes.test.deployMode=docker-desktop \ +-Dspark.kubernetes.test.imageTag=2022-03-06 \ +'kubernetes-integration-tests/testOnly -- -z "Run SparkPi with a very long application name"' - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d83ab94 -> fc6b5e5)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d83ab94 [SPARK-38419][BUILD] Replace tabs that exist in the script with spaces add fc6b5e5 [SPARK-38188][K8S][TESTS][FOLLOWUP] Cleanup resources in `afterEach` No new revisions were added by this update. Summary of changes: .../k8s/integrationtest/VolcanoTestsSuite.scala| 61 ++ 1 file changed, 50 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (b99f58a -> d83ab94)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from b99f58a [SPARK-38267][CORE][SQL][SS] Replace pattern matches on boolean expressions with conditional statements add d83ab94 [SPARK-38419][BUILD] Replace tabs that exist in the script with spaces No new revisions were added by this update. Summary of changes: .../docker/src/main/dockerfiles/spark/entrypoint.sh | 4 ++-- sbin/spark-daemon.sh | 12 ++-- sbin/start-master.sh | 8 sbin/start-mesos-dispatcher.sh | 8 4 files changed, 16 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-38267][CORE][SQL][SS] Replace pattern matches on boolean expressions with conditional statements
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new b99f58a [SPARK-38267][CORE][SQL][SS] Replace pattern matches on boolean expressions with conditional statements b99f58a is described below commit b99f58a57c880ed9cdec3d37ac8683c31daa4c10 Author: yangjie01 AuthorDate: Sun Mar 6 19:26:45 2022 -0600 [SPARK-38267][CORE][SQL][SS] Replace pattern matches on boolean expressions with conditional statements ### What changes were proposed in this pull request? This pr uses `conditional statements` to simplify `pattern matches on boolean`: **Before** ```scala val bool: Boolean bool match { case true => do something when bool is true case false => do something when bool is false } ``` **After** ```scala val bool: Boolean if (bool) { do something when bool is true } else { do something when bool is false } ``` ### Why are the changes needed? Simplify unnecessary pattern match. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GA Closes #35589 from LuciferYang/trivial-match. Authored-by: yangjie01 Signed-off-by: Sean Owen --- .../BlockManagerDecommissionIntegrationSuite.scala | 7 +-- .../catalyst/expressions/datetimeExpressions.scala | 50 +++--- .../spark/sql/catalyst/parser/AstBuilder.scala | 14 +++--- .../sql/internal/ExecutorSideSQLConfSuite.scala| 7 +-- .../streaming/FlatMapGroupsWithStateSuite.scala| 7 +-- 5 files changed, 43 insertions(+), 42 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/storage/BlockManagerDecommissionIntegrationSuite.scala b/core/src/test/scala/org/apache/spark/storage/BlockManagerDecommissionIntegrationSuite.scala index 8999a12..e004c33 100644 --- a/core/src/test/scala/org/apache/spark/storage/BlockManagerDecommissionIntegrationSuite.scala +++ b/core/src/test/scala/org/apache/spark/storage/BlockManagerDecommissionIntegrationSuite.scala @@ -165,9 +165,10 @@ class BlockManagerDecommissionIntegrationSuite extends SparkFunSuite with LocalS } x.map(y => (y, y)) } -val testRdd = shuffle match { - case true => baseRdd.reduceByKey(_ + _) - case false => baseRdd +val testRdd = if (shuffle) { + baseRdd.reduceByKey(_ + _) +} else { + baseRdd } // Listen for the job & block updates diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala index 8b5a387..d8cf474 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala @@ -2903,25 +2903,25 @@ case class SubtractTimestamps( @transient private lazy val zoneIdInEval: ZoneId = zoneIdForType(left.dataType) @transient - private lazy val evalFunc: (Long, Long) => Any = legacyInterval match { -case false => (leftMicros, rightMicros) => - subtractTimestamps(leftMicros, rightMicros, zoneIdInEval) -case true => (leftMicros, rightMicros) => + private lazy val evalFunc: (Long, Long) => Any = if (legacyInterval) { +(leftMicros, rightMicros) => new CalendarInterval(0, 0, leftMicros - rightMicros) + } else { +(leftMicros, rightMicros) => + subtractTimestamps(leftMicros, rightMicros, zoneIdInEval) } override def nullSafeEval(leftMicros: Any, rightMicros: Any): Any = { evalFunc(leftMicros.asInstanceOf[Long], rightMicros.asInstanceOf[Long]) } - override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = legacyInterval match { -case false => - val zid = ctx.addReferenceObj("zoneId", zoneIdInEval, classOf[ZoneId].getName) - val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") - defineCodeGen(ctx, ev, (l, r) => s"""$dtu.subtractTimestamps($l, $r, $zid)""") -case true => - defineCodeGen(ctx, ev, (end, start) => -s"new org.apache.spark.unsafe.types.CalendarInterval(0, 0, $end - $start)") + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = if (legacyInterval) { +defineCodeGen(ctx, ev, (end, start) => + s"new org.apache.spark.unsafe.types.CalendarInterval(0, 0, $end - $start)") + } else { +val zid = ctx.addReferenceObj("zoneId", zoneIdInEval, classOf[ZoneId].getName) +val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") +defineCodeGen(ctx, ev, (l, r) => s"""$dtu.subtractTimestamps($l, $r, $zid)""") } override def toString: String = s"($left - $right)" @@ -2961,26
[spark] branch master updated: [SPARK-38394][BUILD] Upgrade `scala-maven-plugin` to 4.4.0 for Hadoop 3 profile
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3175d83 [SPARK-38394][BUILD] Upgrade `scala-maven-plugin` to 4.4.0 for Hadoop 3 profile 3175d83 is described below commit 3175d830cb029d41909de8960aa790d4272aa188 Author: Steve Loughran AuthorDate: Sun Mar 6 19:23:31 2022 -0600 [SPARK-38394][BUILD] Upgrade `scala-maven-plugin` to 4.4.0 for Hadoop 3 profile ### What changes were proposed in this pull request? This sets scala-maven-plugin.version to 4.4.0 except when the hadoop-2.7 profile is used, because SPARK-36547 shows that only 4.3.0 works there. ### Why are the changes needed? 1. If you try to build against a local snapshot of hadoop trunk with `-Dhadoop.version=3.4.0-SNAPSHOT` the build failes with the error shown in the JIRA. 2. upgrading the scala plugin version fixes this. It is a plugin issue. 3. the version is made configurable so the hadoop 2.7 profile can switch back to the one which works there. As to why this only surfaces when compiling hadoop trunk, or why hadoop-2.7 requires the new one -who knows. they both look certificate related, which is interesting. maybe something related to signed JARs? ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? by successfully building spark against a local build of hadoop 3.4.0-SNAPSHOT Closes #35725 from steveloughran/SPARK-38394-compiler-version. Authored-by: Steve Loughran Signed-off-by: Sean Owen --- pom.xml | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/pom.xml b/pom.xml index 176d3af..8e03167 100644 --- a/pom.xml +++ b/pom.xml @@ -163,6 +163,10 @@ 2.12.15 2.12 2.0.2 + + +4.4.0 --test true @@ -2775,8 +2779,7 @@ net.alchim31.maven scala-maven-plugin - - 4.3.0 + ${scala-maven-plugin.version} eclipse-add-source @@ -3430,6 +3433,7 @@ hadoop-client hadoop-yarn-api hadoop-client +4.3.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-38416][PYTHON][TESTS] Change day to month
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 1406d0c [SPARK-38416][PYTHON][TESTS] Change day to month 1406d0c is described below commit 1406d0cc744ede2a2beb58f22040d0e05582e776 Author: bjornjorgensen AuthorDate: Mon Mar 7 09:00:06 2022 +0900 [SPARK-38416][PYTHON][TESTS] Change day to month ### What changes were proposed in this pull request? Right now we have two functions that are testing the same thing. ### Why are the changes needed? To test both day and mount ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Got the green light. Closes #35741 from bjornjorgensen/change-day-to-month. Authored-by: bjornjorgensen Signed-off-by: Hyukjin Kwon (cherry picked from commit b6516174a84d849bd620417dca9e0a81e0d3b5dc) Signed-off-by: Hyukjin Kwon --- python/pyspark/pandas/tests/indexes/test_datetime.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/python/pyspark/pandas/tests/indexes/test_datetime.py b/python/pyspark/pandas/tests/indexes/test_datetime.py index e3bf14e..85a2b21 100644 --- a/python/pyspark/pandas/tests/indexes/test_datetime.py +++ b/python/pyspark/pandas/tests/indexes/test_datetime.py @@ -120,7 +120,7 @@ class DatetimeIndexTest(PandasOnSparkTestCase, TestUtils): def test_month_name(self): for psidx, pidx in self.idx_pairs: -self.assert_eq(psidx.day_name(), pidx.day_name()) +self.assert_eq(psidx.month_name(), pidx.month_name()) def test_normalize(self): for psidx, pidx in self.idx_pairs: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (135841f -> b651617)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 135841f [SPARK-38411][CORE] Use `UTF-8` when `doMergeApplicationListingInternal` reads event logs add b651617 [SPARK-38416][PYTHON][TESTS] Change day to month No new revisions were added by this update. Summary of changes: python/pyspark/pandas/tests/indexes/test_datetime.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-38411][CORE] Use `UTF-8` when `doMergeApplicationListingInternal` reads event logs
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new e036de3 [SPARK-38411][CORE] Use `UTF-8` when `doMergeApplicationListingInternal` reads event logs e036de3 is described below commit e036de326bdc6bc828eee910861851d52c81f6d5 Author: Cheng Pan AuthorDate: Sun Mar 6 15:41:20 2022 -0800 [SPARK-38411][CORE] Use `UTF-8` when `doMergeApplicationListingInternal` reads event logs ### What changes were proposed in this pull request? Use UTF-8 instead of system default encoding to read event log ### Why are the changes needed? After SPARK-29160, we should always use UTF-8 to read event log, otherwise, if Spark History Server run with different default charset than "UTF-8", will encounter such error. ``` 2022-03-04 12:16:00,143 [3752440] - INFO [log-replay-executor-19:Logging57] - Parsing hdfs://hz-cluster11/spark2-history/application_1640597251469_2453817_1.lz4 for listing data... 2022-03-04 12:16:00,145 [3752442] - ERROR [log-replay-executor-18:Logging94] - Exception while merging application listings java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(CoderResult.java:281) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:161) at java.io.BufferedReader.readLine(BufferedReader.java:324) at java.io.BufferedReader.readLine(BufferedReader.java:389) at scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:74) at scala.collection.Iterator$$anon$20.hasNext(Iterator.scala:884) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:511) at org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:82) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$doMergeApplicationListing$4(FsHistoryProvider.scala:819) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$doMergeApplicationListing$4$adapted(FsHistoryProvider.scala:801) at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2626) at org.apache.spark.deploy.history.FsHistoryProvider.doMergeApplicationListing(FsHistoryProvider.scala:801) at org.apache.spark.deploy.history.FsHistoryProvider.mergeApplicationListing(FsHistoryProvider.scala:715) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$15(FsHistoryProvider.scala:581) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ``` ### Does this PR introduce _any_ user-facing change? Yes, bug fix. ### How was this patch tested? Verification steps in ubuntu:20.04 1. build `spark-3.3.0-SNAPSHOT-bin-master.tgz` on commit `34618a7ef6` using `dev/make-distribution.sh --tgz --name master` 2. build `spark-3.3.0-SNAPSHOT-bin-SPARK-38411.tgz` on commit `2a8f56038b` using `dev/make-distribution.sh --tgz --name SPARK-38411` 3. switch to UTF-8 using `export LC_ALL=C.UTF-8 && bash` 4. generate event log contains no-ASCII chars. ``` bin/spark-submit \ --master local[*] \ --class org.apache.spark.examples.SparkPi \ --conf spark.eventLog.enabled=true \ --conf spark.user.key='计算圆周率' \ examples/jars/spark-examples_2.12-3.3.0-SNAPSHOT.jar ``` 5. switch to POSIX using `export LC_ALL=POSIX && bash` 6. run `spark-3.3.0-SNAPSHOT-bin-master/sbin/start-history-server.sh` and watch logs ``` Spark Command: /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -cp /spark-3.3.0-SNAPSHOT-bin-master/conf/:/spark-3.3.0-SNAPSHOT-bin-master/jars/* -Xmx1g org.apache.spark.deploy.history.HistoryServer Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties 22/03/06 13:37:19 INFO HistoryServer: Started daemon with process name: 48729c3ffc10aa9 22/03/06 13:37:19 INFO SignalUtils: Registering signal handler for TERM 22/03/06 13:37:19 INFO SignalUtils: Registering signal handler for HUP 22/03/06 13:37:19 INFO SignalUtils: Registering signal handler for INT 22/03/06 13:37:21 WARN NativeCodeLoader:
[spark] branch branch-3.1 updated: [SPARK-38411][CORE] Use `UTF-8` when `doMergeApplicationListingInternal` reads event logs
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.1 by this push: new 8d70d5d [SPARK-38411][CORE] Use `UTF-8` when `doMergeApplicationListingInternal` reads event logs 8d70d5d is described below commit 8d70d5da3d74ebdd612b2cdc38201e121b88b24f Author: Cheng Pan AuthorDate: Sun Mar 6 15:41:20 2022 -0800 [SPARK-38411][CORE] Use `UTF-8` when `doMergeApplicationListingInternal` reads event logs ### What changes were proposed in this pull request? Use UTF-8 instead of system default encoding to read event log ### Why are the changes needed? After SPARK-29160, we should always use UTF-8 to read event log, otherwise, if Spark History Server run with different default charset than "UTF-8", will encounter such error. ``` 2022-03-04 12:16:00,143 [3752440] - INFO [log-replay-executor-19:Logging57] - Parsing hdfs://hz-cluster11/spark2-history/application_1640597251469_2453817_1.lz4 for listing data... 2022-03-04 12:16:00,145 [3752442] - ERROR [log-replay-executor-18:Logging94] - Exception while merging application listings java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(CoderResult.java:281) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:161) at java.io.BufferedReader.readLine(BufferedReader.java:324) at java.io.BufferedReader.readLine(BufferedReader.java:389) at scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:74) at scala.collection.Iterator$$anon$20.hasNext(Iterator.scala:884) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:511) at org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:82) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$doMergeApplicationListing$4(FsHistoryProvider.scala:819) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$doMergeApplicationListing$4$adapted(FsHistoryProvider.scala:801) at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2626) at org.apache.spark.deploy.history.FsHistoryProvider.doMergeApplicationListing(FsHistoryProvider.scala:801) at org.apache.spark.deploy.history.FsHistoryProvider.mergeApplicationListing(FsHistoryProvider.scala:715) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$15(FsHistoryProvider.scala:581) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ``` ### Does this PR introduce _any_ user-facing change? Yes, bug fix. ### How was this patch tested? Verification steps in ubuntu:20.04 1. build `spark-3.3.0-SNAPSHOT-bin-master.tgz` on commit `34618a7ef6` using `dev/make-distribution.sh --tgz --name master` 2. build `spark-3.3.0-SNAPSHOT-bin-SPARK-38411.tgz` on commit `2a8f56038b` using `dev/make-distribution.sh --tgz --name SPARK-38411` 3. switch to UTF-8 using `export LC_ALL=C.UTF-8 && bash` 4. generate event log contains no-ASCII chars. ``` bin/spark-submit \ --master local[*] \ --class org.apache.spark.examples.SparkPi \ --conf spark.eventLog.enabled=true \ --conf spark.user.key='计算圆周率' \ examples/jars/spark-examples_2.12-3.3.0-SNAPSHOT.jar ``` 5. switch to POSIX using `export LC_ALL=POSIX && bash` 6. run `spark-3.3.0-SNAPSHOT-bin-master/sbin/start-history-server.sh` and watch logs ``` Spark Command: /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -cp /spark-3.3.0-SNAPSHOT-bin-master/conf/:/spark-3.3.0-SNAPSHOT-bin-master/jars/* -Xmx1g org.apache.spark.deploy.history.HistoryServer Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties 22/03/06 13:37:19 INFO HistoryServer: Started daemon with process name: 48729c3ffc10aa9 22/03/06 13:37:19 INFO SignalUtils: Registering signal handler for TERM 22/03/06 13:37:19 INFO SignalUtils: Registering signal handler for HUP 22/03/06 13:37:19 INFO SignalUtils: Registering signal handler for INT 22/03/06 13:37:21 WARN NativeCodeLoader:
[spark] branch branch-3.2 updated: [SPARK-38411][CORE] Use `UTF-8` when `doMergeApplicationListingInternal` reads event logs
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 56ddf50 [SPARK-38411][CORE] Use `UTF-8` when `doMergeApplicationListingInternal` reads event logs 56ddf50 is described below commit 56ddf50e20bd38f37d6d037b97c1b1d59100116b Author: Cheng Pan AuthorDate: Sun Mar 6 15:41:20 2022 -0800 [SPARK-38411][CORE] Use `UTF-8` when `doMergeApplicationListingInternal` reads event logs ### What changes were proposed in this pull request? Use UTF-8 instead of system default encoding to read event log ### Why are the changes needed? After SPARK-29160, we should always use UTF-8 to read event log, otherwise, if Spark History Server run with different default charset than "UTF-8", will encounter such error. ``` 2022-03-04 12:16:00,143 [3752440] - INFO [log-replay-executor-19:Logging57] - Parsing hdfs://hz-cluster11/spark2-history/application_1640597251469_2453817_1.lz4 for listing data... 2022-03-04 12:16:00,145 [3752442] - ERROR [log-replay-executor-18:Logging94] - Exception while merging application listings java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(CoderResult.java:281) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:161) at java.io.BufferedReader.readLine(BufferedReader.java:324) at java.io.BufferedReader.readLine(BufferedReader.java:389) at scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:74) at scala.collection.Iterator$$anon$20.hasNext(Iterator.scala:884) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:511) at org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:82) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$doMergeApplicationListing$4(FsHistoryProvider.scala:819) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$doMergeApplicationListing$4$adapted(FsHistoryProvider.scala:801) at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2626) at org.apache.spark.deploy.history.FsHistoryProvider.doMergeApplicationListing(FsHistoryProvider.scala:801) at org.apache.spark.deploy.history.FsHistoryProvider.mergeApplicationListing(FsHistoryProvider.scala:715) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$15(FsHistoryProvider.scala:581) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ``` ### Does this PR introduce _any_ user-facing change? Yes, bug fix. ### How was this patch tested? Verification steps in ubuntu:20.04 1. build `spark-3.3.0-SNAPSHOT-bin-master.tgz` on commit `34618a7ef6` using `dev/make-distribution.sh --tgz --name master` 2. build `spark-3.3.0-SNAPSHOT-bin-SPARK-38411.tgz` on commit `2a8f56038b` using `dev/make-distribution.sh --tgz --name SPARK-38411` 3. switch to UTF-8 using `export LC_ALL=C.UTF-8 && bash` 4. generate event log contains no-ASCII chars. ``` bin/spark-submit \ --master local[*] \ --class org.apache.spark.examples.SparkPi \ --conf spark.eventLog.enabled=true \ --conf spark.user.key='计算圆周率' \ examples/jars/spark-examples_2.12-3.3.0-SNAPSHOT.jar ``` 5. switch to POSIX using `export LC_ALL=POSIX && bash` 6. run `spark-3.3.0-SNAPSHOT-bin-master/sbin/start-history-server.sh` and watch logs ``` Spark Command: /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -cp /spark-3.3.0-SNAPSHOT-bin-master/conf/:/spark-3.3.0-SNAPSHOT-bin-master/jars/* -Xmx1g org.apache.spark.deploy.history.HistoryServer Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties 22/03/06 13:37:19 INFO HistoryServer: Started daemon with process name: 48729c3ffc10aa9 22/03/06 13:37:19 INFO SignalUtils: Registering signal handler for TERM 22/03/06 13:37:19 INFO SignalUtils: Registering signal handler for HUP 22/03/06 13:37:19 INFO SignalUtils: Registering signal handler for INT 22/03/06 13:37:21 WARN NativeCodeLoader:
[spark] branch master updated: [SPARK-38411][CORE] Use `UTF-8` when `doMergeApplicationListingInternal` reads event logs
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 135841f [SPARK-38411][CORE] Use `UTF-8` when `doMergeApplicationListingInternal` reads event logs 135841f is described below commit 135841f257fbb008aef211a5e38222940849cb26 Author: Cheng Pan AuthorDate: Sun Mar 6 15:41:20 2022 -0800 [SPARK-38411][CORE] Use `UTF-8` when `doMergeApplicationListingInternal` reads event logs ### What changes were proposed in this pull request? Use UTF-8 instead of system default encoding to read event log ### Why are the changes needed? After SPARK-29160, we should always use UTF-8 to read event log, otherwise, if Spark History Server run with different default charset than "UTF-8", will encounter such error. ``` 2022-03-04 12:16:00,143 [3752440] - INFO [log-replay-executor-19:Logging57] - Parsing hdfs://hz-cluster11/spark2-history/application_1640597251469_2453817_1.lz4 for listing data... 2022-03-04 12:16:00,145 [3752442] - ERROR [log-replay-executor-18:Logging94] - Exception while merging application listings java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(CoderResult.java:281) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:161) at java.io.BufferedReader.readLine(BufferedReader.java:324) at java.io.BufferedReader.readLine(BufferedReader.java:389) at scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:74) at scala.collection.Iterator$$anon$20.hasNext(Iterator.scala:884) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:511) at org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:82) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$doMergeApplicationListing$4(FsHistoryProvider.scala:819) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$doMergeApplicationListing$4$adapted(FsHistoryProvider.scala:801) at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2626) at org.apache.spark.deploy.history.FsHistoryProvider.doMergeApplicationListing(FsHistoryProvider.scala:801) at org.apache.spark.deploy.history.FsHistoryProvider.mergeApplicationListing(FsHistoryProvider.scala:715) at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$15(FsHistoryProvider.scala:581) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ``` ### Does this PR introduce _any_ user-facing change? Yes, bug fix. ### How was this patch tested? Verification steps in ubuntu:20.04 1. build `spark-3.3.0-SNAPSHOT-bin-master.tgz` on commit `34618a7ef6` using `dev/make-distribution.sh --tgz --name master` 2. build `spark-3.3.0-SNAPSHOT-bin-SPARK-38411.tgz` on commit `2a8f56038b` using `dev/make-distribution.sh --tgz --name SPARK-38411` 3. switch to UTF-8 using `export LC_ALL=C.UTF-8 && bash` 4. generate event log contains no-ASCII chars. ``` bin/spark-submit \ --master local[*] \ --class org.apache.spark.examples.SparkPi \ --conf spark.eventLog.enabled=true \ --conf spark.user.key='计算圆周率' \ examples/jars/spark-examples_2.12-3.3.0-SNAPSHOT.jar ``` 5. switch to POSIX using `export LC_ALL=POSIX && bash` 6. run `spark-3.3.0-SNAPSHOT-bin-master/sbin/start-history-server.sh` and watch logs ``` Spark Command: /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -cp /spark-3.3.0-SNAPSHOT-bin-master/conf/:/spark-3.3.0-SNAPSHOT-bin-master/jars/* -Xmx1g org.apache.spark.deploy.history.HistoryServer Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties 22/03/06 13:37:19 INFO HistoryServer: Started daemon with process name: 48729c3ffc10aa9 22/03/06 13:37:19 INFO SignalUtils: Registering signal handler for TERM 22/03/06 13:37:19 INFO SignalUtils: Registering signal handler for HUP 22/03/06 13:37:19 INFO SignalUtils: Registering signal handler for INT 22/03/06 13:37:21 WARN NativeCodeLoader: Unable to
[spark] branch master updated (18219d4 -> 69bc9d1)
This is an automated email from the ASF dual-hosted git repository. srowen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 18219d4 [SPARK-37400][SPARK-37426][PYTHON][MLLIB] Inline type hints for pyspark.mllib classification and regression add 69bc9d1 [SPARK-38239][PYTHON][MLLIB] Fix pyspark.mllib.LogisticRegressionModel.__repr__ No new revisions were added by this update. Summary of changes: python/pyspark/mllib/classification.py | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org