(spark) branch master updated: [SPARK-48628][CORE] Add task peak on/off heap memory metrics
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 717a6da11d0b [SPARK-48628][CORE] Add task peak on/off heap memory metrics 717a6da11d0b is described below commit 717a6da11d0b3d0383141f9682c14201aff821c0 Author: Ziqi Liu AuthorDate: Wed Aug 7 11:52:13 2024 -0700 [SPARK-48628][CORE] Add task peak on/off heap memory metrics ### What changes were proposed in this pull request? Add task on/off heap execution memory in `TaskMetrics`, tracked in `TaskMemoryManager`, **assuming `acquireExecutionMemory` is the only one narrow waist for acquiring execution memory.** ### Why are the changes needed? Currently there is no task on/off heap execution memory metrics. There is a [peakExecutionMemory](https://github.com/apache/spark/blob/3cd35f8cb6462051c621cf49de54b9c5692aae1d/core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala#L114) metrics, however, the semantic is a confusing: it only cover the execution memory used by shuffle/join/aggregate/sort, which is accumulated in specific operators and thus not really reflect the real execution memory. Therefore it's necessary to add these two metrics. Also I created two followup sub tickets: - https://issues.apache.org/jira/browse/SPARK-48788 : accumulate task metrics in stage, and display in Spark UI - https://issues.apache.org/jira/browse/SPARK-48789 : deprecate `peakExecutionMemory` once we have replacement for it. The ultimate goal is to have these two metrics ready (as accumulated stage metrics in Spark UI as well) and deprecate `peakExecutionMemory`. ### Does this PR introduce _any_ user-facing change? Supposedly no. But two followup sub tickets will have user-facing change: new metrics exposed to Spark UI, and old metrics deprecation. ### How was this patch tested? new test ### Was this patch authored or co-authored using generative AI tooling? NO Closes #47192 from liuzqt/SPARK-48628. Authored-by: Ziqi Liu Signed-off-by: Xingbo Jiang --- .../org/apache/spark/memory/TaskMemoryManager.java | 36 +++ .../org/apache/spark/InternalAccumulator.scala | 2 + .../scala/org/apache/spark/executor/Executor.scala | 2 + .../org/apache/spark/executor/TaskMetrics.scala| 21 +++ .../scala/org/apache/spark/util/JsonProtocol.scala | 6 ++ .../apache/spark/memory/MemoryManagerSuite.scala | 18 ++ .../org/apache/spark/util/JsonProtocolSuite.scala | 72 ++ 7 files changed, 132 insertions(+), 25 deletions(-) diff --git a/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java b/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java index fe798e40a6ad..541ed7e2350b 100644 --- a/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java +++ b/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java @@ -122,6 +122,16 @@ public class TaskMemoryManager { */ private volatile long acquiredButNotUsed = 0L; + /** + * Peak off heap memory usage by this task. + */ + private volatile long peakOffHeapMemory = 0L; + + /** + * Peak on heap memory usage by this task. + */ + private volatile long peakOnHeapMemory = 0L; + /** * Construct a new TaskMemoryManager. */ @@ -202,6 +212,17 @@ public class TaskMemoryManager { logger.debug("Task {} acquired {} for {}", taskAttemptId, Utils.bytesToString(got), requestingConsumer); } + + // Consumer will update its used memory after acquireExecutionMemory, so we need to add `got` + // to compute current peak + long currentPeak = consumers.stream().filter(c -> c.getMode() == mode) +.mapToLong(MemoryConsumer::getUsed).sum() + got; + if (mode == MemoryMode.OFF_HEAP) { +peakOffHeapMemory = Math.max(peakOffHeapMemory, currentPeak); + } else { +peakOnHeapMemory = Math.max(peakOnHeapMemory, currentPeak); + } + return got; } } @@ -507,4 +528,19 @@ public class TaskMemoryManager { public MemoryMode getTungstenMemoryMode() { return tungstenMemoryMode; } + + /** + * Returns peak task-level off-heap memory usage in bytes. + * + */ + public long getPeakOnHeapExecutionMemory() { +return peakOnHeapMemory; + } + + /** + * Returns peak task-level on-heap memory usage in bytes. + */ + public long getPeakOffHeapExecutionMemory() { +return peakOffHeapMemory; + } } diff --git a/core/src/main/scala/org/apache/spark/InternalAccumulator.scala b/core/src/main/scala/org/apache/spark/InternalAccumulator.scala index ef4609e6d645..505634d5bb04 100644 --- a/core/src/main/scala/org/apache/spark/InternalAccumulator.scala +++ b/core/src/main/scal
[spark] branch branch-3.4 updated: [SPARK-43043][CORE] Improve the performance of MapOutputTracker.updateMapOutput
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.4 by this push: new f68ece9e607 [SPARK-43043][CORE] Improve the performance of MapOutputTracker.updateMapOutput f68ece9e607 is described below commit f68ece9e6074cecdaf74ad9b39eae3c7dc2cfaf1 Author: Xingbo Jiang AuthorDate: Tue May 16 11:34:30 2023 -0700 [SPARK-43043][CORE] Improve the performance of MapOutputTracker.updateMapOutput ### What changes were proposed in this pull request? The PR changes the implementation of MapOutputTracker.updateMapOutput() to search for the MapStatus under the help of a mapping from mapId to mapIndex, previously it was performing a linear search, which would become performance bottleneck if a large proportion of all blocks in the map are migrated. ### Why are the changes needed? To avoid performance bottleneck when block decommission is enabled and a lot of blocks are migrated within a short time window. ### Does this PR introduce _any_ user-facing change? No, it's pure performance improvement. ### How was this patch tested? Manually test. Closes #40690 from jiangxb1987/SPARK-43043. Lead-authored-by: Xingbo Jiang Co-authored-by: Jiang Xingbo Signed-off-by: Xingbo Jiang (cherry picked from commit 66a2eb8f8957c22c69519b39be59beaaf931822b) Signed-off-by: Xingbo Jiang --- .../scala/org/apache/spark/MapOutputTracker.scala | 26 +- .../apache/spark/util/collection/OpenHashMap.scala | 18 +++ .../spark/util/collection/OpenHashMapSuite.scala | 18 +++ 3 files changed, 56 insertions(+), 6 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/MapOutputTracker.scala b/core/src/main/scala/org/apache/spark/MapOutputTracker.scala index fade0b86dd8..2dd3a903ee2 100644 --- a/core/src/main/scala/org/apache/spark/MapOutputTracker.scala +++ b/core/src/main/scala/org/apache/spark/MapOutputTracker.scala @@ -42,6 +42,7 @@ import org.apache.spark.scheduler.{MapStatus, MergeStatus, ShuffleOutputStatus} import org.apache.spark.shuffle.MetadataFetchFailedException import org.apache.spark.storage.{BlockId, BlockManagerId, ShuffleBlockId, ShuffleMergedBlockId} import org.apache.spark.util._ +import org.apache.spark.util.collection.OpenHashMap import org.apache.spark.util.io.{ChunkedByteBuffer, ChunkedByteBufferOutputStream} /** @@ -147,6 +148,12 @@ private class ShuffleStatus( private[this] var shufflePushMergerLocations: Seq[BlockManagerId] = Seq.empty + /** + * Mapping from a mapId to the mapIndex, this is required to reduce the searching overhead within + * the function updateMapOutput(mapId, bmAddress). + */ + private[this] val mapIdToMapIndex = new OpenHashMap[Long, Int]() + /** * Register a map output. If there is already a registered location for the map output then it * will be replaced by the new location. @@ -157,6 +164,14 @@ private class ShuffleStatus( invalidateSerializedMapOutputStatusCache() } mapStatuses(mapIndex) = status +mapIdToMapIndex(status.mapId) = mapIndex + } + + /** + * Get the map output that corresponding to a given mapId. + */ + def getMapStatus(mapId: Long): Option[MapStatus] = withReadLock { +mapIdToMapIndex.get(mapId).map(mapStatuses(_)) } /** @@ -164,15 +179,16 @@ private class ShuffleStatus( */ def updateMapOutput(mapId: Long, bmAddress: BlockManagerId): Unit = withWriteLock { try { - val mapStatusOpt = mapStatuses.find(x => x != null && x.mapId == mapId) + val mapIndex = mapIdToMapIndex.get(mapId) + val mapStatusOpt = mapIndex.map(mapStatuses(_)).flatMap(Option(_)) mapStatusOpt match { case Some(mapStatus) => logInfo(s"Updating map output for ${mapId} to ${bmAddress}") mapStatus.updateLocation(bmAddress) invalidateSerializedMapOutputStatusCache() case None => - val index = mapStatusesDeleted.indexWhere(x => x != null && x.mapId == mapId) - if (index >= 0 && mapStatuses(index) == null) { + if (mapIndex.map(mapStatusesDeleted).exists(_.mapId == mapId)) { +val index = mapIndex.get val mapStatus = mapStatusesDeleted(index) mapStatus.updateLocation(bmAddress) mapStatuses(index) = mapStatus @@ -1133,9 +1149,7 @@ private[spark] class MapOutputTrackerMaster( */ def getMapOutputLocation(shuffleId: Int, mapId: Long): Option[BlockManagerId] = { shuffleStatuses.get(shuffleId).flatMap { shuffleStatus => - shuffleStatus.withMapStatuses { mapStatues => -mapStatues.filter(_ != null).find(_.mapId == mapId).map(_.location) -
[spark] branch master updated: [SPARK-43043][CORE] Improve the performance of MapOutputTracker.updateMapOutput
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 66a2eb8f895 [SPARK-43043][CORE] Improve the performance of MapOutputTracker.updateMapOutput 66a2eb8f895 is described below commit 66a2eb8f8957c22c69519b39be59beaaf931822b Author: Xingbo Jiang AuthorDate: Tue May 16 11:34:30 2023 -0700 [SPARK-43043][CORE] Improve the performance of MapOutputTracker.updateMapOutput ### What changes were proposed in this pull request? The PR changes the implementation of MapOutputTracker.updateMapOutput() to search for the MapStatus under the help of a mapping from mapId to mapIndex, previously it was performing a linear search, which would become performance bottleneck if a large proportion of all blocks in the map are migrated. ### Why are the changes needed? To avoid performance bottleneck when block decommission is enabled and a lot of blocks are migrated within a short time window. ### Does this PR introduce _any_ user-facing change? No, it's pure performance improvement. ### How was this patch tested? Manually test. Closes #40690 from jiangxb1987/SPARK-43043. Lead-authored-by: Xingbo Jiang Co-authored-by: Jiang Xingbo Signed-off-by: Xingbo Jiang --- .../scala/org/apache/spark/MapOutputTracker.scala | 26 +- .../apache/spark/util/collection/OpenHashMap.scala | 18 +++ .../spark/util/collection/OpenHashMapSuite.scala | 18 +++ 3 files changed, 56 insertions(+), 6 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/MapOutputTracker.scala b/core/src/main/scala/org/apache/spark/MapOutputTracker.scala index 5ad62159d24..9a5cf1da9e4 100644 --- a/core/src/main/scala/org/apache/spark/MapOutputTracker.scala +++ b/core/src/main/scala/org/apache/spark/MapOutputTracker.scala @@ -42,6 +42,7 @@ import org.apache.spark.scheduler.{MapStatus, MergeStatus, ShuffleOutputStatus} import org.apache.spark.shuffle.MetadataFetchFailedException import org.apache.spark.storage.{BlockId, BlockManagerId, ShuffleBlockId, ShuffleMergedBlockId} import org.apache.spark.util._ +import org.apache.spark.util.collection.OpenHashMap import org.apache.spark.util.io.{ChunkedByteBuffer, ChunkedByteBufferOutputStream} /** @@ -147,6 +148,12 @@ private class ShuffleStatus( private[this] var shufflePushMergerLocations: Seq[BlockManagerId] = Seq.empty + /** + * Mapping from a mapId to the mapIndex, this is required to reduce the searching overhead within + * the function updateMapOutput(mapId, bmAddress). + */ + private[this] val mapIdToMapIndex = new OpenHashMap[Long, Int]() + /** * Register a map output. If there is already a registered location for the map output then it * will be replaced by the new location. @@ -157,6 +164,14 @@ private class ShuffleStatus( invalidateSerializedMapOutputStatusCache() } mapStatuses(mapIndex) = status +mapIdToMapIndex(status.mapId) = mapIndex + } + + /** + * Get the map output that corresponding to a given mapId. + */ + def getMapStatus(mapId: Long): Option[MapStatus] = withReadLock { +mapIdToMapIndex.get(mapId).map(mapStatuses(_)) } /** @@ -164,15 +179,16 @@ private class ShuffleStatus( */ def updateMapOutput(mapId: Long, bmAddress: BlockManagerId): Unit = withWriteLock { try { - val mapStatusOpt = mapStatuses.find(x => x != null && x.mapId == mapId) + val mapIndex = mapIdToMapIndex.get(mapId) + val mapStatusOpt = mapIndex.map(mapStatuses(_)).flatMap(Option(_)) mapStatusOpt match { case Some(mapStatus) => logInfo(s"Updating map output for ${mapId} to ${bmAddress}") mapStatus.updateLocation(bmAddress) invalidateSerializedMapOutputStatusCache() case None => - val index = mapStatusesDeleted.indexWhere(x => x != null && x.mapId == mapId) - if (index >= 0 && mapStatuses(index) == null) { + if (mapIndex.map(mapStatusesDeleted).exists(_.mapId == mapId)) { +val index = mapIndex.get val mapStatus = mapStatusesDeleted(index) mapStatus.updateLocation(bmAddress) mapStatuses(index) = mapStatus @@ -1137,9 +1153,7 @@ private[spark] class MapOutputTrackerMaster( */ def getMapOutputLocation(shuffleId: Int, mapId: Long): Option[BlockManagerId] = { shuffleStatuses.get(shuffleId).flatMap { shuffleStatus => - shuffleStatus.withMapStatuses { mapStatues => -mapStatues.filter(_ != null).find(_.mapId == mapId).map(_.location) - } + shuffleStatus.getMapStatus(mapId).map(_.location) } } diff --git a/core/src/main/scala/org/apache/spark/
[spark] branch master updated (8fef5bb -> bcaab62)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 8fef5bb [SPARK-37979][SQL] Switch to more generic error classes in AES functions add bcaab62 [SPARK-37891][CORE] Add scalastyle check to disable scala.concurrent.ExecutionContext.Implicits.global No new revisions were added by this update. Summary of changes: .../src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala | 2 ++ .../scala/org/apache/spark/deploy/rest/RestSubmissionClient.scala | 2 ++ core/src/main/scala/org/apache/spark/storage/FallbackStorage.scala | 2 ++ core/src/test/scala/org/apache/spark/JobCancellationSuite.scala | 2 ++ core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala | 2 ++ .../org/apache/spark/scheduler/SchedulerIntegrationSuite.scala | 4 .../scala/org/apache/spark/serializer/KryoSerializerBenchmark.scala | 2 ++ .../src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala | 2 ++ .../org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala | 2 ++ scalastyle-config.xml | 6 ++ .../org/apache/spark/sql/streaming/StreamingQueryManagerSuite.scala | 4 ++-- .../scala/org/apache/spark/streaming/StreamingListenerSuite.scala | 2 ++ 12 files changed, 30 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (678592a -> 5c8a141)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 678592a [SPARK-35559][TEST] Speed up one test in AdaptiveQueryExecSuite add 5c8a141 [SPARK-35538][SQL] Migrate transformAllExpressions call sites to use transformAllExpressionsWithPruning No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/expressions/DynamicPruning.scala| 3 ++- .../spark/sql/catalyst/expressions/complexTypeCreator.scala | 4 +++- .../spark/sql/catalyst/expressions/datetimeExpressions.scala | 2 ++ .../org/apache/spark/sql/catalyst/expressions/subquery.scala | 4 ++-- .../org/apache/spark/sql/catalyst/optimizer/UpdateFields.scala| 7 --- .../scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala | 4 +++- .../scala/org/apache/spark/sql/catalyst/trees/TreePatterns.scala | 8 .../main/scala/org/apache/spark/sql/execution/CacheManager.scala | 3 ++- .../execution/adaptive/PlanAdaptiveDynamicPruningFilters.scala| 4 +++- .../spark/sql/execution/adaptive/ReuseAdaptiveSubquery.scala | 3 ++- .../scala/org/apache/spark/sql/execution/exchange/Exchange.scala | 8 +--- .../spark/sql/execution/streaming/IncrementalExecution.scala | 4 +++- .../spark/sql/execution/streaming/MicroBatchExecution.scala | 4 +++- .../sql/execution/streaming/continuous/ContinuousExecution.scala | 3 ++- .../src/main/scala/org/apache/spark/sql/execution/subquery.scala | 5 +++-- 15 files changed, 47 insertions(+), 19 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a7d0d35 -> 5e89fbe)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a7d0d35 [SPARK-31994][K8S] Docker image should use `https` urls for only deb.debian.org mirrors add 5e89fbe [SPARK-31824][CORE][TESTS] DAGSchedulerSuite: Improve and reuse completeShuffleMapStageSuccessfully No new revisions were added by this update. Summary of changes: .../apache/spark/scheduler/DAGSchedulerSuite.scala | 197 - 1 file changed, 77 insertions(+), 120 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a7d0d35 -> 5e89fbe)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a7d0d35 [SPARK-31994][K8S] Docker image should use `https` urls for only deb.debian.org mirrors add 5e89fbe [SPARK-31824][CORE][TESTS] DAGSchedulerSuite: Improve and reuse completeShuffleMapStageSuccessfully No new revisions were added by this update. Summary of changes: .../apache/spark/scheduler/DAGSchedulerSuite.scala | 197 - 1 file changed, 77 insertions(+), 120 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a7d0d35 -> 5e89fbe)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a7d0d35 [SPARK-31994][K8S] Docker image should use `https` urls for only deb.debian.org mirrors add 5e89fbe [SPARK-31824][CORE][TESTS] DAGSchedulerSuite: Improve and reuse completeShuffleMapStageSuccessfully No new revisions were added by this update. Summary of changes: .../apache/spark/scheduler/DAGSchedulerSuite.scala | 197 - 1 file changed, 77 insertions(+), 120 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31764][CORE][3.0] JsonProtocol doesn't write RDDInfo#isBarrier
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new b6c8366 [SPARK-31764][CORE][3.0] JsonProtocol doesn't write RDDInfo#isBarrier b6c8366 is described below commit b6c8366cc58b7d5e35ec2a31532ae7ee22275454 Author: Kousuke Saruta AuthorDate: Mon Jun 1 14:33:35 2020 -0700 [SPARK-31764][CORE][3.0] JsonProtocol doesn't write RDDInfo#isBarrier ### What changes were proposed in this pull request? This PR backports the change of #28583 (SPARK-31764) to branch-3.0, which changes JsonProtocol to write RDDInfos#isBarrier. ### Why are the changes needed? JsonProtocol reads RDDInfos#isBarrier but not write it so it's a bug. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I added a testcase. Closes #28660 from sarutak/SPARK-31764-branch-3.0. Authored-by: Kousuke Saruta Signed-off-by: Xingbo Jiang --- .../scala/org/apache/spark/util/JsonProtocol.scala | 1 + .../scheduler/EventLoggingListenerSuite.scala | 44 ++ .../org/apache/spark/util/JsonProtocolSuite.scala | 11 ++ 3 files changed, 56 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala index aee9862..f445fd4 100644 --- a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala +++ b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala @@ -475,6 +475,7 @@ private[spark] object JsonProtocol { ("Callsite" -> rddInfo.callSite) ~ ("Parent IDs" -> parentIds) ~ ("Storage Level" -> storageLevel) ~ +("Barrier" -> rddInfo.isBarrier) ~ ("Number of Partitions" -> rddInfo.numPartitions) ~ ("Number of Cached Partitions" -> rddInfo.numCachedPartitions) ~ ("Memory Size" -> rddInfo.memSize) ~ diff --git a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala index 2869240..046564d 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala @@ -36,6 +36,7 @@ import org.apache.spark.deploy.history.{EventLogFileReader, SingleEventLogFileWr import org.apache.spark.deploy.history.EventLogTestHelper._ import org.apache.spark.executor.{ExecutorMetrics, TaskMetrics} import org.apache.spark.internal.Logging +import org.apache.spark.internal.config.{EVENT_LOG_DIR, EVENT_LOG_ENABLED} import org.apache.spark.io._ import org.apache.spark.metrics.{ExecutorMetricType, MetricsSystem} import org.apache.spark.scheduler.cluster.ExecutorInfo @@ -99,6 +100,49 @@ class EventLoggingListenerSuite extends SparkFunSuite with LocalSparkContext wit testStageExecutorMetricsEventLogging() } + test("SPARK-31764: isBarrier should be logged in event log") { +val conf = new SparkConf() +conf.set(EVENT_LOG_ENABLED, true) +conf.set(EVENT_LOG_DIR, testDirPath.toString) +val sc = new SparkContext("local", "test-SPARK-31764", conf) +val appId = sc.applicationId + +sc.parallelize(1 to 10) + .barrier() + .mapPartitions(_.map(elem => (elem, elem))) + .filter(elem => elem._1 % 2 == 0) + .reduceByKey(_ + _) + .collect +sc.stop() + +val eventLogStream = EventLogFileReader.openEventLog(new Path(testDirPath, appId), fileSystem) +val events = readLines(eventLogStream).map(line => JsonProtocol.sparkEventFromJson(parse(line))) +val jobStartEvents = events + .filter(event => event.isInstanceOf[SparkListenerJobStart]) + .map(_.asInstanceOf[SparkListenerJobStart]) + +assert(jobStartEvents.size === 1) +val stageInfos = jobStartEvents.head.stageInfos +assert(stageInfos.size === 2) + +val stage0 = stageInfos(0) +val rddInfosInStage0 = stage0.rddInfos +assert(rddInfosInStage0.size === 3) +val sortedRddInfosInStage0 = rddInfosInStage0.sortBy(_.scope.get.name) +assert(sortedRddInfosInStage0(0).scope.get.name === "filter") +assert(sortedRddInfosInStage0(0).isBarrier === true) +assert(sortedRddInfosInStage0(1).scope.get.name === "mapPartitions") +assert(sortedRddInfosInStage0(1).isBarrier === true) +assert(sortedRddInfosInStage0(2).scope.get.name === "parallelize") +assert(sortedRddInfosInStage0(2).isBarrier === false) + +val stage1 = stageInfos(1) +val rddInfosInStage1 = stage1.rddInfos +assert(rddInfosInStage1.size === 1) +assert(
[spark] branch branch-3.0 updated: [SPARK-31764][CORE][3.0] JsonProtocol doesn't write RDDInfo#isBarrier
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new b6c8366 [SPARK-31764][CORE][3.0] JsonProtocol doesn't write RDDInfo#isBarrier b6c8366 is described below commit b6c8366cc58b7d5e35ec2a31532ae7ee22275454 Author: Kousuke Saruta AuthorDate: Mon Jun 1 14:33:35 2020 -0700 [SPARK-31764][CORE][3.0] JsonProtocol doesn't write RDDInfo#isBarrier ### What changes were proposed in this pull request? This PR backports the change of #28583 (SPARK-31764) to branch-3.0, which changes JsonProtocol to write RDDInfos#isBarrier. ### Why are the changes needed? JsonProtocol reads RDDInfos#isBarrier but not write it so it's a bug. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I added a testcase. Closes #28660 from sarutak/SPARK-31764-branch-3.0. Authored-by: Kousuke Saruta Signed-off-by: Xingbo Jiang --- .../scala/org/apache/spark/util/JsonProtocol.scala | 1 + .../scheduler/EventLoggingListenerSuite.scala | 44 ++ .../org/apache/spark/util/JsonProtocolSuite.scala | 11 ++ 3 files changed, 56 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala index aee9862..f445fd4 100644 --- a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala +++ b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala @@ -475,6 +475,7 @@ private[spark] object JsonProtocol { ("Callsite" -> rddInfo.callSite) ~ ("Parent IDs" -> parentIds) ~ ("Storage Level" -> storageLevel) ~ +("Barrier" -> rddInfo.isBarrier) ~ ("Number of Partitions" -> rddInfo.numPartitions) ~ ("Number of Cached Partitions" -> rddInfo.numCachedPartitions) ~ ("Memory Size" -> rddInfo.memSize) ~ diff --git a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala index 2869240..046564d 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala @@ -36,6 +36,7 @@ import org.apache.spark.deploy.history.{EventLogFileReader, SingleEventLogFileWr import org.apache.spark.deploy.history.EventLogTestHelper._ import org.apache.spark.executor.{ExecutorMetrics, TaskMetrics} import org.apache.spark.internal.Logging +import org.apache.spark.internal.config.{EVENT_LOG_DIR, EVENT_LOG_ENABLED} import org.apache.spark.io._ import org.apache.spark.metrics.{ExecutorMetricType, MetricsSystem} import org.apache.spark.scheduler.cluster.ExecutorInfo @@ -99,6 +100,49 @@ class EventLoggingListenerSuite extends SparkFunSuite with LocalSparkContext wit testStageExecutorMetricsEventLogging() } + test("SPARK-31764: isBarrier should be logged in event log") { +val conf = new SparkConf() +conf.set(EVENT_LOG_ENABLED, true) +conf.set(EVENT_LOG_DIR, testDirPath.toString) +val sc = new SparkContext("local", "test-SPARK-31764", conf) +val appId = sc.applicationId + +sc.parallelize(1 to 10) + .barrier() + .mapPartitions(_.map(elem => (elem, elem))) + .filter(elem => elem._1 % 2 == 0) + .reduceByKey(_ + _) + .collect +sc.stop() + +val eventLogStream = EventLogFileReader.openEventLog(new Path(testDirPath, appId), fileSystem) +val events = readLines(eventLogStream).map(line => JsonProtocol.sparkEventFromJson(parse(line))) +val jobStartEvents = events + .filter(event => event.isInstanceOf[SparkListenerJobStart]) + .map(_.asInstanceOf[SparkListenerJobStart]) + +assert(jobStartEvents.size === 1) +val stageInfos = jobStartEvents.head.stageInfos +assert(stageInfos.size === 2) + +val stage0 = stageInfos(0) +val rddInfosInStage0 = stage0.rddInfos +assert(rddInfosInStage0.size === 3) +val sortedRddInfosInStage0 = rddInfosInStage0.sortBy(_.scope.get.name) +assert(sortedRddInfosInStage0(0).scope.get.name === "filter") +assert(sortedRddInfosInStage0(0).isBarrier === true) +assert(sortedRddInfosInStage0(1).scope.get.name === "mapPartitions") +assert(sortedRddInfosInStage0(1).isBarrier === true) +assert(sortedRddInfosInStage0(2).scope.get.name === "parallelize") +assert(sortedRddInfosInStage0(2).isBarrier === false) + +val stage1 = stageInfos(1) +val rddInfosInStage1 = stage1.rddInfos +assert(rddInfosInStage1.size === 1) +assert(
[spark] branch branch-3.0 updated (805884a -> 76a0418)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 805884a [SPARK-31885][SQL] Fix filter push down for old millis timestamps to Parquet add 76a0418 Revert "[SPARK-31885][SQL] Fix filter push down for old millis timestamps to Parquet" No new revisions were added by this update. Summary of changes: .../datasources/parquet/ParquetFilters.scala | 27 +++--- .../datasources/parquet/ParquetFilterSuite.scala | 16 ++--- .../datasources/parquet/ParquetTest.scala | 4 +--- 3 files changed, 22 insertions(+), 25 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (805884a -> 76a0418)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 805884a [SPARK-31885][SQL] Fix filter push down for old millis timestamps to Parquet add 76a0418 Revert "[SPARK-31885][SQL] Fix filter push down for old millis timestamps to Parquet" No new revisions were added by this update. Summary of changes: .../datasources/parquet/ParquetFilters.scala | 27 +++--- .../datasources/parquet/ParquetFilterSuite.scala | 16 ++--- .../datasources/parquet/ParquetTest.scala | 4 +--- 3 files changed, 22 insertions(+), 25 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: Revert "[SPARK-31885][SQL] Fix filter push down for old millis timestamps to Parquet"
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 76a0418 Revert "[SPARK-31885][SQL] Fix filter push down for old millis timestamps to Parquet" 76a0418 is described below commit 76a041804d9ba856c221e944b9898c8c62854474 Author: Xingbo Jiang AuthorDate: Mon Jun 1 10:20:17 2020 -0700 Revert "[SPARK-31885][SQL] Fix filter push down for old millis timestamps to Parquet" This reverts commit 805884ad10aca3a532314284b0f2c007d4ca1045. --- .../datasources/parquet/ParquetFilters.scala | 27 +++--- .../datasources/parquet/ParquetFilterSuite.scala | 16 ++--- .../datasources/parquet/ParquetTest.scala | 4 +--- 3 files changed, 22 insertions(+), 25 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala index 7900693..d89186a 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala @@ -148,13 +148,6 @@ class ParquetFilters( Binary.fromConstantByteArray(fixedLengthBytes, 0, numBytes) } - private def timestampToMillis(v: Any): JLong = { -val timestamp = v.asInstanceOf[Timestamp] -val micros = DateTimeUtils.fromJavaTimestamp(timestamp) -val millis = DateTimeUtils.microsToMillis(micros) -millis.asInstanceOf[JLong] - } - private val makeEq: PartialFunction[ParquetSchemaType, (Array[String], Any) => FilterPredicate] = { case ParquetBooleanType => @@ -191,7 +184,7 @@ class ParquetFilters( case ParquetTimestampMillisType if pushDownTimestamp => (n: Array[String], v: Any) => FilterApi.eq( longColumn(n), -Option(v).map(timestampToMillis).orNull) + Option(v).map(_.asInstanceOf[Timestamp].getTime.asInstanceOf[JLong]).orNull) case ParquetSchemaType(DECIMAL, INT32, _, _) if pushDownDecimal => (n: Array[String], v: Any) => FilterApi.eq( @@ -242,7 +235,7 @@ class ParquetFilters( case ParquetTimestampMillisType if pushDownTimestamp => (n: Array[String], v: Any) => FilterApi.notEq( longColumn(n), -Option(v).map(timestampToMillis).orNull) + Option(v).map(_.asInstanceOf[Timestamp].getTime.asInstanceOf[JLong]).orNull) case ParquetSchemaType(DECIMAL, INT32, _, _) if pushDownDecimal => (n: Array[String], v: Any) => FilterApi.notEq( @@ -284,7 +277,9 @@ class ParquetFilters( longColumn(n), DateTimeUtils.fromJavaTimestamp(v.asInstanceOf[Timestamp]).asInstanceOf[JLong]) case ParquetTimestampMillisType if pushDownTimestamp => - (n: Array[String], v: Any) => FilterApi.lt(longColumn(n), timestampToMillis(v)) + (n: Array[String], v: Any) => FilterApi.lt( +longColumn(n), +v.asInstanceOf[Timestamp].getTime.asInstanceOf[JLong]) case ParquetSchemaType(DECIMAL, INT32, _, _) if pushDownDecimal => (n: Array[String], v: Any) => @@ -323,7 +318,9 @@ class ParquetFilters( longColumn(n), DateTimeUtils.fromJavaTimestamp(v.asInstanceOf[Timestamp]).asInstanceOf[JLong]) case ParquetTimestampMillisType if pushDownTimestamp => - (n: Array[String], v: Any) => FilterApi.ltEq(longColumn(n), timestampToMillis(v)) + (n: Array[String], v: Any) => FilterApi.ltEq( +longColumn(n), +v.asInstanceOf[Timestamp].getTime.asInstanceOf[JLong]) case ParquetSchemaType(DECIMAL, INT32, _, _) if pushDownDecimal => (n: Array[String], v: Any) => @@ -362,7 +359,9 @@ class ParquetFilters( longColumn(n), DateTimeUtils.fromJavaTimestamp(v.asInstanceOf[Timestamp]).asInstanceOf[JLong]) case ParquetTimestampMillisType if pushDownTimestamp => - (n: Array[String], v: Any) => FilterApi.gt(longColumn(n), timestampToMillis(v)) + (n: Array[String], v: Any) => FilterApi.gt( +longColumn(n), +v.asInstanceOf[Timestamp].getTime.asInstanceOf[JLong]) case ParquetSchemaType(DECIMAL, INT32, _, _) if pushDownDecimal => (n: Array[String], v: Any) => @@ -401,7 +400,9 @@ class ParquetFilters( longColumn(n), DateTimeUtils.fromJavaTimestamp(v.asInstanceOf[Timestamp]).asInstanceOf[JLong]) case ParquetTimestampMillisType if pushDownTimestamp => - (n: Array[String], v: Any) => FilterApi.gtEq(longColumn(n), timestampToMillis(v)) + (n: Array[String], v: Any) => FilterApi.gtEq( +longColumn(n), +v.asInstanceOf[Timestamp].getTime.asInstanceOf[JLong])
[spark] branch branch-3.0 updated: [SPARK-31730][CORE][TEST][3.0] Fix flaky tests in BarrierTaskContextSuite
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 3ff4db9 [SPARK-31730][CORE][TEST][3.0] Fix flaky tests in BarrierTaskContextSuite 3ff4db9 is described below commit 3ff4db97fb966e35c0b7450a7210cebf5d331be6 Author: Xingbo Jiang AuthorDate: Thu May 28 16:29:40 2020 -0700 [SPARK-31730][CORE][TEST][3.0] Fix flaky tests in BarrierTaskContextSuite ### What changes were proposed in this pull request? To wait until all the executors have started before submitting any job. This could avoid the flakiness caused by waiting for executors coming up. ### How was this patch tested? Existing tests. Closes #28658 from jiangxb1987/barrierTest. Authored-by: Xingbo Jiang Signed-off-by: Xingbo Jiang --- .../spark/scheduler/BarrierTaskContextSuite.scala | 22 ++ 1 file changed, 6 insertions(+), 16 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala index 6191e41..764b4b7 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala @@ -37,10 +37,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with .setAppName("test-cluster") .set(TEST_NO_STAGE_RETRY, true) sc = new SparkContext(conf) +TestUtils.waitUntilExecutorsUp(sc, numWorker, 6) } - // TODO (SPARK-31730): re-enable it - ignore("global sync by barrier() call") { + test("global sync by barrier() call") { initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => @@ -57,10 +57,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with } test("share messages with allGather() call") { -val conf = new SparkConf() - .setMaster("local-cluster[4, 1, 1024]") - .setAppName("test-cluster") -sc = new SparkContext(conf) +initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() @@ -78,10 +75,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with } test("throw exception if we attempt to synchronize with different blocking calls") { -val conf = new SparkConf() - .setMaster("local-cluster[4, 1, 1024]") - .setAppName("test-cluster") -sc = new SparkContext(conf) +initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() @@ -100,10 +94,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with } test("successively sync with allGather and barrier") { -val conf = new SparkConf() - .setMaster("local-cluster[4, 1, 1024]") - .setAppName("test-cluster") -sc = new SparkContext(conf) +initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() @@ -129,8 +120,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with assert(times2.max - times2.min <= 1000) } - // TODO (SPARK-31730): re-enable it - ignore("support multiple barrier() call within a single task") { + test("support multiple barrier() call within a single task") { initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31730][CORE][TEST][3.0] Fix flaky tests in BarrierTaskContextSuite
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 3ff4db9 [SPARK-31730][CORE][TEST][3.0] Fix flaky tests in BarrierTaskContextSuite 3ff4db9 is described below commit 3ff4db97fb966e35c0b7450a7210cebf5d331be6 Author: Xingbo Jiang AuthorDate: Thu May 28 16:29:40 2020 -0700 [SPARK-31730][CORE][TEST][3.0] Fix flaky tests in BarrierTaskContextSuite ### What changes were proposed in this pull request? To wait until all the executors have started before submitting any job. This could avoid the flakiness caused by waiting for executors coming up. ### How was this patch tested? Existing tests. Closes #28658 from jiangxb1987/barrierTest. Authored-by: Xingbo Jiang Signed-off-by: Xingbo Jiang --- .../spark/scheduler/BarrierTaskContextSuite.scala | 22 ++ 1 file changed, 6 insertions(+), 16 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala index 6191e41..764b4b7 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala @@ -37,10 +37,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with .setAppName("test-cluster") .set(TEST_NO_STAGE_RETRY, true) sc = new SparkContext(conf) +TestUtils.waitUntilExecutorsUp(sc, numWorker, 6) } - // TODO (SPARK-31730): re-enable it - ignore("global sync by barrier() call") { + test("global sync by barrier() call") { initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => @@ -57,10 +57,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with } test("share messages with allGather() call") { -val conf = new SparkConf() - .setMaster("local-cluster[4, 1, 1024]") - .setAppName("test-cluster") -sc = new SparkContext(conf) +initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() @@ -78,10 +75,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with } test("throw exception if we attempt to synchronize with different blocking calls") { -val conf = new SparkConf() - .setMaster("local-cluster[4, 1, 1024]") - .setAppName("test-cluster") -sc = new SparkContext(conf) +initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() @@ -100,10 +94,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with } test("successively sync with allGather and barrier") { -val conf = new SparkConf() - .setMaster("local-cluster[4, 1, 1024]") - .setAppName("test-cluster") -sc = new SparkContext(conf) +initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() @@ -129,8 +120,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with assert(times2.max - times2.min <= 1000) } - // TODO (SPARK-31730): re-enable it - ignore("support multiple barrier() call within a single task") { + test("support multiple barrier() call within a single task") { initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: Revert "[SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite"
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new e7b88e8 Revert "[SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite" e7b88e8 is described below commit e7b88e82ec5cee0a738f96127b106358cc97cb4f Author: Xingbo Jiang AuthorDate: Wed May 27 17:21:10 2020 -0700 Revert "[SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite" This reverts commit cb817bb1cf6b074e075c02880001ec96f2f39de7. --- .../spark/scheduler/BarrierTaskContextSuite.scala | 26 +- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala index 54899bf..6191e41 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala @@ -25,7 +25,6 @@ import org.scalatest.concurrent.Eventually import org.scalatest.time.SpanSugar._ import org.apache.spark._ -import org.apache.spark.internal.config import org.apache.spark.internal.config.Tests.TEST_NO_STAGE_RETRY class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with Eventually { @@ -38,10 +37,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with .setAppName("test-cluster") .set(TEST_NO_STAGE_RETRY, true) sc = new SparkContext(conf) -TestUtils.waitUntilExecutorsUp(sc, numWorker, 6) } - test("global sync by barrier() call") { + // TODO (SPARK-31730): re-enable it + ignore("global sync by barrier() call") { initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => @@ -58,7 +57,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with } test("share messages with allGather() call") { -initLocalClusterSparkContext() +val conf = new SparkConf() + .setMaster("local-cluster[4, 1, 1024]") + .setAppName("test-cluster") +sc = new SparkContext(conf) val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() @@ -76,7 +78,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with } test("throw exception if we attempt to synchronize with different blocking calls") { -initLocalClusterSparkContext() +val conf = new SparkConf() + .setMaster("local-cluster[4, 1, 1024]") + .setAppName("test-cluster") +sc = new SparkContext(conf) val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() @@ -95,7 +100,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with } test("successively sync with allGather and barrier") { -initLocalClusterSparkContext() +val conf = new SparkConf() + .setMaster("local-cluster[4, 1, 1024]") + .setAppName("test-cluster") +sc = new SparkContext(conf) val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() @@ -121,7 +129,8 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with assert(times2.max - times2.min <= 1000) } - test("support multiple barrier() call within a single task") { + // TODO (SPARK-31730): re-enable it + ignore("support multiple barrier() call within a single task") { initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => @@ -276,9 +285,6 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with test("SPARK-31485: barrier stage should fail if only partial tasks are launched") { initLocalClusterSparkContext(2) -// It's required to reset the delay timer when a task is scheduled, otherwise all the tasks -// could get scheduled at ANY level. -sc.conf.set(config.LEGACY_LOCALITY_WAIT_RESET, true) val rdd0 = sc.parallelize(Seq(0, 1, 2, 3), 2) val dep = new OneToOneDependency[Int](rdd0) // set up a barrier stage with 2 tasks and both tasks prefer executor 0 (only 1 core) for - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new cb817bb [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite cb817bb is described below commit cb817bb1cf6b074e075c02880001ec96f2f39de7 Author: Xingbo Jiang AuthorDate: Wed May 27 16:37:02 2020 -0700 [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite ### What changes were proposed in this pull request? To wait until all the executors have started before submitting any job. This could avoid the flakiness caused by waiting for executors coming up. ### How was this patch tested? Existing tests. Closes #28584 from jiangxb1987/barrierTest. Authored-by: Xingbo Jiang Signed-off-by: Xingbo Jiang (cherry picked from commit efe7fd2b6bea4a945ed7f3f486ab279c505378b4) Signed-off-by: Xingbo Jiang --- .../spark/scheduler/BarrierTaskContextSuite.scala | 26 +- 1 file changed, 10 insertions(+), 16 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala index 6191e41..54899bf 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala @@ -25,6 +25,7 @@ import org.scalatest.concurrent.Eventually import org.scalatest.time.SpanSugar._ import org.apache.spark._ +import org.apache.spark.internal.config import org.apache.spark.internal.config.Tests.TEST_NO_STAGE_RETRY class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with Eventually { @@ -37,10 +38,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with .setAppName("test-cluster") .set(TEST_NO_STAGE_RETRY, true) sc = new SparkContext(conf) +TestUtils.waitUntilExecutorsUp(sc, numWorker, 6) } - // TODO (SPARK-31730): re-enable it - ignore("global sync by barrier() call") { + test("global sync by barrier() call") { initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => @@ -57,10 +58,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with } test("share messages with allGather() call") { -val conf = new SparkConf() - .setMaster("local-cluster[4, 1, 1024]") - .setAppName("test-cluster") -sc = new SparkContext(conf) +initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() @@ -78,10 +76,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with } test("throw exception if we attempt to synchronize with different blocking calls") { -val conf = new SparkConf() - .setMaster("local-cluster[4, 1, 1024]") - .setAppName("test-cluster") -sc = new SparkContext(conf) +initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() @@ -100,10 +95,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with } test("successively sync with allGather and barrier") { -val conf = new SparkConf() - .setMaster("local-cluster[4, 1, 1024]") - .setAppName("test-cluster") -sc = new SparkContext(conf) +initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() @@ -129,8 +121,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with assert(times2.max - times2.min <= 1000) } - // TODO (SPARK-31730): re-enable it - ignore("support multiple barrier() call within a single task") { + test("support multiple barrier() call within a single task") { initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => @@ -285,6 +276,9 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with test("SPARK-31485: barrier stage should fail if only partial tasks are launched") { initLocalClusterSparkContext(2) +// It's required to reset the delay timer when a task is scheduled, otherwise all the tasks +// could get scheduled at ANY level. +sc.conf.set(config.LEGACY_LOCALITY_WAIT_RESET, true) val rdd0 = sc.par
[spark] branch branch-3.0 updated: [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new cb817bb [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite cb817bb is described below commit cb817bb1cf6b074e075c02880001ec96f2f39de7 Author: Xingbo Jiang AuthorDate: Wed May 27 16:37:02 2020 -0700 [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite ### What changes were proposed in this pull request? To wait until all the executors have started before submitting any job. This could avoid the flakiness caused by waiting for executors coming up. ### How was this patch tested? Existing tests. Closes #28584 from jiangxb1987/barrierTest. Authored-by: Xingbo Jiang Signed-off-by: Xingbo Jiang (cherry picked from commit efe7fd2b6bea4a945ed7f3f486ab279c505378b4) Signed-off-by: Xingbo Jiang --- .../spark/scheduler/BarrierTaskContextSuite.scala | 26 +- 1 file changed, 10 insertions(+), 16 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala index 6191e41..54899bf 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala @@ -25,6 +25,7 @@ import org.scalatest.concurrent.Eventually import org.scalatest.time.SpanSugar._ import org.apache.spark._ +import org.apache.spark.internal.config import org.apache.spark.internal.config.Tests.TEST_NO_STAGE_RETRY class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with Eventually { @@ -37,10 +38,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with .setAppName("test-cluster") .set(TEST_NO_STAGE_RETRY, true) sc = new SparkContext(conf) +TestUtils.waitUntilExecutorsUp(sc, numWorker, 6) } - // TODO (SPARK-31730): re-enable it - ignore("global sync by barrier() call") { + test("global sync by barrier() call") { initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => @@ -57,10 +58,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with } test("share messages with allGather() call") { -val conf = new SparkConf() - .setMaster("local-cluster[4, 1, 1024]") - .setAppName("test-cluster") -sc = new SparkContext(conf) +initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() @@ -78,10 +76,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with } test("throw exception if we attempt to synchronize with different blocking calls") { -val conf = new SparkConf() - .setMaster("local-cluster[4, 1, 1024]") - .setAppName("test-cluster") -sc = new SparkContext(conf) +initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() @@ -100,10 +95,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with } test("successively sync with allGather and barrier") { -val conf = new SparkConf() - .setMaster("local-cluster[4, 1, 1024]") - .setAppName("test-cluster") -sc = new SparkContext(conf) +initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() @@ -129,8 +121,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with assert(times2.max - times2.min <= 1000) } - // TODO (SPARK-31730): re-enable it - ignore("support multiple barrier() call within a single task") { + test("support multiple barrier() call within a single task") { initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => @@ -285,6 +276,9 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with test("SPARK-31485: barrier stage should fail if only partial tasks are launched") { initLocalClusterSparkContext(2) +// It's required to reset the delay timer when a task is scheduled, otherwise all the tasks +// could get scheduled at ANY level. +sc.conf.set(config.LEGACY_LOCALITY_WAIT_RESET, true) val rdd0 = sc.par
[spark] branch master updated (d19b173 -> efe7fd2)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d19b173 [SPARK-31764][CORE] JsonProtocol doesn't write RDDInfo#isBarrier add efe7fd2 [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite No new revisions were added by this update. Summary of changes: .../spark/scheduler/BarrierTaskContextSuite.scala | 26 +- 1 file changed, 10 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new cb817bb [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite cb817bb is described below commit cb817bb1cf6b074e075c02880001ec96f2f39de7 Author: Xingbo Jiang AuthorDate: Wed May 27 16:37:02 2020 -0700 [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite ### What changes were proposed in this pull request? To wait until all the executors have started before submitting any job. This could avoid the flakiness caused by waiting for executors coming up. ### How was this patch tested? Existing tests. Closes #28584 from jiangxb1987/barrierTest. Authored-by: Xingbo Jiang Signed-off-by: Xingbo Jiang (cherry picked from commit efe7fd2b6bea4a945ed7f3f486ab279c505378b4) Signed-off-by: Xingbo Jiang --- .../spark/scheduler/BarrierTaskContextSuite.scala | 26 +- 1 file changed, 10 insertions(+), 16 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala index 6191e41..54899bf 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala @@ -25,6 +25,7 @@ import org.scalatest.concurrent.Eventually import org.scalatest.time.SpanSugar._ import org.apache.spark._ +import org.apache.spark.internal.config import org.apache.spark.internal.config.Tests.TEST_NO_STAGE_RETRY class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with Eventually { @@ -37,10 +38,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with .setAppName("test-cluster") .set(TEST_NO_STAGE_RETRY, true) sc = new SparkContext(conf) +TestUtils.waitUntilExecutorsUp(sc, numWorker, 6) } - // TODO (SPARK-31730): re-enable it - ignore("global sync by barrier() call") { + test("global sync by barrier() call") { initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => @@ -57,10 +58,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with } test("share messages with allGather() call") { -val conf = new SparkConf() - .setMaster("local-cluster[4, 1, 1024]") - .setAppName("test-cluster") -sc = new SparkContext(conf) +initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() @@ -78,10 +76,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with } test("throw exception if we attempt to synchronize with different blocking calls") { -val conf = new SparkConf() - .setMaster("local-cluster[4, 1, 1024]") - .setAppName("test-cluster") -sc = new SparkContext(conf) +initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() @@ -100,10 +95,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with } test("successively sync with allGather and barrier") { -val conf = new SparkConf() - .setMaster("local-cluster[4, 1, 1024]") - .setAppName("test-cluster") -sc = new SparkContext(conf) +initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() @@ -129,8 +121,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with assert(times2.max - times2.min <= 1000) } - // TODO (SPARK-31730): re-enable it - ignore("support multiple barrier() call within a single task") { + test("support multiple barrier() call within a single task") { initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => @@ -285,6 +276,9 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with test("SPARK-31485: barrier stage should fail if only partial tasks are launched") { initLocalClusterSparkContext(2) +// It's required to reset the delay timer when a task is scheduled, otherwise all the tasks +// could get scheduled at ANY level. +sc.conf.set(config.LEGACY_LOCALITY_WAIT_RESET, true) val rdd0 = sc.par
[spark] branch master updated (d19b173 -> efe7fd2)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d19b173 [SPARK-31764][CORE] JsonProtocol doesn't write RDDInfo#isBarrier add efe7fd2 [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite No new revisions were added by this update. Summary of changes: .../spark/scheduler/BarrierTaskContextSuite.scala | 26 +- 1 file changed, 10 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new cb817bb [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite cb817bb is described below commit cb817bb1cf6b074e075c02880001ec96f2f39de7 Author: Xingbo Jiang AuthorDate: Wed May 27 16:37:02 2020 -0700 [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite ### What changes were proposed in this pull request? To wait until all the executors have started before submitting any job. This could avoid the flakiness caused by waiting for executors coming up. ### How was this patch tested? Existing tests. Closes #28584 from jiangxb1987/barrierTest. Authored-by: Xingbo Jiang Signed-off-by: Xingbo Jiang (cherry picked from commit efe7fd2b6bea4a945ed7f3f486ab279c505378b4) Signed-off-by: Xingbo Jiang --- .../spark/scheduler/BarrierTaskContextSuite.scala | 26 +- 1 file changed, 10 insertions(+), 16 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala index 6191e41..54899bf 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala @@ -25,6 +25,7 @@ import org.scalatest.concurrent.Eventually import org.scalatest.time.SpanSugar._ import org.apache.spark._ +import org.apache.spark.internal.config import org.apache.spark.internal.config.Tests.TEST_NO_STAGE_RETRY class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with Eventually { @@ -37,10 +38,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with .setAppName("test-cluster") .set(TEST_NO_STAGE_RETRY, true) sc = new SparkContext(conf) +TestUtils.waitUntilExecutorsUp(sc, numWorker, 6) } - // TODO (SPARK-31730): re-enable it - ignore("global sync by barrier() call") { + test("global sync by barrier() call") { initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => @@ -57,10 +58,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with } test("share messages with allGather() call") { -val conf = new SparkConf() - .setMaster("local-cluster[4, 1, 1024]") - .setAppName("test-cluster") -sc = new SparkContext(conf) +initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() @@ -78,10 +76,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with } test("throw exception if we attempt to synchronize with different blocking calls") { -val conf = new SparkConf() - .setMaster("local-cluster[4, 1, 1024]") - .setAppName("test-cluster") -sc = new SparkContext(conf) +initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() @@ -100,10 +95,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with } test("successively sync with allGather and barrier") { -val conf = new SparkConf() - .setMaster("local-cluster[4, 1, 1024]") - .setAppName("test-cluster") -sc = new SparkContext(conf) +initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() @@ -129,8 +121,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with assert(times2.max - times2.min <= 1000) } - // TODO (SPARK-31730): re-enable it - ignore("support multiple barrier() call within a single task") { + test("support multiple barrier() call within a single task") { initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => @@ -285,6 +276,9 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with test("SPARK-31485: barrier stage should fail if only partial tasks are launched") { initLocalClusterSparkContext(2) +// It's required to reset the delay timer when a task is scheduled, otherwise all the tasks +// could get scheduled at ANY level. +sc.conf.set(config.LEGACY_LOCALITY_WAIT_RESET, true) val rdd0 = sc.par
[spark] branch master updated (d19b173 -> efe7fd2)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d19b173 [SPARK-31764][CORE] JsonProtocol doesn't write RDDInfo#isBarrier add efe7fd2 [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite No new revisions were added by this update. Summary of changes: .../spark/scheduler/BarrierTaskContextSuite.scala | 26 +- 1 file changed, 10 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new cb817bb [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite cb817bb is described below commit cb817bb1cf6b074e075c02880001ec96f2f39de7 Author: Xingbo Jiang AuthorDate: Wed May 27 16:37:02 2020 -0700 [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite ### What changes were proposed in this pull request? To wait until all the executors have started before submitting any job. This could avoid the flakiness caused by waiting for executors coming up. ### How was this patch tested? Existing tests. Closes #28584 from jiangxb1987/barrierTest. Authored-by: Xingbo Jiang Signed-off-by: Xingbo Jiang (cherry picked from commit efe7fd2b6bea4a945ed7f3f486ab279c505378b4) Signed-off-by: Xingbo Jiang --- .../spark/scheduler/BarrierTaskContextSuite.scala | 26 +- 1 file changed, 10 insertions(+), 16 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala index 6191e41..54899bf 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala @@ -25,6 +25,7 @@ import org.scalatest.concurrent.Eventually import org.scalatest.time.SpanSugar._ import org.apache.spark._ +import org.apache.spark.internal.config import org.apache.spark.internal.config.Tests.TEST_NO_STAGE_RETRY class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with Eventually { @@ -37,10 +38,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with .setAppName("test-cluster") .set(TEST_NO_STAGE_RETRY, true) sc = new SparkContext(conf) +TestUtils.waitUntilExecutorsUp(sc, numWorker, 6) } - // TODO (SPARK-31730): re-enable it - ignore("global sync by barrier() call") { + test("global sync by barrier() call") { initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => @@ -57,10 +58,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with } test("share messages with allGather() call") { -val conf = new SparkConf() - .setMaster("local-cluster[4, 1, 1024]") - .setAppName("test-cluster") -sc = new SparkContext(conf) +initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() @@ -78,10 +76,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with } test("throw exception if we attempt to synchronize with different blocking calls") { -val conf = new SparkConf() - .setMaster("local-cluster[4, 1, 1024]") - .setAppName("test-cluster") -sc = new SparkContext(conf) +initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() @@ -100,10 +95,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with } test("successively sync with allGather and barrier") { -val conf = new SparkConf() - .setMaster("local-cluster[4, 1, 1024]") - .setAppName("test-cluster") -sc = new SparkContext(conf) +initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() @@ -129,8 +121,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with assert(times2.max - times2.min <= 1000) } - // TODO (SPARK-31730): re-enable it - ignore("support multiple barrier() call within a single task") { + test("support multiple barrier() call within a single task") { initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => @@ -285,6 +276,9 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with test("SPARK-31485: barrier stage should fail if only partial tasks are launched") { initLocalClusterSparkContext(2) +// It's required to reset the delay timer when a task is scheduled, otherwise all the tasks +// could get scheduled at ANY level. +sc.conf.set(config.LEGACY_LOCALITY_WAIT_RESET, true) val rdd0 = sc.par
[spark] branch master updated (d19b173 -> efe7fd2)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d19b173 [SPARK-31764][CORE] JsonProtocol doesn't write RDDInfo#isBarrier add efe7fd2 [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite No new revisions were added by this update. Summary of changes: .../spark/scheduler/BarrierTaskContextSuite.scala | 26 +- 1 file changed, 10 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new efe7fd2 [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite efe7fd2 is described below commit efe7fd2b6bea4a945ed7f3f486ab279c505378b4 Author: Xingbo Jiang AuthorDate: Wed May 27 16:37:02 2020 -0700 [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite ### What changes were proposed in this pull request? To wait until all the executors have started before submitting any job. This could avoid the flakiness caused by waiting for executors coming up. ### How was this patch tested? Existing tests. Closes #28584 from jiangxb1987/barrierTest. Authored-by: Xingbo Jiang Signed-off-by: Xingbo Jiang --- .../spark/scheduler/BarrierTaskContextSuite.scala | 26 +- 1 file changed, 10 insertions(+), 16 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala index 6191e41..54899bf 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala @@ -25,6 +25,7 @@ import org.scalatest.concurrent.Eventually import org.scalatest.time.SpanSugar._ import org.apache.spark._ +import org.apache.spark.internal.config import org.apache.spark.internal.config.Tests.TEST_NO_STAGE_RETRY class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with Eventually { @@ -37,10 +38,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with .setAppName("test-cluster") .set(TEST_NO_STAGE_RETRY, true) sc = new SparkContext(conf) +TestUtils.waitUntilExecutorsUp(sc, numWorker, 6) } - // TODO (SPARK-31730): re-enable it - ignore("global sync by barrier() call") { + test("global sync by barrier() call") { initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => @@ -57,10 +58,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with } test("share messages with allGather() call") { -val conf = new SparkConf() - .setMaster("local-cluster[4, 1, 1024]") - .setAppName("test-cluster") -sc = new SparkContext(conf) +initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() @@ -78,10 +76,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with } test("throw exception if we attempt to synchronize with different blocking calls") { -val conf = new SparkConf() - .setMaster("local-cluster[4, 1, 1024]") - .setAppName("test-cluster") -sc = new SparkContext(conf) +initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() @@ -100,10 +95,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with } test("successively sync with allGather and barrier") { -val conf = new SparkConf() - .setMaster("local-cluster[4, 1, 1024]") - .setAppName("test-cluster") -sc = new SparkContext(conf) +initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() @@ -129,8 +121,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with assert(times2.max - times2.min <= 1000) } - // TODO (SPARK-31730): re-enable it - ignore("support multiple barrier() call within a single task") { + test("support multiple barrier() call within a single task") { initLocalClusterSparkContext() val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => @@ -285,6 +276,9 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with test("SPARK-31485: barrier stage should fail if only partial tasks are launched") { initLocalClusterSparkContext(2) +// It's required to reset the delay timer when a task is scheduled, otherwise all the tasks +// could get scheduled at ANY level. +sc.conf.set(config.LEGACY_LOCALITY_WAIT_RESET, true) val rdd0 = sc.parallelize(Seq(0, 1, 2, 3), 2) val dep = new OneToOneDependency[Int](rdd0) // set up a barrier stage with
[spark] branch master updated (1528fbc -> d19b173)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1528fbc [SPARK-31827][SQL] fail datetime parsing/formatting if detect the Java 8 bug of stand-alone form add d19b173 [SPARK-31764][CORE] JsonProtocol doesn't write RDDInfo#isBarrier No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/util/JsonProtocol.scala | 1 + .../scheduler/EventLoggingListenerSuite.scala | 44 ++ .../org/apache/spark/util/JsonProtocolSuite.scala | 11 ++ 3 files changed, 56 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1528fbc -> d19b173)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1528fbc [SPARK-31827][SQL] fail datetime parsing/formatting if detect the Java 8 bug of stand-alone form add d19b173 [SPARK-31764][CORE] JsonProtocol doesn't write RDDInfo#isBarrier No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/util/JsonProtocol.scala | 1 + .../scheduler/EventLoggingListenerSuite.scala | 44 ++ .../org/apache/spark/util/JsonProtocolSuite.scala | 11 ++ 3 files changed, 56 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-31764][CORE] JsonProtocol doesn't write RDDInfo#isBarrier
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d19b173 [SPARK-31764][CORE] JsonProtocol doesn't write RDDInfo#isBarrier d19b173 is described below commit d19b173b47af04fe6f03e2b21b60eb317aeaae4f Author: Kousuke Saruta AuthorDate: Wed May 27 14:36:12 2020 -0700 [SPARK-31764][CORE] JsonProtocol doesn't write RDDInfo#isBarrier ### What changes were proposed in this pull request? This PR changes JsonProtocol to write RDDInfos#isBarrier. ### Why are the changes needed? JsonProtocol reads RDDInfos#isBarrier but not write it so it's a bug. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? I added a testcase. Closes #28583 from sarutak/SPARK-31764. Authored-by: Kousuke Saruta Signed-off-by: Xingbo Jiang --- .../scala/org/apache/spark/util/JsonProtocol.scala | 1 + .../scheduler/EventLoggingListenerSuite.scala | 44 ++ .../org/apache/spark/util/JsonProtocolSuite.scala | 11 ++ 3 files changed, 56 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala index 26bbff5..844d9b7 100644 --- a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala +++ b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala @@ -487,6 +487,7 @@ private[spark] object JsonProtocol { ("Callsite" -> rddInfo.callSite) ~ ("Parent IDs" -> parentIds) ~ ("Storage Level" -> storageLevel) ~ +("Barrier" -> rddInfo.isBarrier) ~ ("Number of Partitions" -> rddInfo.numPartitions) ~ ("Number of Cached Partitions" -> rddInfo.numCachedPartitions) ~ ("Memory Size" -> rddInfo.memSize) ~ diff --git a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala index 61ea21f..7c23e44 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala @@ -36,6 +36,7 @@ import org.apache.spark.deploy.history.{EventLogFileReader, SingleEventLogFileWr import org.apache.spark.deploy.history.EventLogTestHelper._ import org.apache.spark.executor.{ExecutorMetrics, TaskMetrics} import org.apache.spark.internal.Logging +import org.apache.spark.internal.config.{EVENT_LOG_DIR, EVENT_LOG_ENABLED} import org.apache.spark.io._ import org.apache.spark.metrics.{ExecutorMetricType, MetricsSystem} import org.apache.spark.resource.ResourceProfile @@ -100,6 +101,49 @@ class EventLoggingListenerSuite extends SparkFunSuite with LocalSparkContext wit testStageExecutorMetricsEventLogging() } + test("SPARK-31764: isBarrier should be logged in event log") { +val conf = new SparkConf() +conf.set(EVENT_LOG_ENABLED, true) +conf.set(EVENT_LOG_DIR, testDirPath.toString) +val sc = new SparkContext("local", "test-SPARK-31764", conf) +val appId = sc.applicationId + +sc.parallelize(1 to 10) + .barrier() + .mapPartitions(_.map(elem => (elem, elem))) + .filter(elem => elem._1 % 2 == 0) + .reduceByKey(_ + _) + .collect +sc.stop() + +val eventLogStream = EventLogFileReader.openEventLog(new Path(testDirPath, appId), fileSystem) +val events = readLines(eventLogStream).map(line => JsonProtocol.sparkEventFromJson(parse(line))) +val jobStartEvents = events + .filter(event => event.isInstanceOf[SparkListenerJobStart]) + .map(_.asInstanceOf[SparkListenerJobStart]) + +assert(jobStartEvents.size === 1) +val stageInfos = jobStartEvents.head.stageInfos +assert(stageInfos.size === 2) + +val stage0 = stageInfos(0) +val rddInfosInStage0 = stage0.rddInfos +assert(rddInfosInStage0.size === 3) +val sortedRddInfosInStage0 = rddInfosInStage0.sortBy(_.scope.get.name) +assert(sortedRddInfosInStage0(0).scope.get.name === "filter") +assert(sortedRddInfosInStage0(0).isBarrier === true) +assert(sortedRddInfosInStage0(1).scope.get.name === "mapPartitions") +assert(sortedRddInfosInStage0(1).isBarrier === true) +assert(sortedRddInfosInStage0(2).scope.get.name === "parallelize") +assert(sortedRddInfosInStage0(2).isBarrier === false) + +val stage1 = stageInfos(1) +val rddInfosInStage1 = stage1.rddInfos +assert(rddInfosInStage1.size === 1) +assert(rddInfosInStage1(0).scope.get.name === "reduceByKey") +assert(rddInfosInStage1(0).isBarrier === false) // reduc
[spark] branch master updated (83d0967 -> 245aee9)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 83d0967 [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call" add 245aee9 [SPARK-31757][CORE] Improve HistoryServerDiskManager.updateAccessTime() No new revisions were added by this update. Summary of changes: .../spark/deploy/history/HistoryServerDiskManager.scala | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (83d0967 -> 245aee9)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 83d0967 [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call" add 245aee9 [SPARK-31757][CORE] Improve HistoryServerDiskManager.updateAccessTime() No new revisions were added by this update. Summary of changes: .../spark/deploy/history/HistoryServerDiskManager.scala | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call"
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new ec80e4b [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call" ec80e4b is described below commit ec80e4b5f80876378765544e0c4c6af1a704 Author: yi.wu AuthorDate: Thu May 21 23:34:11 2020 -0700 [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call" ### What changes were proposed in this pull request? Change from `messages.toList.iterator` to `Iterator.single(messages.toList)`. ### Why are the changes needed? In this test, the expected result of `rdd2.collect().head` should actually be `List("0", "1", "2", "3")` but is `"0"` now. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Updated test. Thanks WeichenXu123 reported this problem. Closes #28596 from Ngone51/fix_allgather_test. Authored-by: yi.wu Signed-off-by: Xingbo Jiang (cherry picked from commit 83d0967dcc6b205a3fd2003e051f49733f63cb30) Signed-off-by: Xingbo Jiang --- .../org/apache/spark/scheduler/BarrierTaskContextSuite.scala | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala index b5614b2..6191e41 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala @@ -69,12 +69,12 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with // Pass partitionId message in val message: String = context.partitionId().toString val messages: Array[String] = context.allGather(message) - messages.toList.iterator + Iterator.single(messages.toList) } -// Take a sorted list of all the partitionId messages -val messages = rdd2.collect().head -// All the task partitionIds are shared -for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString) +val messages = rdd2.collect() +// All the task partitionIds are shared across all tasks +assert(messages.length === 4) +assert(messages.forall(_ == List("0", "1", "2", "3"))) } test("throw exception if we attempt to synchronize with different blocking calls") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (83d0967 -> 245aee9)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 83d0967 [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call" add 245aee9 [SPARK-31757][CORE] Improve HistoryServerDiskManager.updateAccessTime() No new revisions were added by this update. Summary of changes: .../spark/deploy/history/HistoryServerDiskManager.scala | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call"
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new ec80e4b [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call" ec80e4b is described below commit ec80e4b5f80876378765544e0c4c6af1a704 Author: yi.wu AuthorDate: Thu May 21 23:34:11 2020 -0700 [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call" ### What changes were proposed in this pull request? Change from `messages.toList.iterator` to `Iterator.single(messages.toList)`. ### Why are the changes needed? In this test, the expected result of `rdd2.collect().head` should actually be `List("0", "1", "2", "3")` but is `"0"` now. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Updated test. Thanks WeichenXu123 reported this problem. Closes #28596 from Ngone51/fix_allgather_test. Authored-by: yi.wu Signed-off-by: Xingbo Jiang (cherry picked from commit 83d0967dcc6b205a3fd2003e051f49733f63cb30) Signed-off-by: Xingbo Jiang --- .../org/apache/spark/scheduler/BarrierTaskContextSuite.scala | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala index b5614b2..6191e41 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala @@ -69,12 +69,12 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with // Pass partitionId message in val message: String = context.partitionId().toString val messages: Array[String] = context.allGather(message) - messages.toList.iterator + Iterator.single(messages.toList) } -// Take a sorted list of all the partitionId messages -val messages = rdd2.collect().head -// All the task partitionIds are shared -for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString) +val messages = rdd2.collect() +// All the task partitionIds are shared across all tasks +assert(messages.length === 4) +assert(messages.forall(_ == List("0", "1", "2", "3"))) } test("throw exception if we attempt to synchronize with different blocking calls") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (60118a2 -> 83d0967)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 60118a2 [SPARK-31785][SQL][TESTS] Add a helper function to test all parquet readers add 83d0967 [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call" No new revisions were added by this update. Summary of changes: .../org/apache/spark/scheduler/BarrierTaskContextSuite.scala | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call"
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new ec80e4b [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call" ec80e4b is described below commit ec80e4b5f80876378765544e0c4c6af1a704 Author: yi.wu AuthorDate: Thu May 21 23:34:11 2020 -0700 [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call" ### What changes were proposed in this pull request? Change from `messages.toList.iterator` to `Iterator.single(messages.toList)`. ### Why are the changes needed? In this test, the expected result of `rdd2.collect().head` should actually be `List("0", "1", "2", "3")` but is `"0"` now. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Updated test. Thanks WeichenXu123 reported this problem. Closes #28596 from Ngone51/fix_allgather_test. Authored-by: yi.wu Signed-off-by: Xingbo Jiang (cherry picked from commit 83d0967dcc6b205a3fd2003e051f49733f63cb30) Signed-off-by: Xingbo Jiang --- .../org/apache/spark/scheduler/BarrierTaskContextSuite.scala | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala index b5614b2..6191e41 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala @@ -69,12 +69,12 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with // Pass partitionId message in val message: String = context.partitionId().toString val messages: Array[String] = context.allGather(message) - messages.toList.iterator + Iterator.single(messages.toList) } -// Take a sorted list of all the partitionId messages -val messages = rdd2.collect().head -// All the task partitionIds are shared -for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString) +val messages = rdd2.collect() +// All the task partitionIds are shared across all tasks +assert(messages.length === 4) +assert(messages.forall(_ == List("0", "1", "2", "3"))) } test("throw exception if we attempt to synchronize with different blocking calls") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (60118a2 -> 83d0967)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 60118a2 [SPARK-31785][SQL][TESTS] Add a helper function to test all parquet readers add 83d0967 [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call" No new revisions were added by this update. Summary of changes: .../org/apache/spark/scheduler/BarrierTaskContextSuite.scala | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call"
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 83d0967 [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call" 83d0967 is described below commit 83d0967dcc6b205a3fd2003e051f49733f63cb30 Author: yi.wu AuthorDate: Thu May 21 23:34:11 2020 -0700 [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call" ### What changes were proposed in this pull request? Change from `messages.toList.iterator` to `Iterator.single(messages.toList)`. ### Why are the changes needed? In this test, the expected result of `rdd2.collect().head` should actually be `List("0", "1", "2", "3")` but is `"0"` now. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Updated test. Thanks WeichenXu123 reported this problem. Closes #28596 from Ngone51/fix_allgather_test. Authored-by: yi.wu Signed-off-by: Xingbo Jiang --- .../org/apache/spark/scheduler/BarrierTaskContextSuite.scala | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala index b5614b2..6191e41 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala @@ -69,12 +69,12 @@ class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext with // Pass partitionId message in val message: String = context.partitionId().toString val messages: Array[String] = context.allGather(message) - messages.toList.iterator + Iterator.single(messages.toList) } -// Take a sorted list of all the partitionId messages -val messages = rdd2.collect().head -// All the task partitionIds are shared -for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString) +val messages = rdd2.collect() +// All the task partitionIds are shared across all tasks +assert(messages.length === 4) +assert(messages.forall(_ == List("0", "1", "2", "3"))) } test("throw exception if we attempt to synchronize with different blocking calls") { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31651][CORE] Improve handling the case where different barrier sync types in a single sync
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 9f34cf5 [SPARK-31651][CORE] Improve handling the case where different barrier sync types in a single sync 9f34cf5 is described below commit 9f34cf56e551c8bb678221cb8671432d4cb02ac0 Author: yi.wu AuthorDate: Mon May 18 23:54:41 2020 -0700 [SPARK-31651][CORE] Improve handling the case where different barrier sync types in a single sync ### What changes were proposed in this pull request? This PR improves handling the case where different barrier sync types in a single sync: - use `clear` instead of `cleanupBarrierStage ` - make sure all requesters are failed because of "different barrier sync types" ### Why are the changes needed? Currently, we use `cleanupBarrierStage` to clean up a barrier stage when we detecting the case of "different barrier sync types". But this leads to a problem that we could create new a `ContextBarrierState` for the same stage again if there're on-way requests from tasks. As a result, those task will fail because of killing instead of "different barrier sync types". Besides, we don't handle the current request which is being handling properly as it will fail due to epoch mismatch instead of "different barrier sync types". ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Updated a existed test. Closes #28462 from Ngone51/impr_barrier_req. Authored-by: yi.wu Signed-off-by: Xingbo Jiang (cherry picked from commit 653ca19b1f26155c2b7656e2ebbe227b18383308) Signed-off-by: Xingbo Jiang --- .../org/apache/spark/BarrierCoordinator.scala | 30 +++--- .../spark/scheduler/BarrierTaskContextSuite.scala | 5 +--- 2 files changed, 16 insertions(+), 19 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala index 5663055..04faf7f 100644 --- a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala +++ b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala @@ -21,7 +21,7 @@ import java.util.{Timer, TimerTask} import java.util.concurrent.ConcurrentHashMap import java.util.function.Consumer -import scala.collection.mutable.ArrayBuffer +import scala.collection.mutable.{ArrayBuffer, HashSet} import org.apache.spark.internal.Logging import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint} @@ -106,9 +106,11 @@ private[spark] class BarrierCoordinator( // The messages will be replied to all tasks once sync finished. private val messages = Array.ofDim[String](numTasks) -// The request method which is called inside this barrier sync. All tasks should make sure -// that they're calling the same method within the same barrier sync phase. -private var requestMethod: RequestMethod.Value = _ +// Request methods collected from tasks inside this barrier sync. All tasks should make sure +// that they're calling the same method within the same barrier sync phase. In other words, +// the size of requestMethods should always be 1 for a legitimate barrier sync. Otherwise, +// the barrier sync would fail if the size of requestMethods becomes greater than 1. +private val requestMethods = new HashSet[RequestMethod.Value] // A timer task that ensures we may timeout for a barrier() call. private var timerTask: TimerTask = null @@ -141,17 +143,14 @@ private[spark] class BarrierCoordinator( val taskId = request.taskAttemptId val epoch = request.barrierEpoch val curReqMethod = request.requestMethod - - if (requesters.isEmpty) { -requestMethod = curReqMethod - } else if (requestMethod != curReqMethod) { -requesters.foreach( - _.sendFailure(new SparkException(s"$barrierId tried to use requestMethod " + -s"`$curReqMethod` during barrier epoch $barrierEpoch, which does not match " + -s"the current synchronized requestMethod `$requestMethod`" - )) -) -cleanupBarrierStage(barrierId) + requestMethods.add(curReqMethod) + if (requestMethods.size > 1) { +val error = new SparkException(s"Different barrier sync types found for the " + + s"sync $barrierId: ${requestMethods.mkString(", ")}. Please use the " + + s"same barrier sync type within a single sync.") +(requesters :+ requester).foreach(_.sendFailure(error)) +clear() +return } // Require the number of t
[spark] branch master updated: [SPARK-31651][CORE] Improve handling the case where different barrier sync types in a single sync
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 653ca19 [SPARK-31651][CORE] Improve handling the case where different barrier sync types in a single sync 653ca19 is described below commit 653ca19b1f26155c2b7656e2ebbe227b18383308 Author: yi.wu AuthorDate: Mon May 18 23:54:41 2020 -0700 [SPARK-31651][CORE] Improve handling the case where different barrier sync types in a single sync ### What changes were proposed in this pull request? This PR improves handling the case where different barrier sync types in a single sync: - use `clear` instead of `cleanupBarrierStage ` - make sure all requesters are failed because of "different barrier sync types" ### Why are the changes needed? Currently, we use `cleanupBarrierStage` to clean up a barrier stage when we detecting the case of "different barrier sync types". But this leads to a problem that we could create new a `ContextBarrierState` for the same stage again if there're on-way requests from tasks. As a result, those task will fail because of killing instead of "different barrier sync types". Besides, we don't handle the current request which is being handling properly as it will fail due to epoch mismatch instead of "different barrier sync types". ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Updated a existed test. Closes #28462 from Ngone51/impr_barrier_req. Authored-by: yi.wu Signed-off-by: Xingbo Jiang --- .../org/apache/spark/BarrierCoordinator.scala | 30 +++--- .../spark/scheduler/BarrierTaskContextSuite.scala | 5 +--- 2 files changed, 16 insertions(+), 19 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala index 5663055..04faf7f 100644 --- a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala +++ b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala @@ -21,7 +21,7 @@ import java.util.{Timer, TimerTask} import java.util.concurrent.ConcurrentHashMap import java.util.function.Consumer -import scala.collection.mutable.ArrayBuffer +import scala.collection.mutable.{ArrayBuffer, HashSet} import org.apache.spark.internal.Logging import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint} @@ -106,9 +106,11 @@ private[spark] class BarrierCoordinator( // The messages will be replied to all tasks once sync finished. private val messages = Array.ofDim[String](numTasks) -// The request method which is called inside this barrier sync. All tasks should make sure -// that they're calling the same method within the same barrier sync phase. -private var requestMethod: RequestMethod.Value = _ +// Request methods collected from tasks inside this barrier sync. All tasks should make sure +// that they're calling the same method within the same barrier sync phase. In other words, +// the size of requestMethods should always be 1 for a legitimate barrier sync. Otherwise, +// the barrier sync would fail if the size of requestMethods becomes greater than 1. +private val requestMethods = new HashSet[RequestMethod.Value] // A timer task that ensures we may timeout for a barrier() call. private var timerTask: TimerTask = null @@ -141,17 +143,14 @@ private[spark] class BarrierCoordinator( val taskId = request.taskAttemptId val epoch = request.barrierEpoch val curReqMethod = request.requestMethod - - if (requesters.isEmpty) { -requestMethod = curReqMethod - } else if (requestMethod != curReqMethod) { -requesters.foreach( - _.sendFailure(new SparkException(s"$barrierId tried to use requestMethod " + -s"`$curReqMethod` during barrier epoch $barrierEpoch, which does not match " + -s"the current synchronized requestMethod `$requestMethod`" - )) -) -cleanupBarrierStage(barrierId) + requestMethods.add(curReqMethod) + if (requestMethods.size > 1) { +val error = new SparkException(s"Different barrier sync types found for the " + + s"sync $barrierId: ${requestMethods.mkString(", ")}. Please use the " + + s"same barrier sync type within a single sync.") +(requesters :+ requester).foreach(_.sendFailure(error)) +clear() +return } // Require the number of tasks is correctly set from the BarrierTaskContext. @@ -184,6 +183,7 @@ private[spark] class BarrierCoordinator(
[spark] branch master updated: [SPARK-31253][SQL][FOLLOW-UP] simplify the code of calculating size metrics of AQE shuffle
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new db7b865 [SPARK-31253][SQL][FOLLOW-UP] simplify the code of calculating size metrics of AQE shuffle db7b865 is described below commit db7b8651a19d5a749a9f0b4e8fb517e6994921c2 Author: Wenchen Fan AuthorDate: Fri Apr 17 13:20:34 2020 -0700 [SPARK-31253][SQL][FOLLOW-UP] simplify the code of calculating size metrics of AQE shuffle ### What changes were proposed in this pull request? A followup of https://github.com/apache/spark/pull/28175: 1. use mutable collection to store the driver metrics 2. don't send size metrics if there is no map stats, as UI will display size as 0 if there is no data 3. calculate partition data size separately, to make the code easier to read. ### Why are the changes needed? code simplification ### Does this PR introduce any user-facing change? no ### How was this patch tested? existing tests Closes #28240 from cloud-fan/refactor. Authored-by: Wenchen Fan Signed-off-by: Xingbo Jiang --- .../adaptive/CustomShuffleReaderExec.scala | 50 ++ 1 file changed, 22 insertions(+), 28 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala index 68f20bc..6450d49 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala @@ -97,14 +97,27 @@ case class CustomShuffleReaderExec private( case _ => None } + @transient private lazy val partitionDataSizes: Option[Seq[Long]] = { +if (!isLocalReader && shuffleStage.get.mapStats.isDefined) { + val bytesByPartitionId = shuffleStage.get.mapStats.get.bytesByPartitionId + Some(partitionSpecs.map { +case CoalescedPartitionSpec(startReducerIndex, endReducerIndex) => + startReducerIndex.until(endReducerIndex).map(bytesByPartitionId).sum +case p: PartialReducerPartitionSpec => p.dataSize +case p => throw new IllegalStateException("unexpected " + p) + }) +} else { + None +} + } + private def sendDriverMetrics(): Unit = { val executionId = sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY) -var driverAccumUpdates: Seq[(Long, Long)] = Seq.empty +val driverAccumUpdates = ArrayBuffer.empty[(Long, Long)] val numPartitionsMetric = metrics("numPartitions") numPartitionsMetric.set(partitionSpecs.length) -driverAccumUpdates = driverAccumUpdates :+ - (numPartitionsMetric.id, partitionSpecs.length.toLong) +driverAccumUpdates += (numPartitionsMetric.id -> partitionSpecs.length.toLong) if (hasSkewedPartition) { val skewedMetric = metrics("numSkewedPartitions") @@ -112,33 +125,14 @@ case class CustomShuffleReaderExec private( case p: PartialReducerPartitionSpec => p.reducerIndex }.distinct.length skewedMetric.set(numSkewedPartitions) - driverAccumUpdates = driverAccumUpdates :+ (skewedMetric.id, numSkewedPartitions.toLong) + driverAccumUpdates += (skewedMetric.id -> numSkewedPartitions.toLong) } -if(!isLocalReader) { - val partitionMetrics = metrics("partitionDataSize") - val mapStats = shuffleStage.get.mapStats - - if (mapStats.isEmpty) { -partitionMetrics.set(0) -driverAccumUpdates = driverAccumUpdates :+ (partitionMetrics.id, 0L) - } else { -var sum = 0L -partitionSpecs.foreach { - case CoalescedPartitionSpec(startReducerIndex, endReducerIndex) => -val dataSize = startReducerIndex.until(endReducerIndex).map( - mapStats.get.bytesByPartitionId(_)).sum -driverAccumUpdates = driverAccumUpdates :+ (partitionMetrics.id, dataSize) -sum += dataSize - case p: PartialReducerPartitionSpec => -driverAccumUpdates = driverAccumUpdates :+ (partitionMetrics.id, p.dataSize) -sum += p.dataSize - case p => throw new IllegalStateException("unexpected " + p) -} - -// Set sum value to "partitionDataSize" metric. -partitionMetrics.set(sum) - } +partitionDataSizes.foreach { dataSizes => + val partitionDataSizeMetrics = metrics("partitionDataSize") + driverAccumUpdates ++= dataSizes.map(partitionDataSizeMetrics.id -> _) + // Set sum value to "partitionDataSize" metric. +
[spark] branch master updated (fab4ca5 -> b2e9e17)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fab4ca5 [SPARK-31450][SQL] Make ExpressionEncoder thread-safe add b2e9e17 [SPARK-31344][CORE] Polish implementation of barrier() and allGather() No new revisions were added by this update. Summary of changes: .../org/apache/spark/BarrierCoordinator.scala | 114 + .../org/apache/spark/BarrierTaskContext.scala | 59 ++- .../org/apache/spark/api/python/PythonRunner.scala | 17 +-- .../spark/scheduler/BarrierTaskContextSuite.scala | 1 - python/pyspark/taskcontext.py | 15 ++- 5 files changed, 46 insertions(+), 160 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (fab4ca5 -> b2e9e17)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fab4ca5 [SPARK-31450][SQL] Make ExpressionEncoder thread-safe add b2e9e17 [SPARK-31344][CORE] Polish implementation of barrier() and allGather() No new revisions were added by this update. Summary of changes: .../org/apache/spark/BarrierCoordinator.scala | 114 + .../org/apache/spark/BarrierTaskContext.scala | 59 ++- .../org/apache/spark/api/python/PythonRunner.scala | 17 +-- .../spark/scheduler/BarrierTaskContextSuite.scala | 1 - python/pyspark/taskcontext.py | 15 ++- 5 files changed, 46 insertions(+), 160 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31018][CORE][DOCS] Deprecate support of multiple workers on the same host in Standalone
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d286db1 [SPARK-31018][CORE][DOCS] Deprecate support of multiple workers on the same host in Standalone d286db1 is described below commit d286db145433d6d7610c69980512369f389930ca Author: yi.wu AuthorDate: Wed Apr 15 11:29:55 2020 -0700 [SPARK-31018][CORE][DOCS] Deprecate support of multiple workers on the same host in Standalone ### What changes were proposed in this pull request? Update the document and shell script to warn user about the deprecation of multiple workers on the same host support. ### Why are the changes needed? This is a sub-task of [SPARK-30978](https://issues.apache.org/jira/browse/SPARK-30978), which plans to totally remove support of multiple workers in Spark 3.1. This PR makes the first step to deprecate it firstly in Spark 3.0. ### Does this PR introduce any user-facing change? Yeah, user see warning when they run start worker script. ### How was this patch tested? Tested manually. Closes #27768 from Ngone51/deprecate_spark_worker_instances. Authored-by: yi.wu Signed-off-by: Xingbo Jiang (cherry picked from commit 0d4e4df06105cf2985dde17c1af76093b3ae8c13) Signed-off-by: Xingbo Jiang --- docs/core-migration-guide.md | 2 ++ docs/hardware-provisioning.md | 8 sbin/start-slave.sh | 2 +- 3 files changed, 7 insertions(+), 5 deletions(-) diff --git a/docs/core-migration-guide.md b/docs/core-migration-guide.md index 66a489b..cde6e07 100644 --- a/docs/core-migration-guide.md +++ b/docs/core-migration-guide.md @@ -38,3 +38,5 @@ license: | - Event log file will be written as UTF-8 encoding, and Spark History Server will replay event log files as UTF-8 encoding. Previously Spark wrote the event log file as default charset of driver JVM process, so Spark History Server of Spark 2.x is needed to read the old event log files in case of incompatible encoding. - A new protocol for fetching shuffle blocks is used. It's recommended that external shuffle services be upgraded when running Spark 3.0 apps. You can still use old external shuffle services by setting the configuration `spark.shuffle.useOldFetchProtocol` to `true`. Otherwise, Spark may run into errors with messages like `IllegalArgumentException: Unexpected message type: `. + +- `SPARK_WORKER_INSTANCES` is deprecated in Standalone mode. It's recommended to launch multiple executors in one worker and launch one worker per node instead of launching multiple workers per node and launching one executor per worker. diff --git a/docs/hardware-provisioning.md b/docs/hardware-provisioning.md index 4e5d681..fc87995f 100644 --- a/docs/hardware-provisioning.md +++ b/docs/hardware-provisioning.md @@ -63,10 +63,10 @@ Note that memory usage is greatly affected by storage level and serialization fo the [tuning guide](tuning.html) for tips on how to reduce it. Finally, note that the Java VM does not always behave well with more than 200 GiB of RAM. If you -purchase machines with more RAM than this, you can run _multiple worker JVMs per node_. In -Spark's [standalone mode](spark-standalone.html), you can set the number of workers per node -with the `SPARK_WORKER_INSTANCES` variable in `conf/spark-env.sh`, and the number of cores -per worker with `SPARK_WORKER_CORES`. +purchase machines with more RAM than this, you can launch multiple executors in a single node. In +Spark's [standalone mode](spark-standalone.html), a worker is responsible for launching multiple +executors according to its available memory and cores, and each executor will be launched in a +separate Java VM. # Network diff --git a/sbin/start-slave.sh b/sbin/start-slave.sh index 2cb17a0..9b3b26b 100755 --- a/sbin/start-slave.sh +++ b/sbin/start-slave.sh @@ -22,7 +22,7 @@ # Environment Variables # # SPARK_WORKER_INSTANCES The number of worker instances to run on this -# slave. Default is 1. +# slave. Default is 1. Note it has been deprecate since Spark 3.0. # SPARK_WORKER_PORT The base port number for the first worker. If set, # subsequent workers will increment this number. If # unset, Spark will find a valid port number, but - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31018][CORE][DOCS] Deprecate support of multiple workers on the same host in Standalone
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d286db1 [SPARK-31018][CORE][DOCS] Deprecate support of multiple workers on the same host in Standalone d286db1 is described below commit d286db145433d6d7610c69980512369f389930ca Author: yi.wu AuthorDate: Wed Apr 15 11:29:55 2020 -0700 [SPARK-31018][CORE][DOCS] Deprecate support of multiple workers on the same host in Standalone ### What changes were proposed in this pull request? Update the document and shell script to warn user about the deprecation of multiple workers on the same host support. ### Why are the changes needed? This is a sub-task of [SPARK-30978](https://issues.apache.org/jira/browse/SPARK-30978), which plans to totally remove support of multiple workers in Spark 3.1. This PR makes the first step to deprecate it firstly in Spark 3.0. ### Does this PR introduce any user-facing change? Yeah, user see warning when they run start worker script. ### How was this patch tested? Tested manually. Closes #27768 from Ngone51/deprecate_spark_worker_instances. Authored-by: yi.wu Signed-off-by: Xingbo Jiang (cherry picked from commit 0d4e4df06105cf2985dde17c1af76093b3ae8c13) Signed-off-by: Xingbo Jiang --- docs/core-migration-guide.md | 2 ++ docs/hardware-provisioning.md | 8 sbin/start-slave.sh | 2 +- 3 files changed, 7 insertions(+), 5 deletions(-) diff --git a/docs/core-migration-guide.md b/docs/core-migration-guide.md index 66a489b..cde6e07 100644 --- a/docs/core-migration-guide.md +++ b/docs/core-migration-guide.md @@ -38,3 +38,5 @@ license: | - Event log file will be written as UTF-8 encoding, and Spark History Server will replay event log files as UTF-8 encoding. Previously Spark wrote the event log file as default charset of driver JVM process, so Spark History Server of Spark 2.x is needed to read the old event log files in case of incompatible encoding. - A new protocol for fetching shuffle blocks is used. It's recommended that external shuffle services be upgraded when running Spark 3.0 apps. You can still use old external shuffle services by setting the configuration `spark.shuffle.useOldFetchProtocol` to `true`. Otherwise, Spark may run into errors with messages like `IllegalArgumentException: Unexpected message type: `. + +- `SPARK_WORKER_INSTANCES` is deprecated in Standalone mode. It's recommended to launch multiple executors in one worker and launch one worker per node instead of launching multiple workers per node and launching one executor per worker. diff --git a/docs/hardware-provisioning.md b/docs/hardware-provisioning.md index 4e5d681..fc87995f 100644 --- a/docs/hardware-provisioning.md +++ b/docs/hardware-provisioning.md @@ -63,10 +63,10 @@ Note that memory usage is greatly affected by storage level and serialization fo the [tuning guide](tuning.html) for tips on how to reduce it. Finally, note that the Java VM does not always behave well with more than 200 GiB of RAM. If you -purchase machines with more RAM than this, you can run _multiple worker JVMs per node_. In -Spark's [standalone mode](spark-standalone.html), you can set the number of workers per node -with the `SPARK_WORKER_INSTANCES` variable in `conf/spark-env.sh`, and the number of cores -per worker with `SPARK_WORKER_CORES`. +purchase machines with more RAM than this, you can launch multiple executors in a single node. In +Spark's [standalone mode](spark-standalone.html), a worker is responsible for launching multiple +executors according to its available memory and cores, and each executor will be launched in a +separate Java VM. # Network diff --git a/sbin/start-slave.sh b/sbin/start-slave.sh index 2cb17a0..9b3b26b 100755 --- a/sbin/start-slave.sh +++ b/sbin/start-slave.sh @@ -22,7 +22,7 @@ # Environment Variables # # SPARK_WORKER_INSTANCES The number of worker instances to run on this -# slave. Default is 1. +# slave. Default is 1. Note it has been deprecate since Spark 3.0. # SPARK_WORKER_PORT The base port number for the first worker. If set, # subsequent workers will increment this number. If # unset, Spark will find a valid port number, but - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (2b10d70 -> 0d4e4df)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 2b10d70 [SPARK-31423][SQL] Fix rebasing of not-existed dates add 0d4e4df [SPARK-31018][CORE][DOCS] Deprecate support of multiple workers on the same host in Standalone No new revisions were added by this update. Summary of changes: docs/core-migration-guide.md | 2 ++ docs/hardware-provisioning.md | 8 sbin/start-slave.sh | 2 +- 3 files changed, 7 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-31018][CORE][DOCS] Deprecate support of multiple workers on the same host in Standalone
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 0d4e4df [SPARK-31018][CORE][DOCS] Deprecate support of multiple workers on the same host in Standalone 0d4e4df is described below commit 0d4e4df06105cf2985dde17c1af76093b3ae8c13 Author: yi.wu AuthorDate: Wed Apr 15 11:29:55 2020 -0700 [SPARK-31018][CORE][DOCS] Deprecate support of multiple workers on the same host in Standalone ### What changes were proposed in this pull request? Update the document and shell script to warn user about the deprecation of multiple workers on the same host support. ### Why are the changes needed? This is a sub-task of [SPARK-30978](https://issues.apache.org/jira/browse/SPARK-30978), which plans to totally remove support of multiple workers in Spark 3.1. This PR makes the first step to deprecate it firstly in Spark 3.0. ### Does this PR introduce any user-facing change? Yeah, user see warning when they run start worker script. ### How was this patch tested? Tested manually. Closes #27768 from Ngone51/deprecate_spark_worker_instances. Authored-by: yi.wu Signed-off-by: Xingbo Jiang --- docs/core-migration-guide.md | 2 ++ docs/hardware-provisioning.md | 8 sbin/start-slave.sh | 2 +- 3 files changed, 7 insertions(+), 5 deletions(-) diff --git a/docs/core-migration-guide.md b/docs/core-migration-guide.md index 66a489b..cde6e07 100644 --- a/docs/core-migration-guide.md +++ b/docs/core-migration-guide.md @@ -38,3 +38,5 @@ license: | - Event log file will be written as UTF-8 encoding, and Spark History Server will replay event log files as UTF-8 encoding. Previously Spark wrote the event log file as default charset of driver JVM process, so Spark History Server of Spark 2.x is needed to read the old event log files in case of incompatible encoding. - A new protocol for fetching shuffle blocks is used. It's recommended that external shuffle services be upgraded when running Spark 3.0 apps. You can still use old external shuffle services by setting the configuration `spark.shuffle.useOldFetchProtocol` to `true`. Otherwise, Spark may run into errors with messages like `IllegalArgumentException: Unexpected message type: `. + +- `SPARK_WORKER_INSTANCES` is deprecated in Standalone mode. It's recommended to launch multiple executors in one worker and launch one worker per node instead of launching multiple workers per node and launching one executor per worker. diff --git a/docs/hardware-provisioning.md b/docs/hardware-provisioning.md index 4e5d681..fc87995f 100644 --- a/docs/hardware-provisioning.md +++ b/docs/hardware-provisioning.md @@ -63,10 +63,10 @@ Note that memory usage is greatly affected by storage level and serialization fo the [tuning guide](tuning.html) for tips on how to reduce it. Finally, note that the Java VM does not always behave well with more than 200 GiB of RAM. If you -purchase machines with more RAM than this, you can run _multiple worker JVMs per node_. In -Spark's [standalone mode](spark-standalone.html), you can set the number of workers per node -with the `SPARK_WORKER_INSTANCES` variable in `conf/spark-env.sh`, and the number of cores -per worker with `SPARK_WORKER_CORES`. +purchase machines with more RAM than this, you can launch multiple executors in a single node. In +Spark's [standalone mode](spark-standalone.html), a worker is responsible for launching multiple +executors according to its available memory and cores, and each executor will be launched in a +separate Java VM. # Network diff --git a/sbin/start-slave.sh b/sbin/start-slave.sh index 2cb17a0..9b3b26b 100755 --- a/sbin/start-slave.sh +++ b/sbin/start-slave.sh @@ -22,7 +22,7 @@ # Environment Variables # # SPARK_WORKER_INSTANCES The number of worker instances to run on this -# slave. Default is 1. +# slave. Default is 1. Note it has been deprecate since Spark 3.0. # SPARK_WORKER_PORT The base port number for the first worker. If set, # subsequent workers will increment this number. If # unset, Spark will find a valid port number, but - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31052][TEST][CORE] Fix flaky test "DAGSchedulerSuite.shuffle fetch failed on speculative task, but original task succeed"
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 80a8947 [SPARK-31052][TEST][CORE] Fix flaky test "DAGSchedulerSuite.shuffle fetch failed on speculative task, but original task succeed" 80a8947 is described below commit 80a894785121777aedfe340d0acacbbd17630cb3 Author: yi.wu AuthorDate: Thu Mar 5 10:56:49 2020 -0800 [SPARK-31052][TEST][CORE] Fix flaky test "DAGSchedulerSuite.shuffle fetch failed on speculative task, but original task succeed" ### What changes were proposed in this pull request? This PR fix the flaky test in #27050. ### Why are the changes needed? `SparkListenerStageCompleted` is posted by `listenerBus` asynchronously. So, we should make sure listener has consumed the event before asserting completed stages. See [error message](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119308/testReport/org.apache.spark.scheduler/DAGSchedulerSuite/shuffle_fetch_failed_on_speculative_task__but_original_task_succeed__SPARK_30388_/): ``` sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: List(0, 1, 1) did not equal List(0, 1, 1, 0) at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503) at org.apache.spark.scheduler.DAGSchedulerSuite.$anonfun$new$88(DAGSchedulerSuite.scala:1976) ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Update test and test locally by no failure after running hundreds of times. Note, the failure is easy to reproduce when loop running the test for hundreds of times(e.g 200) Closes #27809 from Ngone51/fix_flaky_spark_30388. Authored-by: yi.wu Signed-off-by: Xingbo Jiang (cherry picked from commit 8d5ef2f766166cce3cc7a15a98ec016050ede4d8) Signed-off-by: Xingbo Jiang --- .../test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala| 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala index 72a2e4c..02bc216 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala @@ -1931,7 +1931,7 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi assertDataStructuresEmpty() } - test("shuffle fetch failed on speculative task, but original task succeed (SPARK-30388)") { + test("SPARK-30388: shuffle fetch failed on speculative task, but original task succeed") { var completedStage: List[Int] = Nil val listener = new SparkListener() { override def onStageCompleted(event: SparkListenerStageCompleted): Unit = { @@ -1945,6 +1945,7 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi val reduceRdd = new MyRDD(sc, 2, List(shuffleDep)) submit(reduceRdd, Array(0, 1)) completeShuffleMapStageSuccessfully(0, 0, 2) +sc.listenerBus.waitUntilEmpty() assert(completedStage === List(0)) // result task 0.0 succeed @@ -1960,6 +1961,7 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi info ) ) +sc.listenerBus.waitUntilEmpty() assert(completedStage === List(0, 1)) Thread.sleep(DAGScheduler.RESUBMIT_TIMEOUT * 2) @@ -1971,6 +1973,7 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi // original result task 1.0 succeed runEvent(makeCompletionEvent(taskSets(1).tasks(1), Success, 42)) +sc.listenerBus.waitUntilEmpty() assert(completedStage === List(0, 1, 1, 0)) assert(scheduler.activeJobs.isEmpty) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-31052][TEST][CORE] Fix flaky test "DAGSchedulerSuite.shuffle fetch failed on speculative task, but original task succeed"
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8d5ef2f [SPARK-31052][TEST][CORE] Fix flaky test "DAGSchedulerSuite.shuffle fetch failed on speculative task, but original task succeed" 8d5ef2f is described below commit 8d5ef2f766166cce3cc7a15a98ec016050ede4d8 Author: yi.wu AuthorDate: Thu Mar 5 10:56:49 2020 -0800 [SPARK-31052][TEST][CORE] Fix flaky test "DAGSchedulerSuite.shuffle fetch failed on speculative task, but original task succeed" ### What changes were proposed in this pull request? This PR fix the flaky test in #27050. ### Why are the changes needed? `SparkListenerStageCompleted` is posted by `listenerBus` asynchronously. So, we should make sure listener has consumed the event before asserting completed stages. See [error message](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119308/testReport/org.apache.spark.scheduler/DAGSchedulerSuite/shuffle_fetch_failed_on_speculative_task__but_original_task_succeed__SPARK_30388_/): ``` sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: List(0, 1, 1) did not equal List(0, 1, 1, 0) at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) at org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503) at org.apache.spark.scheduler.DAGSchedulerSuite.$anonfun$new$88(DAGSchedulerSuite.scala:1976) ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Update test and test locally by no failure after running hundreds of times. Note, the failure is easy to reproduce when loop running the test for hundreds of times(e.g 200) Closes #27809 from Ngone51/fix_flaky_spark_30388. Authored-by: yi.wu Signed-off-by: Xingbo Jiang --- .../test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala| 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala index 2b2fd32..4486389 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala @@ -1933,7 +1933,7 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi assertDataStructuresEmpty() } - test("shuffle fetch failed on speculative task, but original task succeed (SPARK-30388)") { + test("SPARK-30388: shuffle fetch failed on speculative task, but original task succeed") { var completedStage: List[Int] = Nil val listener = new SparkListener() { override def onStageCompleted(event: SparkListenerStageCompleted): Unit = { @@ -1947,6 +1947,7 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi val reduceRdd = new MyRDD(sc, 2, List(shuffleDep)) submit(reduceRdd, Array(0, 1)) completeShuffleMapStageSuccessfully(0, 0, 2) +sc.listenerBus.waitUntilEmpty() assert(completedStage === List(0)) // result task 0.0 succeed @@ -1962,6 +1963,7 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi info ) ) +sc.listenerBus.waitUntilEmpty() assert(completedStage === List(0, 1)) Thread.sleep(DAGScheduler.RESUBMIT_TIMEOUT * 2) @@ -1973,6 +1975,7 @@ class DAGSchedulerSuite extends SparkFunSuite with LocalSparkContext with TimeLi // original result task 1.0 succeed runEvent(makeCompletionEvent(taskSets(1).tasks(1), Success, 42)) +sc.listenerBus.waitUntilEmpty() assert(completedStage === List(0, 1, 1, 0)) assert(scheduler.activeJobs.isEmpty) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30969][CORE] Remove resource coordination support from Standalone
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 148262f [SPARK-30969][CORE] Remove resource coordination support from Standalone 148262f is described below commit 148262f3fbdf7c5da7cd147cf43bf5ebab5f5244 Author: yi.wu AuthorDate: Mon Mar 2 11:23:07 2020 -0800 [SPARK-30969][CORE] Remove resource coordination support from Standalone ### What changes were proposed in this pull request? Remove automatically resource coordination support from Standalone. ### Why are the changes needed? Resource coordination is mainly designed for the scenario where multiple workers launched on the same host. However, it's, actually, a non-existed scenario for today's Spark. Because, Spark now can start multiple executors in a single Worker, while it only allow one executor per Worker at very beginning. So, now, it really help nothing for user to launch multiple workers on the same host. Thus, it's not worth for us to bring over complicated implementation and potential high maintain [...] ### Does this PR introduce any user-facing change? No, it's Spark 3.0 feature. ### How was this patch tested? Pass Jenkins. Closes #27722 from Ngone51/abandon_coordination. Authored-by: yi.wu Signed-off-by: Xingbo Jiang (cherry picked from commit b517f991fe0c95a186872d38be6a2091d9326195) Signed-off-by: Xingbo Jiang --- .gitignore | 1 - .../main/scala/org/apache/spark/SparkContext.scala | 25 +- .../spark/deploy/StandaloneResourceUtils.scala | 263 + .../org/apache/spark/deploy/worker/Worker.scala| 27 +-- .../org/apache/spark/internal/config/package.scala | 17 -- .../main/scala/org/apache/spark/util/Utils.scala | 22 -- .../scala/org/apache/spark/SparkContextSuite.scala | 6 +- .../apache/spark/deploy/worker/WorkerSuite.scala | 74 +- .../resource/ResourceDiscoveryPluginSuite.scala| 4 - docs/configuration.md | 27 +-- docs/spark-standalone.md | 10 +- 11 files changed, 17 insertions(+), 459 deletions(-) diff --git a/.gitignore b/.gitignore index 798e8ac..198fdee 100644 --- a/.gitignore +++ b/.gitignore @@ -72,7 +72,6 @@ scalastyle-on-compile.generated.xml scalastyle-output.xml scalastyle.txt spark-*-bin-*.tgz -spark-resources/ spark-tests.log src_managed/ streaming-tests.log diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala index 91188d5..bcbb7e4 100644 --- a/core/src/main/scala/org/apache/spark/SparkContext.scala +++ b/core/src/main/scala/org/apache/spark/SparkContext.scala @@ -41,7 +41,6 @@ import org.apache.hadoop.mapreduce.lib.input.{FileInputFormat => NewFileInputFor import org.apache.spark.annotation.DeveloperApi import org.apache.spark.broadcast.Broadcast import org.apache.spark.deploy.{LocalSparkCluster, SparkHadoopUtil} -import org.apache.spark.deploy.StandaloneResourceUtils._ import org.apache.spark.executor.{ExecutorMetrics, ExecutorMetricsSource} import org.apache.spark.input.{FixedLengthBinaryInputFormat, PortableDataStream, StreamInputFormat, WholeTextFileInputFormat} import org.apache.spark.internal.Logging @@ -250,15 +249,6 @@ class SparkContext(config: SparkConf) extends Logging { def isLocal: Boolean = Utils.isLocalMaster(_conf) - private def isClientStandalone: Boolean = { -val isSparkCluster = master match { - case SparkMasterRegex.SPARK_REGEX(_) => true - case SparkMasterRegex.LOCAL_CLUSTER_REGEX(_, _, _) => true - case _ => false -} -deployMode == "client" && isSparkCluster - } - /** * @return true if context is stopped or in the midst of stopping. */ @@ -396,17 +386,7 @@ class SparkContext(config: SparkConf) extends Logging { _driverLogger = DriverLogger(_conf) val resourcesFileOpt = conf.get(DRIVER_RESOURCES_FILE) -val allResources = getOrDiscoverAllResources(_conf, SPARK_DRIVER_PREFIX, resourcesFileOpt) -_resources = { - // driver submitted in client mode under Standalone may have conflicting resources with - // other drivers/workers on this host. We should sync driver's resources info into - // SPARK_RESOURCES/SPARK_RESOURCES_COORDINATE_DIR/ to avoid collision. - if (isClientStandalone) { -acquireResources(_conf, SPARK_DRIVER_PREFIX, allResources, Utils.getProcessId) - } else { -allResources - } -} +_resources = getOrDiscoverAllResources(_conf, SPARK_DRIVER_PREFIX, resourcesFileOpt) logResourceInfo(SPARK_DRIVER_PREFIX, _resources) // log out spark.app.na
[spark] branch branch-3.0 updated: [SPARK-30969][CORE] Remove resource coordination support from Standalone
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 148262f [SPARK-30969][CORE] Remove resource coordination support from Standalone 148262f is described below commit 148262f3fbdf7c5da7cd147cf43bf5ebab5f5244 Author: yi.wu AuthorDate: Mon Mar 2 11:23:07 2020 -0800 [SPARK-30969][CORE] Remove resource coordination support from Standalone ### What changes were proposed in this pull request? Remove automatically resource coordination support from Standalone. ### Why are the changes needed? Resource coordination is mainly designed for the scenario where multiple workers launched on the same host. However, it's, actually, a non-existed scenario for today's Spark. Because, Spark now can start multiple executors in a single Worker, while it only allow one executor per Worker at very beginning. So, now, it really help nothing for user to launch multiple workers on the same host. Thus, it's not worth for us to bring over complicated implementation and potential high maintain [...] ### Does this PR introduce any user-facing change? No, it's Spark 3.0 feature. ### How was this patch tested? Pass Jenkins. Closes #27722 from Ngone51/abandon_coordination. Authored-by: yi.wu Signed-off-by: Xingbo Jiang (cherry picked from commit b517f991fe0c95a186872d38be6a2091d9326195) Signed-off-by: Xingbo Jiang --- .gitignore | 1 - .../main/scala/org/apache/spark/SparkContext.scala | 25 +- .../spark/deploy/StandaloneResourceUtils.scala | 263 + .../org/apache/spark/deploy/worker/Worker.scala| 27 +-- .../org/apache/spark/internal/config/package.scala | 17 -- .../main/scala/org/apache/spark/util/Utils.scala | 22 -- .../scala/org/apache/spark/SparkContextSuite.scala | 6 +- .../apache/spark/deploy/worker/WorkerSuite.scala | 74 +- .../resource/ResourceDiscoveryPluginSuite.scala| 4 - docs/configuration.md | 27 +-- docs/spark-standalone.md | 10 +- 11 files changed, 17 insertions(+), 459 deletions(-) diff --git a/.gitignore b/.gitignore index 798e8ac..198fdee 100644 --- a/.gitignore +++ b/.gitignore @@ -72,7 +72,6 @@ scalastyle-on-compile.generated.xml scalastyle-output.xml scalastyle.txt spark-*-bin-*.tgz -spark-resources/ spark-tests.log src_managed/ streaming-tests.log diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala index 91188d5..bcbb7e4 100644 --- a/core/src/main/scala/org/apache/spark/SparkContext.scala +++ b/core/src/main/scala/org/apache/spark/SparkContext.scala @@ -41,7 +41,6 @@ import org.apache.hadoop.mapreduce.lib.input.{FileInputFormat => NewFileInputFor import org.apache.spark.annotation.DeveloperApi import org.apache.spark.broadcast.Broadcast import org.apache.spark.deploy.{LocalSparkCluster, SparkHadoopUtil} -import org.apache.spark.deploy.StandaloneResourceUtils._ import org.apache.spark.executor.{ExecutorMetrics, ExecutorMetricsSource} import org.apache.spark.input.{FixedLengthBinaryInputFormat, PortableDataStream, StreamInputFormat, WholeTextFileInputFormat} import org.apache.spark.internal.Logging @@ -250,15 +249,6 @@ class SparkContext(config: SparkConf) extends Logging { def isLocal: Boolean = Utils.isLocalMaster(_conf) - private def isClientStandalone: Boolean = { -val isSparkCluster = master match { - case SparkMasterRegex.SPARK_REGEX(_) => true - case SparkMasterRegex.LOCAL_CLUSTER_REGEX(_, _, _) => true - case _ => false -} -deployMode == "client" && isSparkCluster - } - /** * @return true if context is stopped or in the midst of stopping. */ @@ -396,17 +386,7 @@ class SparkContext(config: SparkConf) extends Logging { _driverLogger = DriverLogger(_conf) val resourcesFileOpt = conf.get(DRIVER_RESOURCES_FILE) -val allResources = getOrDiscoverAllResources(_conf, SPARK_DRIVER_PREFIX, resourcesFileOpt) -_resources = { - // driver submitted in client mode under Standalone may have conflicting resources with - // other drivers/workers on this host. We should sync driver's resources info into - // SPARK_RESOURCES/SPARK_RESOURCES_COORDINATE_DIR/ to avoid collision. - if (isClientStandalone) { -acquireResources(_conf, SPARK_DRIVER_PREFIX, allResources, Utils.getProcessId) - } else { -allResources - } -} +_resources = getOrDiscoverAllResources(_conf, SPARK_DRIVER_PREFIX, resourcesFileOpt) logResourceInfo(SPARK_DRIVER_PREFIX, _resources) // log out spark.app.na
[spark] branch master updated (ac12276 -> b517f99)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ac12276 [SPARK-30813][ML] Fix Matrices.sprand comments add b517f99 [SPARK-30969][CORE] Remove resource coordination support from Standalone No new revisions were added by this update. Summary of changes: .gitignore | 1 - .../main/scala/org/apache/spark/SparkContext.scala | 25 +- .../spark/deploy/StandaloneResourceUtils.scala | 263 + .../org/apache/spark/deploy/worker/Worker.scala| 27 +-- .../org/apache/spark/internal/config/package.scala | 17 -- .../main/scala/org/apache/spark/util/Utils.scala | 22 -- .../scala/org/apache/spark/SparkContextSuite.scala | 6 +- .../apache/spark/deploy/worker/WorkerSuite.scala | 74 +- .../resource/ResourceDiscoveryPluginSuite.scala| 4 - docs/configuration.md | 27 +-- docs/spark-standalone.md | 10 +- python/pyspark/tests/test_context.py | 2 - python/pyspark/tests/test_taskcontext.py | 2 - 13 files changed, 17 insertions(+), 463 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ac12276 -> b517f99)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ac12276 [SPARK-30813][ML] Fix Matrices.sprand comments add b517f99 [SPARK-30969][CORE] Remove resource coordination support from Standalone No new revisions were added by this update. Summary of changes: .gitignore | 1 - .../main/scala/org/apache/spark/SparkContext.scala | 25 +- .../spark/deploy/StandaloneResourceUtils.scala | 263 + .../org/apache/spark/deploy/worker/Worker.scala| 27 +-- .../org/apache/spark/internal/config/package.scala | 17 -- .../main/scala/org/apache/spark/util/Utils.scala | 22 -- .../scala/org/apache/spark/SparkContextSuite.scala | 6 +- .../apache/spark/deploy/worker/WorkerSuite.scala | 74 +- .../resource/ResourceDiscoveryPluginSuite.scala| 4 - docs/configuration.md | 27 +-- docs/spark-standalone.md | 10 +- python/pyspark/tests/test_context.py | 2 - python/pyspark/tests/test_taskcontext.py | 2 - 13 files changed, 17 insertions(+), 463 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30987][CORE] Increase the timeout on local-cluster waitUntilExecutorsUp calls
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 00e2bf8 [SPARK-30987][CORE] Increase the timeout on local-cluster waitUntilExecutorsUp calls 00e2bf8 is described below commit 00e2bf8a9a96cf421fa5257f72d46334306b92fa Author: Thomas Graves AuthorDate: Fri Feb 28 11:43:05 2020 -0800 [SPARK-30987][CORE] Increase the timeout on local-cluster waitUntilExecutorsUp calls ### What changes were proposed in this pull request? The ResourceDiscoveryPlugin tests intermittently timeout. They are timing out on just bringing up the local-cluster. I am not able to reproduce locally. I suspect the jenkins boxes are overloaded and taking longer then 10 seconds. There was another jira SPARK-29139 that increased timeout for some other of these as well. So try increasing the timeout to 60 seconds. Examples of timeouts: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119030/testReport/ https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119005/testReport/ https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119029/testReport/ ### Why are the changes needed? tests should no longer intermittently fail. ### Does this PR introduce any user-facing change? no ### How was this patch tested? unit tests ran. Closes #27738 from tgravescs/SPARK-30987. Authored-by: Thomas Graves Signed-off-by: Xingbo Jiang (cherry picked from commit 6c0c41fa0d1e119b16980405b5dc69b953380d7d) Signed-off-by: Xingbo Jiang --- core/src/test/scala/org/apache/spark/DistributedSuite.scala | 2 +- .../org/apache/spark/internal/plugin/PluginContainerSuite.scala | 4 ++-- .../org/apache/spark/resource/ResourceDiscoveryPluginSuite.scala | 8 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/core/src/test/scala/org/apache/spark/DistributedSuite.scala b/core/src/test/scala/org/apache/spark/DistributedSuite.scala index 3f30981..4d157b9 100644 --- a/core/src/test/scala/org/apache/spark/DistributedSuite.scala +++ b/core/src/test/scala/org/apache/spark/DistributedSuite.scala @@ -174,7 +174,7 @@ class DistributedSuite extends SparkFunSuite with Matchers with LocalSparkContex private def testCaching(conf: SparkConf, storageLevel: StorageLevel): Unit = { sc = new SparkContext(conf.setMaster(clusterUrl).setAppName("test")) -TestUtils.waitUntilExecutorsUp(sc, 2, 3) +TestUtils.waitUntilExecutorsUp(sc, 2, 6) val data = sc.parallelize(1 to 1000, 10) val cachedData = data.persist(storageLevel) assert(cachedData.count === 1000) diff --git a/core/src/test/scala/org/apache/spark/internal/plugin/PluginContainerSuite.scala b/core/src/test/scala/org/apache/spark/internal/plugin/PluginContainerSuite.scala index cf2d929..7888796 100644 --- a/core/src/test/scala/org/apache/spark/internal/plugin/PluginContainerSuite.scala +++ b/core/src/test/scala/org/apache/spark/internal/plugin/PluginContainerSuite.scala @@ -139,7 +139,7 @@ class PluginContainerSuite extends SparkFunSuite with BeforeAndAfterEach with Lo .set(NonLocalModeSparkPlugin.TEST_PATH_CONF, path.getAbsolutePath()) sc = new SparkContext(conf) -TestUtils.waitUntilExecutorsUp(sc, 2, 1) +TestUtils.waitUntilExecutorsUp(sc, 2, 6) eventually(timeout(10.seconds), interval(100.millis)) { val children = path.listFiles() @@ -169,7 +169,7 @@ class PluginContainerSuite extends SparkFunSuite with BeforeAndAfterEach with Lo sc = new SparkContext(conf) // Ensure all executors has started - TestUtils.waitUntilExecutorsUp(sc, 1, 1) + TestUtils.waitUntilExecutorsUp(sc, 1, 6) var children = Array.empty[File] eventually(timeout(10.seconds), interval(100.millis)) { diff --git a/core/src/test/scala/org/apache/spark/resource/ResourceDiscoveryPluginSuite.scala b/core/src/test/scala/org/apache/spark/resource/ResourceDiscoveryPluginSuite.scala index 7a05daa..437c903 100644 --- a/core/src/test/scala/org/apache/spark/resource/ResourceDiscoveryPluginSuite.scala +++ b/core/src/test/scala/org/apache/spark/resource/ResourceDiscoveryPluginSuite.scala @@ -56,7 +56,7 @@ class ResourceDiscoveryPluginSuite extends SparkFunSuite with LocalSparkContext .set(EXECUTOR_FPGA_ID.amountConf, "1") sc = new SparkContext(conf) - TestUtils.waitUntilExecutorsUp(sc, 2, 1) + TestUtils.waitUntilExecutorsUp(sc, 2, 6) eventually(timeout(10.seconds), interval(100.millis)) { val children = dir.listFiles() @@ -84,7 +84,7 @@ class ResourceDiscoveryPluginSuite extends SparkFunSuite with LocalSparkContext .set(SPARK_RESOURCES_DIR, dir.ge
[spark] branch master updated (961c539 -> 6c0c41f)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 961c539 [SPARK-28998][SQL][FOLLOW-UP] Remove unnecessary MiMa excludes add 6c0c41f [SPARK-30987][CORE] Increase the timeout on local-cluster waitUntilExecutorsUp calls No new revisions were added by this update. Summary of changes: core/src/test/scala/org/apache/spark/DistributedSuite.scala | 2 +- .../org/apache/spark/internal/plugin/PluginContainerSuite.scala | 4 ++-- .../org/apache/spark/resource/ResourceDiscoveryPluginSuite.scala | 8 3 files changed, 7 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a40a2f8 -> 3b69796)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a40a2f8 [SPARK-27619][SQL][FOLLOWUP] Rename 'spark.sql.legacy.useHashOnMapType' to 'spark.sql.legacy.allowHashOnMapType' add 3b69796 [SPARK-30947][CORE] Log better message when accelerate resource is empty No new revisions were added by this update. Summary of changes: .../src/main/scala/org/apache/spark/resource/ResourceProfile.scala | 4 +++- core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala | 7 ++- 2 files changed, 9 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (a40a2f8 -> 3b69796)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from a40a2f8 [SPARK-27619][SQL][FOLLOWUP] Rename 'spark.sql.legacy.useHashOnMapType' to 'spark.sql.legacy.allowHashOnMapType' add 3b69796 [SPARK-30947][CORE] Log better message when accelerate resource is empty No new revisions were added by this update. Summary of changes: .../src/main/scala/org/apache/spark/resource/ResourceProfile.scala | 4 +++- core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala | 7 ++- 2 files changed, 9 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new a5b3377 [SPARK-30667][CORE] Add all gather method to BarrierTaskContext a5b3377 is described below commit a5b3377f795a76fe218e7c9c0dafc0cbcf9a80a6 Author: sarthfrey-db AuthorDate: Fri Feb 21 11:40:28 2020 -0800 [SPARK-30667][CORE] Add all gather method to BarrierTaskContext Fix for #27395 ### What changes were proposed in this pull request? The `allGather` method is added to the `BarrierTaskContext`. This method contains the same functionality as the `BarrierTaskContext.barrier` method; it blocks the task until all tasks make the call, at which time they may continue execution. In addition, the `allGather` method takes an input message. Upon returning from the `allGather` the task receives a list of all the messages sent by all the tasks that made the `allGather` call. ### Why are the changes needed? There are many situations where having the tasks communicate in a synchronized way is useful. One simple example is if each task needs to start a server to serve requests from one another; first the tasks must find a free port (the result of which is undetermined beforehand) and then start making requests, but to do so they each must know the port chosen by the other task. An `allGather` method would allow them to inform each other of the port they will run on. ### Does this PR introduce any user-facing change? Yes, an `BarrierTaskContext.allGather` method will be available through the Scala, Java, and Python APIs. ### How was this patch tested? Most of the code path is already covered by tests to the `barrier` method, since this PR includes a refactor so that much code is shared by the `barrier` and `allGather` methods. However, a test is added to assert that an all gather on each tasks partition ID will return a list of every partition ID. An example through the Python API: ```python >>> from pyspark import BarrierTaskContext >>> >>> def f(iterator): ... context = BarrierTaskContext.get() ... return [context.allGather('{}'.format(context.partitionId()))] ... >>> sc.parallelize(range(4), 4).barrier().mapPartitions(f).collect()[0] [u'3', u'1', u'0', u'2'] ``` Closes #27640 from sarthfrey/master. Lead-authored-by: sarthfrey-db Co-authored-by: sarthfrey Signed-off-by: Xingbo Jiang (cherry picked from commit 274b328f57a626bb770c9831ae81034d060a7e1e) Signed-off-by: Xingbo Jiang --- .../org/apache/spark/BarrierCoordinator.scala | 111 +-- .../org/apache/spark/BarrierTaskContext.scala | 153 ++--- .../org/apache/spark/api/python/PythonRunner.scala | 51 +-- .../spark/scheduler/BarrierTaskContextSuite.scala | 77 +++ project/MimaExcludes.scala | 5 +- python/pyspark/taskcontext.py | 51 ++- python/pyspark/tests/test_taskcontext.py | 20 +++ 7 files changed, 388 insertions(+), 80 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala index 4e41767..be5036e 100644 --- a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala +++ b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala @@ -17,12 +17,17 @@ package org.apache.spark +import java.nio.charset.StandardCharsets.UTF_8 import java.util.{Timer, TimerTask} import java.util.concurrent.ConcurrentHashMap import java.util.function.Consumer import scala.collection.mutable.ArrayBuffer +import org.json4s.JsonAST._ +import org.json4s.JsonDSL._ +import org.json4s.jackson.JsonMethods.{compact, render} + import org.apache.spark.internal.Logging import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint} import org.apache.spark.scheduler.{LiveListenerBus, SparkListener, SparkListenerStageCompleted} @@ -99,10 +104,15 @@ private[spark] class BarrierCoordinator( // reset when a barrier() call fails due to timeout. private var barrierEpoch: Int = 0 -// An array of RPCCallContexts for barrier tasks that are waiting for reply of a barrier() -// call. +// An Array of RPCCallContexts for barrier tasks that have made a blocking runBarrier() call private val requesters: ArrayBuffer[RpcCallContext] = new ArrayBuffer[RpcCallContext](numTasks) +// An Array of allGather messages for barrier tasks that have made a blocking runBarrier() call +private val allGatherMessages: ArrayBuffer[String] = new Array[String](numTasks).to[ArrayBuffer] + +// The
[spark] branch branch-3.0 updated: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new a5b3377 [SPARK-30667][CORE] Add all gather method to BarrierTaskContext a5b3377 is described below commit a5b3377f795a76fe218e7c9c0dafc0cbcf9a80a6 Author: sarthfrey-db AuthorDate: Fri Feb 21 11:40:28 2020 -0800 [SPARK-30667][CORE] Add all gather method to BarrierTaskContext Fix for #27395 ### What changes were proposed in this pull request? The `allGather` method is added to the `BarrierTaskContext`. This method contains the same functionality as the `BarrierTaskContext.barrier` method; it blocks the task until all tasks make the call, at which time they may continue execution. In addition, the `allGather` method takes an input message. Upon returning from the `allGather` the task receives a list of all the messages sent by all the tasks that made the `allGather` call. ### Why are the changes needed? There are many situations where having the tasks communicate in a synchronized way is useful. One simple example is if each task needs to start a server to serve requests from one another; first the tasks must find a free port (the result of which is undetermined beforehand) and then start making requests, but to do so they each must know the port chosen by the other task. An `allGather` method would allow them to inform each other of the port they will run on. ### Does this PR introduce any user-facing change? Yes, an `BarrierTaskContext.allGather` method will be available through the Scala, Java, and Python APIs. ### How was this patch tested? Most of the code path is already covered by tests to the `barrier` method, since this PR includes a refactor so that much code is shared by the `barrier` and `allGather` methods. However, a test is added to assert that an all gather on each tasks partition ID will return a list of every partition ID. An example through the Python API: ```python >>> from pyspark import BarrierTaskContext >>> >>> def f(iterator): ... context = BarrierTaskContext.get() ... return [context.allGather('{}'.format(context.partitionId()))] ... >>> sc.parallelize(range(4), 4).barrier().mapPartitions(f).collect()[0] [u'3', u'1', u'0', u'2'] ``` Closes #27640 from sarthfrey/master. Lead-authored-by: sarthfrey-db Co-authored-by: sarthfrey Signed-off-by: Xingbo Jiang (cherry picked from commit 274b328f57a626bb770c9831ae81034d060a7e1e) Signed-off-by: Xingbo Jiang --- .../org/apache/spark/BarrierCoordinator.scala | 111 +-- .../org/apache/spark/BarrierTaskContext.scala | 153 ++--- .../org/apache/spark/api/python/PythonRunner.scala | 51 +-- .../spark/scheduler/BarrierTaskContextSuite.scala | 77 +++ project/MimaExcludes.scala | 5 +- python/pyspark/taskcontext.py | 51 ++- python/pyspark/tests/test_taskcontext.py | 20 +++ 7 files changed, 388 insertions(+), 80 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala index 4e41767..be5036e 100644 --- a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala +++ b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala @@ -17,12 +17,17 @@ package org.apache.spark +import java.nio.charset.StandardCharsets.UTF_8 import java.util.{Timer, TimerTask} import java.util.concurrent.ConcurrentHashMap import java.util.function.Consumer import scala.collection.mutable.ArrayBuffer +import org.json4s.JsonAST._ +import org.json4s.JsonDSL._ +import org.json4s.jackson.JsonMethods.{compact, render} + import org.apache.spark.internal.Logging import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint} import org.apache.spark.scheduler.{LiveListenerBus, SparkListener, SparkListenerStageCompleted} @@ -99,10 +104,15 @@ private[spark] class BarrierCoordinator( // reset when a barrier() call fails due to timeout. private var barrierEpoch: Int = 0 -// An array of RPCCallContexts for barrier tasks that are waiting for reply of a barrier() -// call. +// An Array of RPCCallContexts for barrier tasks that have made a blocking runBarrier() call private val requesters: ArrayBuffer[RpcCallContext] = new ArrayBuffer[RpcCallContext](numTasks) +// An Array of allGather messages for barrier tasks that have made a blocking runBarrier() call +private val allGatherMessages: ArrayBuffer[String] = new Array[String](numTasks).to[ArrayBuffer] + +// The
[spark] branch master updated (1f0300f -> 274b328)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1f0300f [SPARK-30764][SQL] Improve the readability of EXPLAIN FORMATTED style add 274b328 [SPARK-30667][CORE] Add all gather method to BarrierTaskContext No new revisions were added by this update. Summary of changes: .../org/apache/spark/BarrierCoordinator.scala | 111 +-- .../org/apache/spark/BarrierTaskContext.scala | 153 ++--- .../org/apache/spark/api/python/PythonRunner.scala | 51 +-- .../spark/scheduler/BarrierTaskContextSuite.scala | 77 +++ project/MimaExcludes.scala | 5 +- python/pyspark/taskcontext.py | 51 ++- python/pyspark/tests/test_taskcontext.py | 20 +++ 7 files changed, 388 insertions(+), 80 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1f0300f -> 274b328)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1f0300f [SPARK-30764][SQL] Improve the readability of EXPLAIN FORMATTED style add 274b328 [SPARK-30667][CORE] Add all gather method to BarrierTaskContext No new revisions were added by this update. Summary of changes: .../org/apache/spark/BarrierCoordinator.scala | 111 +-- .../org/apache/spark/BarrierTaskContext.scala | 153 ++--- .../org/apache/spark/api/python/PythonRunner.scala | 51 +-- .../spark/scheduler/BarrierTaskContextSuite.scala | 77 +++ project/MimaExcludes.scala | 5 +- python/pyspark/taskcontext.py | 51 ++- python/pyspark/tests/test_taskcontext.py | 20 +++ 7 files changed, 388 insertions(+), 80 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: Revert "[SPARK-30667][CORE] Add allGather method to BarrierTaskContext"
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new cadec3d Revert "[SPARK-30667][CORE] Add allGather method to BarrierTaskContext" cadec3d is described below commit cadec3d239186e3abc104b67f7d1aa09d4d7516c Author: Xingbo Jiang AuthorDate: Wed Feb 19 17:06:20 2020 -0800 Revert "[SPARK-30667][CORE] Add allGather method to BarrierTaskContext" This reverts commit f482187c127418d2ea538ac2551ae0fce1ddbc31. --- .../org/apache/spark/BarrierCoordinator.scala | 113 ++- .../org/apache/spark/BarrierTaskContext.scala | 153 +++-- .../org/apache/spark/api/python/PythonRunner.scala | 51 ++- .../spark/scheduler/BarrierTaskContextSuite.scala | 74 -- python/pyspark/taskcontext.py | 49 +-- python/pyspark/tests/test_taskcontext.py | 20 --- 6 files changed, 79 insertions(+), 381 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala index 042a266..4e41767 100644 --- a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala +++ b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala @@ -17,17 +17,12 @@ package org.apache.spark -import java.nio.charset.StandardCharsets.UTF_8 import java.util.{Timer, TimerTask} import java.util.concurrent.ConcurrentHashMap import java.util.function.Consumer import scala.collection.mutable.ArrayBuffer -import org.json4s.JsonAST._ -import org.json4s.JsonDSL._ -import org.json4s.jackson.JsonMethods.{compact, render} - import org.apache.spark.internal.Logging import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint} import org.apache.spark.scheduler.{LiveListenerBus, SparkListener, SparkListenerStageCompleted} @@ -104,15 +99,10 @@ private[spark] class BarrierCoordinator( // reset when a barrier() call fails due to timeout. private var barrierEpoch: Int = 0 -// An Array of RPCCallContexts for barrier tasks that have made a blocking runBarrier() call +// An array of RPCCallContexts for barrier tasks that are waiting for reply of a barrier() +// call. private val requesters: ArrayBuffer[RpcCallContext] = new ArrayBuffer[RpcCallContext](numTasks) -// An Array of allGather messages for barrier tasks that have made a blocking runBarrier() call -private val allGatherMessages: ArrayBuffer[String] = new Array[String](numTasks).to[ArrayBuffer] - -// The blocking requestMethod called by tasks to sync up for this stage attempt -private var requestMethodToSync: RequestMethod.Value = RequestMethod.BARRIER - // A timer task that ensures we may timeout for a barrier() call. private var timerTask: TimerTask = null @@ -140,32 +130,9 @@ private[spark] class BarrierCoordinator( // Process the global sync request. The barrier() call succeed if collected enough requests // within a configured time, otherwise fail all the pending requests. -def handleRequest( - requester: RpcCallContext, - request: RequestToSync -): Unit = synchronized { +def handleRequest(requester: RpcCallContext, request: RequestToSync): Unit = synchronized { val taskId = request.taskAttemptId val epoch = request.barrierEpoch - val requestMethod = request.requestMethod - val partitionId = request.partitionId - val allGatherMessage = request match { -case ag: AllGatherRequestToSync => ag.allGatherMessage -case _ => "" - } - - if (requesters.size == 0) { -requestMethodToSync = requestMethod - } - - if (requestMethodToSync != requestMethod) { -requesters.foreach( - _.sendFailure(new SparkException(s"$barrierId tried to use requestMethod " + -s"`$requestMethod` during barrier epoch $barrierEpoch, which does not match " + -s"the current synchronized requestMethod `$requestMethodToSync`" - )) -) -cleanupBarrierStage(barrierId) - } // Require the number of tasks is correctly set from the BarrierTaskContext. require(request.numTasks == numTasks, s"Number of tasks of $barrierId is " + @@ -186,7 +153,6 @@ private[spark] class BarrierCoordinator( } // Add the requester to array of RPCCallContexts pending for reply. requesters += requester -allGatherMessages(partitionId) = allGatherMessage logInfo(s"Barrier sync epoch $barrierEpoch from $barrierId received update from Task " + s"$taskId, current progress: ${requesters.size}/$numTasks.") if (maybeFinishAllRequesters(requesters
[spark] branch master updated (6936799 -> e32411e)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 6936799 [SPARK-30802][ML] Use Summarizer instead of MultivariateOnlineSummarizer in Aggregator test suite add e32411e Revert "[SPARK-30667][CORE] Add allGather method to BarrierTaskContext" No new revisions were added by this update. Summary of changes: .../org/apache/spark/BarrierCoordinator.scala | 113 ++- .../org/apache/spark/BarrierTaskContext.scala | 153 +++-- .../org/apache/spark/api/python/PythonRunner.scala | 51 ++- .../spark/scheduler/BarrierTaskContextSuite.scala | 74 -- python/pyspark/taskcontext.py | 49 +-- python/pyspark/tests/test_taskcontext.py | 20 --- 6 files changed, 79 insertions(+), 381 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: Revert "[SPARK-30667][CORE] Add allGather method to BarrierTaskContext"
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new eb37aa5 Revert "[SPARK-30667][CORE] Add allGather method to BarrierTaskContext" eb37aa5 is described below commit eb37aa5595badd79becf4d3d332404cbcdb1b12d Author: Xingbo Jiang AuthorDate: Thu Feb 13 17:48:19 2020 -0800 Revert "[SPARK-30667][CORE] Add allGather method to BarrierTaskContext" This reverts commit 6001866cea1216da421c5acd71d6fc74228222ac. --- .../org/apache/spark/BarrierCoordinator.scala | 113 ++- .../org/apache/spark/BarrierTaskContext.scala | 153 +++-- .../org/apache/spark/api/python/PythonRunner.scala | 51 ++- .../spark/scheduler/BarrierTaskContextSuite.scala | 74 -- python/pyspark/taskcontext.py | 49 +-- python/pyspark/tests/test_taskcontext.py | 20 --- 6 files changed, 79 insertions(+), 381 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala index 042a266..4e41767 100644 --- a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala +++ b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala @@ -17,17 +17,12 @@ package org.apache.spark -import java.nio.charset.StandardCharsets.UTF_8 import java.util.{Timer, TimerTask} import java.util.concurrent.ConcurrentHashMap import java.util.function.Consumer import scala.collection.mutable.ArrayBuffer -import org.json4s.JsonAST._ -import org.json4s.JsonDSL._ -import org.json4s.jackson.JsonMethods.{compact, render} - import org.apache.spark.internal.Logging import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint} import org.apache.spark.scheduler.{LiveListenerBus, SparkListener, SparkListenerStageCompleted} @@ -104,15 +99,10 @@ private[spark] class BarrierCoordinator( // reset when a barrier() call fails due to timeout. private var barrierEpoch: Int = 0 -// An Array of RPCCallContexts for barrier tasks that have made a blocking runBarrier() call +// An array of RPCCallContexts for barrier tasks that are waiting for reply of a barrier() +// call. private val requesters: ArrayBuffer[RpcCallContext] = new ArrayBuffer[RpcCallContext](numTasks) -// An Array of allGather messages for barrier tasks that have made a blocking runBarrier() call -private val allGatherMessages: ArrayBuffer[String] = new Array[String](numTasks).to[ArrayBuffer] - -// The blocking requestMethod called by tasks to sync up for this stage attempt -private var requestMethodToSync: RequestMethod.Value = RequestMethod.BARRIER - // A timer task that ensures we may timeout for a barrier() call. private var timerTask: TimerTask = null @@ -140,32 +130,9 @@ private[spark] class BarrierCoordinator( // Process the global sync request. The barrier() call succeed if collected enough requests // within a configured time, otherwise fail all the pending requests. -def handleRequest( - requester: RpcCallContext, - request: RequestToSync -): Unit = synchronized { +def handleRequest(requester: RpcCallContext, request: RequestToSync): Unit = synchronized { val taskId = request.taskAttemptId val epoch = request.barrierEpoch - val requestMethod = request.requestMethod - val partitionId = request.partitionId - val allGatherMessage = request match { -case ag: AllGatherRequestToSync => ag.allGatherMessage -case _ => "" - } - - if (requesters.size == 0) { -requestMethodToSync = requestMethod - } - - if (requestMethodToSync != requestMethod) { -requesters.foreach( - _.sendFailure(new SparkException(s"$barrierId tried to use requestMethod " + -s"`$requestMethod` during barrier epoch $barrierEpoch, which does not match " + -s"the current synchronized requestMethod `$requestMethodToSync`" - )) -) -cleanupBarrierStage(barrierId) - } // Require the number of tasks is correctly set from the BarrierTaskContext. require(request.numTasks == numTasks, s"Number of tasks of $barrierId is " + @@ -186,7 +153,6 @@ private[spark] class BarrierCoordinator( } // Add the requester to array of RPCCallContexts pending for reply. requesters += requester -allGatherMessages(partitionId) = allGatherMessage logInfo(s"Barrier sync epoch $barrierEpoch from $barrierId received update from Task " + s"$taskId, current progress: ${requesters.size}/$numTasks.") if (maybeFinishAllRequesters(requesters
[spark] branch master updated (57254c9 -> fa3517c)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 57254c9 [SPARK-30667][CORE] Add allGather method to BarrierTaskContext add fa3517c Revert "[SPARK-30667][CORE] Add allGather method to BarrierTaskContext" No new revisions were added by this update. Summary of changes: .../org/apache/spark/BarrierCoordinator.scala | 113 ++- .../org/apache/spark/BarrierTaskContext.scala | 153 +++-- .../org/apache/spark/api/python/PythonRunner.scala | 51 ++- .../spark/scheduler/BarrierTaskContextSuite.scala | 74 -- python/pyspark/taskcontext.py | 49 +-- python/pyspark/tests/test_taskcontext.py | 20 --- 6 files changed, 79 insertions(+), 381 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (57254c9 -> fa3517c)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 57254c9 [SPARK-30667][CORE] Add allGather method to BarrierTaskContext add fa3517c Revert "[SPARK-30667][CORE] Add allGather method to BarrierTaskContext" No new revisions were added by this update. Summary of changes: .../org/apache/spark/BarrierCoordinator.scala | 113 ++- .../org/apache/spark/BarrierTaskContext.scala | 153 +++-- .../org/apache/spark/api/python/PythonRunner.scala | 51 ++- .../spark/scheduler/BarrierTaskContextSuite.scala | 74 -- python/pyspark/taskcontext.py | 49 +-- python/pyspark/tests/test_taskcontext.py | 20 --- 6 files changed, 79 insertions(+), 381 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bd7510b -> c49abf8)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bd7510b [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource add c49abf8 [SPARK-30417][CORE] Task speculation numTaskThreshold should be greater than 0 even EXECUTOR_CORES is not set under Standalone mode No new revisions were added by this update. Summary of changes: .../org/apache/spark/internal/config/package.scala | 3 +- .../apache/spark/scheduler/TaskSetManager.scala| 8 - .../spark/scheduler/TaskSetManagerSuite.scala | 37 ++ docs/configuration.md | 3 +- 4 files changed, 41 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3d45779 -> 2e71a6e)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3d45779 [SPARK-29728][SQL] Datasource V2: Support ALTER TABLE RENAME TO add 2e71a6e [SPARK-27558][CORE] Gracefully cleanup task when it fails with OOM exception No new revisions were added by this update. Summary of changes: .../spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java | 4 1 file changed, 4 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3d45779 -> 2e71a6e)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3d45779 [SPARK-29728][SQL] Datasource V2: Support ALTER TABLE RENAME TO add 2e71a6e [SPARK-27558][CORE] Gracefully cleanup task when it fails with OOM exception No new revisions were added by this update. Summary of changes: .../spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java | 4 1 file changed, 4 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (39b502a -> 15a72f3)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 39b502a [SPARK-29778][SQL] pass writer options to saveAsTable in append mode add 15a72f3 [SPARK-29287][CORE] Add LaunchedExecutor message to tell driver which executor is ready for making offers No new revisions were added by this update. Summary of changes: .../org/apache/spark/executor/CoarseGrainedExecutorBackend.scala | 1 + .../spark/scheduler/cluster/CoarseGrainedClusterMessage.scala| 2 ++ .../spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala | 9 +++-- .../apache/spark/deploy/StandaloneDynamicAllocationSuite.scala | 3 ++- 4 files changed, 12 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: fix download page (#229)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new ce0a7e9 fix download page (#229) ce0a7e9 is described below commit ce0a7e93441787d27fdfdee9eccc27503ed7fdc0 Author: Jiang Xingbo AuthorDate: Thu Nov 7 12:33:14 2019 -0800 fix download page (#229) The download page is not able to show `3.0.0-preview` release, because `hadoop3p2` was not recognized. This PR fixes the issue. --- js/downloads.js | 2 +- site/js/downloads.js | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/js/downloads.js b/js/downloads.js index 47103f0..f7429e1 100644 --- a/js/downloads.js +++ b/js/downloads.js @@ -15,7 +15,7 @@ var sources = {pretty: "Source Code", tag: "sources"}; var hadoopFree = {pretty: "Pre-built with user-provided Apache Hadoop", tag: "without-hadoop"}; var hadoop2p6 = {pretty: "Pre-built for Apache Hadoop 2.6", tag: "hadoop2.6"}; var hadoop2p7 = {pretty: "Pre-built for Apache Hadoop 2.7", tag: "hadoop2.7"}; -var hadoop3p2 = {pretty: "Pre-built for Apache Hadoop 3.2 and later", tag: "hadoop3.2"} +var hadoop3p2 = {pretty: "Pre-built for Apache Hadoop 3.2 and later", tag: "hadoop3.2"}; var scala2p12_hadoopFree = {pretty: "Pre-built with Scala 2.12 and user-provided Apache Hadoop", tag: "without-hadoop-scala-2.12"}; // 2.2.0+ diff --git a/site/js/downloads.js b/site/js/downloads.js index 47103f0..f7429e1 100644 --- a/site/js/downloads.js +++ b/site/js/downloads.js @@ -15,7 +15,7 @@ var sources = {pretty: "Source Code", tag: "sources"}; var hadoopFree = {pretty: "Pre-built with user-provided Apache Hadoop", tag: "without-hadoop"}; var hadoop2p6 = {pretty: "Pre-built for Apache Hadoop 2.6", tag: "hadoop2.6"}; var hadoop2p7 = {pretty: "Pre-built for Apache Hadoop 2.7", tag: "hadoop2.7"}; -var hadoop3p2 = {pretty: "Pre-built for Apache Hadoop 3.2 and later", tag: "hadoop3.2"} +var hadoop3p2 = {pretty: "Pre-built for Apache Hadoop 3.2 and later", tag: "hadoop3.2"}; var scala2p12_hadoopFree = {pretty: "Pre-built with Scala 2.12 and user-provided Apache Hadoop", tag: "without-hadoop-scala-2.12"}; // 2.2.0+ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] annotated tag v3.0.0-preview updated (007c873 -> b23e21e)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to annotated tag v3.0.0-preview in repository https://gitbox.apache.org/repos/asf/spark.git. *** WARNING: tag v3.0.0-preview was modified! *** from 007c873 (commit) to b23e21e (tag) tagging 007c873ae34f58651481ccba30e8e2ba38a692c4 (commit) replaces v3.0.0-preview-rc1 by Xingbo Jiang on Wed Oct 30 18:03:22 2019 -0700 - Log - v3.0.0-preview-rc2 --- No new revisions were added by this update. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r36658 - /dev/spark/v3.0.0-preview-rc2-docs/
Author: jiangxb1987 Date: Wed Nov 6 23:28:58 2019 New Revision: 36658 Log: Removing RC artifacts. Removed: dev/spark/v3.0.0-preview-rc2-docs/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r36657 - in /dev/spark: v3.0.0-preview-rc1-bin/ v3.0.0-preview-rc1-docs/
Author: jiangxb1987 Date: Wed Nov 6 23:27:47 2019 New Revision: 36657 Log: Removing RC artifacts. Removed: dev/spark/v3.0.0-preview-rc1-bin/ dev/spark/v3.0.0-preview-rc1-docs/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r36563 - in /dev/spark/v3.0.0-preview-rc2-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apac
Author: jiangxb1987 Date: Thu Oct 31 02:38:18 2019 New Revision: 36563 Log: Apache Spark v3.0.0-preview-rc2 docs [This commit notification would consist of 1890 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r36562 - /dev/spark/v3.0.0-preview-rc2-bin/
Author: jiangxb1987 Date: Thu Oct 31 02:22:19 2019 New Revision: 36562 Log: Apache Spark v3.0.0-preview-rc2 Added: dev/spark/v3.0.0-preview-rc2-bin/ dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz (with props) dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.asc dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.sha512 dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz (with props) dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz.asc dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz.sha512 dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz (with props) dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz.asc dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz.sha512 dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-hadoop3.2.tgz (with props) dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-hadoop3.2.tgz.asc dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-hadoop3.2.tgz.sha512 dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-without-hadoop.tgz (with props) dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-without-hadoop.tgz.asc dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-without-hadoop.tgz.sha512 dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview.tgz (with props) dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview.tgz.asc dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview.tgz.sha512 Added: dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.asc == --- dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.asc (added) +++ dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.asc Thu Oct 31 02:22:19 2019 @@ -0,0 +1,16 @@ +-BEGIN PGP SIGNATURE- + +iQIzBAABCgAdFiEEJpWcIu5dRoIEnWUm4bfg8l5L9WsFAl26PZIACgkQ4bfg8l5L +9Wv1YRAAjyL1er+AJEdWJg8EkTq0Yq1anqthY+6ARmc3gKUm5MiyxB5SXh/5xNrG +k4Pob3aQ4cQdsMeXv1SX1+zHqVDa2VEbk5GEf4YDeS5OYB0f3SeG/C4rrGVktoq6 +r50VEg0b2ur+2iYup5VG6hvGQmiF4UlICVGvPA8HvyIbvMt6pxxWSW2musIz32Tq +pnR7fqa4d5/WtLKk+m3W/n+LIk6P04cG++thsr2Xl8UcPB17g0utuEH6l2qFDIT1 +GJqCFvptQRHxJIhBG1fC0wIuXDm4uBSv/yMlS7MlLn+PQ5PB2NLcc30VkRzepWKp +efuhy9rPGEVMNxcpPQf64OkibnQXNpOkvD7N7RNaDh+sVWhiPtUbmAGPKtNjgpvw +yWUJO40SbFAyA7UY7gFBI9LmYqE1iQCCpLhsIzT/Q8pQrQ9sssgbO3Oih2Dhj+QA +K8xVBDmlP0M/9K+Xk8BITAg2zvsOmtEd18dL/te+vaxTHlMsWSGDq0ZcDpbU8S0p +FXSifMUGb1/wdthg2V+bqi+/VPIRvdOqUj9gfrooXEcUx9XShI39CQxVhJbb1k2+ +SrFNqC3zY8DrXaf/Imx1C5sQPAMY4T2dE22NkrFLhwsiNRVMk3IyfRVtI+8iVxZm +KuZNI1t5gMvaVfBIRqSZ4teT1YnktapbuDadeiC3zOk2pgB7mQk= +=ViZ1 +-END PGP SIGNATURE- Added: dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.sha512 == --- dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.sha512 (added) +++ dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.sha512 Thu Oct 31 02:22:19 2019 @@ -0,0 +1,4 @@ +SparkR_3.0.0-preview.tar.gz: 2817FF08 5338F9BA B889CA98 8F8F1961 03747AC9 + 8ED4F57D AAA87F0A DDA2A145 089F9C85 C5B6EFC9 + 4CDC0CCF 1BA48017 F712F4DC 50B534E7 6134F43E + BA87D79A Added: dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz.asc == --- dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz.asc (added) +++ dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz.asc Thu Oct 31 02:22:19 2019 @@ -0,0 +1,16 @@ +-BEGIN PGP SIGNATURE- + +iQIzBAABCgAdFiEEJpWcIu5dRoIEnWUm4bfg8l5L9WsFAl26PZMACgkQ4bfg8l5L +9Wttuw/7Bpkeag7Haalshy0Atn0SBfGHhyn/KD7ri4GvV5DmyOwEwk7fV51uHrOR +X2SmgmZ068G1/rXYmhiZGyZExg7CWLeqiG2VxOnfnvkgldg2mAs0vbx6pNEZ4b7k +DdahU3jMBIDtNSc8ivN8jJSswUJVoHb8bJN+GSMZqoOsy5DEIFRVh1yd/eqHABk0 +zNduitnQW8Vdg6Mav0fjetbzP6F/YXEg9mt4B4vV2Mwt2ChTUAGL+SaVrU1oTsUQ +NY8HVMOf8V8uCshEcsSP+Ga46N9a0StociUkiUHkzHC7PEM0PVrq+qqcmZC1/2HA +ST55bbZIaGon2D1Zy9mPvgQpY9XXrvcl9itDw/IxwIk4/lddrfoFXo4QERrZoP1m +DZpE
[spark] annotated tag v3.0.0-preview-rc2 updated (007c873 -> b23e21e)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to annotated tag v3.0.0-preview-rc2 in repository https://gitbox.apache.org/repos/asf/spark.git. *** WARNING: tag v3.0.0-preview-rc2 was modified! *** from 007c873 (commit) to b23e21e (tag) tagging 007c873ae34f58651481ccba30e8e2ba38a692c4 (commit) replaces v3.0.0-preview-rc1 by Xingbo Jiang on Wed Oct 30 18:03:22 2019 -0700 - Log - v3.0.0-preview-rc2 --- No new revisions were added by this update. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (155a67d -> 8207c83)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 155a67d [SPARK-29666][BUILD] Fix the publish release failure under dry-run mode add 007c873 Prepare Spark release v3.0.0-preview-rc2 add 8207c83 Revert "Prepare Spark release v3.0.0-preview-rc2" No new revisions were added by this update. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r36558 - in /dev/spark/v3.0.0-preview-rc2-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apac
Author: jiangxb1987 Date: Thu Oct 31 00:39:27 2019 New Revision: 36558 Log: Apache Spark v3.0.0-preview-rc2 docs [This commit notification would consist of 1890 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r36557 - /dev/spark/v3.0.0-preview-rc2-bin/
Author: jiangxb1987 Date: Thu Oct 31 00:23:13 2019 New Revision: 36557 Log: Apache Spark v3.0.0-preview-rc2 Added: dev/spark/v3.0.0-preview-rc2-bin/ dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz (with props) dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.asc dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.sha512 dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz (with props) dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz.asc dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz.sha512 dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz (with props) dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz.asc dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz.sha512 dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-hadoop3.2.tgz (with props) dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-hadoop3.2.tgz.asc dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-hadoop3.2.tgz.sha512 dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-without-hadoop.tgz (with props) dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-without-hadoop.tgz.asc dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-without-hadoop.tgz.sha512 dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview.tgz (with props) dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview.tgz.asc dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview.tgz.sha512 Added: dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.asc == --- dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.asc (added) +++ dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.asc Thu Oct 31 00:23:13 2019 @@ -0,0 +1,16 @@ +-BEGIN PGP SIGNATURE- + +iQIzBAABCgAdFiEEJpWcIu5dRoIEnWUm4bfg8l5L9WsFAl26IasACgkQ4bfg8l5L +9WvrQhAAi115Cm+BH93sVUM0jH9iWq0JN4gwb6ueNdpYArtQLZfMFA7y+d3ssnfM +44qv9gvb2KSGkSBf9JJRSk9j1pSMLnjYFRfm89p0w2DD0I7CoNqIIMgoup9elMjy +0SKUQ1CGSGWgZRlnAW5g6ZV0D4mnHzhZiakUipQHglSMLzsTQakTbKycXQPcRfTe +KhyO/B4UPxSEihFemR89DcqvKN2zuNiPdXF6QVk2/XyqrKndObSllOzgYzBixDuD +rFtmBguTdLIXv+EVE0U6b3SlW/LUxDDI/uDnwZGoycUizwWevGUcjAVc5hD1DWQv +2CysbOJAYoOMH0UR1OKIxeUoE2Akpp52byD+pfzfVkC+krqA4G8SIhSrLX1lCuvV +bcK9WFhlSw7CzMLAbNhdR0UqsO/W/vQXvh7FhEw5DpLSagQkSecqGFHifE+RCw8G +vRuqLRmNVfGO/FBjuyhpiAaLnHEJnSOcZLP2m73fVKYghhOMrsmK4p6Grl3pFvnA +pcb+S74AtQoVXmBYSvpx4q853d9RvuZr9g2p8bhywHnERQeJygC00yJbzAtaPe5D +KJRdN6wzRQw7/I659UvWRHRbCO4eM9O6ecuSTZJH9IjQmYa/7O07QVtTsDkla29d +LZJ9z17yi/+oAQtVlh4vpw41eLb1HRra4Hy9NRAafVUzKSUu7ss= +=+nkw +-END PGP SIGNATURE- Added: dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.sha512 == --- dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.sha512 (added) +++ dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.sha512 Thu Oct 31 00:23:13 2019 @@ -0,0 +1,4 @@ +SparkR_3.0.0-preview.tar.gz: EFCE3700 A013B5B6 383EE803 3DECB569 48E28BD6 + 0694077A 22D813C3 A535CA04 9303381F 5C44D018 + 214042A4 457734DA 87FBC417 3CDB774D 2A1CF133 + AF9EBB45 Added: dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz.asc == --- dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz.asc (added) +++ dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz.asc Thu Oct 31 00:23:13 2019 @@ -0,0 +1,16 @@ +-BEGIN PGP SIGNATURE- + +iQIzBAABCgAdFiEEJpWcIu5dRoIEnWUm4bfg8l5L9WsFAl26Ia0ACgkQ4bfg8l5L +9Wsq9BAAj5Dn6Ofgq0CwxNw6YnQUn8TCBf521ky+zVGS0oVwxKcUvvZ8ixLk6GkK +wIslxkkLFqa5WMLP/VQfnQnv38UpzYYQjKyATqLvh7YM65/mwRfsmExWbd7GTEGS +ReGdVwl2Vj3bHLiZMPW4up3eNhkulQP86K3yfuNhwFHXJedLPH/da3MqBI+mizNx +FVHDdvoPkmDF2OcVsZR+db/mfNnUt9Zd61nYb+aq4MWOiXduNwV77F8jDr2AZyJ9 +9Pq7RS+KAlX+3Ce+SzoVdFOMulilE6wB9WBRXBfyK1WX1+/QqVhnI8PofEmMyezl +vRlN1BZioki1zwB13J1BCmLAhRXXsLj3yQi3KAEz9BhBfqxJcae1b0UoA2MRLYRN
[spark] annotated tag v3.0.0-preview-rc2 updated (5eddbb5 -> 9e3baa3)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to annotated tag v3.0.0-preview-rc2 in repository https://gitbox.apache.org/repos/asf/spark.git. *** WARNING: tag v3.0.0-preview-rc2 was modified! *** from 5eddbb5 (commit) to 9e3baa3 (tag) tagging 5eddbb5f1d9789696927f435c55df887e50a1389 (commit) replaces 3.0.0-preview by Xingbo Jiang on Wed Oct 30 16:23:08 2019 -0700 - Log - v3.0.0-preview-rc2 --- No new revisions were added by this update. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] annotated tag v3.0.0-preview-rc2 updated (155a67d -> f86eed8)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to annotated tag v3.0.0-preview-rc2 in repository https://gitbox.apache.org/repos/asf/spark.git. *** WARNING: tag v3.0.0-preview-rc2 was modified! *** from 155a67d (commit) to f86eed8 (tag) tagging 155a67d00cb2f12aad179f6df2d992feca8e003e (commit) replaces v3.0.0-preview-rc1 by Xingbo Jiang on Wed Oct 30 16:16:56 2019 -0700 - Log - v3.0.0-preview-rc2 --- No new revisions were added by this update. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r36548 - in /dev/spark/v3.0.0-preview-rc1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apac
Author: jiangxb1987 Date: Tue Oct 29 21:25:58 2019 New Revision: 36548 Log: Apache Spark v3.0.0-preview-rc1 docs [This commit notification would consist of 1890 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r36547 - /dev/spark/v3.0.0-preview-rc1-bin/
Author: jiangxb1987 Date: Tue Oct 29 21:09:40 2019 New Revision: 36547 Log: Apache Spark v3.0.0-preview-rc1 Added: dev/spark/v3.0.0-preview-rc1-bin/ dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-preview.tar.gz (with props) dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-preview.tar.gz.asc dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-preview.tar.gz.sha512 dev/spark/v3.0.0-preview-rc1-bin/pyspark-3.0.0.preview.tar.gz.sha512 dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz (with props) dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz.asc dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz.sha512 dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-hadoop3.2.tgz (with props) dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-hadoop3.2.tgz.asc dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-hadoop3.2.tgz.sha512 dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-without-hadoop.tgz (with props) dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-without-hadoop.tgz.asc dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-without-hadoop.tgz.sha512 dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview.tgz (with props) dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview.tgz.asc dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview.tgz.sha512 Added: dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-preview.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-preview.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-preview.tar.gz.asc == --- dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-preview.tar.gz.asc (added) +++ dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-preview.tar.gz.asc Tue Oct 29 21:09:40 2019 @@ -0,0 +1,16 @@ +-BEGIN PGP SIGNATURE- + +iQIzBAABCgAdFiEEJpWcIu5dRoIEnWUm4bfg8l5L9WsFAl24osoACgkQ4bfg8l5L +9Wthuw/9Hzrv5SIBFq7PjSn3IaZ9k9HRWuARKFgCX1O4wOY1giLqv/8gVkXyh139 +cT7fgq4lPRc/EwF6DdQqqp5cGGkZeATOVchDWru8/uPOpgjBJtuaM5ZQvHYla5CP +mYCEz0Y0aDLvb+7QjcBjfXQnj3Jwv92uzdTH9J1upaUJP1UX6KN4AmkC/utHa0BQ +FM8e/yLk7k0o7PSdXl418wzV88Q/sQZA+C4h7bJjSF8Zgfnnwy99yRmVwOaLNx2o +gpYo6+0kvX3dh8/jRb1zCgGgidtWHpnuGJZU7teyKpUoZefiE5FL4TG13HdekSjh +pZGhWlk2iWbca0rRxABmIy7Xvcm8RC8SueOzorTa0CpvlmtOHYpDeEv7mQNSOa19 +alHCkdMIKgCTTDffYBv+baePSo3MKFXSHPQ0xAD0TKUN/Jb3w6UUSTIMlS6wfz47 +KLWIq61EcuW+/irOOQ7ZRUK/e2DDpT0jtM3k6oKtFlbSqWLqCtZZTAzvAhU17GzZ +SVX+yBDax+n6ZfL9NOEL1geUKqKlhOVChqUPaL/VVZ0KXKphPhqvS0/EHz8/0l52 +R8JwPfZn5u192OfU8Qqtc5oMLcpX2hm7aVgU8q5XNRrF1u9egytOdrnKyRbaHlds +nWyVHzx1dySMiULY9TO5bRkjKFr+DY828QgsgiHb7VBRfuzpng4= +=ANBW +-END PGP SIGNATURE- Added: dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-preview.tar.gz.sha512 == --- dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-preview.tar.gz.sha512 (added) +++ dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-preview.tar.gz.sha512 Tue Oct 29 21:09:40 2019 @@ -0,0 +1,4 @@ +SparkR_3.0.0-preview.tar.gz: A5317BEC BA7BD226 0971EDCC FEF0F3CB 1BD21180 + DA41E5DC 3D37CF0F CB873512 B7FDF14A 453F2FFC + 7E1B88C0 5E98FEC0 0BC2BA21 FE611726 218C7500 + 44D44610 Added: dev/spark/v3.0.0-preview-rc1-bin/pyspark-3.0.0.preview.tar.gz.sha512 == (empty) Added: dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz.asc == --- dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz.asc (added) +++ dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz.asc Tue Oct 29 21:09:40 2019 @@ -0,0 +1,16 @@ +-BEGIN PGP SIGNATURE- + +iQIzBAABCgAdFiEEJpWcIu5dRoIEnWUm4bfg8l5L9WsFAl24oswACgkQ4bfg8l5L +9WuV/A//UTaApVOBvsRuw7xMsr0SYVmrjZQEBdAGXiPaDyXUmw6qTNk4q0RTLzMQ +rXxP90Jmu4T8qPM6gQnLVDGmq7+5GBs9Bvs4eZ9ZMKnrZ/pIYw1+Q4sym5uzPI13 +lAvmnYCnaAQuHHBSN6YJv9O2WAUUX1WnOIo/LOThgabr2j6tWxJDF2u5ZIk5aDni +iDQ+SJVICzTKcqSS0+IPJr+rR/WJXEd34RrYklCs54pSHgbb8ZbEFY/mErXLXIP4 +4We7oRGwW9Qb6WY0AjShjgD1ru2fjXZz5jJFnTWt9uZw6q7hFsnjvGSNRNbiGNdD
svn commit: r36546 - /dev/spark/KEYS
Author: jiangxb1987 Date: Tue Oct 29 21:05:34 2019 New Revision: 36546 Log: Update KEYS Modified: dev/spark/KEYS Modified: dev/spark/KEYS == --- dev/spark/KEYS (original) +++ dev/spark/KEYS Tue Oct 29 21:05:34 2019 @@ -1050,3 +1050,60 @@ XJ5Dp1pqv9DC6cl9vLSHctRrM2kG =mQLW -END PGP PUBLIC KEY BLOCK- +pub rsa4096 2019-10-22 [SC] [expires: 2029-10-19] + 26959C22EE5D4682049D6526E1B7E0F25E4BF56B +uid [ultimate] Xingbo Jiang (CODE SIGNING KEY) +sub rsa4096 2019-10-22 [E] [expires: 2029-10-19] + +-BEGIN PGP PUBLIC KEY BLOCK- + +mQINBF2uvsQBEAC2Nv1IChLuRTlq/fADcp2Q0dJ4dwsdEZuuGqtius6hiQuJX/6b +1EoVkns+8EVF88mBLGyw5RtYLqVmfPjuObhw0yHX5wO9ilSjrhyVPf1oSAg7TdZg +Lm2Sr4Lc1tzMAS+smtkrsNdARzTHYE5s4gFh5Kq65Y4+aNjbNOWomlNIAJumXO62 +5RGYn0cMaqx2DyFmavUYp2ofAf1OnBlTOhZnVFxkl+EIbbZfuynRcWG2dV1IYbPk ++felLi9EDtps0H0MWxZrwqOdHC5gTBoA10XTGmF0mYiTaT60+FclnBTiGipzhrP8 +O4lIWfzokYe9UjE5MX454tsPPNAHQ1b24AYqZJfBOzOn62yVayVzSy2LvRUinbPp +O7fMmfmMhxQw+TpG5M9wlrxyfNrOwc7kZP8xn0g2MIEPtCEBkkgEB3vrGwvPnIdi +ZikgUi3zCTWRXhODzdSt/spbtQrAJRaB63KbfdIYRO/wJOQ/I7+ytg/UO+SX1+X1 +nbjjoCUkbShaUPCdxUWM/f9D60Kz9yWG8ue0n911H0s2kBSOUNmerhMt9EJbvW04 +YuLZ3Vtt7ju1b+Ol0Nar2y4H8mVh8DAelc9ZXE+T4CaGWaC0ynqAZbg4y4bvdMvm +hsdPP0jGHJYh8fxjGbiNGx513i5kshZ+HX9KxX09lsehfVbgPgayk3XQoQARAQAB +tDhYaW5nYm8gSmlhbmcgKENPREUgU0lHTklORyBLRVkpIDxqaWFuZ3hiMTk4N0Bh +cGFjaGUub3JnPokCVAQTAQgAPhYhBCaVnCLuXUaCBJ1lJuG34PJeS/VrBQJdrr7E +AhsDBQkSzAMABQsJCAcCBhUKCQgLAgQWAgMBAh4BAheAAAoJEOG34PJeS/VraCsQ +AIKYFXrprDCsl/rIc4MhvzLMvibLPWzGgjmkVZ5uJVS9zhGGU6gwatzllbpFeKkV +uXQrPt8kn2ygx/jhGvRr8ku0eKK2xAoEii89Ta2QVxkDMszKUB31OUZKJDN0ytmM +i1Y3NeWFwiKWNh9bCh1voamKHtiVHeB3LgxoKPYfYxTodr+lZFZBjo8ZAZEcNAzQ +iP46XPXCTu0J0jJhdfiP+haofOR49nP7VcD+ALS2AkEwD+2zpkcJLDB/CTEgBxco +xCCAHL7WKSXqU/526R84JXO8zLVOLIDG3//m4cagQknPvux8eTyu+OZTqwFz1gLc +Vtvv5tIH3gotmohbTNzzn+9LQhd18YawW+D+1ie29qqwq/K48K8EXpy9JAnBTJv6 +z1VcQ5HS2Va7U/fqQf9TbHLlZiK1u7XVmhvuVzqgLruAAl5zz0AybjNo9iEa+pSB +7L1MFHx7oEI3aoIiuI39dyrE0JFwUelbfHrpuBgfI/VfFW5vqkBbJrNxb4tSgzcY ++q37nNp72pZkeL5Kn3m9R4M5GDU6saaGsCpqr802f3AGdDfSI4rKfPFA7Y62S/gI +rs6da1i15sCprQvuYKMIPeUq0YIUoJ0WuHuBX4gF8XfmtGAxriYU0moaGzoCPMuT +dT4Nc4uYWaTVx5SvIo40w88h0rHg9g6ty65ur4hn7G0iuQINBF2uvsQBEADj0Cb4 +EJKqvV2SxysiW2mdDnRN0c+/aX1zB64U5VMJJl2h4crlYd71hDTXwu0Zt0MMAg+8 +HqkjAAp8szsYTSr0kz/9xSnJzuMuTWnaNe9eCii/HTzHSbkN25JJTThNX5hZ0And +wmtFfhelgZczIt9EnblCUqsyPiwL1w4vF1x7a2ftbS+k1n2Q+UIYAMZfsoYIlZzZ +cPKNJdHI0itBPaKBEoOQ3KebCyl91xdXb2elzS1NiID4Rx0dHB6C1w3v8wnBmgaC +IEv7Z7sS1Fr6NHf3pTnsBiyu/xcwE8eznhH0Bqb/i4qSCA8H6DMzwj5JiZ7yMvGd +S2PbUCLKSFaeZOcXY9LN9K9rqztxgCSVBqSZ6IB41HnjKd97hO4yMJqvrelg+KUN +31F3Jp0VBQB1a+j2LLNMJoFcrfG2Tv6R8sx1pwJWySRkdHKX6bB42Dzpd66ReZ0e +3RrcoOWe0jtXr0WwuxP94EZoyRN/iRQ6k3jR6B3n6RBg8CL0qNc2yRGsuXR9MPfE +GHUW/cYvZZbfkORYXSPt8pfdziWnr18a6oagcTGdeNyjVWWsQE39psYNM6HnhBSs +IxkMb7lXTz43MfFk55nrJqE/3LIH3AkJxKr3HOzGbUM832Iiqvg0o0nBRd9B4vLz +mkdtD+K9hCW6G7g+1hzx1Tqaf6n2Q69gv/zq+wARAQABiQI8BBgBCAAmFiEEJpWc +Iu5dRoIEnWUm4bfg8l5L9WsFAl2uvsQCGwwFCRLMAwAACgkQ4bfg8l5L9Wv7fw/+ +LDXns0Bph1X/ph2+kUBWL6KC6Fq+f636HvZfaRQMxs9vPXEBWpcTNFAmjPckmNJ+ +y/VbgeUSKR5ylol5jyTIamAR3J9xI0kQdaN7URnBmn2WLIxiRR3houLBMaJFWOnr +5/4+LJ42R4c1Le3GJOXVfiCustJf0eZTMoAyN4bQHPd2vXb9p/BwQfctTNrvv+ZT +v5D0rUAZO+kmIO9hzfVnQ/RCDPzMZRXImxWFOKczbwUq3By+ZsoPtG0IHjtbSErt +/Rl+CK0vzD008Mq2vEP/OJ1IW32+QrokoWRAam87RAe7YATOvWYsMib3m8Hnh0JY +AwDlgPzhvvymA9eVOscs608ZOtuNzDRTCKTRTv2T6Ktsh7ZBUSU0aXN2nafC+8xA +/SBAeUqH+C9+aHiP8gdAG9xq2MXs8V5oQRVoYriQZuSlM2r/u0DNq8BGcXbguHjn +opBNCzchUWR5X19V0KDN8cW9f22RcrenREJhrrn2WfVT/lvBF1E0X70H8ziMHrrf +fWOysS0CT3qM2hNCa15MH+xsu6Lq56lqzCC5UhR+vTo4SNXSScv2Z5Verf30xCnN +/JuPHYL8Y48/pX+9dYSAali4oDNQ0mI38j+A44ik8C8o6qOjk7qQwUlluCmV62ut +kR7loYvuYi9fxvlaW0kc3Bd10JCHCEmobHno5Gflr9I= +=l4ux +-END PGP PUBLIC KEY BLOCK- - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] annotated tag v3.0.0-preview-rc1 updated (5eddbb5 -> a480a27)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to annotated tag v3.0.0-preview-rc1 in repository https://gitbox.apache.org/repos/asf/spark.git. *** WARNING: tag v3.0.0-preview-rc1 was modified! *** from 5eddbb5 (commit) to a480a27 (tag) tagging 5eddbb5f1d9789696927f435c55df887e50a1389 (commit) replaces 3.0.0-preview by Xingbo Jiang on Mon Oct 28 22:36:57 2019 -0700 - Log - v3.0.0-preview-rc1 --- No new revisions were added by this update. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5eddbb5 -> b33a58c)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5eddbb5 Prepare Spark release v3.0.0-preview-rc1 add b33a58c Revert "Prepare Spark release v3.0.0-preview-rc1" No new revisions were added by this update. Summary of changes: R/pkg/R/sparkR.R | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/avro/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml| 2 +- external/kafka-0-10-token-provider/pom.xml | 2 +- external/kafka-0-10/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml| 2 +- graph/api/pom.xml | 2 +- graph/cypher/pom.xml | 2 +- graph/graph/pom.xml| 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 41 files changed, 42 insertions(+), 42 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (fb80dfe -> 5eddbb5)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fb80dfe [SPARK-28158][SQL][FOLLOWUP] HiveUserDefinedTypeSuite: don't use RandomDataGenerator to create row for UDT backed by ArrayType add 5eddbb5 Prepare Spark release v3.0.0-preview-rc1 No new revisions were added by this update. Summary of changes: R/pkg/R/sparkR.R | 2 +- assembly/pom.xml | 2 +- common/kvstore/pom.xml | 2 +- common/network-common/pom.xml | 2 +- common/network-shuffle/pom.xml | 2 +- common/network-yarn/pom.xml| 2 +- common/sketch/pom.xml | 2 +- common/tags/pom.xml| 2 +- common/unsafe/pom.xml | 2 +- core/pom.xml | 2 +- docs/_config.yml | 4 ++-- examples/pom.xml | 2 +- external/avro/pom.xml | 2 +- external/docker-integration-tests/pom.xml | 2 +- external/kafka-0-10-assembly/pom.xml | 2 +- external/kafka-0-10-sql/pom.xml| 2 +- external/kafka-0-10-token-provider/pom.xml | 2 +- external/kafka-0-10/pom.xml| 2 +- external/kinesis-asl-assembly/pom.xml | 2 +- external/kinesis-asl/pom.xml | 2 +- external/spark-ganglia-lgpl/pom.xml| 2 +- graph/api/pom.xml | 2 +- graph/cypher/pom.xml | 2 +- graph/graph/pom.xml| 2 +- graphx/pom.xml | 2 +- hadoop-cloud/pom.xml | 2 +- launcher/pom.xml | 2 +- mllib-local/pom.xml| 2 +- mllib/pom.xml | 2 +- pom.xml| 2 +- repl/pom.xml | 2 +- resource-managers/kubernetes/core/pom.xml | 2 +- resource-managers/kubernetes/integration-tests/pom.xml | 2 +- resource-managers/mesos/pom.xml| 2 +- resource-managers/yarn/pom.xml | 2 +- sql/catalyst/pom.xml | 2 +- sql/core/pom.xml | 2 +- sql/hive-thriftserver/pom.xml | 2 +- sql/hive/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/pom.xml | 2 +- 41 files changed, 42 insertions(+), 42 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r36458 - in /dev/spark/v3.0.0-preview-rc1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apac
Author: jiangxb1987 Date: Thu Oct 24 09:29:16 2019 New Revision: 36458 Log: Apache Spark v3.0.0-preview-rc1 docs [This commit notification would consist of 1877 parts, which exceeds the limit of 50 ones, so it was shortened to the summary.] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r36457 - /dev/spark/v3.0.0-preview-rc1-bin/
Author: jiangxb1987 Date: Thu Oct 24 09:12:17 2019 New Revision: 36457 Log: Apache Spark v3.0.0-preview-rc1 Added: dev/spark/v3.0.0-preview-rc1-bin/ dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-SNAPSHOT.tar.gz (with props) dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-SNAPSHOT.tar.gz.asc dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-SNAPSHOT.tar.gz.sha512 dev/spark/v3.0.0-preview-rc1-bin/pyspark-3.0.0.dev0.tar.gz (with props) dev/spark/v3.0.0-preview-rc1-bin/pyspark-3.0.0.dev0.tar.gz.asc dev/spark/v3.0.0-preview-rc1-bin/pyspark-3.0.0.dev0.tar.gz.sha512 dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-SNAPSHOT-bin-hadoop2.7.tgz (with props) dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-SNAPSHOT-bin-hadoop2.7.tgz.asc dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-SNAPSHOT-bin-hadoop2.7.tgz.sha512 dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-SNAPSHOT-bin-without-hadoop.tgz (with props) dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-SNAPSHOT-bin-without-hadoop.tgz.asc dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-SNAPSHOT-bin-without-hadoop.tgz.sha512 dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-SNAPSHOT.tgz (with props) dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-SNAPSHOT.tgz.asc dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-SNAPSHOT.tgz.sha512 Added: dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-SNAPSHOT.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-SNAPSHOT.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-SNAPSHOT.tar.gz.asc == --- dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-SNAPSHOT.tar.gz.asc (added) +++ dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-SNAPSHOT.tar.gz.asc Thu Oct 24 09:12:17 2019 @@ -0,0 +1,16 @@ +-BEGIN PGP SIGNATURE- + +iQIzBAABCgAdFiEEJpWcIu5dRoIEnWUm4bfg8l5L9WsFAl2xZxwACgkQ4bfg8l5L +9WtJng/+Kv6m1B4O1P3FYmC5x3Eq3r1zBRrp0hZdykmDDXUwstAOoZVfL6sGBdMX +2fyo7lu5GFJOuwNuRgAb4oWNyShAvtRSQsHeDLdhbNUxxiWwY3x9IZH5pI3Uk1eD +Pzm2cYzjOinGYHEeWRNLUnA5I1pIeZjmhLvEvE9u0k/oxu570pk0MaCwK7aeeXC4 +BbBa2q6BgLOy2+TPbxd88aIl4oZpTBXO+73bEGpW1K4eP3fPLKNgCDmPjYqpRC6z +1OEMdinzveKZetPNyVP25BNJWDqEalkwJEmC/QJEr3SlPPT8lN12eHomKv5DHyHJ +gCw9msdb4/RX5oM8/5n3WDjuwhuWVWiEMWlzrWTLtI9wcY4DuzPrXsHdsZPzQzgU +Jfoh5w8cH7gC0KnrGsKPAGuABh13xXra3+verH1LJmXy5xg3nUnFN6vv8qGJcBVW +r512rO2XmNgBmidQzuY+7JkdZCnDDg9Du77elzAG3QIVQDDgWGU1osV7ynmsMbiu +XR8y6XJUIcmfaU3HpjqiIn4U3XTHx676JIPs8RNvMwfzlhoG+NgqfdjGu9SpVMCK +HqrRfAY2e9XogGCPy4kx7V1Wug49kmIsELIl0dRGLQkfT4M6X21itkWo9nMM9V1M ++287rzDRzcHb6lvcx3l2Sk1DXHkcKDpn7VvpOdxPRFMlj924ZYU= +=I+VG +-END PGP SIGNATURE- Added: dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-SNAPSHOT.tar.gz.sha512 == --- dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-SNAPSHOT.tar.gz.sha512 (added) +++ dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-SNAPSHOT.tar.gz.sha512 Thu Oct 24 09:12:17 2019 @@ -0,0 +1,4 @@ +SparkR_3.0.0-SNAPSHOT.tar.gz: F8DBA96E 17909428 BD7A4B9B 2CC3EDDF C5C64440 + 4093CCFC F082154F 4C852AC8 35CE7218 CC855377 + FBF55297 61F4E098 573892F4 F0EC97D3 708FD39C + EE0BA7AD Added: dev/spark/v3.0.0-preview-rc1-bin/pyspark-3.0.0.dev0.tar.gz == Binary file - no diff available. Propchange: dev/spark/v3.0.0-preview-rc1-bin/pyspark-3.0.0.dev0.tar.gz -- svn:mime-type = application/octet-stream Added: dev/spark/v3.0.0-preview-rc1-bin/pyspark-3.0.0.dev0.tar.gz.asc == --- dev/spark/v3.0.0-preview-rc1-bin/pyspark-3.0.0.dev0.tar.gz.asc (added) +++ dev/spark/v3.0.0-preview-rc1-bin/pyspark-3.0.0.dev0.tar.gz.asc Thu Oct 24 09:12:17 2019 @@ -0,0 +1,16 @@ +-BEGIN PGP SIGNATURE- + +iQIzBAABCgAdFiEEJpWcIu5dRoIEnWUm4bfg8l5L9WsFAl2xZx0ACgkQ4bfg8l5L +9WuFSxAArhO2UwczkBcIztFyEe5dDze7wnD072OQhFvoXKf+Zkn7VOW0ae1tEWcP +wnqIei2KZK+Ro4hu0Iu8JeI5H/WAZhdkS3pxX/Cl7y9Yg2RFIfMqOtPQFdsFEPWt +uwGePyvOM4Q/jL/+GaJCwyVsf4sWfDMj0qnuOHjMNPw7zrjPGas3vQYYsFNbiblf +l0tt/kaW9Ic8PrScrn4liZxULZNZ/b5hEpCRFeyYZt1+FHUPXHI8Lvh9Rc5XcpLE +ZxNUbrSkcQnMDoHL3P5+lnQWxJabEsXnI5AnVb3fwM72EoLb2jcOGQAzBvY1cs70 +d9N6dzn+Mn9UDMD6+yXOG5ruavdfpEb7kyfa040r3EufZVOXCjdHVWJWOtEU2C9R +yimAjeq5w/s8rApYO4jMPJU7P/O3iMP3RdOIxM/MXNM1W6uCTFmi0w88eHLqWLgu +4zdbS5g4UmCo9N55l7+SsBQMIlS2HPQIRwSz7spisM5Pp1Um/W7rDxJwNEcafuZ3 +b/MezuY5k2w5dIzyeHLcf9VCIzouvKnbLHvaFtIB3QoLkCqtZ+N244we3k+T8t1o +C6McEJyqqdpTyk4nzdqf/W49MbLRWWEBldCcc1wQC0dLUFvrVHeS9zj/U71hpY53
[spark] branch master updated (70dd9c0 -> 0a70951)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 70dd9c0 [SPARK-29542][SQL][DOC] Make the descriptions of spark.sql.files.* be clearly add 0a70951 [SPARK-29499][CORE][PYSPARK] Add mapPartitionsWithIndex for RDDBarrier No new revisions were added by this update. Summary of changes: .../scala/org/apache/spark/rdd/RDDBarrier.scala| 22 +++ .../org/apache/spark/rdd/RDDBarrierSuite.scala | 9 ++ dev/sparktestsupport/modules.py| 1 + python/pyspark/rdd.py | 14 ++ .../test_group.py => tests/test_rddbarrier.py} | 32 -- 5 files changed, 64 insertions(+), 14 deletions(-) copy python/pyspark/{sql/tests/test_group.py => tests/test_rddbarrier.py} (56%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] annotated tag v3.0.0-preview-rc1 updated (f23c5d7 -> 7a3dad9)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to annotated tag v3.0.0-preview-rc1 in repository https://gitbox.apache.org/repos/asf/spark.git. *** WARNING: tag v3.0.0-preview-rc1 was modified! *** from f23c5d7 (commit) to 7a3dad9 (tag) tagging f23c5d7f6705348ddeac0b714b29374cba3a4efe (commit) replaces 3.0.0-preview by Xingbo Jiang on Wed Oct 23 09:41:49 2019 +0200 - Log - v3.0.0-preview-rc1 --- No new revisions were added by this update. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] annotated tag v3.0.0-preview-rc1 updated (322ec0b -> 7c11604)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to annotated tag v3.0.0-preview-rc1 in repository https://gitbox.apache.org/repos/asf/spark.git. *** WARNING: tag v3.0.0-preview-rc1 was modified! *** from 322ec0b (commit) to 7c11604 (tag) tagging 322ec0ba9ba75708cfe679368a43655de7b0e4f9 (commit) by Xingbo Jiang on Mon Oct 21 11:46:00 2019 +0200 - Log - v3.0.0-preview-rc1 --- No new revisions were added by this update. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] annotated tag 3.0.0-preview updated (322ec0b -> 72c391e)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to annotated tag 3.0.0-preview in repository https://gitbox.apache.org/repos/asf/spark.git. *** WARNING: tag 3.0.0-preview was modified! *** from 322ec0b (commit) to 72c391e (tag) tagging 322ec0ba9ba75708cfe679368a43655de7b0e4f9 (commit) by Xingbo Jiang on Thu Oct 17 19:04:08 2019 +0200 - Log - 3.0.0-preview --- No new revisions were added by this update. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] annotated tag 3.0.0-preview updated (322ec0b -> 72c391e)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to annotated tag 3.0.0-preview in repository https://gitbox.apache.org/repos/asf/spark.git. *** WARNING: tag 3.0.0-preview was modified! *** from 322ec0b (commit) to 72c391e (tag) tagging 322ec0ba9ba75708cfe679368a43655de7b0e4f9 (commit) by Xingbo Jiang on Thu Oct 17 19:04:08 2019 +0200 - Log - 3.0.0-preview --- No new revisions were added by this update. Summary of changes: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0-preview created (now 02c5b4f)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch branch-3.0-preview in repository https://gitbox.apache.org/repos/asf/spark.git. at 02c5b4f [SPARK-28947][K8S] Status logging not happens at an interval for liveness No new revisions were added by this update. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d2f21b0 -> 56a3beb)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d2f21b0 [SPARK-27468][CORE] Track correct storage level of RDDs and partitions add 56a3beb [SPARK-27492][DOC][FOLLOWUP] Update resource scheduling user docs No new revisions were added by this update. Summary of changes: docs/configuration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-29263][SCHEDULER] Update `availableSlots` in `resourceOffers()` before checking available slots for barrier taskSet
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 99e503c [SPARK-29263][SCHEDULER] Update `availableSlots` in `resourceOffers()` before checking available slots for barrier taskSet 99e503c is described below commit 99e503cebfd9cb19372c88b0dd70c6743f864454 Author: Juliusz Sompolski AuthorDate: Fri Sep 27 11:18:32 2019 -0700 [SPARK-29263][SCHEDULER] Update `availableSlots` in `resourceOffers()` before checking available slots for barrier taskSet ### What changes were proposed in this pull request? availableSlots are computed before the for loop looping over all TaskSets in resourceOffers. But the number of slots changes in every iteration, as in every iteration these slots are taken. The number of available slots checked by a barrier task set has therefore to be recomputed in every iteration from availableCpus. ### Why are the changes needed? Bugfix. This could make resourceOffer attempt to start a barrier task set, even though it has not enough slots available. That would then be caught by the `require` in line 519, which will throw an exception, which will get caught and ignored by Dispatcher's MessageLoop, so nothing terrible would happen, but the exception would prevent resourceOffers from considering further TaskSets. Note that launching the barrier TaskSet can still fail if other requirements are not satisfied, and still can be rolled-back by throwing exception in this `require`. Handling it more gracefully remains a TODO in SPARK-24818, but this fix at least should resolve the situation when it's unable to launch because of insufficient slots. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Added UT Closes #23375 Closes #25946 from juliuszsompolski/SPARK-29263. Authored-by: Juliusz Sompolski Signed-off-by: Xingbo Jiang (cherry picked from commit 420abb457df0f422f73bab19a6ed6d7c6bab3173) Signed-off-by: Xingbo Jiang --- .../apache/spark/scheduler/TaskSchedulerImpl.scala | 2 +- .../org/apache/spark/scheduler/FakeTask.scala | 36 +++ .../spark/scheduler/TaskSchedulerImplSuite.scala | 51 -- 3 files changed, 65 insertions(+), 24 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala index e194b79..38dbbe7 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala @@ -391,7 +391,6 @@ private[spark] class TaskSchedulerImpl( // Build a list of tasks to assign to each worker. val tasks = shuffledOffers.map(o => new ArrayBuffer[TaskDescription](o.cores / CPUS_PER_TASK)) val availableCpus = shuffledOffers.map(o => o.cores).toArray -val availableSlots = shuffledOffers.map(o => o.cores / CPUS_PER_TASK).sum val sortedTaskSets = rootPool.getSortedTaskSetQueue for (taskSet <- sortedTaskSets) { logDebug("parentName: %s, name: %s, runningTasks: %s".format( @@ -405,6 +404,7 @@ private[spark] class TaskSchedulerImpl( // of locality levels so that it gets a chance to launch local tasks on all of them. // NOTE: the preferredLocality order: PROCESS_LOCAL, NODE_LOCAL, NO_PREF, RACK_LOCAL, ANY for (taskSet <- sortedTaskSets) { + val availableSlots = availableCpus.map(c => c / CPUS_PER_TASK).sum // Skip the barrier taskSet if the available slots are less than the number of pending tasks. if (taskSet.isBarrier && availableSlots < taskSet.numTasks) { // Skip the launch process. diff --git a/core/src/test/scala/org/apache/spark/scheduler/FakeTask.scala b/core/src/test/scala/org/apache/spark/scheduler/FakeTask.scala index b29d32f..abc8841 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/FakeTask.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/FakeTask.scala @@ -42,15 +42,23 @@ object FakeTask { * locations for each task (given as varargs) if this sequence is not empty. */ def createTaskSet(numTasks: Int, prefLocs: Seq[TaskLocation]*): TaskSet = { -createTaskSet(numTasks, stageAttemptId = 0, prefLocs: _*) +createTaskSet(numTasks, stageId = 0, stageAttemptId = 0, priority = 0, prefLocs: _*) } - def createTaskSet(numTasks: Int, stageAttemptId: Int, prefLocs: Seq[TaskLocation]*): TaskSet = { -createTaskSet(numTasks, stageId = 0, stageAttemptId, prefLocs: _*) + def createTaskSet( + numTasks: Int, + stageId: Int, + stageAttemptId: Int, + prefLocs: Seq[TaskLocation]*): TaskSet = { +
[spark] branch master updated (fda0e6e -> 420abb4)
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fda0e6e [SPARK-29240][PYTHON] Pass Py4J column instance to support PySpark column in element_at function add 420abb4 [SPARK-29263][SCHEDULER] Update `availableSlots` in `resourceOffers()` before checking available slots for barrier taskSet No new revisions were added by this update. Summary of changes: .../apache/spark/scheduler/TaskSchedulerImpl.scala | 2 +- .../org/apache/spark/scheduler/FakeTask.scala | 36 +++ .../spark/scheduler/TaskSchedulerImplSuite.scala | 51 -- 3 files changed, 65 insertions(+), 24 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-28561][WEBUI] DAG viz for barrier-execution mode
This is an automated email from the ASF dual-hosted git repository. jiangxb1987 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 247bebc [SPARK-28561][WEBUI] DAG viz for barrier-execution mode 247bebc is described below commit 247bebcf94df77883ac245aea63e7e871fc7aa44 Author: Kousuke Saruta AuthorDate: Mon Aug 12 22:38:10 2019 -0700 [SPARK-28561][WEBUI] DAG viz for barrier-execution mode ## What changes were proposed in this pull request? In the current UI, we cannot identify which RDDs are barrier. Visualizing it will make easy to debug. Following images are shown after this change. ![Screenshot from 2019-07-30 16-30-35](https://user-images.githubusercontent.com/4736016/62110508-83cec100-b2e9-11e9-83b9-bc2e485a4cbe.png) ![Screenshot from 2019-07-30 16-31-09](https://user-images.githubusercontent.com/4736016/62110509-83cec100-b2e9-11e9-9e2e-47c4dae23a52.png) The boxes in pale green mean barrier (We might need to discuss which color is proper). ## How was this patch tested? Tested manually. The images above are shown by following operations. ``` val rdd1 = sc.parallelize(1 to 10) val rdd2 = sc.parallelize(1 to 10) val rdd3 = rdd1.zip(rdd2).barrier.mapPartitions(identity(_)) val rdd4 = rdd3.map(identity(_)) val rdd5 = rdd4.reduceByKey(_+_) rdd5.collect ``` Closes #25296 from sarutak/barrierexec-dagviz. Authored-by: Kousuke Saruta Signed-off-by: Xingbo Jiang --- .../org/apache/spark/ui/static/spark-dag-viz.css | 12 + .../org/apache/spark/ui/static/spark-dag-viz.js| 6 + .../scala/org/apache/spark/status/storeTypes.scala | 4 ++- .../scala/org/apache/spark/storage/RDDInfo.scala | 3 ++- .../main/scala/org/apache/spark/ui/UIUtils.scala | 3 +++ .../apache/spark/ui/scope/RDDOperationGraph.scala | 30 +- .../scala/org/apache/spark/util/JsonProtocol.scala | 4 ++- .../spark/status/AppStatusListenerSuite.scala | 6 ++--- .../org/apache/spark/storage/StorageSuite.scala| 4 +-- .../spark/ui/scope/RDDOperationGraphSuite.scala| 10 .../org/apache/spark/util/JsonProtocolSuite.scala | 7 ++--- 11 files changed, 67 insertions(+), 22 deletions(-) diff --git a/core/src/main/resources/org/apache/spark/ui/static/spark-dag-viz.css b/core/src/main/resources/org/apache/spark/ui/static/spark-dag-viz.css index 9cc5c79..1fbc90b 100644 --- a/core/src/main/resources/org/apache/spark/ui/static/spark-dag-viz.css +++ b/core/src/main/resources/org/apache/spark/ui/static/spark-dag-viz.css @@ -89,6 +89,12 @@ stroke-width: 2px; } +#dag-viz-graph svg.job g.cluster.barrier rect { + fill: #B4E9E2; + stroke: #32DBC6; + stroke-width: 2px; +} + /* Stage page specific styles */ #dag-viz-graph svg.stage g.cluster rect { @@ -123,6 +129,12 @@ stroke-width: 2px; } +#dag-viz-graph svg.stage g.cluster.barrier rect { + fill: #84E9E2; + stroke: #32DBC6; + stroke-width: 2px; +} + .tooltip-inner { white-space: pre-wrap; } diff --git a/core/src/main/resources/org/apache/spark/ui/static/spark-dag-viz.js b/core/src/main/resources/org/apache/spark/ui/static/spark-dag-viz.js index cf508ac..035d72f 100644 --- a/core/src/main/resources/org/apache/spark/ui/static/spark-dag-viz.js +++ b/core/src/main/resources/org/apache/spark/ui/static/spark-dag-viz.js @@ -172,6 +172,12 @@ function renderDagViz(forJob) { svg.selectAll("g." + nodeId).classed("cached", true); }); + metadataContainer().selectAll(".barrier-rdd").each(function() { +var rddId = d3.select(this).text().trim() +var clusterId = VizConstants.clusterPrefix + rddId +svg.selectAll("g." + clusterId).classed("barrier", true) + }); + resizeSvg(svg); interpretLineBreak(svg); } diff --git a/core/src/main/scala/org/apache/spark/status/storeTypes.scala b/core/src/main/scala/org/apache/spark/status/storeTypes.scala index eea47b3..9da5bea 100644 --- a/core/src/main/scala/org/apache/spark/status/storeTypes.scala +++ b/core/src/main/scala/org/apache/spark/status/storeTypes.scala @@ -402,7 +402,9 @@ private[spark] class RDDOperationClusterWrapper( val childClusters: Seq[RDDOperationClusterWrapper]) { def toRDDOperationCluster(): RDDOperationCluster = { -val cluster = new RDDOperationCluster(id, name) +val isBarrier = childNodes.exists(_.barrier) +val name = if (isBarrier) this.name + "\n(barrier mode)" else this.name +val cluster = new RDDOperationCluster(id, isBarrier, name) childNodes.foreach(cluster.attachChildNode) childClusters.foreach { child => cluster.attachChildCluster(child.toRDDOperationCluster()) diff --git a/core/src/main/scala/org/apache/spark/storage/RDDInfo.scala b/core/src/main/scala/org/apache/spark/sto