from:"jiangxb1987"

(spark) branch master updated: [SPARK-48628][CORE] Add task peak on/off heap memory metrics

2024-08-07 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 717a6da11d0b [SPARK-48628][CORE] Add task peak on/off heap memory 
metrics
717a6da11d0b is described below

commit 717a6da11d0b3d0383141f9682c14201aff821c0
Author: Ziqi Liu 
AuthorDate: Wed Aug 7 11:52:13 2024 -0700

[SPARK-48628][CORE] Add task peak on/off heap memory metrics

### What changes were proposed in this pull request?

Add task on/off heap execution memory in `TaskMetrics`, tracked in 
`TaskMemoryManager`, **assuming `acquireExecutionMemory` is the only one narrow 
waist for acquiring execution memory.**

### Why are the changes needed?

Currently there is no task on/off heap execution memory metrics.

There is a 
[peakExecutionMemory](https://github.com/apache/spark/blob/3cd35f8cb6462051c621cf49de54b9c5692aae1d/core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala#L114)
  metrics, however, the semantic is a confusing: it only cover the execution 
memory used by shuffle/join/aggregate/sort, which is accumulated in specific 
operators and thus not really reflect the real execution memory.

Therefore it's necessary to add these two metrics.

Also I created two followup sub tickets:

- https://issues.apache.org/jira/browse/SPARK-48788 : accumulate task 
metrics in stage, and display in Spark UI
- https://issues.apache.org/jira/browse/SPARK-48789 : deprecate 
`peakExecutionMemory` once we have replacement for it.

The ultimate goal is to have these two metrics ready (as accumulated stage 
metrics in Spark UI as well) and deprecate `peakExecutionMemory`.

### Does this PR introduce _any_ user-facing change?

Supposedly no. But two followup sub tickets will have user-facing change: 
new metrics exposed to Spark UI, and old metrics deprecation.

### How was this patch tested?
new test

### Was this patch authored or co-authored using generative AI tooling?
NO

Closes #47192 from liuzqt/SPARK-48628.

Authored-by: Ziqi Liu 
Signed-off-by: Xingbo Jiang 
---
 .../org/apache/spark/memory/TaskMemoryManager.java | 36 +++
 .../org/apache/spark/InternalAccumulator.scala |  2 +
 .../scala/org/apache/spark/executor/Executor.scala |  2 +
 .../org/apache/spark/executor/TaskMetrics.scala| 21 +++
 .../scala/org/apache/spark/util/JsonProtocol.scala |  6 ++
 .../apache/spark/memory/MemoryManagerSuite.scala   | 18 ++
 .../org/apache/spark/util/JsonProtocolSuite.scala  | 72 ++
 7 files changed, 132 insertions(+), 25 deletions(-)

diff --git a/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java 
b/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java
index fe798e40a6ad..541ed7e2350b 100644
--- a/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java
+++ b/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java
@@ -122,6 +122,16 @@ public class TaskMemoryManager {
*/
   private volatile long acquiredButNotUsed = 0L;
 
+  /**
+   * Peak off heap memory usage by this task.
+   */
+  private volatile long peakOffHeapMemory = 0L;
+
+  /**
+   * Peak on heap memory usage by this task.
+   */
+  private volatile long peakOnHeapMemory = 0L;
+
   /**
* Construct a new TaskMemoryManager.
*/
@@ -202,6 +212,17 @@ public class TaskMemoryManager {
 logger.debug("Task {} acquired {} for {}", taskAttemptId, 
Utils.bytesToString(got),
   requestingConsumer);
   }
+
+  // Consumer will update its used memory after acquireExecutionMemory, so 
we need to add `got`
+  // to compute current peak
+  long currentPeak = consumers.stream().filter(c -> c.getMode() == mode)
+.mapToLong(MemoryConsumer::getUsed).sum() + got;
+  if (mode == MemoryMode.OFF_HEAP) {
+peakOffHeapMemory = Math.max(peakOffHeapMemory, currentPeak);
+  } else {
+peakOnHeapMemory = Math.max(peakOnHeapMemory, currentPeak);
+  }
+
   return got;
 }
   }
@@ -507,4 +528,19 @@ public class TaskMemoryManager {
   public MemoryMode getTungstenMemoryMode() {
 return tungstenMemoryMode;
   }
+
+  /**
+   * Returns peak task-level off-heap memory usage in bytes.
+   *
+   */
+  public long getPeakOnHeapExecutionMemory() {
+return peakOnHeapMemory;
+  }
+
+  /**
+   * Returns peak task-level on-heap memory usage in bytes.
+   */
+  public long getPeakOffHeapExecutionMemory() {
+return peakOffHeapMemory;
+  }
 }
diff --git a/core/src/main/scala/org/apache/spark/InternalAccumulator.scala 
b/core/src/main/scala/org/apache/spark/InternalAccumulator.scala
index ef4609e6d645..505634d5bb04 100644
--- a/core/src/main/scala/org/apache/spark/InternalAccumulator.scala
+++ b/core/src/main/scal

[spark] branch branch-3.4 updated: [SPARK-43043][CORE] Improve the performance of MapOutputTracker.updateMapOutput

2023-05-16 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new f68ece9e607 [SPARK-43043][CORE] Improve the performance of 
MapOutputTracker.updateMapOutput
f68ece9e607 is described below

commit f68ece9e6074cecdaf74ad9b39eae3c7dc2cfaf1
Author: Xingbo Jiang 
AuthorDate: Tue May 16 11:34:30 2023 -0700

[SPARK-43043][CORE] Improve the performance of 
MapOutputTracker.updateMapOutput

### What changes were proposed in this pull request?

The PR changes the implementation of MapOutputTracker.updateMapOutput() to 
search for the MapStatus under the help of a mapping from mapId to mapIndex, 
previously it was performing a linear search, which would become performance 
bottleneck if a large proportion of all blocks in the map are migrated.

### Why are the changes needed?

To avoid performance bottleneck when block decommission is enabled and a 
lot of blocks are migrated within a short time window.

### Does this PR introduce _any_ user-facing change?

No, it's pure performance improvement.

### How was this patch tested?

Manually test.

Closes #40690 from jiangxb1987/SPARK-43043.

Lead-authored-by: Xingbo Jiang 
Co-authored-by: Jiang Xingbo 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit 66a2eb8f8957c22c69519b39be59beaaf931822b)
Signed-off-by: Xingbo Jiang 
---
 .../scala/org/apache/spark/MapOutputTracker.scala  | 26 +-
 .../apache/spark/util/collection/OpenHashMap.scala | 18 +++
 .../spark/util/collection/OpenHashMapSuite.scala   | 18 +++
 3 files changed, 56 insertions(+), 6 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/MapOutputTracker.scala 
b/core/src/main/scala/org/apache/spark/MapOutputTracker.scala
index fade0b86dd8..2dd3a903ee2 100644
--- a/core/src/main/scala/org/apache/spark/MapOutputTracker.scala
+++ b/core/src/main/scala/org/apache/spark/MapOutputTracker.scala
@@ -42,6 +42,7 @@ import org.apache.spark.scheduler.{MapStatus, MergeStatus, 
ShuffleOutputStatus}
 import org.apache.spark.shuffle.MetadataFetchFailedException
 import org.apache.spark.storage.{BlockId, BlockManagerId, ShuffleBlockId, 
ShuffleMergedBlockId}
 import org.apache.spark.util._
+import org.apache.spark.util.collection.OpenHashMap
 import org.apache.spark.util.io.{ChunkedByteBuffer, 
ChunkedByteBufferOutputStream}
 
 /**
@@ -147,6 +148,12 @@ private class ShuffleStatus(
 
   private[this] var shufflePushMergerLocations: Seq[BlockManagerId] = Seq.empty
 
+  /**
+   * Mapping from a mapId to the mapIndex, this is required to reduce the 
searching overhead within
+   * the function updateMapOutput(mapId, bmAddress).
+   */
+  private[this] val mapIdToMapIndex = new OpenHashMap[Long, Int]()
+
   /**
* Register a map output. If there is already a registered location for the 
map output then it
* will be replaced by the new location.
@@ -157,6 +164,14 @@ private class ShuffleStatus(
   invalidateSerializedMapOutputStatusCache()
 }
 mapStatuses(mapIndex) = status
+mapIdToMapIndex(status.mapId) = mapIndex
+  }
+
+  /**
+   * Get the map output that corresponding to a given mapId.
+   */
+  def getMapStatus(mapId: Long): Option[MapStatus] = withReadLock {
+mapIdToMapIndex.get(mapId).map(mapStatuses(_))
   }
 
   /**
@@ -164,15 +179,16 @@ private class ShuffleStatus(
*/
   def updateMapOutput(mapId: Long, bmAddress: BlockManagerId): Unit = 
withWriteLock {
 try {
-  val mapStatusOpt = mapStatuses.find(x => x != null && x.mapId == mapId)
+  val mapIndex = mapIdToMapIndex.get(mapId)
+  val mapStatusOpt = mapIndex.map(mapStatuses(_)).flatMap(Option(_))
   mapStatusOpt match {
 case Some(mapStatus) =>
   logInfo(s"Updating map output for ${mapId} to ${bmAddress}")
   mapStatus.updateLocation(bmAddress)
   invalidateSerializedMapOutputStatusCache()
 case None =>
-  val index = mapStatusesDeleted.indexWhere(x => x != null && x.mapId 
== mapId)
-  if (index >= 0 && mapStatuses(index) == null) {
+  if (mapIndex.map(mapStatusesDeleted).exists(_.mapId == mapId)) {
+val index = mapIndex.get
 val mapStatus = mapStatusesDeleted(index)
 mapStatus.updateLocation(bmAddress)
 mapStatuses(index) = mapStatus
@@ -1133,9 +1149,7 @@ private[spark] class MapOutputTrackerMaster(
*/
   def getMapOutputLocation(shuffleId: Int, mapId: Long): 
Option[BlockManagerId] = {
 shuffleStatuses.get(shuffleId).flatMap { shuffleStatus =>
-  shuffleStatus.withMapStatuses { mapStatues =>
-mapStatues.filter(_ != null).find(_.mapId == mapId).map(_.location)
-

[spark] branch master updated: [SPARK-43043][CORE] Improve the performance of MapOutputTracker.updateMapOutput

2023-05-16 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 66a2eb8f895 [SPARK-43043][CORE] Improve the performance of 
MapOutputTracker.updateMapOutput
66a2eb8f895 is described below

commit 66a2eb8f8957c22c69519b39be59beaaf931822b
Author: Xingbo Jiang 
AuthorDate: Tue May 16 11:34:30 2023 -0700

[SPARK-43043][CORE] Improve the performance of 
MapOutputTracker.updateMapOutput

### What changes were proposed in this pull request?

The PR changes the implementation of MapOutputTracker.updateMapOutput() to 
search for the MapStatus under the help of a mapping from mapId to mapIndex, 
previously it was performing a linear search, which would become performance 
bottleneck if a large proportion of all blocks in the map are migrated.

### Why are the changes needed?

To avoid performance bottleneck when block decommission is enabled and a 
lot of blocks are migrated within a short time window.

### Does this PR introduce _any_ user-facing change?

No, it's pure performance improvement.

### How was this patch tested?

Manually test.

Closes #40690 from jiangxb1987/SPARK-43043.

Lead-authored-by: Xingbo Jiang 
Co-authored-by: Jiang Xingbo 
Signed-off-by: Xingbo Jiang 
---
 .../scala/org/apache/spark/MapOutputTracker.scala  | 26 +-
 .../apache/spark/util/collection/OpenHashMap.scala | 18 +++
 .../spark/util/collection/OpenHashMapSuite.scala   | 18 +++
 3 files changed, 56 insertions(+), 6 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/MapOutputTracker.scala 
b/core/src/main/scala/org/apache/spark/MapOutputTracker.scala
index 5ad62159d24..9a5cf1da9e4 100644
--- a/core/src/main/scala/org/apache/spark/MapOutputTracker.scala
+++ b/core/src/main/scala/org/apache/spark/MapOutputTracker.scala
@@ -42,6 +42,7 @@ import org.apache.spark.scheduler.{MapStatus, MergeStatus, 
ShuffleOutputStatus}
 import org.apache.spark.shuffle.MetadataFetchFailedException
 import org.apache.spark.storage.{BlockId, BlockManagerId, ShuffleBlockId, 
ShuffleMergedBlockId}
 import org.apache.spark.util._
+import org.apache.spark.util.collection.OpenHashMap
 import org.apache.spark.util.io.{ChunkedByteBuffer, 
ChunkedByteBufferOutputStream}
 
 /**
@@ -147,6 +148,12 @@ private class ShuffleStatus(
 
   private[this] var shufflePushMergerLocations: Seq[BlockManagerId] = Seq.empty
 
+  /**
+   * Mapping from a mapId to the mapIndex, this is required to reduce the 
searching overhead within
+   * the function updateMapOutput(mapId, bmAddress).
+   */
+  private[this] val mapIdToMapIndex = new OpenHashMap[Long, Int]()
+
   /**
* Register a map output. If there is already a registered location for the 
map output then it
* will be replaced by the new location.
@@ -157,6 +164,14 @@ private class ShuffleStatus(
   invalidateSerializedMapOutputStatusCache()
 }
 mapStatuses(mapIndex) = status
+mapIdToMapIndex(status.mapId) = mapIndex
+  }
+
+  /**
+   * Get the map output that corresponding to a given mapId.
+   */
+  def getMapStatus(mapId: Long): Option[MapStatus] = withReadLock {
+mapIdToMapIndex.get(mapId).map(mapStatuses(_))
   }
 
   /**
@@ -164,15 +179,16 @@ private class ShuffleStatus(
*/
   def updateMapOutput(mapId: Long, bmAddress: BlockManagerId): Unit = 
withWriteLock {
 try {
-  val mapStatusOpt = mapStatuses.find(x => x != null && x.mapId == mapId)
+  val mapIndex = mapIdToMapIndex.get(mapId)
+  val mapStatusOpt = mapIndex.map(mapStatuses(_)).flatMap(Option(_))
   mapStatusOpt match {
 case Some(mapStatus) =>
   logInfo(s"Updating map output for ${mapId} to ${bmAddress}")
   mapStatus.updateLocation(bmAddress)
   invalidateSerializedMapOutputStatusCache()
 case None =>
-  val index = mapStatusesDeleted.indexWhere(x => x != null && x.mapId 
== mapId)
-  if (index >= 0 && mapStatuses(index) == null) {
+  if (mapIndex.map(mapStatusesDeleted).exists(_.mapId == mapId)) {
+val index = mapIndex.get
 val mapStatus = mapStatusesDeleted(index)
 mapStatus.updateLocation(bmAddress)
 mapStatuses(index) = mapStatus
@@ -1137,9 +1153,7 @@ private[spark] class MapOutputTrackerMaster(
*/
   def getMapOutputLocation(shuffleId: Int, mapId: Long): 
Option[BlockManagerId] = {
 shuffleStatuses.get(shuffleId).flatMap { shuffleStatus =>
-  shuffleStatus.withMapStatuses { mapStatues =>
-mapStatues.filter(_ != null).find(_.mapId == mapId).map(_.location)
-  }
+  shuffleStatus.getMapStatus(mapId).map(_.location)
 }
   }
 
diff --git 
a/core/src/main/scala/org/apache/spark/

[spark] branch master updated (8fef5bb -> bcaab62)

2022-01-24 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8fef5bb  [SPARK-37979][SQL] Switch to more generic error classes in 
AES functions
 add bcaab62  [SPARK-37891][CORE] Add scalastyle check to disable 
scala.concurrent.ExecutionContext.Implicits.global

No new revisions were added by this update.

Summary of changes:
 .../src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala | 2 ++
 .../scala/org/apache/spark/deploy/rest/RestSubmissionClient.scala   | 2 ++
 core/src/main/scala/org/apache/spark/storage/FallbackStorage.scala  | 2 ++
 core/src/test/scala/org/apache/spark/JobCancellationSuite.scala | 2 ++
 core/src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala | 2 ++
 .../org/apache/spark/scheduler/SchedulerIntegrationSuite.scala  | 4 
 .../scala/org/apache/spark/serializer/KryoSerializerBenchmark.scala | 2 ++
 .../src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala | 2 ++
 .../org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala | 2 ++
 scalastyle-config.xml   | 6 ++
 .../org/apache/spark/sql/streaming/StreamingQueryManagerSuite.scala | 4 ++--
 .../scala/org/apache/spark/streaming/StreamingListenerSuite.scala   | 2 ++
 12 files changed, 30 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (678592a -> 5c8a141)

2021-05-28 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 678592a  [SPARK-35559][TEST] Speed up one test in 
AdaptiveQueryExecSuite
 add 5c8a141  [SPARK-35538][SQL] Migrate transformAllExpressions call sites 
to use transformAllExpressionsWithPruning

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/expressions/DynamicPruning.scala| 3 ++-
 .../spark/sql/catalyst/expressions/complexTypeCreator.scala   | 4 +++-
 .../spark/sql/catalyst/expressions/datetimeExpressions.scala  | 2 ++
 .../org/apache/spark/sql/catalyst/expressions/subquery.scala  | 4 ++--
 .../org/apache/spark/sql/catalyst/optimizer/UpdateFields.scala| 7 ---
 .../scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala | 4 +++-
 .../scala/org/apache/spark/sql/catalyst/trees/TreePatterns.scala  | 8 
 .../main/scala/org/apache/spark/sql/execution/CacheManager.scala  | 3 ++-
 .../execution/adaptive/PlanAdaptiveDynamicPruningFilters.scala| 4 +++-
 .../spark/sql/execution/adaptive/ReuseAdaptiveSubquery.scala  | 3 ++-
 .../scala/org/apache/spark/sql/execution/exchange/Exchange.scala  | 8 +---
 .../spark/sql/execution/streaming/IncrementalExecution.scala  | 4 +++-
 .../spark/sql/execution/streaming/MicroBatchExecution.scala   | 4 +++-
 .../sql/execution/streaming/continuous/ContinuousExecution.scala  | 3 ++-
 .../src/main/scala/org/apache/spark/sql/execution/subquery.scala  | 5 +++--
 15 files changed, 47 insertions(+), 19 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a7d0d35 -> 5e89fbe)

2020-06-15 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a7d0d35  [SPARK-31994][K8S] Docker image should use `https` urls for 
only deb.debian.org mirrors
 add 5e89fbe  [SPARK-31824][CORE][TESTS] DAGSchedulerSuite: Improve and 
reuse completeShuffleMapStageSuccessfully

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/scheduler/DAGSchedulerSuite.scala | 197 -
 1 file changed, 77 insertions(+), 120 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a7d0d35 -> 5e89fbe)

2020-06-15 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a7d0d35  [SPARK-31994][K8S] Docker image should use `https` urls for 
only deb.debian.org mirrors
 add 5e89fbe  [SPARK-31824][CORE][TESTS] DAGSchedulerSuite: Improve and 
reuse completeShuffleMapStageSuccessfully

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/scheduler/DAGSchedulerSuite.scala | 197 -
 1 file changed, 77 insertions(+), 120 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a7d0d35 -> 5e89fbe)

2020-06-15 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a7d0d35  [SPARK-31994][K8S] Docker image should use `https` urls for 
only deb.debian.org mirrors
 add 5e89fbe  [SPARK-31824][CORE][TESTS] DAGSchedulerSuite: Improve and 
reuse completeShuffleMapStageSuccessfully

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/scheduler/DAGSchedulerSuite.scala | 197 -
 1 file changed, 77 insertions(+), 120 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31764][CORE][3.0] JsonProtocol doesn't write RDDInfo#isBarrier

2020-06-01 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new b6c8366  [SPARK-31764][CORE][3.0] JsonProtocol doesn't write 
RDDInfo#isBarrier
b6c8366 is described below

commit b6c8366cc58b7d5e35ec2a31532ae7ee22275454
Author: Kousuke Saruta 
AuthorDate: Mon Jun 1 14:33:35 2020 -0700

[SPARK-31764][CORE][3.0] JsonProtocol doesn't write RDDInfo#isBarrier

### What changes were proposed in this pull request?

This PR backports the change of #28583 (SPARK-31764) to branch-3.0, which 
changes JsonProtocol to write RDDInfos#isBarrier.

### Why are the changes needed?

JsonProtocol reads RDDInfos#isBarrier but not write it so it's a bug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I added a testcase.

Closes #28660 from sarutak/SPARK-31764-branch-3.0.

Authored-by: Kousuke Saruta 
Signed-off-by: Xingbo Jiang 
---
 .../scala/org/apache/spark/util/JsonProtocol.scala |  1 +
 .../scheduler/EventLoggingListenerSuite.scala  | 44 ++
 .../org/apache/spark/util/JsonProtocolSuite.scala  | 11 ++
 3 files changed, 56 insertions(+)

diff --git a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala 
b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
index aee9862..f445fd4 100644
--- a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
+++ b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
@@ -475,6 +475,7 @@ private[spark] object JsonProtocol {
 ("Callsite" -> rddInfo.callSite) ~
 ("Parent IDs" -> parentIds) ~
 ("Storage Level" -> storageLevel) ~
+("Barrier" -> rddInfo.isBarrier) ~
 ("Number of Partitions" -> rddInfo.numPartitions) ~
 ("Number of Cached Partitions" -> rddInfo.numCachedPartitions) ~
 ("Memory Size" -> rddInfo.memSize) ~
diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
 
b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
index 2869240..046564d 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
@@ -36,6 +36,7 @@ import org.apache.spark.deploy.history.{EventLogFileReader, 
SingleEventLogFileWr
 import org.apache.spark.deploy.history.EventLogTestHelper._
 import org.apache.spark.executor.{ExecutorMetrics, TaskMetrics}
 import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config.{EVENT_LOG_DIR, EVENT_LOG_ENABLED}
 import org.apache.spark.io._
 import org.apache.spark.metrics.{ExecutorMetricType, MetricsSystem}
 import org.apache.spark.scheduler.cluster.ExecutorInfo
@@ -99,6 +100,49 @@ class EventLoggingListenerSuite extends SparkFunSuite with 
LocalSparkContext wit
 testStageExecutorMetricsEventLogging()
   }
 
+  test("SPARK-31764: isBarrier should be logged in event log") {
+val conf = new SparkConf()
+conf.set(EVENT_LOG_ENABLED, true)
+conf.set(EVENT_LOG_DIR, testDirPath.toString)
+val sc = new SparkContext("local", "test-SPARK-31764", conf)
+val appId = sc.applicationId
+
+sc.parallelize(1 to 10)
+  .barrier()
+  .mapPartitions(_.map(elem => (elem, elem)))
+  .filter(elem => elem._1 % 2 == 0)
+  .reduceByKey(_ + _)
+  .collect
+sc.stop()
+
+val eventLogStream = EventLogFileReader.openEventLog(new Path(testDirPath, 
appId), fileSystem)
+val events = readLines(eventLogStream).map(line => 
JsonProtocol.sparkEventFromJson(parse(line)))
+val jobStartEvents = events
+  .filter(event => event.isInstanceOf[SparkListenerJobStart])
+  .map(_.asInstanceOf[SparkListenerJobStart])
+
+assert(jobStartEvents.size === 1)
+val stageInfos = jobStartEvents.head.stageInfos
+assert(stageInfos.size === 2)
+
+val stage0 = stageInfos(0)
+val rddInfosInStage0 = stage0.rddInfos
+assert(rddInfosInStage0.size === 3)
+val sortedRddInfosInStage0 = rddInfosInStage0.sortBy(_.scope.get.name)
+assert(sortedRddInfosInStage0(0).scope.get.name === "filter")
+assert(sortedRddInfosInStage0(0).isBarrier === true)
+assert(sortedRddInfosInStage0(1).scope.get.name === "mapPartitions")
+assert(sortedRddInfosInStage0(1).isBarrier === true)
+assert(sortedRddInfosInStage0(2).scope.get.name === "parallelize")
+assert(sortedRddInfosInStage0(2).isBarrier === false)
+
+val stage1 = stageInfos(1)
+val rddInfosInStage1 = stage1.rddInfos
+assert(rddInfosInStage1.size === 1)
+assert(

[spark] branch branch-3.0 updated: [SPARK-31764][CORE][3.0] JsonProtocol doesn't write RDDInfo#isBarrier

2020-06-01 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new b6c8366  [SPARK-31764][CORE][3.0] JsonProtocol doesn't write 
RDDInfo#isBarrier
b6c8366 is described below

commit b6c8366cc58b7d5e35ec2a31532ae7ee22275454
Author: Kousuke Saruta 
AuthorDate: Mon Jun 1 14:33:35 2020 -0700

[SPARK-31764][CORE][3.0] JsonProtocol doesn't write RDDInfo#isBarrier

### What changes were proposed in this pull request?

This PR backports the change of #28583 (SPARK-31764) to branch-3.0, which 
changes JsonProtocol to write RDDInfos#isBarrier.

### Why are the changes needed?

JsonProtocol reads RDDInfos#isBarrier but not write it so it's a bug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I added a testcase.

Closes #28660 from sarutak/SPARK-31764-branch-3.0.

Authored-by: Kousuke Saruta 
Signed-off-by: Xingbo Jiang 
---
 .../scala/org/apache/spark/util/JsonProtocol.scala |  1 +
 .../scheduler/EventLoggingListenerSuite.scala  | 44 ++
 .../org/apache/spark/util/JsonProtocolSuite.scala  | 11 ++
 3 files changed, 56 insertions(+)

diff --git a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala 
b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
index aee9862..f445fd4 100644
--- a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
+++ b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
@@ -475,6 +475,7 @@ private[spark] object JsonProtocol {
 ("Callsite" -> rddInfo.callSite) ~
 ("Parent IDs" -> parentIds) ~
 ("Storage Level" -> storageLevel) ~
+("Barrier" -> rddInfo.isBarrier) ~
 ("Number of Partitions" -> rddInfo.numPartitions) ~
 ("Number of Cached Partitions" -> rddInfo.numCachedPartitions) ~
 ("Memory Size" -> rddInfo.memSize) ~
diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
 
b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
index 2869240..046564d 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
@@ -36,6 +36,7 @@ import org.apache.spark.deploy.history.{EventLogFileReader, 
SingleEventLogFileWr
 import org.apache.spark.deploy.history.EventLogTestHelper._
 import org.apache.spark.executor.{ExecutorMetrics, TaskMetrics}
 import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config.{EVENT_LOG_DIR, EVENT_LOG_ENABLED}
 import org.apache.spark.io._
 import org.apache.spark.metrics.{ExecutorMetricType, MetricsSystem}
 import org.apache.spark.scheduler.cluster.ExecutorInfo
@@ -99,6 +100,49 @@ class EventLoggingListenerSuite extends SparkFunSuite with 
LocalSparkContext wit
 testStageExecutorMetricsEventLogging()
   }
 
+  test("SPARK-31764: isBarrier should be logged in event log") {
+val conf = new SparkConf()
+conf.set(EVENT_LOG_ENABLED, true)
+conf.set(EVENT_LOG_DIR, testDirPath.toString)
+val sc = new SparkContext("local", "test-SPARK-31764", conf)
+val appId = sc.applicationId
+
+sc.parallelize(1 to 10)
+  .barrier()
+  .mapPartitions(_.map(elem => (elem, elem)))
+  .filter(elem => elem._1 % 2 == 0)
+  .reduceByKey(_ + _)
+  .collect
+sc.stop()
+
+val eventLogStream = EventLogFileReader.openEventLog(new Path(testDirPath, 
appId), fileSystem)
+val events = readLines(eventLogStream).map(line => 
JsonProtocol.sparkEventFromJson(parse(line)))
+val jobStartEvents = events
+  .filter(event => event.isInstanceOf[SparkListenerJobStart])
+  .map(_.asInstanceOf[SparkListenerJobStart])
+
+assert(jobStartEvents.size === 1)
+val stageInfos = jobStartEvents.head.stageInfos
+assert(stageInfos.size === 2)
+
+val stage0 = stageInfos(0)
+val rddInfosInStage0 = stage0.rddInfos
+assert(rddInfosInStage0.size === 3)
+val sortedRddInfosInStage0 = rddInfosInStage0.sortBy(_.scope.get.name)
+assert(sortedRddInfosInStage0(0).scope.get.name === "filter")
+assert(sortedRddInfosInStage0(0).isBarrier === true)
+assert(sortedRddInfosInStage0(1).scope.get.name === "mapPartitions")
+assert(sortedRddInfosInStage0(1).isBarrier === true)
+assert(sortedRddInfosInStage0(2).scope.get.name === "parallelize")
+assert(sortedRddInfosInStage0(2).isBarrier === false)
+
+val stage1 = stageInfos(1)
+val rddInfosInStage1 = stage1.rddInfos
+assert(rddInfosInStage1.size === 1)
+assert(

[spark] branch branch-3.0 updated (805884a -> 76a0418)

2020-06-01 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 805884a  [SPARK-31885][SQL] Fix filter push down for old millis 
timestamps to Parquet
 add 76a0418  Revert "[SPARK-31885][SQL] Fix filter push down for old 
millis timestamps to Parquet"

No new revisions were added by this update.

Summary of changes:
 .../datasources/parquet/ParquetFilters.scala   | 27 +++---
 .../datasources/parquet/ParquetFilterSuite.scala   | 16 ++---
 .../datasources/parquet/ParquetTest.scala  |  4 +---
 3 files changed, 22 insertions(+), 25 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated (805884a -> 76a0418)

2020-06-01 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 805884a  [SPARK-31885][SQL] Fix filter push down for old millis 
timestamps to Parquet
 add 76a0418  Revert "[SPARK-31885][SQL] Fix filter push down for old 
millis timestamps to Parquet"

No new revisions were added by this update.

Summary of changes:
 .../datasources/parquet/ParquetFilters.scala   | 27 +++---
 .../datasources/parquet/ParquetFilterSuite.scala   | 16 ++---
 .../datasources/parquet/ParquetTest.scala  |  4 +---
 3 files changed, 22 insertions(+), 25 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: Revert "[SPARK-31885][SQL] Fix filter push down for old millis timestamps to Parquet"

2020-06-01 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 76a0418  Revert "[SPARK-31885][SQL] Fix filter push down for old 
millis timestamps to Parquet"
76a0418 is described below

commit 76a041804d9ba856c221e944b9898c8c62854474
Author: Xingbo Jiang 
AuthorDate: Mon Jun 1 10:20:17 2020 -0700

Revert "[SPARK-31885][SQL] Fix filter push down for old millis timestamps 
to Parquet"

This reverts commit 805884ad10aca3a532314284b0f2c007d4ca1045.
---
 .../datasources/parquet/ParquetFilters.scala   | 27 +++---
 .../datasources/parquet/ParquetFilterSuite.scala   | 16 ++---
 .../datasources/parquet/ParquetTest.scala  |  4 +---
 3 files changed, 22 insertions(+), 25 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
index 7900693..d89186a 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
@@ -148,13 +148,6 @@ class ParquetFilters(
 Binary.fromConstantByteArray(fixedLengthBytes, 0, numBytes)
   }
 
-  private def timestampToMillis(v: Any): JLong = {
-val timestamp = v.asInstanceOf[Timestamp]
-val micros = DateTimeUtils.fromJavaTimestamp(timestamp)
-val millis = DateTimeUtils.microsToMillis(micros)
-millis.asInstanceOf[JLong]
-  }
-
   private val makeEq:
 PartialFunction[ParquetSchemaType, (Array[String], Any) => 
FilterPredicate] = {
 case ParquetBooleanType =>
@@ -191,7 +184,7 @@ class ParquetFilters(
 case ParquetTimestampMillisType if pushDownTimestamp =>
   (n: Array[String], v: Any) => FilterApi.eq(
 longColumn(n),
-Option(v).map(timestampToMillis).orNull)
+
Option(v).map(_.asInstanceOf[Timestamp].getTime.asInstanceOf[JLong]).orNull)
 
 case ParquetSchemaType(DECIMAL, INT32, _, _) if pushDownDecimal =>
   (n: Array[String], v: Any) => FilterApi.eq(
@@ -242,7 +235,7 @@ class ParquetFilters(
 case ParquetTimestampMillisType if pushDownTimestamp =>
   (n: Array[String], v: Any) => FilterApi.notEq(
 longColumn(n),
-Option(v).map(timestampToMillis).orNull)
+
Option(v).map(_.asInstanceOf[Timestamp].getTime.asInstanceOf[JLong]).orNull)
 
 case ParquetSchemaType(DECIMAL, INT32, _, _) if pushDownDecimal =>
   (n: Array[String], v: Any) => FilterApi.notEq(
@@ -284,7 +277,9 @@ class ParquetFilters(
 longColumn(n),
 
DateTimeUtils.fromJavaTimestamp(v.asInstanceOf[Timestamp]).asInstanceOf[JLong])
 case ParquetTimestampMillisType if pushDownTimestamp =>
-  (n: Array[String], v: Any) => FilterApi.lt(longColumn(n), 
timestampToMillis(v))
+  (n: Array[String], v: Any) => FilterApi.lt(
+longColumn(n),
+v.asInstanceOf[Timestamp].getTime.asInstanceOf[JLong])
 
 case ParquetSchemaType(DECIMAL, INT32, _, _) if pushDownDecimal =>
   (n: Array[String], v: Any) =>
@@ -323,7 +318,9 @@ class ParquetFilters(
 longColumn(n),
 
DateTimeUtils.fromJavaTimestamp(v.asInstanceOf[Timestamp]).asInstanceOf[JLong])
 case ParquetTimestampMillisType if pushDownTimestamp =>
-  (n: Array[String], v: Any) => FilterApi.ltEq(longColumn(n), 
timestampToMillis(v))
+  (n: Array[String], v: Any) => FilterApi.ltEq(
+longColumn(n),
+v.asInstanceOf[Timestamp].getTime.asInstanceOf[JLong])
 
 case ParquetSchemaType(DECIMAL, INT32, _, _) if pushDownDecimal =>
   (n: Array[String], v: Any) =>
@@ -362,7 +359,9 @@ class ParquetFilters(
 longColumn(n),
 
DateTimeUtils.fromJavaTimestamp(v.asInstanceOf[Timestamp]).asInstanceOf[JLong])
 case ParquetTimestampMillisType if pushDownTimestamp =>
-  (n: Array[String], v: Any) => FilterApi.gt(longColumn(n), 
timestampToMillis(v))
+  (n: Array[String], v: Any) => FilterApi.gt(
+longColumn(n),
+v.asInstanceOf[Timestamp].getTime.asInstanceOf[JLong])
 
 case ParquetSchemaType(DECIMAL, INT32, _, _) if pushDownDecimal =>
   (n: Array[String], v: Any) =>
@@ -401,7 +400,9 @@ class ParquetFilters(
 longColumn(n),
 
DateTimeUtils.fromJavaTimestamp(v.asInstanceOf[Timestamp]).asInstanceOf[JLong])
 case ParquetTimestampMillisType if pushDownTimestamp =>
-  (n: Array[String], v: Any) => FilterApi.gtEq(longColumn(n), 
timestampToMillis(v))
+  (n: Array[String], v: Any) => FilterApi.gtEq(
+longColumn(n),
+v.asInstanceOf[Timestamp].getTime.asInstanceOf[JLong])

[spark] branch branch-3.0 updated: [SPARK-31730][CORE][TEST][3.0] Fix flaky tests in BarrierTaskContextSuite

2020-05-28 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 3ff4db9  [SPARK-31730][CORE][TEST][3.0] Fix flaky tests in 
BarrierTaskContextSuite
3ff4db9 is described below

commit 3ff4db97fb966e35c0b7450a7210cebf5d331be6
Author: Xingbo Jiang 
AuthorDate: Thu May 28 16:29:40 2020 -0700

[SPARK-31730][CORE][TEST][3.0] Fix flaky tests in BarrierTaskContextSuite

### What changes were proposed in this pull request?

To wait until all the executors have started before submitting any job. 
This could avoid the flakiness caused by waiting for executors coming up.

### How was this patch tested?

Existing tests.

Closes #28658 from jiangxb1987/barrierTest.

Authored-by: Xingbo Jiang 
Signed-off-by: Xingbo Jiang 
---
 .../spark/scheduler/BarrierTaskContextSuite.scala  | 22 ++
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index 6191e41..764b4b7 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -37,10 +37,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   .setAppName("test-cluster")
   .set(TEST_NO_STAGE_RETRY, true)
 sc = new SparkContext(conf)
+TestUtils.waitUntilExecutorsUp(sc, numWorker, 6)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("global sync by barrier() call") {
+  test("global sync by barrier() call") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -57,10 +57,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("share messages with allGather() call") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -78,10 +75,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("throw exception if we attempt to synchronize with different blocking 
calls") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -100,10 +94,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("successively sync with allGather and barrier") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -129,8 +120,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 assert(times2.max - times2.min <= 1000)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("support multiple barrier() call within a single task") {
+  test("support multiple barrier() call within a single task") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31730][CORE][TEST][3.0] Fix flaky tests in BarrierTaskContextSuite

2020-05-28 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 3ff4db9  [SPARK-31730][CORE][TEST][3.0] Fix flaky tests in 
BarrierTaskContextSuite
3ff4db9 is described below

commit 3ff4db97fb966e35c0b7450a7210cebf5d331be6
Author: Xingbo Jiang 
AuthorDate: Thu May 28 16:29:40 2020 -0700

[SPARK-31730][CORE][TEST][3.0] Fix flaky tests in BarrierTaskContextSuite

### What changes were proposed in this pull request?

To wait until all the executors have started before submitting any job. 
This could avoid the flakiness caused by waiting for executors coming up.

### How was this patch tested?

Existing tests.

Closes #28658 from jiangxb1987/barrierTest.

Authored-by: Xingbo Jiang 
Signed-off-by: Xingbo Jiang 
---
 .../spark/scheduler/BarrierTaskContextSuite.scala  | 22 ++
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index 6191e41..764b4b7 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -37,10 +37,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   .setAppName("test-cluster")
   .set(TEST_NO_STAGE_RETRY, true)
 sc = new SparkContext(conf)
+TestUtils.waitUntilExecutorsUp(sc, numWorker, 6)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("global sync by barrier() call") {
+  test("global sync by barrier() call") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -57,10 +57,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("share messages with allGather() call") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -78,10 +75,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("throw exception if we attempt to synchronize with different blocking 
calls") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -100,10 +94,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("successively sync with allGather and barrier") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -129,8 +120,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 assert(times2.max - times2.min <= 1000)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("support multiple barrier() call within a single task") {
+  test("support multiple barrier() call within a single task") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: Revert "[SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite"

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new e7b88e8  Revert "[SPARK-31730][CORE][TEST] Fix flaky tests in 
BarrierTaskContextSuite"
e7b88e8 is described below

commit e7b88e82ec5cee0a738f96127b106358cc97cb4f
Author: Xingbo Jiang 
AuthorDate: Wed May 27 17:21:10 2020 -0700

Revert "[SPARK-31730][CORE][TEST] Fix flaky tests in 
BarrierTaskContextSuite"

This reverts commit cb817bb1cf6b074e075c02880001ec96f2f39de7.
---
 .../spark/scheduler/BarrierTaskContextSuite.scala  | 26 +-
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index 54899bf..6191e41 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -25,7 +25,6 @@ import org.scalatest.concurrent.Eventually
 import org.scalatest.time.SpanSugar._
 
 import org.apache.spark._
-import org.apache.spark.internal.config
 import org.apache.spark.internal.config.Tests.TEST_NO_STAGE_RETRY
 
 class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext 
with Eventually {
@@ -38,10 +37,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   .setAppName("test-cluster")
   .set(TEST_NO_STAGE_RETRY, true)
 sc = new SparkContext(conf)
-TestUtils.waitUntilExecutorsUp(sc, numWorker, 6)
   }
 
-  test("global sync by barrier() call") {
+  // TODO (SPARK-31730): re-enable it
+  ignore("global sync by barrier() call") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -58,7 +57,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("share messages with allGather() call") {
-initLocalClusterSparkContext()
+val conf = new SparkConf()
+  .setMaster("local-cluster[4, 1, 1024]")
+  .setAppName("test-cluster")
+sc = new SparkContext(conf)
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -76,7 +78,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("throw exception if we attempt to synchronize with different blocking 
calls") {
-initLocalClusterSparkContext()
+val conf = new SparkConf()
+  .setMaster("local-cluster[4, 1, 1024]")
+  .setAppName("test-cluster")
+sc = new SparkContext(conf)
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -95,7 +100,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("successively sync with allGather and barrier") {
-initLocalClusterSparkContext()
+val conf = new SparkConf()
+  .setMaster("local-cluster[4, 1, 1024]")
+  .setAppName("test-cluster")
+sc = new SparkContext(conf)
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -121,7 +129,8 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 assert(times2.max - times2.min <= 1000)
   }
 
-  test("support multiple barrier() call within a single task") {
+  // TODO (SPARK-31730): re-enable it
+  ignore("support multiple barrier() call within a single task") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -276,9 +285,6 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 
   test("SPARK-31485: barrier stage should fail if only partial tasks are 
launched") {
 initLocalClusterSparkContext(2)
-// It's required to reset the delay timer when a task is scheduled, 
otherwise all the tasks
-// could get scheduled at ANY level.
-sc.conf.set(config.LEGACY_LOCALITY_WAIT_RESET, true)
 val rdd0 = sc.parallelize(Seq(0, 1, 2, 3), 2)
 val dep = new OneToOneDependency[Int](rdd0)
 // set up a barrier stage with 2 tasks and both tasks prefer executor 0 
(only 1 core) for


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new cb817bb  [SPARK-31730][CORE][TEST] Fix flaky tests in 
BarrierTaskContextSuite
cb817bb is described below

commit cb817bb1cf6b074e075c02880001ec96f2f39de7
Author: Xingbo Jiang 
AuthorDate: Wed May 27 16:37:02 2020 -0700

[SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite

### What changes were proposed in this pull request?

To wait until all the executors have started before submitting any job. 
This could avoid the flakiness caused by waiting for executors coming up.

### How was this patch tested?

Existing tests.

Closes #28584 from jiangxb1987/barrierTest.

Authored-by: Xingbo Jiang 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit efe7fd2b6bea4a945ed7f3f486ab279c505378b4)
Signed-off-by: Xingbo Jiang 
---
 .../spark/scheduler/BarrierTaskContextSuite.scala  | 26 +-
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index 6191e41..54899bf 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -25,6 +25,7 @@ import org.scalatest.concurrent.Eventually
 import org.scalatest.time.SpanSugar._
 
 import org.apache.spark._
+import org.apache.spark.internal.config
 import org.apache.spark.internal.config.Tests.TEST_NO_STAGE_RETRY
 
 class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext 
with Eventually {
@@ -37,10 +38,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   .setAppName("test-cluster")
   .set(TEST_NO_STAGE_RETRY, true)
 sc = new SparkContext(conf)
+TestUtils.waitUntilExecutorsUp(sc, numWorker, 6)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("global sync by barrier() call") {
+  test("global sync by barrier() call") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -57,10 +58,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("share messages with allGather() call") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -78,10 +76,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("throw exception if we attempt to synchronize with different blocking 
calls") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -100,10 +95,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("successively sync with allGather and barrier") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -129,8 +121,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 assert(times2.max - times2.min <= 1000)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("support multiple barrier() call within a single task") {
+  test("support multiple barrier() call within a single task") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -285,6 +276,9 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 
   test("SPARK-31485: barrier stage should fail if only partial tasks are 
launched") {
 initLocalClusterSparkContext(2)
+// It's required to reset the delay timer when a task is scheduled, 
otherwise all the tasks
+// could get scheduled at ANY level.
+sc.conf.set(config.LEGACY_LOCALITY_WAIT_RESET, true)
 val rdd0 = sc.par

[spark] branch branch-3.0 updated: [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new cb817bb  [SPARK-31730][CORE][TEST] Fix flaky tests in 
BarrierTaskContextSuite
cb817bb is described below

commit cb817bb1cf6b074e075c02880001ec96f2f39de7
Author: Xingbo Jiang 
AuthorDate: Wed May 27 16:37:02 2020 -0700

[SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite

### What changes were proposed in this pull request?

To wait until all the executors have started before submitting any job. 
This could avoid the flakiness caused by waiting for executors coming up.

### How was this patch tested?

Existing tests.

Closes #28584 from jiangxb1987/barrierTest.

Authored-by: Xingbo Jiang 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit efe7fd2b6bea4a945ed7f3f486ab279c505378b4)
Signed-off-by: Xingbo Jiang 
---
 .../spark/scheduler/BarrierTaskContextSuite.scala  | 26 +-
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index 6191e41..54899bf 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -25,6 +25,7 @@ import org.scalatest.concurrent.Eventually
 import org.scalatest.time.SpanSugar._
 
 import org.apache.spark._
+import org.apache.spark.internal.config
 import org.apache.spark.internal.config.Tests.TEST_NO_STAGE_RETRY
 
 class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext 
with Eventually {
@@ -37,10 +38,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   .setAppName("test-cluster")
   .set(TEST_NO_STAGE_RETRY, true)
 sc = new SparkContext(conf)
+TestUtils.waitUntilExecutorsUp(sc, numWorker, 6)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("global sync by barrier() call") {
+  test("global sync by barrier() call") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -57,10 +58,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("share messages with allGather() call") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -78,10 +76,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("throw exception if we attempt to synchronize with different blocking 
calls") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -100,10 +95,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("successively sync with allGather and barrier") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -129,8 +121,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 assert(times2.max - times2.min <= 1000)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("support multiple barrier() call within a single task") {
+  test("support multiple barrier() call within a single task") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -285,6 +276,9 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 
   test("SPARK-31485: barrier stage should fail if only partial tasks are 
launched") {
 initLocalClusterSparkContext(2)
+// It's required to reset the delay timer when a task is scheduled, 
otherwise all the tasks
+// could get scheduled at ANY level.
+sc.conf.set(config.LEGACY_LOCALITY_WAIT_RESET, true)
 val rdd0 = sc.par

[spark] branch master updated (d19b173 -> efe7fd2)

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d19b173  [SPARK-31764][CORE] JsonProtocol doesn't write 
RDDInfo#isBarrier
 add efe7fd2  [SPARK-31730][CORE][TEST] Fix flaky tests in 
BarrierTaskContextSuite

No new revisions were added by this update.

Summary of changes:
 .../spark/scheduler/BarrierTaskContextSuite.scala  | 26 +-
 1 file changed, 10 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new cb817bb  [SPARK-31730][CORE][TEST] Fix flaky tests in 
BarrierTaskContextSuite
cb817bb is described below

commit cb817bb1cf6b074e075c02880001ec96f2f39de7
Author: Xingbo Jiang 
AuthorDate: Wed May 27 16:37:02 2020 -0700

[SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite

### What changes were proposed in this pull request?

To wait until all the executors have started before submitting any job. 
This could avoid the flakiness caused by waiting for executors coming up.

### How was this patch tested?

Existing tests.

Closes #28584 from jiangxb1987/barrierTest.

Authored-by: Xingbo Jiang 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit efe7fd2b6bea4a945ed7f3f486ab279c505378b4)
Signed-off-by: Xingbo Jiang 
---
 .../spark/scheduler/BarrierTaskContextSuite.scala  | 26 +-
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index 6191e41..54899bf 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -25,6 +25,7 @@ import org.scalatest.concurrent.Eventually
 import org.scalatest.time.SpanSugar._
 
 import org.apache.spark._
+import org.apache.spark.internal.config
 import org.apache.spark.internal.config.Tests.TEST_NO_STAGE_RETRY
 
 class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext 
with Eventually {
@@ -37,10 +38,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   .setAppName("test-cluster")
   .set(TEST_NO_STAGE_RETRY, true)
 sc = new SparkContext(conf)
+TestUtils.waitUntilExecutorsUp(sc, numWorker, 6)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("global sync by barrier() call") {
+  test("global sync by barrier() call") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -57,10 +58,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("share messages with allGather() call") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -78,10 +76,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("throw exception if we attempt to synchronize with different blocking 
calls") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -100,10 +95,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("successively sync with allGather and barrier") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -129,8 +121,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 assert(times2.max - times2.min <= 1000)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("support multiple barrier() call within a single task") {
+  test("support multiple barrier() call within a single task") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -285,6 +276,9 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 
   test("SPARK-31485: barrier stage should fail if only partial tasks are 
launched") {
 initLocalClusterSparkContext(2)
+// It's required to reset the delay timer when a task is scheduled, 
otherwise all the tasks
+// could get scheduled at ANY level.
+sc.conf.set(config.LEGACY_LOCALITY_WAIT_RESET, true)
 val rdd0 = sc.par

[spark] branch master updated (d19b173 -> efe7fd2)

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d19b173  [SPARK-31764][CORE] JsonProtocol doesn't write 
RDDInfo#isBarrier
 add efe7fd2  [SPARK-31730][CORE][TEST] Fix flaky tests in 
BarrierTaskContextSuite

No new revisions were added by this update.

Summary of changes:
 .../spark/scheduler/BarrierTaskContextSuite.scala  | 26 +-
 1 file changed, 10 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new cb817bb  [SPARK-31730][CORE][TEST] Fix flaky tests in 
BarrierTaskContextSuite
cb817bb is described below

commit cb817bb1cf6b074e075c02880001ec96f2f39de7
Author: Xingbo Jiang 
AuthorDate: Wed May 27 16:37:02 2020 -0700

[SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite

### What changes were proposed in this pull request?

To wait until all the executors have started before submitting any job. 
This could avoid the flakiness caused by waiting for executors coming up.

### How was this patch tested?

Existing tests.

Closes #28584 from jiangxb1987/barrierTest.

Authored-by: Xingbo Jiang 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit efe7fd2b6bea4a945ed7f3f486ab279c505378b4)
Signed-off-by: Xingbo Jiang 
---
 .../spark/scheduler/BarrierTaskContextSuite.scala  | 26 +-
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index 6191e41..54899bf 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -25,6 +25,7 @@ import org.scalatest.concurrent.Eventually
 import org.scalatest.time.SpanSugar._
 
 import org.apache.spark._
+import org.apache.spark.internal.config
 import org.apache.spark.internal.config.Tests.TEST_NO_STAGE_RETRY
 
 class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext 
with Eventually {
@@ -37,10 +38,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   .setAppName("test-cluster")
   .set(TEST_NO_STAGE_RETRY, true)
 sc = new SparkContext(conf)
+TestUtils.waitUntilExecutorsUp(sc, numWorker, 6)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("global sync by barrier() call") {
+  test("global sync by barrier() call") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -57,10 +58,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("share messages with allGather() call") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -78,10 +76,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("throw exception if we attempt to synchronize with different blocking 
calls") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -100,10 +95,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("successively sync with allGather and barrier") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -129,8 +121,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 assert(times2.max - times2.min <= 1000)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("support multiple barrier() call within a single task") {
+  test("support multiple barrier() call within a single task") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -285,6 +276,9 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 
   test("SPARK-31485: barrier stage should fail if only partial tasks are 
launched") {
 initLocalClusterSparkContext(2)
+// It's required to reset the delay timer when a task is scheduled, 
otherwise all the tasks
+// could get scheduled at ANY level.
+sc.conf.set(config.LEGACY_LOCALITY_WAIT_RESET, true)
 val rdd0 = sc.par

[spark] branch master updated (d19b173 -> efe7fd2)

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d19b173  [SPARK-31764][CORE] JsonProtocol doesn't write 
RDDInfo#isBarrier
 add efe7fd2  [SPARK-31730][CORE][TEST] Fix flaky tests in 
BarrierTaskContextSuite

No new revisions were added by this update.

Summary of changes:
 .../spark/scheduler/BarrierTaskContextSuite.scala  | 26 +-
 1 file changed, 10 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new cb817bb  [SPARK-31730][CORE][TEST] Fix flaky tests in 
BarrierTaskContextSuite
cb817bb is described below

commit cb817bb1cf6b074e075c02880001ec96f2f39de7
Author: Xingbo Jiang 
AuthorDate: Wed May 27 16:37:02 2020 -0700

[SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite

### What changes were proposed in this pull request?

To wait until all the executors have started before submitting any job. 
This could avoid the flakiness caused by waiting for executors coming up.

### How was this patch tested?

Existing tests.

Closes #28584 from jiangxb1987/barrierTest.

Authored-by: Xingbo Jiang 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit efe7fd2b6bea4a945ed7f3f486ab279c505378b4)
Signed-off-by: Xingbo Jiang 
---
 .../spark/scheduler/BarrierTaskContextSuite.scala  | 26 +-
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index 6191e41..54899bf 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -25,6 +25,7 @@ import org.scalatest.concurrent.Eventually
 import org.scalatest.time.SpanSugar._
 
 import org.apache.spark._
+import org.apache.spark.internal.config
 import org.apache.spark.internal.config.Tests.TEST_NO_STAGE_RETRY
 
 class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext 
with Eventually {
@@ -37,10 +38,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   .setAppName("test-cluster")
   .set(TEST_NO_STAGE_RETRY, true)
 sc = new SparkContext(conf)
+TestUtils.waitUntilExecutorsUp(sc, numWorker, 6)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("global sync by barrier() call") {
+  test("global sync by barrier() call") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -57,10 +58,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("share messages with allGather() call") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -78,10 +76,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("throw exception if we attempt to synchronize with different blocking 
calls") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -100,10 +95,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("successively sync with allGather and barrier") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -129,8 +121,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 assert(times2.max - times2.min <= 1000)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("support multiple barrier() call within a single task") {
+  test("support multiple barrier() call within a single task") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -285,6 +276,9 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 
   test("SPARK-31485: barrier stage should fail if only partial tasks are 
launched") {
 initLocalClusterSparkContext(2)
+// It's required to reset the delay timer when a task is scheduled, 
otherwise all the tasks
+// could get scheduled at ANY level.
+sc.conf.set(config.LEGACY_LOCALITY_WAIT_RESET, true)
 val rdd0 = sc.par

[spark] branch master updated (d19b173 -> efe7fd2)

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d19b173  [SPARK-31764][CORE] JsonProtocol doesn't write 
RDDInfo#isBarrier
 add efe7fd2  [SPARK-31730][CORE][TEST] Fix flaky tests in 
BarrierTaskContextSuite

No new revisions were added by this update.

Summary of changes:
 .../spark/scheduler/BarrierTaskContextSuite.scala  | 26 +-
 1 file changed, 10 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new efe7fd2  [SPARK-31730][CORE][TEST] Fix flaky tests in 
BarrierTaskContextSuite
efe7fd2 is described below

commit efe7fd2b6bea4a945ed7f3f486ab279c505378b4
Author: Xingbo Jiang 
AuthorDate: Wed May 27 16:37:02 2020 -0700

[SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite

### What changes were proposed in this pull request?

To wait until all the executors have started before submitting any job. 
This could avoid the flakiness caused by waiting for executors coming up.

### How was this patch tested?

Existing tests.

Closes #28584 from jiangxb1987/barrierTest.

Authored-by: Xingbo Jiang 
Signed-off-by: Xingbo Jiang 
---
 .../spark/scheduler/BarrierTaskContextSuite.scala  | 26 +-
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index 6191e41..54899bf 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -25,6 +25,7 @@ import org.scalatest.concurrent.Eventually
 import org.scalatest.time.SpanSugar._
 
 import org.apache.spark._
+import org.apache.spark.internal.config
 import org.apache.spark.internal.config.Tests.TEST_NO_STAGE_RETRY
 
 class BarrierTaskContextSuite extends SparkFunSuite with LocalSparkContext 
with Eventually {
@@ -37,10 +38,10 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   .setAppName("test-cluster")
   .set(TEST_NO_STAGE_RETRY, true)
 sc = new SparkContext(conf)
+TestUtils.waitUntilExecutorsUp(sc, numWorker, 6)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("global sync by barrier() call") {
+  test("global sync by barrier() call") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -57,10 +58,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("share messages with allGather() call") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -78,10 +76,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("throw exception if we attempt to synchronize with different blocking 
calls") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -100,10 +95,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   }
 
   test("successively sync with allGather and barrier") {
-val conf = new SparkConf()
-  .setMaster("local-cluster[4, 1, 1024]")
-  .setAppName("test-cluster")
-sc = new SparkContext(conf)
+initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
   val context = BarrierTaskContext.get()
@@ -129,8 +121,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 assert(times2.max - times2.min <= 1000)
   }
 
-  // TODO (SPARK-31730): re-enable it
-  ignore("support multiple barrier() call within a single task") {
+  test("support multiple barrier() call within a single task") {
 initLocalClusterSparkContext()
 val rdd = sc.makeRDD(1 to 10, 4)
 val rdd2 = rdd.barrier().mapPartitions { it =>
@@ -285,6 +276,9 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 
   test("SPARK-31485: barrier stage should fail if only partial tasks are 
launched") {
 initLocalClusterSparkContext(2)
+// It's required to reset the delay timer when a task is scheduled, 
otherwise all the tasks
+// could get scheduled at ANY level.
+sc.conf.set(config.LEGACY_LOCALITY_WAIT_RESET, true)
 val rdd0 = sc.parallelize(Seq(0, 1, 2, 3), 2)
 val dep = new OneToOneDependency[Int](rdd0)
 // set up a barrier stage with

[spark] branch master updated (1528fbc -> d19b173)

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1528fbc  [SPARK-31827][SQL] fail datetime parsing/formatting if detect 
the Java 8 bug of stand-alone form
 add d19b173  [SPARK-31764][CORE] JsonProtocol doesn't write 
RDDInfo#isBarrier

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/util/JsonProtocol.scala |  1 +
 .../scheduler/EventLoggingListenerSuite.scala  | 44 ++
 .../org/apache/spark/util/JsonProtocolSuite.scala  | 11 ++
 3 files changed, 56 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (1528fbc -> d19b173)

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1528fbc  [SPARK-31827][SQL] fail datetime parsing/formatting if detect 
the Java 8 bug of stand-alone form
 add d19b173  [SPARK-31764][CORE] JsonProtocol doesn't write 
RDDInfo#isBarrier

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/util/JsonProtocol.scala |  1 +
 .../scheduler/EventLoggingListenerSuite.scala  | 44 ++
 .../org/apache/spark/util/JsonProtocolSuite.scala  | 11 ++
 3 files changed, 56 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-31764][CORE] JsonProtocol doesn't write RDDInfo#isBarrier

2020-05-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d19b173  [SPARK-31764][CORE] JsonProtocol doesn't write 
RDDInfo#isBarrier
d19b173 is described below

commit d19b173b47af04fe6f03e2b21b60eb317aeaae4f
Author: Kousuke Saruta 
AuthorDate: Wed May 27 14:36:12 2020 -0700

[SPARK-31764][CORE] JsonProtocol doesn't write RDDInfo#isBarrier

### What changes were proposed in this pull request?

This PR changes JsonProtocol to write RDDInfos#isBarrier.

### Why are the changes needed?

JsonProtocol reads RDDInfos#isBarrier but not write it so it's a bug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I added a testcase.

Closes #28583 from sarutak/SPARK-31764.

Authored-by: Kousuke Saruta 
Signed-off-by: Xingbo Jiang 
---
 .../scala/org/apache/spark/util/JsonProtocol.scala |  1 +
 .../scheduler/EventLoggingListenerSuite.scala  | 44 ++
 .../org/apache/spark/util/JsonProtocolSuite.scala  | 11 ++
 3 files changed, 56 insertions(+)

diff --git a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala 
b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
index 26bbff5..844d9b7 100644
--- a/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
+++ b/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala
@@ -487,6 +487,7 @@ private[spark] object JsonProtocol {
 ("Callsite" -> rddInfo.callSite) ~
 ("Parent IDs" -> parentIds) ~
 ("Storage Level" -> storageLevel) ~
+("Barrier" -> rddInfo.isBarrier) ~
 ("Number of Partitions" -> rddInfo.numPartitions) ~
 ("Number of Cached Partitions" -> rddInfo.numCachedPartitions) ~
 ("Memory Size" -> rddInfo.memSize) ~
diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
 
b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
index 61ea21f..7c23e44 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
@@ -36,6 +36,7 @@ import org.apache.spark.deploy.history.{EventLogFileReader, 
SingleEventLogFileWr
 import org.apache.spark.deploy.history.EventLogTestHelper._
 import org.apache.spark.executor.{ExecutorMetrics, TaskMetrics}
 import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config.{EVENT_LOG_DIR, EVENT_LOG_ENABLED}
 import org.apache.spark.io._
 import org.apache.spark.metrics.{ExecutorMetricType, MetricsSystem}
 import org.apache.spark.resource.ResourceProfile
@@ -100,6 +101,49 @@ class EventLoggingListenerSuite extends SparkFunSuite with 
LocalSparkContext wit
 testStageExecutorMetricsEventLogging()
   }
 
+  test("SPARK-31764: isBarrier should be logged in event log") {
+val conf = new SparkConf()
+conf.set(EVENT_LOG_ENABLED, true)
+conf.set(EVENT_LOG_DIR, testDirPath.toString)
+val sc = new SparkContext("local", "test-SPARK-31764", conf)
+val appId = sc.applicationId
+
+sc.parallelize(1 to 10)
+  .barrier()
+  .mapPartitions(_.map(elem => (elem, elem)))
+  .filter(elem => elem._1 % 2 == 0)
+  .reduceByKey(_ + _)
+  .collect
+sc.stop()
+
+val eventLogStream = EventLogFileReader.openEventLog(new Path(testDirPath, 
appId), fileSystem)
+val events = readLines(eventLogStream).map(line => 
JsonProtocol.sparkEventFromJson(parse(line)))
+val jobStartEvents = events
+  .filter(event => event.isInstanceOf[SparkListenerJobStart])
+  .map(_.asInstanceOf[SparkListenerJobStart])
+
+assert(jobStartEvents.size === 1)
+val stageInfos = jobStartEvents.head.stageInfos
+assert(stageInfos.size === 2)
+
+val stage0 = stageInfos(0)
+val rddInfosInStage0 = stage0.rddInfos
+assert(rddInfosInStage0.size === 3)
+val sortedRddInfosInStage0 = rddInfosInStage0.sortBy(_.scope.get.name)
+assert(sortedRddInfosInStage0(0).scope.get.name === "filter")
+assert(sortedRddInfosInStage0(0).isBarrier === true)
+assert(sortedRddInfosInStage0(1).scope.get.name === "mapPartitions")
+assert(sortedRddInfosInStage0(1).isBarrier === true)
+assert(sortedRddInfosInStage0(2).scope.get.name === "parallelize")
+assert(sortedRddInfosInStage0(2).isBarrier === false)
+
+val stage1 = stageInfos(1)
+val rddInfosInStage1 = stage1.rddInfos
+assert(rddInfosInStage1.size === 1)
+assert(rddInfosInStage1(0).scope.get.name === "reduceByKey")
+assert(rddInfosInStage1(0).isBarrier === false) // reduc

[spark] branch master updated (83d0967 -> 245aee9)

2020-05-21 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 83d0967  [SPARK-31784][CORE][TEST] Fix test 
BarrierTaskContextSuite."share messages with allGather() call"
 add 245aee9  [SPARK-31757][CORE] Improve 
HistoryServerDiskManager.updateAccessTime()

No new revisions were added by this update.

Summary of changes:
 .../spark/deploy/history/HistoryServerDiskManager.scala   | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (83d0967 -> 245aee9)

2020-05-21 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 83d0967  [SPARK-31784][CORE][TEST] Fix test 
BarrierTaskContextSuite."share messages with allGather() call"
 add 245aee9  [SPARK-31757][CORE] Improve 
HistoryServerDiskManager.updateAccessTime()

No new revisions were added by this update.

Summary of changes:
 .../spark/deploy/history/HistoryServerDiskManager.scala   | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call"

2020-05-21 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ec80e4b  [SPARK-31784][CORE][TEST] Fix test 
BarrierTaskContextSuite."share messages with allGather() call"
ec80e4b is described below

commit ec80e4b5f80876378765544e0c4c6af1a704
Author: yi.wu 
AuthorDate: Thu May 21 23:34:11 2020 -0700

[SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages 
with allGather() call"

### What changes were proposed in this pull request?

Change from `messages.toList.iterator` to 
`Iterator.single(messages.toList)`.

### Why are the changes needed?

In this test, the expected result of `rdd2.collect().head` should actually 
be `List("0", "1", "2", "3")` but is `"0"` now.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Updated test.

Thanks WeichenXu123 reported this problem.

Closes #28596 from Ngone51/fix_allgather_test.

Authored-by: yi.wu 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit 83d0967dcc6b205a3fd2003e051f49733f63cb30)
Signed-off-by: Xingbo Jiang 
---
 .../org/apache/spark/scheduler/BarrierTaskContextSuite.scala   | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index b5614b2..6191e41 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -69,12 +69,12 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   // Pass partitionId message in
   val message: String = context.partitionId().toString
   val messages: Array[String] = context.allGather(message)
-  messages.toList.iterator
+  Iterator.single(messages.toList)
 }
-// Take a sorted list of all the partitionId messages
-val messages = rdd2.collect().head
-// All the task partitionIds are shared
-for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString)
+val messages = rdd2.collect()
+// All the task partitionIds are shared across all tasks
+assert(messages.length === 4)
+assert(messages.forall(_ == List("0", "1", "2", "3")))
   }
 
   test("throw exception if we attempt to synchronize with different blocking 
calls") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (83d0967 -> 245aee9)

2020-05-21 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 83d0967  [SPARK-31784][CORE][TEST] Fix test 
BarrierTaskContextSuite."share messages with allGather() call"
 add 245aee9  [SPARK-31757][CORE] Improve 
HistoryServerDiskManager.updateAccessTime()

No new revisions were added by this update.

Summary of changes:
 .../spark/deploy/history/HistoryServerDiskManager.scala   | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call"

2020-05-21 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ec80e4b  [SPARK-31784][CORE][TEST] Fix test 
BarrierTaskContextSuite."share messages with allGather() call"
ec80e4b is described below

commit ec80e4b5f80876378765544e0c4c6af1a704
Author: yi.wu 
AuthorDate: Thu May 21 23:34:11 2020 -0700

[SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages 
with allGather() call"

### What changes were proposed in this pull request?

Change from `messages.toList.iterator` to 
`Iterator.single(messages.toList)`.

### Why are the changes needed?

In this test, the expected result of `rdd2.collect().head` should actually 
be `List("0", "1", "2", "3")` but is `"0"` now.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Updated test.

Thanks WeichenXu123 reported this problem.

Closes #28596 from Ngone51/fix_allgather_test.

Authored-by: yi.wu 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit 83d0967dcc6b205a3fd2003e051f49733f63cb30)
Signed-off-by: Xingbo Jiang 
---
 .../org/apache/spark/scheduler/BarrierTaskContextSuite.scala   | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index b5614b2..6191e41 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -69,12 +69,12 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   // Pass partitionId message in
   val message: String = context.partitionId().toString
   val messages: Array[String] = context.allGather(message)
-  messages.toList.iterator
+  Iterator.single(messages.toList)
 }
-// Take a sorted list of all the partitionId messages
-val messages = rdd2.collect().head
-// All the task partitionIds are shared
-for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString)
+val messages = rdd2.collect()
+// All the task partitionIds are shared across all tasks
+assert(messages.length === 4)
+assert(messages.forall(_ == List("0", "1", "2", "3")))
   }
 
   test("throw exception if we attempt to synchronize with different blocking 
calls") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (60118a2 -> 83d0967)

2020-05-21 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 60118a2  [SPARK-31785][SQL][TESTS] Add a helper function to test all 
parquet readers
 add 83d0967  [SPARK-31784][CORE][TEST] Fix test 
BarrierTaskContextSuite."share messages with allGather() call"

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/scheduler/BarrierTaskContextSuite.scala   | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call"

2020-05-21 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ec80e4b  [SPARK-31784][CORE][TEST] Fix test 
BarrierTaskContextSuite."share messages with allGather() call"
ec80e4b is described below

commit ec80e4b5f80876378765544e0c4c6af1a704
Author: yi.wu 
AuthorDate: Thu May 21 23:34:11 2020 -0700

[SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages 
with allGather() call"

### What changes were proposed in this pull request?

Change from `messages.toList.iterator` to 
`Iterator.single(messages.toList)`.

### Why are the changes needed?

In this test, the expected result of `rdd2.collect().head` should actually 
be `List("0", "1", "2", "3")` but is `"0"` now.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Updated test.

Thanks WeichenXu123 reported this problem.

Closes #28596 from Ngone51/fix_allgather_test.

Authored-by: yi.wu 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit 83d0967dcc6b205a3fd2003e051f49733f63cb30)
Signed-off-by: Xingbo Jiang 
---
 .../org/apache/spark/scheduler/BarrierTaskContextSuite.scala   | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index b5614b2..6191e41 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -69,12 +69,12 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   // Pass partitionId message in
   val message: String = context.partitionId().toString
   val messages: Array[String] = context.allGather(message)
-  messages.toList.iterator
+  Iterator.single(messages.toList)
 }
-// Take a sorted list of all the partitionId messages
-val messages = rdd2.collect().head
-// All the task partitionIds are shared
-for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString)
+val messages = rdd2.collect()
+// All the task partitionIds are shared across all tasks
+assert(messages.length === 4)
+assert(messages.forall(_ == List("0", "1", "2", "3")))
   }
 
   test("throw exception if we attempt to synchronize with different blocking 
calls") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (60118a2 -> 83d0967)

2020-05-21 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 60118a2  [SPARK-31785][SQL][TESTS] Add a helper function to test all 
parquet readers
 add 83d0967  [SPARK-31784][CORE][TEST] Fix test 
BarrierTaskContextSuite."share messages with allGather() call"

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/scheduler/BarrierTaskContextSuite.scala   | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages with allGather() call"

2020-05-21 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 83d0967  [SPARK-31784][CORE][TEST] Fix test 
BarrierTaskContextSuite."share messages with allGather() call"
83d0967 is described below

commit 83d0967dcc6b205a3fd2003e051f49733f63cb30
Author: yi.wu 
AuthorDate: Thu May 21 23:34:11 2020 -0700

[SPARK-31784][CORE][TEST] Fix test BarrierTaskContextSuite."share messages 
with allGather() call"

### What changes were proposed in this pull request?

Change from `messages.toList.iterator` to 
`Iterator.single(messages.toList)`.

### Why are the changes needed?

In this test, the expected result of `rdd2.collect().head` should actually 
be `List("0", "1", "2", "3")` but is `"0"` now.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Updated test.

Thanks WeichenXu123 reported this problem.

Closes #28596 from Ngone51/fix_allgather_test.

Authored-by: yi.wu 
Signed-off-by: Xingbo Jiang 
---
 .../org/apache/spark/scheduler/BarrierTaskContextSuite.scala   | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index b5614b2..6191e41 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -69,12 +69,12 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
   // Pass partitionId message in
   val message: String = context.partitionId().toString
   val messages: Array[String] = context.allGather(message)
-  messages.toList.iterator
+  Iterator.single(messages.toList)
 }
-// Take a sorted list of all the partitionId messages
-val messages = rdd2.collect().head
-// All the task partitionIds are shared
-for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString)
+val messages = rdd2.collect()
+// All the task partitionIds are shared across all tasks
+assert(messages.length === 4)
+assert(messages.forall(_ == List("0", "1", "2", "3")))
   }
 
   test("throw exception if we attempt to synchronize with different blocking 
calls") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31651][CORE] Improve handling the case where different barrier sync types in a single sync

2020-05-18 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 9f34cf5  [SPARK-31651][CORE] Improve handling the case where different 
barrier sync types in a single sync
9f34cf5 is described below

commit 9f34cf56e551c8bb678221cb8671432d4cb02ac0
Author: yi.wu 
AuthorDate: Mon May 18 23:54:41 2020 -0700

[SPARK-31651][CORE] Improve handling the case where different barrier sync 
types in a single sync

### What changes were proposed in this pull request?

This PR improves handling the case where different barrier sync types in a 
single sync:

- use `clear` instead of `cleanupBarrierStage `

- make sure all requesters are failed because of "different barrier sync 
types"

### Why are the changes needed?

Currently, we use `cleanupBarrierStage` to clean up a barrier stage when we 
detecting the case of "different barrier sync types". But this leads to a 
problem that we could create new a `ContextBarrierState` for the same stage 
again if there're on-way requests from tasks. As a result, those task will fail 
because of killing instead of "different barrier sync types".

Besides, we don't handle the current request which is being handling 
properly as it will fail due to epoch mismatch instead of "different barrier 
sync types".

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Updated a existed test.

Closes #28462 from Ngone51/impr_barrier_req.

Authored-by: yi.wu 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit 653ca19b1f26155c2b7656e2ebbe227b18383308)
Signed-off-by: Xingbo Jiang 
---
 .../org/apache/spark/BarrierCoordinator.scala  | 30 +++---
 .../spark/scheduler/BarrierTaskContextSuite.scala  |  5 +---
 2 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala 
b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala
index 5663055..04faf7f 100644
--- a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala
+++ b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala
@@ -21,7 +21,7 @@ import java.util.{Timer, TimerTask}
 import java.util.concurrent.ConcurrentHashMap
 import java.util.function.Consumer
 
-import scala.collection.mutable.ArrayBuffer
+import scala.collection.mutable.{ArrayBuffer, HashSet}
 
 import org.apache.spark.internal.Logging
 import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint}
@@ -106,9 +106,11 @@ private[spark] class BarrierCoordinator(
 // The messages will be replied to all tasks once sync finished.
 private val messages = Array.ofDim[String](numTasks)
 
-// The request method which is called inside this barrier sync. All tasks 
should make sure
-// that they're calling the same method within the same barrier sync phase.
-private var requestMethod: RequestMethod.Value = _
+// Request methods collected from tasks inside this barrier sync. All 
tasks should make sure
+// that they're calling the same method within the same barrier sync 
phase. In other words,
+// the size of requestMethods should always be 1 for a legitimate barrier 
sync. Otherwise,
+// the barrier sync would fail if the size of requestMethods becomes 
greater than 1.
+private val requestMethods = new HashSet[RequestMethod.Value]
 
 // A timer task that ensures we may timeout for a barrier() call.
 private var timerTask: TimerTask = null
@@ -141,17 +143,14 @@ private[spark] class BarrierCoordinator(
   val taskId = request.taskAttemptId
   val epoch = request.barrierEpoch
   val curReqMethod = request.requestMethod
-
-  if (requesters.isEmpty) {
-requestMethod = curReqMethod
-  } else if (requestMethod != curReqMethod) {
-requesters.foreach(
-  _.sendFailure(new SparkException(s"$barrierId tried to use 
requestMethod " +
-s"`$curReqMethod` during barrier epoch $barrierEpoch, which does 
not match " +
-s"the current synchronized requestMethod `$requestMethod`"
-  ))
-)
-cleanupBarrierStage(barrierId)
+  requestMethods.add(curReqMethod)
+  if (requestMethods.size > 1) {
+val error = new SparkException(s"Different barrier sync types found 
for the " +
+  s"sync $barrierId: ${requestMethods.mkString(", ")}. Please use the 
" +
+  s"same barrier sync type within a single sync.")
+(requesters :+ requester).foreach(_.sendFailure(error))
+clear()
+return
   }
 
   // Require the number of t

[spark] branch master updated: [SPARK-31651][CORE] Improve handling the case where different barrier sync types in a single sync

2020-05-18 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 653ca19  [SPARK-31651][CORE] Improve handling the case where different 
barrier sync types in a single sync
653ca19 is described below

commit 653ca19b1f26155c2b7656e2ebbe227b18383308
Author: yi.wu 
AuthorDate: Mon May 18 23:54:41 2020 -0700

[SPARK-31651][CORE] Improve handling the case where different barrier sync 
types in a single sync

### What changes were proposed in this pull request?

This PR improves handling the case where different barrier sync types in a 
single sync:

- use `clear` instead of `cleanupBarrierStage `

- make sure all requesters are failed because of "different barrier sync 
types"

### Why are the changes needed?

Currently, we use `cleanupBarrierStage` to clean up a barrier stage when we 
detecting the case of "different barrier sync types". But this leads to a 
problem that we could create new a `ContextBarrierState` for the same stage 
again if there're on-way requests from tasks. As a result, those task will fail 
because of killing instead of "different barrier sync types".

Besides, we don't handle the current request which is being handling 
properly as it will fail due to epoch mismatch instead of "different barrier 
sync types".

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Updated a existed test.

Closes #28462 from Ngone51/impr_barrier_req.

Authored-by: yi.wu 
Signed-off-by: Xingbo Jiang 
---
 .../org/apache/spark/BarrierCoordinator.scala  | 30 +++---
 .../spark/scheduler/BarrierTaskContextSuite.scala  |  5 +---
 2 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala 
b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala
index 5663055..04faf7f 100644
--- a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala
+++ b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala
@@ -21,7 +21,7 @@ import java.util.{Timer, TimerTask}
 import java.util.concurrent.ConcurrentHashMap
 import java.util.function.Consumer
 
-import scala.collection.mutable.ArrayBuffer
+import scala.collection.mutable.{ArrayBuffer, HashSet}
 
 import org.apache.spark.internal.Logging
 import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint}
@@ -106,9 +106,11 @@ private[spark] class BarrierCoordinator(
 // The messages will be replied to all tasks once sync finished.
 private val messages = Array.ofDim[String](numTasks)
 
-// The request method which is called inside this barrier sync. All tasks 
should make sure
-// that they're calling the same method within the same barrier sync phase.
-private var requestMethod: RequestMethod.Value = _
+// Request methods collected from tasks inside this barrier sync. All 
tasks should make sure
+// that they're calling the same method within the same barrier sync 
phase. In other words,
+// the size of requestMethods should always be 1 for a legitimate barrier 
sync. Otherwise,
+// the barrier sync would fail if the size of requestMethods becomes 
greater than 1.
+private val requestMethods = new HashSet[RequestMethod.Value]
 
 // A timer task that ensures we may timeout for a barrier() call.
 private var timerTask: TimerTask = null
@@ -141,17 +143,14 @@ private[spark] class BarrierCoordinator(
   val taskId = request.taskAttemptId
   val epoch = request.barrierEpoch
   val curReqMethod = request.requestMethod
-
-  if (requesters.isEmpty) {
-requestMethod = curReqMethod
-  } else if (requestMethod != curReqMethod) {
-requesters.foreach(
-  _.sendFailure(new SparkException(s"$barrierId tried to use 
requestMethod " +
-s"`$curReqMethod` during barrier epoch $barrierEpoch, which does 
not match " +
-s"the current synchronized requestMethod `$requestMethod`"
-  ))
-)
-cleanupBarrierStage(barrierId)
+  requestMethods.add(curReqMethod)
+  if (requestMethods.size > 1) {
+val error = new SparkException(s"Different barrier sync types found 
for the " +
+  s"sync $barrierId: ${requestMethods.mkString(", ")}. Please use the 
" +
+  s"same barrier sync type within a single sync.")
+(requesters :+ requester).foreach(_.sendFailure(error))
+clear()
+return
   }
 
   // Require the number of tasks is correctly set from the 
BarrierTaskContext.
@@ -184,6 +183,7 @@ private[spark] class BarrierCoordinator(

[spark] branch master updated: [SPARK-31253][SQL][FOLLOW-UP] simplify the code of calculating size metrics of AQE shuffle

2020-04-17 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new db7b865  [SPARK-31253][SQL][FOLLOW-UP] simplify the code of 
calculating size metrics of AQE shuffle
db7b865 is described below

commit db7b8651a19d5a749a9f0b4e8fb517e6994921c2
Author: Wenchen Fan 
AuthorDate: Fri Apr 17 13:20:34 2020 -0700

[SPARK-31253][SQL][FOLLOW-UP] simplify the code of calculating size metrics 
of AQE shuffle

### What changes were proposed in this pull request?

A followup of https://github.com/apache/spark/pull/28175:
1. use mutable collection to store the driver metrics
2. don't send size metrics if there is no map stats, as UI will display 
size as 0 if there is no data
3. calculate partition data size separately, to make the code easier to 
read.

### Why are the changes needed?

code simplification

### Does this PR introduce any user-facing change?

no

### How was this patch tested?

existing tests

Closes #28240 from cloud-fan/refactor.

Authored-by: Wenchen Fan 
Signed-off-by: Xingbo Jiang 
---
 .../adaptive/CustomShuffleReaderExec.scala | 50 ++
 1 file changed, 22 insertions(+), 28 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala
index 68f20bc..6450d49 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala
@@ -97,14 +97,27 @@ case class CustomShuffleReaderExec private(
 case _ => None
   }
 
+  @transient private lazy val partitionDataSizes: Option[Seq[Long]] = {
+if (!isLocalReader && shuffleStage.get.mapStats.isDefined) {
+  val bytesByPartitionId = shuffleStage.get.mapStats.get.bytesByPartitionId
+  Some(partitionSpecs.map {
+case CoalescedPartitionSpec(startReducerIndex, endReducerIndex) =>
+  startReducerIndex.until(endReducerIndex).map(bytesByPartitionId).sum
+case p: PartialReducerPartitionSpec => p.dataSize
+case p => throw new IllegalStateException("unexpected " + p)
+  })
+} else {
+  None
+}
+  }
+
   private def sendDriverMetrics(): Unit = {
 val executionId = 
sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY)
-var driverAccumUpdates: Seq[(Long, Long)] = Seq.empty
+val driverAccumUpdates = ArrayBuffer.empty[(Long, Long)]
 
 val numPartitionsMetric = metrics("numPartitions")
 numPartitionsMetric.set(partitionSpecs.length)
-driverAccumUpdates = driverAccumUpdates :+
-  (numPartitionsMetric.id, partitionSpecs.length.toLong)
+driverAccumUpdates += (numPartitionsMetric.id -> 
partitionSpecs.length.toLong)
 
 if (hasSkewedPartition) {
   val skewedMetric = metrics("numSkewedPartitions")
@@ -112,33 +125,14 @@ case class CustomShuffleReaderExec private(
 case p: PartialReducerPartitionSpec => p.reducerIndex
   }.distinct.length
   skewedMetric.set(numSkewedPartitions)
-  driverAccumUpdates = driverAccumUpdates :+ (skewedMetric.id, 
numSkewedPartitions.toLong)
+  driverAccumUpdates += (skewedMetric.id -> numSkewedPartitions.toLong)
 }
 
-if(!isLocalReader) {
-  val partitionMetrics = metrics("partitionDataSize")
-  val mapStats = shuffleStage.get.mapStats
-
-  if (mapStats.isEmpty) {
-partitionMetrics.set(0)
-driverAccumUpdates = driverAccumUpdates :+ (partitionMetrics.id, 0L)
-  } else {
-var sum = 0L
-partitionSpecs.foreach {
-  case CoalescedPartitionSpec(startReducerIndex, endReducerIndex) =>
-val dataSize = startReducerIndex.until(endReducerIndex).map(
-  mapStats.get.bytesByPartitionId(_)).sum
-driverAccumUpdates = driverAccumUpdates :+ (partitionMetrics.id, 
dataSize)
-sum += dataSize
-  case p: PartialReducerPartitionSpec =>
-driverAccumUpdates = driverAccumUpdates :+ (partitionMetrics.id, 
p.dataSize)
-sum += p.dataSize
-  case p => throw new IllegalStateException("unexpected " + p)
-}
-
-// Set sum value to "partitionDataSize" metric.
-partitionMetrics.set(sum)
-  }
+partitionDataSizes.foreach { dataSizes =>
+  val partitionDataSizeMetrics = metrics("partitionDataSize")
+  driverAccumUpdates ++= dataSizes.map(partitionDataSizeMetrics.id -> _)
+  // Set sum value to "partitionDataSize" metric.
+

[spark] branch master updated (fab4ca5 -> b2e9e17)

2020-04-16 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fab4ca5  [SPARK-31450][SQL] Make ExpressionEncoder thread-safe
 add b2e9e17  [SPARK-31344][CORE] Polish implementation of barrier() and 
allGather()

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/BarrierCoordinator.scala  | 114 +
 .../org/apache/spark/BarrierTaskContext.scala  |  59 ++-
 .../org/apache/spark/api/python/PythonRunner.scala |  17 +--
 .../spark/scheduler/BarrierTaskContextSuite.scala  |   1 -
 python/pyspark/taskcontext.py  |  15 ++-
 5 files changed, 46 insertions(+), 160 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (fab4ca5 -> b2e9e17)

2020-04-16 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fab4ca5  [SPARK-31450][SQL] Make ExpressionEncoder thread-safe
 add b2e9e17  [SPARK-31344][CORE] Polish implementation of barrier() and 
allGather()

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/BarrierCoordinator.scala  | 114 +
 .../org/apache/spark/BarrierTaskContext.scala  |  59 ++-
 .../org/apache/spark/api/python/PythonRunner.scala |  17 +--
 .../spark/scheduler/BarrierTaskContextSuite.scala  |   1 -
 python/pyspark/taskcontext.py  |  15 ++-
 5 files changed, 46 insertions(+), 160 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31018][CORE][DOCS] Deprecate support of multiple workers on the same host in Standalone

2020-04-15 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new d286db1  [SPARK-31018][CORE][DOCS] Deprecate support of multiple 
workers on the same host in Standalone
d286db1 is described below

commit d286db145433d6d7610c69980512369f389930ca
Author: yi.wu 
AuthorDate: Wed Apr 15 11:29:55 2020 -0700

[SPARK-31018][CORE][DOCS] Deprecate support of multiple workers on the same 
host in Standalone

### What changes were proposed in this pull request?

Update the document and shell script to warn user about the deprecation of 
multiple workers on the same host support.

### Why are the changes needed?

This is a sub-task of 
[SPARK-30978](https://issues.apache.org/jira/browse/SPARK-30978), which plans 
to totally remove support of multiple workers in Spark 3.1. This PR makes the 
first step to deprecate it firstly in Spark 3.0.

### Does this PR introduce any user-facing change?

Yeah, user see warning when they run start worker script.

### How was this patch tested?

Tested manually.

Closes #27768 from Ngone51/deprecate_spark_worker_instances.

Authored-by: yi.wu 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit 0d4e4df06105cf2985dde17c1af76093b3ae8c13)
Signed-off-by: Xingbo Jiang 
---
 docs/core-migration-guide.md  | 2 ++
 docs/hardware-provisioning.md | 8 
 sbin/start-slave.sh   | 2 +-
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/docs/core-migration-guide.md b/docs/core-migration-guide.md
index 66a489b..cde6e07 100644
--- a/docs/core-migration-guide.md
+++ b/docs/core-migration-guide.md
@@ -38,3 +38,5 @@ license: |
 - Event log file will be written as UTF-8 encoding, and Spark History Server 
will replay event log files as UTF-8 encoding. Previously Spark wrote the event 
log file as default charset of driver JVM process, so Spark History Server of 
Spark 2.x is needed to read the old event log files in case of incompatible 
encoding.
 
 - A new protocol for fetching shuffle blocks is used. It's recommended that 
external shuffle services be upgraded when running Spark 3.0 apps. You can 
still use old external shuffle services by setting the configuration 
`spark.shuffle.useOldFetchProtocol` to `true`. Otherwise, Spark may run into 
errors with messages like `IllegalArgumentException: Unexpected message type: 
`.
+
+- `SPARK_WORKER_INSTANCES` is deprecated in Standalone mode. It's recommended 
to launch multiple executors in one worker and launch one worker per node 
instead of launching multiple workers per node and launching one executor per 
worker.
diff --git a/docs/hardware-provisioning.md b/docs/hardware-provisioning.md
index 4e5d681..fc87995f 100644
--- a/docs/hardware-provisioning.md
+++ b/docs/hardware-provisioning.md
@@ -63,10 +63,10 @@ Note that memory usage is greatly affected by storage level 
and serialization fo
 the [tuning guide](tuning.html) for tips on how to reduce it.
 
 Finally, note that the Java VM does not always behave well with more than 200 
GiB of RAM. If you
-purchase machines with more RAM than this, you can run _multiple worker JVMs 
per node_. In
-Spark's [standalone mode](spark-standalone.html), you can set the number of 
workers per node
-with the `SPARK_WORKER_INSTANCES` variable in `conf/spark-env.sh`, and the 
number of cores
-per worker with `SPARK_WORKER_CORES`.
+purchase machines with more RAM than this, you can launch multiple executors 
in a single node. In
+Spark's [standalone mode](spark-standalone.html), a worker is responsible for 
launching multiple
+executors according to its available memory and cores, and each executor will 
be launched in a
+separate Java VM.
 
 # Network
 
diff --git a/sbin/start-slave.sh b/sbin/start-slave.sh
index 2cb17a0..9b3b26b 100755
--- a/sbin/start-slave.sh
+++ b/sbin/start-slave.sh
@@ -22,7 +22,7 @@
 # Environment Variables
 #
 #   SPARK_WORKER_INSTANCES  The number of worker instances to run on this
-#   slave.  Default is 1.
+#   slave.  Default is 1. Note it has been deprecate 
since Spark 3.0.
 #   SPARK_WORKER_PORT   The base port number for the first worker. If set,
 #   subsequent workers will increment this number.  If
 #   unset, Spark will find a valid port number, but


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31018][CORE][DOCS] Deprecate support of multiple workers on the same host in Standalone

2020-04-15 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new d286db1  [SPARK-31018][CORE][DOCS] Deprecate support of multiple 
workers on the same host in Standalone
d286db1 is described below

commit d286db145433d6d7610c69980512369f389930ca
Author: yi.wu 
AuthorDate: Wed Apr 15 11:29:55 2020 -0700

[SPARK-31018][CORE][DOCS] Deprecate support of multiple workers on the same 
host in Standalone

### What changes were proposed in this pull request?

Update the document and shell script to warn user about the deprecation of 
multiple workers on the same host support.

### Why are the changes needed?

This is a sub-task of 
[SPARK-30978](https://issues.apache.org/jira/browse/SPARK-30978), which plans 
to totally remove support of multiple workers in Spark 3.1. This PR makes the 
first step to deprecate it firstly in Spark 3.0.

### Does this PR introduce any user-facing change?

Yeah, user see warning when they run start worker script.

### How was this patch tested?

Tested manually.

Closes #27768 from Ngone51/deprecate_spark_worker_instances.

Authored-by: yi.wu 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit 0d4e4df06105cf2985dde17c1af76093b3ae8c13)
Signed-off-by: Xingbo Jiang 
---
 docs/core-migration-guide.md  | 2 ++
 docs/hardware-provisioning.md | 8 
 sbin/start-slave.sh   | 2 +-
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/docs/core-migration-guide.md b/docs/core-migration-guide.md
index 66a489b..cde6e07 100644
--- a/docs/core-migration-guide.md
+++ b/docs/core-migration-guide.md
@@ -38,3 +38,5 @@ license: |
 - Event log file will be written as UTF-8 encoding, and Spark History Server 
will replay event log files as UTF-8 encoding. Previously Spark wrote the event 
log file as default charset of driver JVM process, so Spark History Server of 
Spark 2.x is needed to read the old event log files in case of incompatible 
encoding.
 
 - A new protocol for fetching shuffle blocks is used. It's recommended that 
external shuffle services be upgraded when running Spark 3.0 apps. You can 
still use old external shuffle services by setting the configuration 
`spark.shuffle.useOldFetchProtocol` to `true`. Otherwise, Spark may run into 
errors with messages like `IllegalArgumentException: Unexpected message type: 
`.
+
+- `SPARK_WORKER_INSTANCES` is deprecated in Standalone mode. It's recommended 
to launch multiple executors in one worker and launch one worker per node 
instead of launching multiple workers per node and launching one executor per 
worker.
diff --git a/docs/hardware-provisioning.md b/docs/hardware-provisioning.md
index 4e5d681..fc87995f 100644
--- a/docs/hardware-provisioning.md
+++ b/docs/hardware-provisioning.md
@@ -63,10 +63,10 @@ Note that memory usage is greatly affected by storage level 
and serialization fo
 the [tuning guide](tuning.html) for tips on how to reduce it.
 
 Finally, note that the Java VM does not always behave well with more than 200 
GiB of RAM. If you
-purchase machines with more RAM than this, you can run _multiple worker JVMs 
per node_. In
-Spark's [standalone mode](spark-standalone.html), you can set the number of 
workers per node
-with the `SPARK_WORKER_INSTANCES` variable in `conf/spark-env.sh`, and the 
number of cores
-per worker with `SPARK_WORKER_CORES`.
+purchase machines with more RAM than this, you can launch multiple executors 
in a single node. In
+Spark's [standalone mode](spark-standalone.html), a worker is responsible for 
launching multiple
+executors according to its available memory and cores, and each executor will 
be launched in a
+separate Java VM.
 
 # Network
 
diff --git a/sbin/start-slave.sh b/sbin/start-slave.sh
index 2cb17a0..9b3b26b 100755
--- a/sbin/start-slave.sh
+++ b/sbin/start-slave.sh
@@ -22,7 +22,7 @@
 # Environment Variables
 #
 #   SPARK_WORKER_INSTANCES  The number of worker instances to run on this
-#   slave.  Default is 1.
+#   slave.  Default is 1. Note it has been deprecate 
since Spark 3.0.
 #   SPARK_WORKER_PORT   The base port number for the first worker. If set,
 #   subsequent workers will increment this number.  If
 #   unset, Spark will find a valid port number, but


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (2b10d70 -> 0d4e4df)

2020-04-15 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2b10d70  [SPARK-31423][SQL] Fix rebasing of not-existed dates
 add 0d4e4df  [SPARK-31018][CORE][DOCS] Deprecate support of multiple 
workers on the same host in Standalone

No new revisions were added by this update.

Summary of changes:
 docs/core-migration-guide.md  | 2 ++
 docs/hardware-provisioning.md | 8 
 sbin/start-slave.sh   | 2 +-
 3 files changed, 7 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-31018][CORE][DOCS] Deprecate support of multiple workers on the same host in Standalone

2020-04-15 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0d4e4df  [SPARK-31018][CORE][DOCS] Deprecate support of multiple 
workers on the same host in Standalone
0d4e4df is described below

commit 0d4e4df06105cf2985dde17c1af76093b3ae8c13
Author: yi.wu 
AuthorDate: Wed Apr 15 11:29:55 2020 -0700

[SPARK-31018][CORE][DOCS] Deprecate support of multiple workers on the same 
host in Standalone

### What changes were proposed in this pull request?

Update the document and shell script to warn user about the deprecation of 
multiple workers on the same host support.

### Why are the changes needed?

This is a sub-task of 
[SPARK-30978](https://issues.apache.org/jira/browse/SPARK-30978), which plans 
to totally remove support of multiple workers in Spark 3.1. This PR makes the 
first step to deprecate it firstly in Spark 3.0.

### Does this PR introduce any user-facing change?

Yeah, user see warning when they run start worker script.

### How was this patch tested?

Tested manually.

Closes #27768 from Ngone51/deprecate_spark_worker_instances.

Authored-by: yi.wu 
Signed-off-by: Xingbo Jiang 
---
 docs/core-migration-guide.md  | 2 ++
 docs/hardware-provisioning.md | 8 
 sbin/start-slave.sh   | 2 +-
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/docs/core-migration-guide.md b/docs/core-migration-guide.md
index 66a489b..cde6e07 100644
--- a/docs/core-migration-guide.md
+++ b/docs/core-migration-guide.md
@@ -38,3 +38,5 @@ license: |
 - Event log file will be written as UTF-8 encoding, and Spark History Server 
will replay event log files as UTF-8 encoding. Previously Spark wrote the event 
log file as default charset of driver JVM process, so Spark History Server of 
Spark 2.x is needed to read the old event log files in case of incompatible 
encoding.
 
 - A new protocol for fetching shuffle blocks is used. It's recommended that 
external shuffle services be upgraded when running Spark 3.0 apps. You can 
still use old external shuffle services by setting the configuration 
`spark.shuffle.useOldFetchProtocol` to `true`. Otherwise, Spark may run into 
errors with messages like `IllegalArgumentException: Unexpected message type: 
`.
+
+- `SPARK_WORKER_INSTANCES` is deprecated in Standalone mode. It's recommended 
to launch multiple executors in one worker and launch one worker per node 
instead of launching multiple workers per node and launching one executor per 
worker.
diff --git a/docs/hardware-provisioning.md b/docs/hardware-provisioning.md
index 4e5d681..fc87995f 100644
--- a/docs/hardware-provisioning.md
+++ b/docs/hardware-provisioning.md
@@ -63,10 +63,10 @@ Note that memory usage is greatly affected by storage level 
and serialization fo
 the [tuning guide](tuning.html) for tips on how to reduce it.
 
 Finally, note that the Java VM does not always behave well with more than 200 
GiB of RAM. If you
-purchase machines with more RAM than this, you can run _multiple worker JVMs 
per node_. In
-Spark's [standalone mode](spark-standalone.html), you can set the number of 
workers per node
-with the `SPARK_WORKER_INSTANCES` variable in `conf/spark-env.sh`, and the 
number of cores
-per worker with `SPARK_WORKER_CORES`.
+purchase machines with more RAM than this, you can launch multiple executors 
in a single node. In
+Spark's [standalone mode](spark-standalone.html), a worker is responsible for 
launching multiple
+executors according to its available memory and cores, and each executor will 
be launched in a
+separate Java VM.
 
 # Network
 
diff --git a/sbin/start-slave.sh b/sbin/start-slave.sh
index 2cb17a0..9b3b26b 100755
--- a/sbin/start-slave.sh
+++ b/sbin/start-slave.sh
@@ -22,7 +22,7 @@
 # Environment Variables
 #
 #   SPARK_WORKER_INSTANCES  The number of worker instances to run on this
-#   slave.  Default is 1.
+#   slave.  Default is 1. Note it has been deprecate 
since Spark 3.0.
 #   SPARK_WORKER_PORT   The base port number for the first worker. If set,
 #   subsequent workers will increment this number.  If
 #   unset, Spark will find a valid port number, but


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31052][TEST][CORE] Fix flaky test "DAGSchedulerSuite.shuffle fetch failed on speculative task, but original task succeed"

2020-03-05 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 80a8947  [SPARK-31052][TEST][CORE] Fix flaky test 
"DAGSchedulerSuite.shuffle fetch failed on speculative task, but original task 
succeed"
80a8947 is described below

commit 80a894785121777aedfe340d0acacbbd17630cb3
Author: yi.wu 
AuthorDate: Thu Mar 5 10:56:49 2020 -0800

[SPARK-31052][TEST][CORE] Fix flaky test "DAGSchedulerSuite.shuffle fetch 
failed on speculative task, but original task succeed"

### What changes were proposed in this pull request?

This PR fix the flaky test in #27050.

### Why are the changes needed?

`SparkListenerStageCompleted` is posted by `listenerBus` asynchronously. 
So, we should make sure listener has consumed the event before asserting 
completed stages.

See [error 
message](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119308/testReport/org.apache.spark.scheduler/DAGSchedulerSuite/shuffle_fetch_failed_on_speculative_task__but_original_task_succeed__SPARK_30388_/):

```
sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 
List(0, 1, 1) did not equal List(0, 1, 1, 0)
at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
at 
org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503)
at 
org.apache.spark.scheduler.DAGSchedulerSuite.$anonfun$new$88(DAGSchedulerSuite.scala:1976)
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Update test and test locally by no failure after running hundreds of times. 
Note, the failure is easy to reproduce when loop running the test for hundreds 
of times(e.g 200)

Closes #27809 from Ngone51/fix_flaky_spark_30388.

Authored-by: yi.wu 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit 8d5ef2f766166cce3cc7a15a98ec016050ede4d8)
Signed-off-by: Xingbo Jiang 
---
 .../test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala| 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
index 72a2e4c..02bc216 100644
--- a/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
@@ -1931,7 +1931,7 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with TimeLi
 assertDataStructuresEmpty()
   }
 
-  test("shuffle fetch failed on speculative task, but original task succeed 
(SPARK-30388)") {
+  test("SPARK-30388: shuffle fetch failed on speculative task, but original 
task succeed") {
 var completedStage: List[Int] = Nil
 val listener = new SparkListener() {
   override def onStageCompleted(event: SparkListenerStageCompleted): Unit 
= {
@@ -1945,6 +1945,7 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with TimeLi
 val reduceRdd = new MyRDD(sc, 2, List(shuffleDep))
 submit(reduceRdd, Array(0, 1))
 completeShuffleMapStageSuccessfully(0, 0, 2)
+sc.listenerBus.waitUntilEmpty()
 assert(completedStage === List(0))
 
 // result task 0.0 succeed
@@ -1960,6 +1961,7 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with TimeLi
 info
   )
 )
+sc.listenerBus.waitUntilEmpty()
 assert(completedStage === List(0, 1))
 
 Thread.sleep(DAGScheduler.RESUBMIT_TIMEOUT * 2)
@@ -1971,6 +1973,7 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with TimeLi
 
 // original result task 1.0 succeed
 runEvent(makeCompletionEvent(taskSets(1).tasks(1), Success, 42))
+sc.listenerBus.waitUntilEmpty()
 assert(completedStage === List(0, 1, 1, 0))
 assert(scheduler.activeJobs.isEmpty)
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-31052][TEST][CORE] Fix flaky test "DAGSchedulerSuite.shuffle fetch failed on speculative task, but original task succeed"

2020-03-05 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8d5ef2f  [SPARK-31052][TEST][CORE] Fix flaky test 
"DAGSchedulerSuite.shuffle fetch failed on speculative task, but original task 
succeed"
8d5ef2f is described below

commit 8d5ef2f766166cce3cc7a15a98ec016050ede4d8
Author: yi.wu 
AuthorDate: Thu Mar 5 10:56:49 2020 -0800

[SPARK-31052][TEST][CORE] Fix flaky test "DAGSchedulerSuite.shuffle fetch 
failed on speculative task, but original task succeed"

### What changes were proposed in this pull request?

This PR fix the flaky test in #27050.

### Why are the changes needed?

`SparkListenerStageCompleted` is posted by `listenerBus` asynchronously. 
So, we should make sure listener has consumed the event before asserting 
completed stages.

See [error 
message](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119308/testReport/org.apache.spark.scheduler/DAGSchedulerSuite/shuffle_fetch_failed_on_speculative_task__but_original_task_succeed__SPARK_30388_/):

```
sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 
List(0, 1, 1) did not equal List(0, 1, 1, 0)
at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
at 
org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503)
at 
org.apache.spark.scheduler.DAGSchedulerSuite.$anonfun$new$88(DAGSchedulerSuite.scala:1976)
```

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Update test and test locally by no failure after running hundreds of times. 
Note, the failure is easy to reproduce when loop running the test for hundreds 
of times(e.g 200)

Closes #27809 from Ngone51/fix_flaky_spark_30388.

Authored-by: yi.wu 
Signed-off-by: Xingbo Jiang 
---
 .../test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala| 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
index 2b2fd32..4486389 100644
--- a/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala
@@ -1933,7 +1933,7 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with TimeLi
 assertDataStructuresEmpty()
   }
 
-  test("shuffle fetch failed on speculative task, but original task succeed 
(SPARK-30388)") {
+  test("SPARK-30388: shuffle fetch failed on speculative task, but original 
task succeed") {
 var completedStage: List[Int] = Nil
 val listener = new SparkListener() {
   override def onStageCompleted(event: SparkListenerStageCompleted): Unit 
= {
@@ -1947,6 +1947,7 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with TimeLi
 val reduceRdd = new MyRDD(sc, 2, List(shuffleDep))
 submit(reduceRdd, Array(0, 1))
 completeShuffleMapStageSuccessfully(0, 0, 2)
+sc.listenerBus.waitUntilEmpty()
 assert(completedStage === List(0))
 
 // result task 0.0 succeed
@@ -1962,6 +1963,7 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with TimeLi
 info
   )
 )
+sc.listenerBus.waitUntilEmpty()
 assert(completedStage === List(0, 1))
 
 Thread.sleep(DAGScheduler.RESUBMIT_TIMEOUT * 2)
@@ -1973,6 +1975,7 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with TimeLi
 
 // original result task 1.0 succeed
 runEvent(makeCompletionEvent(taskSets(1).tasks(1), Success, 42))
+sc.listenerBus.waitUntilEmpty()
 assert(completedStage === List(0, 1, 1, 0))
 assert(scheduler.activeJobs.isEmpty)
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-30969][CORE] Remove resource coordination support from Standalone

2020-03-02 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 148262f  [SPARK-30969][CORE] Remove resource coordination support from 
Standalone
148262f is described below

commit 148262f3fbdf7c5da7cd147cf43bf5ebab5f5244
Author: yi.wu 
AuthorDate: Mon Mar 2 11:23:07 2020 -0800

[SPARK-30969][CORE] Remove resource coordination support from Standalone

### What changes were proposed in this pull request?

Remove automatically resource coordination support from Standalone.

### Why are the changes needed?

Resource coordination is mainly designed for the scenario where multiple 
workers launched on the same host. However, it's, actually, a non-existed  
scenario for today's Spark. Because, Spark now can start multiple executors in 
a single Worker, while it only allow one executor per Worker at very beginning. 
So, now, it really help nothing for user to launch multiple workers on the same 
host. Thus, it's not worth for us to bring over complicated implementation and 
potential high maintain [...]

### Does this PR introduce any user-facing change?

No, it's Spark 3.0 feature.

### How was this patch tested?

Pass Jenkins.

Closes #27722 from Ngone51/abandon_coordination.

Authored-by: yi.wu 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit b517f991fe0c95a186872d38be6a2091d9326195)
Signed-off-by: Xingbo Jiang 
---
 .gitignore |   1 -
 .../main/scala/org/apache/spark/SparkContext.scala |  25 +-
 .../spark/deploy/StandaloneResourceUtils.scala | 263 +
 .../org/apache/spark/deploy/worker/Worker.scala|  27 +--
 .../org/apache/spark/internal/config/package.scala |  17 --
 .../main/scala/org/apache/spark/util/Utils.scala   |  22 --
 .../scala/org/apache/spark/SparkContextSuite.scala |   6 +-
 .../apache/spark/deploy/worker/WorkerSuite.scala   |  74 +-
 .../resource/ResourceDiscoveryPluginSuite.scala|   4 -
 docs/configuration.md  |  27 +--
 docs/spark-standalone.md   |  10 +-
 11 files changed, 17 insertions(+), 459 deletions(-)

diff --git a/.gitignore b/.gitignore
index 798e8ac..198fdee 100644
--- a/.gitignore
+++ b/.gitignore
@@ -72,7 +72,6 @@ scalastyle-on-compile.generated.xml
 scalastyle-output.xml
 scalastyle.txt
 spark-*-bin-*.tgz
-spark-resources/
 spark-tests.log
 src_managed/
 streaming-tests.log
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index 91188d5..bcbb7e4 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -41,7 +41,6 @@ import org.apache.hadoop.mapreduce.lib.input.{FileInputFormat 
=> NewFileInputFor
 import org.apache.spark.annotation.DeveloperApi
 import org.apache.spark.broadcast.Broadcast
 import org.apache.spark.deploy.{LocalSparkCluster, SparkHadoopUtil}
-import org.apache.spark.deploy.StandaloneResourceUtils._
 import org.apache.spark.executor.{ExecutorMetrics, ExecutorMetricsSource}
 import org.apache.spark.input.{FixedLengthBinaryInputFormat, 
PortableDataStream, StreamInputFormat, WholeTextFileInputFormat}
 import org.apache.spark.internal.Logging
@@ -250,15 +249,6 @@ class SparkContext(config: SparkConf) extends Logging {
 
   def isLocal: Boolean = Utils.isLocalMaster(_conf)
 
-  private def isClientStandalone: Boolean = {
-val isSparkCluster = master match {
-  case SparkMasterRegex.SPARK_REGEX(_) => true
-  case SparkMasterRegex.LOCAL_CLUSTER_REGEX(_, _, _) => true
-  case _ => false
-}
-deployMode == "client" && isSparkCluster
-  }
-
   /**
* @return true if context is stopped or in the midst of stopping.
*/
@@ -396,17 +386,7 @@ class SparkContext(config: SparkConf) extends Logging {
 _driverLogger = DriverLogger(_conf)
 
 val resourcesFileOpt = conf.get(DRIVER_RESOURCES_FILE)
-val allResources = getOrDiscoverAllResources(_conf, SPARK_DRIVER_PREFIX, 
resourcesFileOpt)
-_resources = {
-  // driver submitted in client mode under Standalone may have conflicting 
resources with
-  // other drivers/workers on this host. We should sync driver's resources 
info into
-  // SPARK_RESOURCES/SPARK_RESOURCES_COORDINATE_DIR/ to avoid collision.
-  if (isClientStandalone) {
-acquireResources(_conf, SPARK_DRIVER_PREFIX, allResources, 
Utils.getProcessId)
-  } else {
-allResources
-  }
-}
+_resources = getOrDiscoverAllResources(_conf, SPARK_DRIVER_PREFIX, 
resourcesFileOpt)
 logResourceInfo(SPARK_DRIVER_PREFIX, _resources)
 
 // log out spark.app.na

[spark] branch branch-3.0 updated: [SPARK-30969][CORE] Remove resource coordination support from Standalone

2020-03-02 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 148262f  [SPARK-30969][CORE] Remove resource coordination support from 
Standalone
148262f is described below

commit 148262f3fbdf7c5da7cd147cf43bf5ebab5f5244
Author: yi.wu 
AuthorDate: Mon Mar 2 11:23:07 2020 -0800

[SPARK-30969][CORE] Remove resource coordination support from Standalone

### What changes were proposed in this pull request?

Remove automatically resource coordination support from Standalone.

### Why are the changes needed?

Resource coordination is mainly designed for the scenario where multiple 
workers launched on the same host. However, it's, actually, a non-existed  
scenario for today's Spark. Because, Spark now can start multiple executors in 
a single Worker, while it only allow one executor per Worker at very beginning. 
So, now, it really help nothing for user to launch multiple workers on the same 
host. Thus, it's not worth for us to bring over complicated implementation and 
potential high maintain [...]

### Does this PR introduce any user-facing change?

No, it's Spark 3.0 feature.

### How was this patch tested?

Pass Jenkins.

Closes #27722 from Ngone51/abandon_coordination.

Authored-by: yi.wu 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit b517f991fe0c95a186872d38be6a2091d9326195)
Signed-off-by: Xingbo Jiang 
---
 .gitignore |   1 -
 .../main/scala/org/apache/spark/SparkContext.scala |  25 +-
 .../spark/deploy/StandaloneResourceUtils.scala | 263 +
 .../org/apache/spark/deploy/worker/Worker.scala|  27 +--
 .../org/apache/spark/internal/config/package.scala |  17 --
 .../main/scala/org/apache/spark/util/Utils.scala   |  22 --
 .../scala/org/apache/spark/SparkContextSuite.scala |   6 +-
 .../apache/spark/deploy/worker/WorkerSuite.scala   |  74 +-
 .../resource/ResourceDiscoveryPluginSuite.scala|   4 -
 docs/configuration.md  |  27 +--
 docs/spark-standalone.md   |  10 +-
 11 files changed, 17 insertions(+), 459 deletions(-)

diff --git a/.gitignore b/.gitignore
index 798e8ac..198fdee 100644
--- a/.gitignore
+++ b/.gitignore
@@ -72,7 +72,6 @@ scalastyle-on-compile.generated.xml
 scalastyle-output.xml
 scalastyle.txt
 spark-*-bin-*.tgz
-spark-resources/
 spark-tests.log
 src_managed/
 streaming-tests.log
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index 91188d5..bcbb7e4 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -41,7 +41,6 @@ import org.apache.hadoop.mapreduce.lib.input.{FileInputFormat 
=> NewFileInputFor
 import org.apache.spark.annotation.DeveloperApi
 import org.apache.spark.broadcast.Broadcast
 import org.apache.spark.deploy.{LocalSparkCluster, SparkHadoopUtil}
-import org.apache.spark.deploy.StandaloneResourceUtils._
 import org.apache.spark.executor.{ExecutorMetrics, ExecutorMetricsSource}
 import org.apache.spark.input.{FixedLengthBinaryInputFormat, 
PortableDataStream, StreamInputFormat, WholeTextFileInputFormat}
 import org.apache.spark.internal.Logging
@@ -250,15 +249,6 @@ class SparkContext(config: SparkConf) extends Logging {
 
   def isLocal: Boolean = Utils.isLocalMaster(_conf)
 
-  private def isClientStandalone: Boolean = {
-val isSparkCluster = master match {
-  case SparkMasterRegex.SPARK_REGEX(_) => true
-  case SparkMasterRegex.LOCAL_CLUSTER_REGEX(_, _, _) => true
-  case _ => false
-}
-deployMode == "client" && isSparkCluster
-  }
-
   /**
* @return true if context is stopped or in the midst of stopping.
*/
@@ -396,17 +386,7 @@ class SparkContext(config: SparkConf) extends Logging {
 _driverLogger = DriverLogger(_conf)
 
 val resourcesFileOpt = conf.get(DRIVER_RESOURCES_FILE)
-val allResources = getOrDiscoverAllResources(_conf, SPARK_DRIVER_PREFIX, 
resourcesFileOpt)
-_resources = {
-  // driver submitted in client mode under Standalone may have conflicting 
resources with
-  // other drivers/workers on this host. We should sync driver's resources 
info into
-  // SPARK_RESOURCES/SPARK_RESOURCES_COORDINATE_DIR/ to avoid collision.
-  if (isClientStandalone) {
-acquireResources(_conf, SPARK_DRIVER_PREFIX, allResources, 
Utils.getProcessId)
-  } else {
-allResources
-  }
-}
+_resources = getOrDiscoverAllResources(_conf, SPARK_DRIVER_PREFIX, 
resourcesFileOpt)
 logResourceInfo(SPARK_DRIVER_PREFIX, _resources)
 
 // log out spark.app.na

[spark] branch master updated (ac12276 -> b517f99)

2020-03-02 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ac12276  [SPARK-30813][ML] Fix Matrices.sprand comments
 add b517f99  [SPARK-30969][CORE] Remove resource coordination support from 
Standalone

No new revisions were added by this update.

Summary of changes:
 .gitignore |   1 -
 .../main/scala/org/apache/spark/SparkContext.scala |  25 +-
 .../spark/deploy/StandaloneResourceUtils.scala | 263 +
 .../org/apache/spark/deploy/worker/Worker.scala|  27 +--
 .../org/apache/spark/internal/config/package.scala |  17 --
 .../main/scala/org/apache/spark/util/Utils.scala   |  22 --
 .../scala/org/apache/spark/SparkContextSuite.scala |   6 +-
 .../apache/spark/deploy/worker/WorkerSuite.scala   |  74 +-
 .../resource/ResourceDiscoveryPluginSuite.scala|   4 -
 docs/configuration.md  |  27 +--
 docs/spark-standalone.md   |  10 +-
 python/pyspark/tests/test_context.py   |   2 -
 python/pyspark/tests/test_taskcontext.py   |   2 -
 13 files changed, 17 insertions(+), 463 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ac12276 -> b517f99)

2020-03-02 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ac12276  [SPARK-30813][ML] Fix Matrices.sprand comments
 add b517f99  [SPARK-30969][CORE] Remove resource coordination support from 
Standalone

No new revisions were added by this update.

Summary of changes:
 .gitignore |   1 -
 .../main/scala/org/apache/spark/SparkContext.scala |  25 +-
 .../spark/deploy/StandaloneResourceUtils.scala | 263 +
 .../org/apache/spark/deploy/worker/Worker.scala|  27 +--
 .../org/apache/spark/internal/config/package.scala |  17 --
 .../main/scala/org/apache/spark/util/Utils.scala   |  22 --
 .../scala/org/apache/spark/SparkContextSuite.scala |   6 +-
 .../apache/spark/deploy/worker/WorkerSuite.scala   |  74 +-
 .../resource/ResourceDiscoveryPluginSuite.scala|   4 -
 docs/configuration.md  |  27 +--
 docs/spark-standalone.md   |  10 +-
 python/pyspark/tests/test_context.py   |   2 -
 python/pyspark/tests/test_taskcontext.py   |   2 -
 13 files changed, 17 insertions(+), 463 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-30987][CORE] Increase the timeout on local-cluster waitUntilExecutorsUp calls

2020-02-28 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 00e2bf8  [SPARK-30987][CORE] Increase the timeout on local-cluster 
waitUntilExecutorsUp calls
00e2bf8 is described below

commit 00e2bf8a9a96cf421fa5257f72d46334306b92fa
Author: Thomas Graves 
AuthorDate: Fri Feb 28 11:43:05 2020 -0800

[SPARK-30987][CORE] Increase the timeout on local-cluster 
waitUntilExecutorsUp calls

### What changes were proposed in this pull request?

The ResourceDiscoveryPlugin tests intermittently timeout. They are timing 
out on just bringing up the local-cluster. I am not able to reproduce locally.  
I suspect the jenkins boxes are overloaded and taking longer then 10 seconds. 
There was another jira SPARK-29139 that increased timeout for some other of 
these as well. So try increasing the timeout to 60 seconds.

Examples of timeouts:

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119030/testReport/

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119005/testReport/

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119029/testReport/

### Why are the changes needed?

tests should no longer intermittently fail.

### Does this PR introduce any user-facing change?

no
### How was this patch tested?

unit tests ran.

Closes #27738 from tgravescs/SPARK-30987.

Authored-by: Thomas Graves 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit 6c0c41fa0d1e119b16980405b5dc69b953380d7d)
Signed-off-by: Xingbo Jiang 
---
 core/src/test/scala/org/apache/spark/DistributedSuite.scala   | 2 +-
 .../org/apache/spark/internal/plugin/PluginContainerSuite.scala   | 4 ++--
 .../org/apache/spark/resource/ResourceDiscoveryPluginSuite.scala  | 8 
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/core/src/test/scala/org/apache/spark/DistributedSuite.scala 
b/core/src/test/scala/org/apache/spark/DistributedSuite.scala
index 3f30981..4d157b9 100644
--- a/core/src/test/scala/org/apache/spark/DistributedSuite.scala
+++ b/core/src/test/scala/org/apache/spark/DistributedSuite.scala
@@ -174,7 +174,7 @@ class DistributedSuite extends SparkFunSuite with Matchers 
with LocalSparkContex
 
   private def testCaching(conf: SparkConf, storageLevel: StorageLevel): Unit = 
{
 sc = new SparkContext(conf.setMaster(clusterUrl).setAppName("test"))
-TestUtils.waitUntilExecutorsUp(sc, 2, 3)
+TestUtils.waitUntilExecutorsUp(sc, 2, 6)
 val data = sc.parallelize(1 to 1000, 10)
 val cachedData = data.persist(storageLevel)
 assert(cachedData.count === 1000)
diff --git 
a/core/src/test/scala/org/apache/spark/internal/plugin/PluginContainerSuite.scala
 
b/core/src/test/scala/org/apache/spark/internal/plugin/PluginContainerSuite.scala
index cf2d929..7888796 100644
--- 
a/core/src/test/scala/org/apache/spark/internal/plugin/PluginContainerSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/internal/plugin/PluginContainerSuite.scala
@@ -139,7 +139,7 @@ class PluginContainerSuite extends SparkFunSuite with 
BeforeAndAfterEach with Lo
   .set(NonLocalModeSparkPlugin.TEST_PATH_CONF, path.getAbsolutePath())
 
 sc = new SparkContext(conf)
-TestUtils.waitUntilExecutorsUp(sc, 2, 1)
+TestUtils.waitUntilExecutorsUp(sc, 2, 6)
 
 eventually(timeout(10.seconds), interval(100.millis)) {
   val children = path.listFiles()
@@ -169,7 +169,7 @@ class PluginContainerSuite extends SparkFunSuite with 
BeforeAndAfterEach with Lo
   sc = new SparkContext(conf)
 
   // Ensure all executors has started
-  TestUtils.waitUntilExecutorsUp(sc, 1, 1)
+  TestUtils.waitUntilExecutorsUp(sc, 1, 6)
 
   var children = Array.empty[File]
   eventually(timeout(10.seconds), interval(100.millis)) {
diff --git 
a/core/src/test/scala/org/apache/spark/resource/ResourceDiscoveryPluginSuite.scala
 
b/core/src/test/scala/org/apache/spark/resource/ResourceDiscoveryPluginSuite.scala
index 7a05daa..437c903 100644
--- 
a/core/src/test/scala/org/apache/spark/resource/ResourceDiscoveryPluginSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/resource/ResourceDiscoveryPluginSuite.scala
@@ -56,7 +56,7 @@ class ResourceDiscoveryPluginSuite extends SparkFunSuite with 
LocalSparkContext
 .set(EXECUTOR_FPGA_ID.amountConf, "1")
 
   sc = new SparkContext(conf)
-  TestUtils.waitUntilExecutorsUp(sc, 2, 1)
+  TestUtils.waitUntilExecutorsUp(sc, 2, 6)
 
   eventually(timeout(10.seconds), interval(100.millis)) {
 val children = dir.listFiles()
@@ -84,7 +84,7 @@ class ResourceDiscoveryPluginSuite extends SparkFunSuite with 
LocalSparkContext
 .set(SPARK_RESOURCES_DIR, dir.ge

[spark] branch master updated (961c539 -> 6c0c41f)

2020-02-28 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 961c539  [SPARK-28998][SQL][FOLLOW-UP] Remove unnecessary MiMa excludes
 add 6c0c41f  [SPARK-30987][CORE] Increase the timeout on local-cluster 
waitUntilExecutorsUp calls

No new revisions were added by this update.

Summary of changes:
 core/src/test/scala/org/apache/spark/DistributedSuite.scala   | 2 +-
 .../org/apache/spark/internal/plugin/PluginContainerSuite.scala   | 4 ++--
 .../org/apache/spark/resource/ResourceDiscoveryPluginSuite.scala  | 8 
 3 files changed, 7 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a40a2f8 -> 3b69796)

2020-02-28 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a40a2f8  [SPARK-27619][SQL][FOLLOWUP] Rename 
'spark.sql.legacy.useHashOnMapType' to 'spark.sql.legacy.allowHashOnMapType'
 add 3b69796  [SPARK-30947][CORE] Log better message when accelerate 
resource is empty

No new revisions were added by this update.

Summary of changes:
 .../src/main/scala/org/apache/spark/resource/ResourceProfile.scala | 4 +++-
 core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala  | 7 ++-
 2 files changed, 9 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a40a2f8 -> 3b69796)

2020-02-28 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a40a2f8  [SPARK-27619][SQL][FOLLOWUP] Rename 
'spark.sql.legacy.useHashOnMapType' to 'spark.sql.legacy.allowHashOnMapType'
 add 3b69796  [SPARK-30947][CORE] Log better message when accelerate 
resource is empty

No new revisions were added by this update.

Summary of changes:
 .../src/main/scala/org/apache/spark/resource/ResourceProfile.scala | 4 +++-
 core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala  | 7 ++-
 2 files changed, 9 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

2020-02-21 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new a5b3377  [SPARK-30667][CORE] Add all gather method to 
BarrierTaskContext
a5b3377 is described below

commit a5b3377f795a76fe218e7c9c0dafc0cbcf9a80a6
Author: sarthfrey-db 
AuthorDate: Fri Feb 21 11:40:28 2020 -0800

[SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Fix for #27395

### What changes were proposed in this pull request?

The `allGather` method is added to the `BarrierTaskContext`. This method 
contains the same functionality as the `BarrierTaskContext.barrier` method; it 
blocks the task until all tasks make the call, at which time they may continue 
execution. In addition, the `allGather` method takes an input message. Upon 
returning from the `allGather` the task receives a list of all the messages 
sent by all the tasks that made the `allGather` call.

### Why are the changes needed?

There are many situations where having the tasks communicate in a 
synchronized way is useful. One simple example is if each task needs to start a 
server to serve requests from one another; first the tasks must find a free 
port (the result of which is undetermined beforehand) and then start making 
requests, but to do so they each must know the port chosen by the other task. 
An `allGather` method would allow them to inform each other of the port they 
will run on.

### Does this PR introduce any user-facing change?

Yes, an `BarrierTaskContext.allGather` method will be available through the 
Scala, Java, and Python APIs.

### How was this patch tested?

Most of the code path is already covered by tests to the `barrier` method, 
since this PR includes a refactor so that much code is shared by the `barrier` 
and `allGather` methods. However, a test is added to assert that an all gather 
on each tasks partition ID will return a list of every partition ID.

An example through the Python API:
```python
>>> from pyspark import BarrierTaskContext
>>>
>>> def f(iterator):
... context = BarrierTaskContext.get()
... return [context.allGather('{}'.format(context.partitionId()))]
...
>>> sc.parallelize(range(4), 4).barrier().mapPartitions(f).collect()[0]
[u'3', u'1', u'0', u'2']
```

Closes #27640 from sarthfrey/master.

Lead-authored-by: sarthfrey-db 
Co-authored-by: sarthfrey 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit 274b328f57a626bb770c9831ae81034d060a7e1e)
Signed-off-by: Xingbo Jiang 
---
 .../org/apache/spark/BarrierCoordinator.scala  | 111 +--
 .../org/apache/spark/BarrierTaskContext.scala  | 153 ++---
 .../org/apache/spark/api/python/PythonRunner.scala |  51 +--
 .../spark/scheduler/BarrierTaskContextSuite.scala  |  77 +++
 project/MimaExcludes.scala |   5 +-
 python/pyspark/taskcontext.py  |  51 ++-
 python/pyspark/tests/test_taskcontext.py   |  20 +++
 7 files changed, 388 insertions(+), 80 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala 
b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala
index 4e41767..be5036e 100644
--- a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala
+++ b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala
@@ -17,12 +17,17 @@
 
 package org.apache.spark
 
+import java.nio.charset.StandardCharsets.UTF_8
 import java.util.{Timer, TimerTask}
 import java.util.concurrent.ConcurrentHashMap
 import java.util.function.Consumer
 
 import scala.collection.mutable.ArrayBuffer
 
+import org.json4s.JsonAST._
+import org.json4s.JsonDSL._
+import org.json4s.jackson.JsonMethods.{compact, render}
+
 import org.apache.spark.internal.Logging
 import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint}
 import org.apache.spark.scheduler.{LiveListenerBus, SparkListener, 
SparkListenerStageCompleted}
@@ -99,10 +104,15 @@ private[spark] class BarrierCoordinator(
 // reset when a barrier() call fails due to timeout.
 private var barrierEpoch: Int = 0
 
-// An array of RPCCallContexts for barrier tasks that are waiting for 
reply of a barrier()
-// call.
+// An Array of RPCCallContexts for barrier tasks that have made a blocking 
runBarrier() call
 private val requesters: ArrayBuffer[RpcCallContext] = new 
ArrayBuffer[RpcCallContext](numTasks)
 
+// An Array of allGather messages for barrier tasks that have made a 
blocking runBarrier() call
+private val allGatherMessages: ArrayBuffer[String] = new 
Array[String](numTasks).to[ArrayBuffer]
+
+// The

[spark] branch branch-3.0 updated: [SPARK-30667][CORE] Add all gather method to BarrierTaskContext

2020-02-21 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new a5b3377  [SPARK-30667][CORE] Add all gather method to 
BarrierTaskContext
a5b3377 is described below

commit a5b3377f795a76fe218e7c9c0dafc0cbcf9a80a6
Author: sarthfrey-db 
AuthorDate: Fri Feb 21 11:40:28 2020 -0800

[SPARK-30667][CORE] Add all gather method to BarrierTaskContext

Fix for #27395

### What changes were proposed in this pull request?

The `allGather` method is added to the `BarrierTaskContext`. This method 
contains the same functionality as the `BarrierTaskContext.barrier` method; it 
blocks the task until all tasks make the call, at which time they may continue 
execution. In addition, the `allGather` method takes an input message. Upon 
returning from the `allGather` the task receives a list of all the messages 
sent by all the tasks that made the `allGather` call.

### Why are the changes needed?

There are many situations where having the tasks communicate in a 
synchronized way is useful. One simple example is if each task needs to start a 
server to serve requests from one another; first the tasks must find a free 
port (the result of which is undetermined beforehand) and then start making 
requests, but to do so they each must know the port chosen by the other task. 
An `allGather` method would allow them to inform each other of the port they 
will run on.

### Does this PR introduce any user-facing change?

Yes, an `BarrierTaskContext.allGather` method will be available through the 
Scala, Java, and Python APIs.

### How was this patch tested?

Most of the code path is already covered by tests to the `barrier` method, 
since this PR includes a refactor so that much code is shared by the `barrier` 
and `allGather` methods. However, a test is added to assert that an all gather 
on each tasks partition ID will return a list of every partition ID.

An example through the Python API:
```python
>>> from pyspark import BarrierTaskContext
>>>
>>> def f(iterator):
... context = BarrierTaskContext.get()
... return [context.allGather('{}'.format(context.partitionId()))]
...
>>> sc.parallelize(range(4), 4).barrier().mapPartitions(f).collect()[0]
[u'3', u'1', u'0', u'2']
```

Closes #27640 from sarthfrey/master.

Lead-authored-by: sarthfrey-db 
Co-authored-by: sarthfrey 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit 274b328f57a626bb770c9831ae81034d060a7e1e)
Signed-off-by: Xingbo Jiang 
---
 .../org/apache/spark/BarrierCoordinator.scala  | 111 +--
 .../org/apache/spark/BarrierTaskContext.scala  | 153 ++---
 .../org/apache/spark/api/python/PythonRunner.scala |  51 +--
 .../spark/scheduler/BarrierTaskContextSuite.scala  |  77 +++
 project/MimaExcludes.scala |   5 +-
 python/pyspark/taskcontext.py  |  51 ++-
 python/pyspark/tests/test_taskcontext.py   |  20 +++
 7 files changed, 388 insertions(+), 80 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala 
b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala
index 4e41767..be5036e 100644
--- a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala
+++ b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala
@@ -17,12 +17,17 @@
 
 package org.apache.spark
 
+import java.nio.charset.StandardCharsets.UTF_8
 import java.util.{Timer, TimerTask}
 import java.util.concurrent.ConcurrentHashMap
 import java.util.function.Consumer
 
 import scala.collection.mutable.ArrayBuffer
 
+import org.json4s.JsonAST._
+import org.json4s.JsonDSL._
+import org.json4s.jackson.JsonMethods.{compact, render}
+
 import org.apache.spark.internal.Logging
 import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint}
 import org.apache.spark.scheduler.{LiveListenerBus, SparkListener, 
SparkListenerStageCompleted}
@@ -99,10 +104,15 @@ private[spark] class BarrierCoordinator(
 // reset when a barrier() call fails due to timeout.
 private var barrierEpoch: Int = 0
 
-// An array of RPCCallContexts for barrier tasks that are waiting for 
reply of a barrier()
-// call.
+// An Array of RPCCallContexts for barrier tasks that have made a blocking 
runBarrier() call
 private val requesters: ArrayBuffer[RpcCallContext] = new 
ArrayBuffer[RpcCallContext](numTasks)
 
+// An Array of allGather messages for barrier tasks that have made a 
blocking runBarrier() call
+private val allGatherMessages: ArrayBuffer[String] = new 
Array[String](numTasks).to[ArrayBuffer]
+
+// The

[spark] branch master updated (1f0300f -> 274b328)

2020-02-21 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1f0300f  [SPARK-30764][SQL] Improve the readability of EXPLAIN 
FORMATTED style
 add 274b328  [SPARK-30667][CORE] Add all gather method to 
BarrierTaskContext

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/BarrierCoordinator.scala  | 111 +--
 .../org/apache/spark/BarrierTaskContext.scala  | 153 ++---
 .../org/apache/spark/api/python/PythonRunner.scala |  51 +--
 .../spark/scheduler/BarrierTaskContextSuite.scala  |  77 +++
 project/MimaExcludes.scala |   5 +-
 python/pyspark/taskcontext.py  |  51 ++-
 python/pyspark/tests/test_taskcontext.py   |  20 +++
 7 files changed, 388 insertions(+), 80 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (1f0300f -> 274b328)

2020-02-21 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1f0300f  [SPARK-30764][SQL] Improve the readability of EXPLAIN 
FORMATTED style
 add 274b328  [SPARK-30667][CORE] Add all gather method to 
BarrierTaskContext

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/BarrierCoordinator.scala  | 111 +--
 .../org/apache/spark/BarrierTaskContext.scala  | 153 ++---
 .../org/apache/spark/api/python/PythonRunner.scala |  51 +--
 .../spark/scheduler/BarrierTaskContextSuite.scala  |  77 +++
 project/MimaExcludes.scala |   5 +-
 python/pyspark/taskcontext.py  |  51 ++-
 python/pyspark/tests/test_taskcontext.py   |  20 +++
 7 files changed, 388 insertions(+), 80 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: Revert "[SPARK-30667][CORE] Add allGather method to BarrierTaskContext"

2020-02-19 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new cadec3d  Revert "[SPARK-30667][CORE] Add allGather method to 
BarrierTaskContext"
cadec3d is described below

commit cadec3d239186e3abc104b67f7d1aa09d4d7516c
Author: Xingbo Jiang 
AuthorDate: Wed Feb 19 17:06:20 2020 -0800

Revert "[SPARK-30667][CORE] Add allGather method to BarrierTaskContext"

This reverts commit f482187c127418d2ea538ac2551ae0fce1ddbc31.
---
 .../org/apache/spark/BarrierCoordinator.scala  | 113 ++-
 .../org/apache/spark/BarrierTaskContext.scala  | 153 +++--
 .../org/apache/spark/api/python/PythonRunner.scala |  51 ++-
 .../spark/scheduler/BarrierTaskContextSuite.scala  |  74 --
 python/pyspark/taskcontext.py  |  49 +--
 python/pyspark/tests/test_taskcontext.py   |  20 ---
 6 files changed, 79 insertions(+), 381 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala 
b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala
index 042a266..4e41767 100644
--- a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala
+++ b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala
@@ -17,17 +17,12 @@
 
 package org.apache.spark
 
-import java.nio.charset.StandardCharsets.UTF_8
 import java.util.{Timer, TimerTask}
 import java.util.concurrent.ConcurrentHashMap
 import java.util.function.Consumer
 
 import scala.collection.mutable.ArrayBuffer
 
-import org.json4s.JsonAST._
-import org.json4s.JsonDSL._
-import org.json4s.jackson.JsonMethods.{compact, render}
-
 import org.apache.spark.internal.Logging
 import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint}
 import org.apache.spark.scheduler.{LiveListenerBus, SparkListener, 
SparkListenerStageCompleted}
@@ -104,15 +99,10 @@ private[spark] class BarrierCoordinator(
 // reset when a barrier() call fails due to timeout.
 private var barrierEpoch: Int = 0
 
-// An Array of RPCCallContexts for barrier tasks that have made a blocking 
runBarrier() call
+// An array of RPCCallContexts for barrier tasks that are waiting for 
reply of a barrier()
+// call.
 private val requesters: ArrayBuffer[RpcCallContext] = new 
ArrayBuffer[RpcCallContext](numTasks)
 
-// An Array of allGather messages for barrier tasks that have made a 
blocking runBarrier() call
-private val allGatherMessages: ArrayBuffer[String] = new 
Array[String](numTasks).to[ArrayBuffer]
-
-// The blocking requestMethod called by tasks to sync up for this stage 
attempt
-private var requestMethodToSync: RequestMethod.Value = 
RequestMethod.BARRIER
-
 // A timer task that ensures we may timeout for a barrier() call.
 private var timerTask: TimerTask = null
 
@@ -140,32 +130,9 @@ private[spark] class BarrierCoordinator(
 
 // Process the global sync request. The barrier() call succeed if 
collected enough requests
 // within a configured time, otherwise fail all the pending requests.
-def handleRequest(
-  requester: RpcCallContext,
-  request: RequestToSync
-): Unit = synchronized {
+def handleRequest(requester: RpcCallContext, request: RequestToSync): Unit 
= synchronized {
   val taskId = request.taskAttemptId
   val epoch = request.barrierEpoch
-  val requestMethod = request.requestMethod
-  val partitionId = request.partitionId
-  val allGatherMessage = request match {
-case ag: AllGatherRequestToSync => ag.allGatherMessage
-case _ => ""
-  }
-
-  if (requesters.size == 0) {
-requestMethodToSync = requestMethod
-  }
-
-  if (requestMethodToSync != requestMethod) {
-requesters.foreach(
-  _.sendFailure(new SparkException(s"$barrierId tried to use 
requestMethod " +
-s"`$requestMethod` during barrier epoch $barrierEpoch, which does 
not match " +
-s"the current synchronized requestMethod `$requestMethodToSync`"
-  ))
-)
-cleanupBarrierStage(barrierId)
-  }
 
   // Require the number of tasks is correctly set from the 
BarrierTaskContext.
   require(request.numTasks == numTasks, s"Number of tasks of $barrierId is 
" +
@@ -186,7 +153,6 @@ private[spark] class BarrierCoordinator(
 }
 // Add the requester to array of RPCCallContexts pending for reply.
 requesters += requester
-allGatherMessages(partitionId) = allGatherMessage
 logInfo(s"Barrier sync epoch $barrierEpoch from $barrierId received 
update from Task " +
   s"$taskId, current progress: ${requesters.size}/$numTasks.")
 if (maybeFinishAllRequesters(requesters

[spark] branch master updated (6936799 -> e32411e)

2020-02-19 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6936799  [SPARK-30802][ML] Use Summarizer instead of 
MultivariateOnlineSummarizer in Aggregator test suite
 add e32411e  Revert "[SPARK-30667][CORE] Add allGather method to 
BarrierTaskContext"

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/BarrierCoordinator.scala  | 113 ++-
 .../org/apache/spark/BarrierTaskContext.scala  | 153 +++--
 .../org/apache/spark/api/python/PythonRunner.scala |  51 ++-
 .../spark/scheduler/BarrierTaskContextSuite.scala  |  74 --
 python/pyspark/taskcontext.py  |  49 +--
 python/pyspark/tests/test_taskcontext.py   |  20 ---
 6 files changed, 79 insertions(+), 381 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: Revert "[SPARK-30667][CORE] Add allGather method to BarrierTaskContext"

2020-02-13 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new eb37aa5  Revert "[SPARK-30667][CORE] Add allGather method to 
BarrierTaskContext"
eb37aa5 is described below

commit eb37aa5595badd79becf4d3d332404cbcdb1b12d
Author: Xingbo Jiang 
AuthorDate: Thu Feb 13 17:48:19 2020 -0800

Revert "[SPARK-30667][CORE] Add allGather method to BarrierTaskContext"

This reverts commit 6001866cea1216da421c5acd71d6fc74228222ac.
---
 .../org/apache/spark/BarrierCoordinator.scala  | 113 ++-
 .../org/apache/spark/BarrierTaskContext.scala  | 153 +++--
 .../org/apache/spark/api/python/PythonRunner.scala |  51 ++-
 .../spark/scheduler/BarrierTaskContextSuite.scala  |  74 --
 python/pyspark/taskcontext.py  |  49 +--
 python/pyspark/tests/test_taskcontext.py   |  20 ---
 6 files changed, 79 insertions(+), 381 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala 
b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala
index 042a266..4e41767 100644
--- a/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala
+++ b/core/src/main/scala/org/apache/spark/BarrierCoordinator.scala
@@ -17,17 +17,12 @@
 
 package org.apache.spark
 
-import java.nio.charset.StandardCharsets.UTF_8
 import java.util.{Timer, TimerTask}
 import java.util.concurrent.ConcurrentHashMap
 import java.util.function.Consumer
 
 import scala.collection.mutable.ArrayBuffer
 
-import org.json4s.JsonAST._
-import org.json4s.JsonDSL._
-import org.json4s.jackson.JsonMethods.{compact, render}
-
 import org.apache.spark.internal.Logging
 import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint}
 import org.apache.spark.scheduler.{LiveListenerBus, SparkListener, 
SparkListenerStageCompleted}
@@ -104,15 +99,10 @@ private[spark] class BarrierCoordinator(
 // reset when a barrier() call fails due to timeout.
 private var barrierEpoch: Int = 0
 
-// An Array of RPCCallContexts for barrier tasks that have made a blocking 
runBarrier() call
+// An array of RPCCallContexts for barrier tasks that are waiting for 
reply of a barrier()
+// call.
 private val requesters: ArrayBuffer[RpcCallContext] = new 
ArrayBuffer[RpcCallContext](numTasks)
 
-// An Array of allGather messages for barrier tasks that have made a 
blocking runBarrier() call
-private val allGatherMessages: ArrayBuffer[String] = new 
Array[String](numTasks).to[ArrayBuffer]
-
-// The blocking requestMethod called by tasks to sync up for this stage 
attempt
-private var requestMethodToSync: RequestMethod.Value = 
RequestMethod.BARRIER
-
 // A timer task that ensures we may timeout for a barrier() call.
 private var timerTask: TimerTask = null
 
@@ -140,32 +130,9 @@ private[spark] class BarrierCoordinator(
 
 // Process the global sync request. The barrier() call succeed if 
collected enough requests
 // within a configured time, otherwise fail all the pending requests.
-def handleRequest(
-  requester: RpcCallContext,
-  request: RequestToSync
-): Unit = synchronized {
+def handleRequest(requester: RpcCallContext, request: RequestToSync): Unit 
= synchronized {
   val taskId = request.taskAttemptId
   val epoch = request.barrierEpoch
-  val requestMethod = request.requestMethod
-  val partitionId = request.partitionId
-  val allGatherMessage = request match {
-case ag: AllGatherRequestToSync => ag.allGatherMessage
-case _ => ""
-  }
-
-  if (requesters.size == 0) {
-requestMethodToSync = requestMethod
-  }
-
-  if (requestMethodToSync != requestMethod) {
-requesters.foreach(
-  _.sendFailure(new SparkException(s"$barrierId tried to use 
requestMethod " +
-s"`$requestMethod` during barrier epoch $barrierEpoch, which does 
not match " +
-s"the current synchronized requestMethod `$requestMethodToSync`"
-  ))
-)
-cleanupBarrierStage(barrierId)
-  }
 
   // Require the number of tasks is correctly set from the 
BarrierTaskContext.
   require(request.numTasks == numTasks, s"Number of tasks of $barrierId is 
" +
@@ -186,7 +153,6 @@ private[spark] class BarrierCoordinator(
 }
 // Add the requester to array of RPCCallContexts pending for reply.
 requesters += requester
-allGatherMessages(partitionId) = allGatherMessage
 logInfo(s"Barrier sync epoch $barrierEpoch from $barrierId received 
update from Task " +
   s"$taskId, current progress: ${requesters.size}/$numTasks.")
 if (maybeFinishAllRequesters(requesters

[spark] branch master updated (57254c9 -> fa3517c)

2020-02-13 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 57254c9  [SPARK-30667][CORE] Add allGather method to BarrierTaskContext
 add fa3517c  Revert "[SPARK-30667][CORE] Add allGather method to 
BarrierTaskContext"

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/BarrierCoordinator.scala  | 113 ++-
 .../org/apache/spark/BarrierTaskContext.scala  | 153 +++--
 .../org/apache/spark/api/python/PythonRunner.scala |  51 ++-
 .../spark/scheduler/BarrierTaskContextSuite.scala  |  74 --
 python/pyspark/taskcontext.py  |  49 +--
 python/pyspark/tests/test_taskcontext.py   |  20 ---
 6 files changed, 79 insertions(+), 381 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (57254c9 -> fa3517c)

2020-02-13 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 57254c9  [SPARK-30667][CORE] Add allGather method to BarrierTaskContext
 add fa3517c  Revert "[SPARK-30667][CORE] Add allGather method to 
BarrierTaskContext"

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/BarrierCoordinator.scala  | 113 ++-
 .../org/apache/spark/BarrierTaskContext.scala  | 153 +++--
 .../org/apache/spark/api/python/PythonRunner.scala |  51 ++-
 .../spark/scheduler/BarrierTaskContextSuite.scala  |  74 --
 python/pyspark/taskcontext.py  |  49 +--
 python/pyspark/tests/test_taskcontext.py   |  20 ---
 6 files changed, 79 insertions(+), 381 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (bd7510b -> c49abf8)

2020-01-08 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bd7510b  [SPARK-30281][SS] Consider partitioned/recursive option while 
verifying archive path on FileStreamSource
 add c49abf8  [SPARK-30417][CORE] Task speculation numTaskThreshold should 
be greater than 0 even EXECUTOR_CORES is not set under Standalone mode

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/internal/config/package.scala |  3 +-
 .../apache/spark/scheduler/TaskSetManager.scala|  8 -
 .../spark/scheduler/TaskSetManagerSuite.scala  | 37 ++
 docs/configuration.md  |  3 +-
 4 files changed, 41 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (3d45779 -> 2e71a6e)

2019-11-18 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3d45779  [SPARK-29728][SQL] Datasource V2: Support ALTER TABLE RENAME 
TO
 add 2e71a6e  [SPARK-27558][CORE] Gracefully cleanup task when it fails 
with OOM exception

No new revisions were added by this update.

Summary of changes:
 .../spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java   | 4 
 1 file changed, 4 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (3d45779 -> 2e71a6e)

2019-11-18 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3d45779  [SPARK-29728][SQL] Datasource V2: Support ALTER TABLE RENAME 
TO
 add 2e71a6e  [SPARK-27558][CORE] Gracefully cleanup task when it fails 
with OOM exception

No new revisions were added by this update.

Summary of changes:
 .../spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java   | 4 
 1 file changed, 4 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (39b502a -> 15a72f3)

2019-11-13 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 39b502a  [SPARK-29778][SQL] pass writer options to saveAsTable in 
append mode
 add 15a72f3  [SPARK-29287][CORE] Add LaunchedExecutor message to tell 
driver which executor is ready for making offers

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/executor/CoarseGrainedExecutorBackend.scala | 1 +
 .../spark/scheduler/cluster/CoarseGrainedClusterMessage.scala| 2 ++
 .../spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala  | 9 +++--
 .../apache/spark/deploy/StandaloneDynamicAllocationSuite.scala   | 3 ++-
 4 files changed, 12 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-website] branch asf-site updated: fix download page (#229)

2019-11-07 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new ce0a7e9  fix download page (#229)
ce0a7e9 is described below

commit ce0a7e93441787d27fdfdee9eccc27503ed7fdc0
Author: Jiang Xingbo 
AuthorDate: Thu Nov 7 12:33:14 2019 -0800

fix download page (#229)

The download page is not able to show `3.0.0-preview` release, because 
`hadoop3p2` was not recognized. This PR fixes the issue.
---
 js/downloads.js  | 2 +-
 site/js/downloads.js | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/js/downloads.js b/js/downloads.js
index 47103f0..f7429e1 100644
--- a/js/downloads.js
+++ b/js/downloads.js
@@ -15,7 +15,7 @@ var sources = {pretty: "Source Code", tag: "sources"};
 var hadoopFree = {pretty: "Pre-built with user-provided Apache Hadoop", tag: 
"without-hadoop"};
 var hadoop2p6 = {pretty: "Pre-built for Apache Hadoop 2.6", tag: "hadoop2.6"};
 var hadoop2p7 = {pretty: "Pre-built for Apache Hadoop 2.7", tag: "hadoop2.7"};
-var hadoop3p2 = {pretty: "Pre-built for Apache Hadoop 3.2 and later", tag: 
"hadoop3.2"}
+var hadoop3p2 = {pretty: "Pre-built for Apache Hadoop 3.2 and later", tag: 
"hadoop3.2"};
 var scala2p12_hadoopFree = {pretty: "Pre-built with Scala 2.12 and 
user-provided Apache Hadoop", tag: "without-hadoop-scala-2.12"};
 
 // 2.2.0+
diff --git a/site/js/downloads.js b/site/js/downloads.js
index 47103f0..f7429e1 100644
--- a/site/js/downloads.js
+++ b/site/js/downloads.js
@@ -15,7 +15,7 @@ var sources = {pretty: "Source Code", tag: "sources"};
 var hadoopFree = {pretty: "Pre-built with user-provided Apache Hadoop", tag: 
"without-hadoop"};
 var hadoop2p6 = {pretty: "Pre-built for Apache Hadoop 2.6", tag: "hadoop2.6"};
 var hadoop2p7 = {pretty: "Pre-built for Apache Hadoop 2.7", tag: "hadoop2.7"};
-var hadoop3p2 = {pretty: "Pre-built for Apache Hadoop 3.2 and later", tag: 
"hadoop3.2"}
+var hadoop3p2 = {pretty: "Pre-built for Apache Hadoop 3.2 and later", tag: 
"hadoop3.2"};
 var scala2p12_hadoopFree = {pretty: "Pre-built with Scala 2.12 and 
user-provided Apache Hadoop", tag: "without-hadoop-scala-2.12"};
 
 // 2.2.0+


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] annotated tag v3.0.0-preview updated (007c873 -> b23e21e)

2019-11-06 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to annotated tag v3.0.0-preview
in repository https://gitbox.apache.org/repos/asf/spark.git.


*** WARNING: tag v3.0.0-preview was modified! ***

from 007c873  (commit)
  to b23e21e  (tag)
 tagging 007c873ae34f58651481ccba30e8e2ba38a692c4 (commit)
 replaces v3.0.0-preview-rc1
  by Xingbo Jiang
  on Wed Oct 30 18:03:22 2019 -0700

- Log -
v3.0.0-preview-rc2
---


No new revisions were added by this update.

Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r36658 - /dev/spark/v3.0.0-preview-rc2-docs/

2019-11-06 Thread jiangxb1987

Author: jiangxb1987
Date: Wed Nov  6 23:28:58 2019
New Revision: 36658

Log:
Removing RC artifacts.

Removed:
dev/spark/v3.0.0-preview-rc2-docs/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r36657 - in /dev/spark: v3.0.0-preview-rc1-bin/ v3.0.0-preview-rc1-docs/

2019-11-06 Thread jiangxb1987

Author: jiangxb1987
Date: Wed Nov  6 23:27:47 2019
New Revision: 36657

Log:
Removing RC artifacts.

Removed:
dev/spark/v3.0.0-preview-rc1-bin/
dev/spark/v3.0.0-preview-rc1-docs/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r36563 - in /dev/spark/v3.0.0-preview-rc2-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apac

2019-10-30 Thread jiangxb1987

Author: jiangxb1987
Date: Thu Oct 31 02:38:18 2019
New Revision: 36563

Log:
Apache Spark v3.0.0-preview-rc2 docs


[This commit notification would consist of 1890 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r36562 - /dev/spark/v3.0.0-preview-rc2-bin/

2019-10-30 Thread jiangxb1987

Author: jiangxb1987
Date: Thu Oct 31 02:22:19 2019
New Revision: 36562

Log:
Apache Spark v3.0.0-preview-rc2

Added:
dev/spark/v3.0.0-preview-rc2-bin/
dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz   (with props)
dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.asc
dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.sha512
dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz   (with props)
dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz.asc
dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz.sha512
dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz   
(with props)
dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz.asc

dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz.sha512
dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-hadoop3.2.tgz   
(with props)
dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-hadoop3.2.tgz.asc

dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-hadoop3.2.tgz.sha512
dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-without-hadoop.tgz 
  (with props)

dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-without-hadoop.tgz.asc

dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-without-hadoop.tgz.sha512
dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview.tgz   (with props)
dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview.tgz.asc
dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview.tgz.sha512

Added: dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.asc
==
--- dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.asc (added)
+++ dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.asc Thu Oct 31 
02:22:19 2019
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCgAdFiEEJpWcIu5dRoIEnWUm4bfg8l5L9WsFAl26PZIACgkQ4bfg8l5L
+9Wv1YRAAjyL1er+AJEdWJg8EkTq0Yq1anqthY+6ARmc3gKUm5MiyxB5SXh/5xNrG
+k4Pob3aQ4cQdsMeXv1SX1+zHqVDa2VEbk5GEf4YDeS5OYB0f3SeG/C4rrGVktoq6
+r50VEg0b2ur+2iYup5VG6hvGQmiF4UlICVGvPA8HvyIbvMt6pxxWSW2musIz32Tq
+pnR7fqa4d5/WtLKk+m3W/n+LIk6P04cG++thsr2Xl8UcPB17g0utuEH6l2qFDIT1
+GJqCFvptQRHxJIhBG1fC0wIuXDm4uBSv/yMlS7MlLn+PQ5PB2NLcc30VkRzepWKp
+efuhy9rPGEVMNxcpPQf64OkibnQXNpOkvD7N7RNaDh+sVWhiPtUbmAGPKtNjgpvw
+yWUJO40SbFAyA7UY7gFBI9LmYqE1iQCCpLhsIzT/Q8pQrQ9sssgbO3Oih2Dhj+QA
+K8xVBDmlP0M/9K+Xk8BITAg2zvsOmtEd18dL/te+vaxTHlMsWSGDq0ZcDpbU8S0p
+FXSifMUGb1/wdthg2V+bqi+/VPIRvdOqUj9gfrooXEcUx9XShI39CQxVhJbb1k2+
+SrFNqC3zY8DrXaf/Imx1C5sQPAMY4T2dE22NkrFLhwsiNRVMk3IyfRVtI+8iVxZm
+KuZNI1t5gMvaVfBIRqSZ4teT1YnktapbuDadeiC3zOk2pgB7mQk=
+=ViZ1
+-END PGP SIGNATURE-

Added: dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.sha512
==
--- dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.sha512 (added)
+++ dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.sha512 Thu Oct 
31 02:22:19 2019
@@ -0,0 +1,4 @@
+SparkR_3.0.0-preview.tar.gz: 2817FF08 5338F9BA B889CA98 8F8F1961 03747AC9
+ 8ED4F57D AAA87F0A DDA2A145 089F9C85 C5B6EFC9
+ 4CDC0CCF 1BA48017 F712F4DC 50B534E7 6134F43E
+ BA87D79A

Added: dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz.asc
==
--- dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz.asc (added)
+++ dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz.asc Thu Oct 31 
02:22:19 2019
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCgAdFiEEJpWcIu5dRoIEnWUm4bfg8l5L9WsFAl26PZMACgkQ4bfg8l5L
+9Wttuw/7Bpkeag7Haalshy0Atn0SBfGHhyn/KD7ri4GvV5DmyOwEwk7fV51uHrOR
+X2SmgmZ068G1/rXYmhiZGyZExg7CWLeqiG2VxOnfnvkgldg2mAs0vbx6pNEZ4b7k
+DdahU3jMBIDtNSc8ivN8jJSswUJVoHb8bJN+GSMZqoOsy5DEIFRVh1yd/eqHABk0
+zNduitnQW8Vdg6Mav0fjetbzP6F/YXEg9mt4B4vV2Mwt2ChTUAGL+SaVrU1oTsUQ
+NY8HVMOf8V8uCshEcsSP+Ga46N9a0StociUkiUHkzHC7PEM0PVrq+qqcmZC1/2HA
+ST55bbZIaGon2D1Zy9mPvgQpY9XXrvcl9itDw/IxwIk4/lddrfoFXo4QERrZoP1m
+DZpE

[spark] annotated tag v3.0.0-preview-rc2 updated (007c873 -> b23e21e)

2019-10-30 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to annotated tag v3.0.0-preview-rc2
in repository https://gitbox.apache.org/repos/asf/spark.git.


*** WARNING: tag v3.0.0-preview-rc2 was modified! ***

from 007c873  (commit)
  to b23e21e  (tag)
 tagging 007c873ae34f58651481ccba30e8e2ba38a692c4 (commit)
 replaces v3.0.0-preview-rc1
  by Xingbo Jiang
  on Wed Oct 30 18:03:22 2019 -0700

- Log -
v3.0.0-preview-rc2
---


No new revisions were added by this update.

Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (155a67d -> 8207c83)

2019-10-30 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 155a67d  [SPARK-29666][BUILD] Fix the publish release failure under 
dry-run mode
 add 007c873  Prepare Spark release v3.0.0-preview-rc2
 add 8207c83  Revert "Prepare Spark release v3.0.0-preview-rc2"

No new revisions were added by this update.

Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r36558 - in /dev/spark/v3.0.0-preview-rc2-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apac

2019-10-30 Thread jiangxb1987

Author: jiangxb1987
Date: Thu Oct 31 00:39:27 2019
New Revision: 36558

Log:
Apache Spark v3.0.0-preview-rc2 docs


[This commit notification would consist of 1890 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r36557 - /dev/spark/v3.0.0-preview-rc2-bin/

2019-10-30 Thread jiangxb1987

Author: jiangxb1987
Date: Thu Oct 31 00:23:13 2019
New Revision: 36557

Log:
Apache Spark v3.0.0-preview-rc2

Added:
dev/spark/v3.0.0-preview-rc2-bin/
dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz   (with props)
dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.asc
dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.sha512
dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz   (with props)
dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz.asc
dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz.sha512
dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz   
(with props)
dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz.asc

dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz.sha512
dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-hadoop3.2.tgz   
(with props)
dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-hadoop3.2.tgz.asc

dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-hadoop3.2.tgz.sha512
dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-without-hadoop.tgz 
  (with props)

dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-without-hadoop.tgz.asc

dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview-bin-without-hadoop.tgz.sha512
dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview.tgz   (with props)
dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview.tgz.asc
dev/spark/v3.0.0-preview-rc2-bin/spark-3.0.0-preview.tgz.sha512

Added: dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.asc
==
--- dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.asc (added)
+++ dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.asc Thu Oct 31 
00:23:13 2019
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCgAdFiEEJpWcIu5dRoIEnWUm4bfg8l5L9WsFAl26IasACgkQ4bfg8l5L
+9WvrQhAAi115Cm+BH93sVUM0jH9iWq0JN4gwb6ueNdpYArtQLZfMFA7y+d3ssnfM
+44qv9gvb2KSGkSBf9JJRSk9j1pSMLnjYFRfm89p0w2DD0I7CoNqIIMgoup9elMjy
+0SKUQ1CGSGWgZRlnAW5g6ZV0D4mnHzhZiakUipQHglSMLzsTQakTbKycXQPcRfTe
+KhyO/B4UPxSEihFemR89DcqvKN2zuNiPdXF6QVk2/XyqrKndObSllOzgYzBixDuD
+rFtmBguTdLIXv+EVE0U6b3SlW/LUxDDI/uDnwZGoycUizwWevGUcjAVc5hD1DWQv
+2CysbOJAYoOMH0UR1OKIxeUoE2Akpp52byD+pfzfVkC+krqA4G8SIhSrLX1lCuvV
+bcK9WFhlSw7CzMLAbNhdR0UqsO/W/vQXvh7FhEw5DpLSagQkSecqGFHifE+RCw8G
+vRuqLRmNVfGO/FBjuyhpiAaLnHEJnSOcZLP2m73fVKYghhOMrsmK4p6Grl3pFvnA
+pcb+S74AtQoVXmBYSvpx4q853d9RvuZr9g2p8bhywHnERQeJygC00yJbzAtaPe5D
+KJRdN6wzRQw7/I659UvWRHRbCO4eM9O6ecuSTZJH9IjQmYa/7O07QVtTsDkla29d
+LZJ9z17yi/+oAQtVlh4vpw41eLb1HRra4Hy9NRAafVUzKSUu7ss=
+=+nkw
+-END PGP SIGNATURE-

Added: dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.sha512
==
--- dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.sha512 (added)
+++ dev/spark/v3.0.0-preview-rc2-bin/SparkR_3.0.0-preview.tar.gz.sha512 Thu Oct 
31 00:23:13 2019
@@ -0,0 +1,4 @@
+SparkR_3.0.0-preview.tar.gz: EFCE3700 A013B5B6 383EE803 3DECB569 48E28BD6
+ 0694077A 22D813C3 A535CA04 9303381F 5C44D018
+ 214042A4 457734DA 87FBC417 3CDB774D 2A1CF133
+ AF9EBB45

Added: dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz.asc
==
--- dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz.asc (added)
+++ dev/spark/v3.0.0-preview-rc2-bin/pyspark-3.0.0.dev0.tar.gz.asc Thu Oct 31 
00:23:13 2019
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCgAdFiEEJpWcIu5dRoIEnWUm4bfg8l5L9WsFAl26Ia0ACgkQ4bfg8l5L
+9Wsq9BAAj5Dn6Ofgq0CwxNw6YnQUn8TCBf521ky+zVGS0oVwxKcUvvZ8ixLk6GkK
+wIslxkkLFqa5WMLP/VQfnQnv38UpzYYQjKyATqLvh7YM65/mwRfsmExWbd7GTEGS
+ReGdVwl2Vj3bHLiZMPW4up3eNhkulQP86K3yfuNhwFHXJedLPH/da3MqBI+mizNx
+FVHDdvoPkmDF2OcVsZR+db/mfNnUt9Zd61nYb+aq4MWOiXduNwV77F8jDr2AZyJ9
+9Pq7RS+KAlX+3Ce+SzoVdFOMulilE6wB9WBRXBfyK1WX1+/QqVhnI8PofEmMyezl
+vRlN1BZioki1zwB13J1BCmLAhRXXsLj3yQi3KAEz9BhBfqxJcae1b0UoA2MRLYRN

[spark] annotated tag v3.0.0-preview-rc2 updated (5eddbb5 -> 9e3baa3)

2019-10-30 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to annotated tag v3.0.0-preview-rc2
in repository https://gitbox.apache.org/repos/asf/spark.git.


*** WARNING: tag v3.0.0-preview-rc2 was modified! ***

from 5eddbb5  (commit)
  to 9e3baa3  (tag)
 tagging 5eddbb5f1d9789696927f435c55df887e50a1389 (commit)
 replaces 3.0.0-preview
  by Xingbo Jiang
  on Wed Oct 30 16:23:08 2019 -0700

- Log -
v3.0.0-preview-rc2
---


No new revisions were added by this update.

Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] annotated tag v3.0.0-preview-rc2 updated (155a67d -> f86eed8)

2019-10-30 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to annotated tag v3.0.0-preview-rc2
in repository https://gitbox.apache.org/repos/asf/spark.git.


*** WARNING: tag v3.0.0-preview-rc2 was modified! ***

from 155a67d  (commit)
  to f86eed8  (tag)
 tagging 155a67d00cb2f12aad179f6df2d992feca8e003e (commit)
 replaces v3.0.0-preview-rc1
  by Xingbo Jiang
  on Wed Oct 30 16:16:56 2019 -0700

- Log -
v3.0.0-preview-rc2
---


No new revisions were added by this update.

Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r36548 - in /dev/spark/v3.0.0-preview-rc1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apac

2019-10-29 Thread jiangxb1987

Author: jiangxb1987
Date: Tue Oct 29 21:25:58 2019
New Revision: 36548

Log:
Apache Spark v3.0.0-preview-rc1 docs


[This commit notification would consist of 1890 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r36547 - /dev/spark/v3.0.0-preview-rc1-bin/

2019-10-29 Thread jiangxb1987

Author: jiangxb1987
Date: Tue Oct 29 21:09:40 2019
New Revision: 36547

Log:
Apache Spark v3.0.0-preview-rc1

Added:
dev/spark/v3.0.0-preview-rc1-bin/
dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-preview.tar.gz   (with props)
dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-preview.tar.gz.asc
dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-preview.tar.gz.sha512
dev/spark/v3.0.0-preview-rc1-bin/pyspark-3.0.0.preview.tar.gz.sha512
dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz   
(with props)
dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz.asc

dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz.sha512
dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-hadoop3.2.tgz   
(with props)
dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-hadoop3.2.tgz.asc

dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-hadoop3.2.tgz.sha512
dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-without-hadoop.tgz 
  (with props)

dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-without-hadoop.tgz.asc

dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-without-hadoop.tgz.sha512
dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview.tgz   (with props)
dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview.tgz.asc
dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview.tgz.sha512

Added: dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-preview.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-preview.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-preview.tar.gz.asc
==
--- dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-preview.tar.gz.asc (added)
+++ dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-preview.tar.gz.asc Tue Oct 29 
21:09:40 2019
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCgAdFiEEJpWcIu5dRoIEnWUm4bfg8l5L9WsFAl24osoACgkQ4bfg8l5L
+9Wthuw/9Hzrv5SIBFq7PjSn3IaZ9k9HRWuARKFgCX1O4wOY1giLqv/8gVkXyh139
+cT7fgq4lPRc/EwF6DdQqqp5cGGkZeATOVchDWru8/uPOpgjBJtuaM5ZQvHYla5CP
+mYCEz0Y0aDLvb+7QjcBjfXQnj3Jwv92uzdTH9J1upaUJP1UX6KN4AmkC/utHa0BQ
+FM8e/yLk7k0o7PSdXl418wzV88Q/sQZA+C4h7bJjSF8Zgfnnwy99yRmVwOaLNx2o
+gpYo6+0kvX3dh8/jRb1zCgGgidtWHpnuGJZU7teyKpUoZefiE5FL4TG13HdekSjh
+pZGhWlk2iWbca0rRxABmIy7Xvcm8RC8SueOzorTa0CpvlmtOHYpDeEv7mQNSOa19
+alHCkdMIKgCTTDffYBv+baePSo3MKFXSHPQ0xAD0TKUN/Jb3w6UUSTIMlS6wfz47
+KLWIq61EcuW+/irOOQ7ZRUK/e2DDpT0jtM3k6oKtFlbSqWLqCtZZTAzvAhU17GzZ
+SVX+yBDax+n6ZfL9NOEL1geUKqKlhOVChqUPaL/VVZ0KXKphPhqvS0/EHz8/0l52
+R8JwPfZn5u192OfU8Qqtc5oMLcpX2hm7aVgU8q5XNRrF1u9egytOdrnKyRbaHlds
+nWyVHzx1dySMiULY9TO5bRkjKFr+DY828QgsgiHb7VBRfuzpng4=
+=ANBW
+-END PGP SIGNATURE-

Added: dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-preview.tar.gz.sha512
==
--- dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-preview.tar.gz.sha512 (added)
+++ dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-preview.tar.gz.sha512 Tue Oct 
29 21:09:40 2019
@@ -0,0 +1,4 @@
+SparkR_3.0.0-preview.tar.gz: A5317BEC BA7BD226 0971EDCC FEF0F3CB 1BD21180
+ DA41E5DC 3D37CF0F CB873512 B7FDF14A 453F2FFC
+ 7E1B88C0 5E98FEC0 0BC2BA21 FE611726 218C7500
+ 44D44610

Added: dev/spark/v3.0.0-preview-rc1-bin/pyspark-3.0.0.preview.tar.gz.sha512
==
(empty)

Added: dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz
==
Binary file - no diff available.

Propchange: 
dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz
--
svn:mime-type = application/octet-stream

Added: 
dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz.asc
==
--- dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz.asc 
(added)
+++ dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-preview-bin-hadoop2.7.tgz.asc 
Tue Oct 29 21:09:40 2019
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCgAdFiEEJpWcIu5dRoIEnWUm4bfg8l5L9WsFAl24oswACgkQ4bfg8l5L
+9WuV/A//UTaApVOBvsRuw7xMsr0SYVmrjZQEBdAGXiPaDyXUmw6qTNk4q0RTLzMQ
+rXxP90Jmu4T8qPM6gQnLVDGmq7+5GBs9Bvs4eZ9ZMKnrZ/pIYw1+Q4sym5uzPI13
+lAvmnYCnaAQuHHBSN6YJv9O2WAUUX1WnOIo/LOThgabr2j6tWxJDF2u5ZIk5aDni
+iDQ+SJVICzTKcqSS0+IPJr+rR/WJXEd34RrYklCs54pSHgbb8ZbEFY/mErXLXIP4
+4We7oRGwW9Qb6WY0AjShjgD1ru2fjXZz5jJFnTWt9uZw6q7hFsnjvGSNRNbiGNdD

svn commit: r36546 - /dev/spark/KEYS

2019-10-29 Thread jiangxb1987

Author: jiangxb1987
Date: Tue Oct 29 21:05:34 2019
New Revision: 36546

Log:
Update KEYS

Modified:
dev/spark/KEYS

Modified: dev/spark/KEYS
==
--- dev/spark/KEYS (original)
+++ dev/spark/KEYS Tue Oct 29 21:05:34 2019
@@ -1050,3 +1050,60 @@ XJ5Dp1pqv9DC6cl9vLSHctRrM2kG
 =mQLW
 -END PGP PUBLIC KEY BLOCK-
 
+pub   rsa4096 2019-10-22 [SC] [expires: 2029-10-19]
+  26959C22EE5D4682049D6526E1B7E0F25E4BF56B
+uid   [ultimate] Xingbo Jiang (CODE SIGNING KEY) 

+sub   rsa4096 2019-10-22 [E] [expires: 2029-10-19]
+
+-BEGIN PGP PUBLIC KEY BLOCK-
+
+mQINBF2uvsQBEAC2Nv1IChLuRTlq/fADcp2Q0dJ4dwsdEZuuGqtius6hiQuJX/6b
+1EoVkns+8EVF88mBLGyw5RtYLqVmfPjuObhw0yHX5wO9ilSjrhyVPf1oSAg7TdZg
+Lm2Sr4Lc1tzMAS+smtkrsNdARzTHYE5s4gFh5Kq65Y4+aNjbNOWomlNIAJumXO62
+5RGYn0cMaqx2DyFmavUYp2ofAf1OnBlTOhZnVFxkl+EIbbZfuynRcWG2dV1IYbPk
++felLi9EDtps0H0MWxZrwqOdHC5gTBoA10XTGmF0mYiTaT60+FclnBTiGipzhrP8
+O4lIWfzokYe9UjE5MX454tsPPNAHQ1b24AYqZJfBOzOn62yVayVzSy2LvRUinbPp
+O7fMmfmMhxQw+TpG5M9wlrxyfNrOwc7kZP8xn0g2MIEPtCEBkkgEB3vrGwvPnIdi
+ZikgUi3zCTWRXhODzdSt/spbtQrAJRaB63KbfdIYRO/wJOQ/I7+ytg/UO+SX1+X1
+nbjjoCUkbShaUPCdxUWM/f9D60Kz9yWG8ue0n911H0s2kBSOUNmerhMt9EJbvW04
+YuLZ3Vtt7ju1b+Ol0Nar2y4H8mVh8DAelc9ZXE+T4CaGWaC0ynqAZbg4y4bvdMvm
+hsdPP0jGHJYh8fxjGbiNGx513i5kshZ+HX9KxX09lsehfVbgPgayk3XQoQARAQAB
+tDhYaW5nYm8gSmlhbmcgKENPREUgU0lHTklORyBLRVkpIDxqaWFuZ3hiMTk4N0Bh
+cGFjaGUub3JnPokCVAQTAQgAPhYhBCaVnCLuXUaCBJ1lJuG34PJeS/VrBQJdrr7E
+AhsDBQkSzAMABQsJCAcCBhUKCQgLAgQWAgMBAh4BAheAAAoJEOG34PJeS/VraCsQ
+AIKYFXrprDCsl/rIc4MhvzLMvibLPWzGgjmkVZ5uJVS9zhGGU6gwatzllbpFeKkV
+uXQrPt8kn2ygx/jhGvRr8ku0eKK2xAoEii89Ta2QVxkDMszKUB31OUZKJDN0ytmM
+i1Y3NeWFwiKWNh9bCh1voamKHtiVHeB3LgxoKPYfYxTodr+lZFZBjo8ZAZEcNAzQ
+iP46XPXCTu0J0jJhdfiP+haofOR49nP7VcD+ALS2AkEwD+2zpkcJLDB/CTEgBxco
+xCCAHL7WKSXqU/526R84JXO8zLVOLIDG3//m4cagQknPvux8eTyu+OZTqwFz1gLc
+Vtvv5tIH3gotmohbTNzzn+9LQhd18YawW+D+1ie29qqwq/K48K8EXpy9JAnBTJv6
+z1VcQ5HS2Va7U/fqQf9TbHLlZiK1u7XVmhvuVzqgLruAAl5zz0AybjNo9iEa+pSB
+7L1MFHx7oEI3aoIiuI39dyrE0JFwUelbfHrpuBgfI/VfFW5vqkBbJrNxb4tSgzcY
++q37nNp72pZkeL5Kn3m9R4M5GDU6saaGsCpqr802f3AGdDfSI4rKfPFA7Y62S/gI
+rs6da1i15sCprQvuYKMIPeUq0YIUoJ0WuHuBX4gF8XfmtGAxriYU0moaGzoCPMuT
+dT4Nc4uYWaTVx5SvIo40w88h0rHg9g6ty65ur4hn7G0iuQINBF2uvsQBEADj0Cb4
+EJKqvV2SxysiW2mdDnRN0c+/aX1zB64U5VMJJl2h4crlYd71hDTXwu0Zt0MMAg+8
+HqkjAAp8szsYTSr0kz/9xSnJzuMuTWnaNe9eCii/HTzHSbkN25JJTThNX5hZ0And
+wmtFfhelgZczIt9EnblCUqsyPiwL1w4vF1x7a2ftbS+k1n2Q+UIYAMZfsoYIlZzZ
+cPKNJdHI0itBPaKBEoOQ3KebCyl91xdXb2elzS1NiID4Rx0dHB6C1w3v8wnBmgaC
+IEv7Z7sS1Fr6NHf3pTnsBiyu/xcwE8eznhH0Bqb/i4qSCA8H6DMzwj5JiZ7yMvGd
+S2PbUCLKSFaeZOcXY9LN9K9rqztxgCSVBqSZ6IB41HnjKd97hO4yMJqvrelg+KUN
+31F3Jp0VBQB1a+j2LLNMJoFcrfG2Tv6R8sx1pwJWySRkdHKX6bB42Dzpd66ReZ0e
+3RrcoOWe0jtXr0WwuxP94EZoyRN/iRQ6k3jR6B3n6RBg8CL0qNc2yRGsuXR9MPfE
+GHUW/cYvZZbfkORYXSPt8pfdziWnr18a6oagcTGdeNyjVWWsQE39psYNM6HnhBSs
+IxkMb7lXTz43MfFk55nrJqE/3LIH3AkJxKr3HOzGbUM832Iiqvg0o0nBRd9B4vLz
+mkdtD+K9hCW6G7g+1hzx1Tqaf6n2Q69gv/zq+wARAQABiQI8BBgBCAAmFiEEJpWc
+Iu5dRoIEnWUm4bfg8l5L9WsFAl2uvsQCGwwFCRLMAwAACgkQ4bfg8l5L9Wv7fw/+
+LDXns0Bph1X/ph2+kUBWL6KC6Fq+f636HvZfaRQMxs9vPXEBWpcTNFAmjPckmNJ+
+y/VbgeUSKR5ylol5jyTIamAR3J9xI0kQdaN7URnBmn2WLIxiRR3houLBMaJFWOnr
+5/4+LJ42R4c1Le3GJOXVfiCustJf0eZTMoAyN4bQHPd2vXb9p/BwQfctTNrvv+ZT
+v5D0rUAZO+kmIO9hzfVnQ/RCDPzMZRXImxWFOKczbwUq3By+ZsoPtG0IHjtbSErt
+/Rl+CK0vzD008Mq2vEP/OJ1IW32+QrokoWRAam87RAe7YATOvWYsMib3m8Hnh0JY
+AwDlgPzhvvymA9eVOscs608ZOtuNzDRTCKTRTv2T6Ktsh7ZBUSU0aXN2nafC+8xA
+/SBAeUqH+C9+aHiP8gdAG9xq2MXs8V5oQRVoYriQZuSlM2r/u0DNq8BGcXbguHjn
+opBNCzchUWR5X19V0KDN8cW9f22RcrenREJhrrn2WfVT/lvBF1E0X70H8ziMHrrf
+fWOysS0CT3qM2hNCa15MH+xsu6Lq56lqzCC5UhR+vTo4SNXSScv2Z5Verf30xCnN
+/JuPHYL8Y48/pX+9dYSAali4oDNQ0mI38j+A44ik8C8o6qOjk7qQwUlluCmV62ut
+kR7loYvuYi9fxvlaW0kc3Bd10JCHCEmobHno5Gflr9I=
+=l4ux
+-END PGP PUBLIC KEY BLOCK-



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] annotated tag v3.0.0-preview-rc1 updated (5eddbb5 -> a480a27)

2019-10-28 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to annotated tag v3.0.0-preview-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git.


*** WARNING: tag v3.0.0-preview-rc1 was modified! ***

from 5eddbb5  (commit)
  to a480a27  (tag)
 tagging 5eddbb5f1d9789696927f435c55df887e50a1389 (commit)
 replaces 3.0.0-preview
  by Xingbo Jiang
  on Mon Oct 28 22:36:57 2019 -0700

- Log -
v3.0.0-preview-rc1
---


No new revisions were added by this update.

Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (5eddbb5 -> b33a58c)

2019-10-28 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5eddbb5  Prepare Spark release v3.0.0-preview-rc1
 add b33a58c  Revert "Prepare Spark release v3.0.0-preview-rc1"

No new revisions were added by this update.

Summary of changes:
 R/pkg/R/sparkR.R   | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graph/api/pom.xml  | 2 +-
 graph/cypher/pom.xml   | 2 +-
 graph/graph/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 41 files changed, 42 insertions(+), 42 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (fb80dfe -> 5eddbb5)

2019-10-28 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fb80dfe  [SPARK-28158][SQL][FOLLOWUP] HiveUserDefinedTypeSuite: don't 
use RandomDataGenerator to create row for UDT backed by ArrayType
 add 5eddbb5  Prepare Spark release v3.0.0-preview-rc1

No new revisions were added by this update.

Summary of changes:
 R/pkg/R/sparkR.R   | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 4 ++--
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graph/api/pom.xml  | 2 +-
 graph/cypher/pom.xml   | 2 +-
 graph/graph/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 41 files changed, 42 insertions(+), 42 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r36458 - in /dev/spark/v3.0.0-preview-rc1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apac

2019-10-24 Thread jiangxb1987

Author: jiangxb1987
Date: Thu Oct 24 09:29:16 2019
New Revision: 36458

Log:
Apache Spark v3.0.0-preview-rc1 docs


[This commit notification would consist of 1877 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r36457 - /dev/spark/v3.0.0-preview-rc1-bin/

2019-10-24 Thread jiangxb1987

Author: jiangxb1987
Date: Thu Oct 24 09:12:17 2019
New Revision: 36457

Log:
Apache Spark v3.0.0-preview-rc1

Added:
dev/spark/v3.0.0-preview-rc1-bin/
dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-SNAPSHOT.tar.gz   (with props)
dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-SNAPSHOT.tar.gz.asc
dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-SNAPSHOT.tar.gz.sha512
dev/spark/v3.0.0-preview-rc1-bin/pyspark-3.0.0.dev0.tar.gz   (with props)
dev/spark/v3.0.0-preview-rc1-bin/pyspark-3.0.0.dev0.tar.gz.asc
dev/spark/v3.0.0-preview-rc1-bin/pyspark-3.0.0.dev0.tar.gz.sha512
dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-SNAPSHOT-bin-hadoop2.7.tgz   
(with props)
dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-SNAPSHOT-bin-hadoop2.7.tgz.asc

dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-SNAPSHOT-bin-hadoop2.7.tgz.sha512

dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-SNAPSHOT-bin-without-hadoop.tgz   
(with props)

dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-SNAPSHOT-bin-without-hadoop.tgz.asc

dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-SNAPSHOT-bin-without-hadoop.tgz.sha512
dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-SNAPSHOT.tgz   (with props)
dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-SNAPSHOT.tgz.asc
dev/spark/v3.0.0-preview-rc1-bin/spark-3.0.0-SNAPSHOT.tgz.sha512

Added: dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-SNAPSHOT.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-SNAPSHOT.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-SNAPSHOT.tar.gz.asc
==
--- dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-SNAPSHOT.tar.gz.asc (added)
+++ dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-SNAPSHOT.tar.gz.asc Thu Oct 
24 09:12:17 2019
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCgAdFiEEJpWcIu5dRoIEnWUm4bfg8l5L9WsFAl2xZxwACgkQ4bfg8l5L
+9WtJng/+Kv6m1B4O1P3FYmC5x3Eq3r1zBRrp0hZdykmDDXUwstAOoZVfL6sGBdMX
+2fyo7lu5GFJOuwNuRgAb4oWNyShAvtRSQsHeDLdhbNUxxiWwY3x9IZH5pI3Uk1eD
+Pzm2cYzjOinGYHEeWRNLUnA5I1pIeZjmhLvEvE9u0k/oxu570pk0MaCwK7aeeXC4
+BbBa2q6BgLOy2+TPbxd88aIl4oZpTBXO+73bEGpW1K4eP3fPLKNgCDmPjYqpRC6z
+1OEMdinzveKZetPNyVP25BNJWDqEalkwJEmC/QJEr3SlPPT8lN12eHomKv5DHyHJ
+gCw9msdb4/RX5oM8/5n3WDjuwhuWVWiEMWlzrWTLtI9wcY4DuzPrXsHdsZPzQzgU
+Jfoh5w8cH7gC0KnrGsKPAGuABh13xXra3+verH1LJmXy5xg3nUnFN6vv8qGJcBVW
+r512rO2XmNgBmidQzuY+7JkdZCnDDg9Du77elzAG3QIVQDDgWGU1osV7ynmsMbiu
+XR8y6XJUIcmfaU3HpjqiIn4U3XTHx676JIPs8RNvMwfzlhoG+NgqfdjGu9SpVMCK
+HqrRfAY2e9XogGCPy4kx7V1Wug49kmIsELIl0dRGLQkfT4M6X21itkWo9nMM9V1M
++287rzDRzcHb6lvcx3l2Sk1DXHkcKDpn7VvpOdxPRFMlj924ZYU=
+=I+VG
+-END PGP SIGNATURE-

Added: dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-SNAPSHOT.tar.gz.sha512
==
--- dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-SNAPSHOT.tar.gz.sha512 (added)
+++ dev/spark/v3.0.0-preview-rc1-bin/SparkR_3.0.0-SNAPSHOT.tar.gz.sha512 Thu 
Oct 24 09:12:17 2019
@@ -0,0 +1,4 @@
+SparkR_3.0.0-SNAPSHOT.tar.gz: F8DBA96E 17909428 BD7A4B9B 2CC3EDDF C5C64440
+  4093CCFC F082154F 4C852AC8 35CE7218 CC855377
+  FBF55297 61F4E098 573892F4 F0EC97D3 708FD39C
+  EE0BA7AD

Added: dev/spark/v3.0.0-preview-rc1-bin/pyspark-3.0.0.dev0.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.0.0-preview-rc1-bin/pyspark-3.0.0.dev0.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.0.0-preview-rc1-bin/pyspark-3.0.0.dev0.tar.gz.asc
==
--- dev/spark/v3.0.0-preview-rc1-bin/pyspark-3.0.0.dev0.tar.gz.asc (added)
+++ dev/spark/v3.0.0-preview-rc1-bin/pyspark-3.0.0.dev0.tar.gz.asc Thu Oct 24 
09:12:17 2019
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCgAdFiEEJpWcIu5dRoIEnWUm4bfg8l5L9WsFAl2xZx0ACgkQ4bfg8l5L
+9WuFSxAArhO2UwczkBcIztFyEe5dDze7wnD072OQhFvoXKf+Zkn7VOW0ae1tEWcP
+wnqIei2KZK+Ro4hu0Iu8JeI5H/WAZhdkS3pxX/Cl7y9Yg2RFIfMqOtPQFdsFEPWt
+uwGePyvOM4Q/jL/+GaJCwyVsf4sWfDMj0qnuOHjMNPw7zrjPGas3vQYYsFNbiblf
+l0tt/kaW9Ic8PrScrn4liZxULZNZ/b5hEpCRFeyYZt1+FHUPXHI8Lvh9Rc5XcpLE
+ZxNUbrSkcQnMDoHL3P5+lnQWxJabEsXnI5AnVb3fwM72EoLb2jcOGQAzBvY1cs70
+d9N6dzn+Mn9UDMD6+yXOG5ruavdfpEb7kyfa040r3EufZVOXCjdHVWJWOtEU2C9R
+yimAjeq5w/s8rApYO4jMPJU7P/O3iMP3RdOIxM/MXNM1W6uCTFmi0w88eHLqWLgu
+4zdbS5g4UmCo9N55l7+SsBQMIlS2HPQIRwSz7spisM5Pp1Um/W7rDxJwNEcafuZ3
+b/MezuY5k2w5dIzyeHLcf9VCIzouvKnbLHvaFtIB3QoLkCqtZ+N244we3k+T8t1o
+C6McEJyqqdpTyk4nzdqf/W49MbLRWWEBldCcc1wQC0dLUFvrVHeS9zj/U71hpY53

[spark] branch master updated (70dd9c0 -> 0a70951)

2019-10-23 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 70dd9c0  [SPARK-29542][SQL][DOC] Make the descriptions of 
spark.sql.files.* be clearly
 add 0a70951  [SPARK-29499][CORE][PYSPARK] Add mapPartitionsWithIndex for 
RDDBarrier

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/rdd/RDDBarrier.scala| 22 +++
 .../org/apache/spark/rdd/RDDBarrierSuite.scala |  9 ++
 dev/sparktestsupport/modules.py|  1 +
 python/pyspark/rdd.py  | 14 ++
 .../test_group.py => tests/test_rddbarrier.py} | 32 --
 5 files changed, 64 insertions(+), 14 deletions(-)
 copy python/pyspark/{sql/tests/test_group.py => tests/test_rddbarrier.py} (56%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] annotated tag v3.0.0-preview-rc1 updated (f23c5d7 -> 7a3dad9)

2019-10-23 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to annotated tag v3.0.0-preview-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git.


*** WARNING: tag v3.0.0-preview-rc1 was modified! ***

from f23c5d7  (commit)
  to 7a3dad9  (tag)
 tagging f23c5d7f6705348ddeac0b714b29374cba3a4efe (commit)
 replaces 3.0.0-preview
  by Xingbo Jiang
  on Wed Oct 23 09:41:49 2019 +0200

- Log -
v3.0.0-preview-rc1
---


No new revisions were added by this update.

Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] annotated tag v3.0.0-preview-rc1 updated (322ec0b -> 7c11604)

2019-10-21 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to annotated tag v3.0.0-preview-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git.


*** WARNING: tag v3.0.0-preview-rc1 was modified! ***

from 322ec0b  (commit)
  to 7c11604  (tag)
 tagging 322ec0ba9ba75708cfe679368a43655de7b0e4f9 (commit)
  by Xingbo Jiang
  on Mon Oct 21 11:46:00 2019 +0200

- Log -
v3.0.0-preview-rc1
---


No new revisions were added by this update.

Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] annotated tag 3.0.0-preview updated (322ec0b -> 72c391e)

2019-10-17 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to annotated tag 3.0.0-preview
in repository https://gitbox.apache.org/repos/asf/spark.git.


*** WARNING: tag 3.0.0-preview was modified! ***

from 322ec0b  (commit)
  to 72c391e  (tag)
 tagging 322ec0ba9ba75708cfe679368a43655de7b0e4f9 (commit)
  by Xingbo Jiang
  on Thu Oct 17 19:04:08 2019 +0200

- Log -
3.0.0-preview
---


No new revisions were added by this update.

Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] annotated tag 3.0.0-preview updated (322ec0b -> 72c391e)

2019-10-17 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to annotated tag 3.0.0-preview
in repository https://gitbox.apache.org/repos/asf/spark.git.


*** WARNING: tag 3.0.0-preview was modified! ***

from 322ec0b  (commit)
  to 72c391e  (tag)
 tagging 322ec0ba9ba75708cfe679368a43655de7b0e4f9 (commit)
  by Xingbo Jiang
  on Thu Oct 17 19:04:08 2019 +0200

- Log -
3.0.0-preview
---


No new revisions were added by this update.

Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0-preview created (now 02c5b4f)

2019-10-15 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch branch-3.0-preview
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at 02c5b4f  [SPARK-28947][K8S] Status logging not happens at an interval 
for liveness

No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (d2f21b0 -> 56a3beb)

2019-10-07 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d2f21b0  [SPARK-27468][CORE] Track correct storage level of RDDs and 
partitions
 add 56a3beb  [SPARK-27492][DOC][FOLLOWUP] Update resource scheduling user 
docs

No new revisions were added by this update.

Summary of changes:
 docs/configuration.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [SPARK-29263][SCHEDULER] Update `availableSlots` in `resourceOffers()` before checking available slots for barrier taskSet

2019-09-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 99e503c  [SPARK-29263][SCHEDULER] Update `availableSlots` in 
`resourceOffers()` before checking available slots for barrier taskSet
99e503c is described below

commit 99e503cebfd9cb19372c88b0dd70c6743f864454
Author: Juliusz Sompolski 
AuthorDate: Fri Sep 27 11:18:32 2019 -0700

[SPARK-29263][SCHEDULER] Update `availableSlots` in `resourceOffers()` 
before checking available slots for barrier taskSet

### What changes were proposed in this pull request?

availableSlots are computed before the for loop looping over all TaskSets 
in resourceOffers. But the number of slots changes in every iteration, as in 
every iteration these slots are taken. The number of available slots checked by 
a barrier task set has therefore to be recomputed in every iteration from 
availableCpus.

### Why are the changes needed?

Bugfix.
This could make resourceOffer attempt to start a barrier task set, even 
though it has not enough slots available. That would then be caught by the 
`require` in line 519, which will throw an exception, which will get caught and 
ignored by Dispatcher's MessageLoop, so nothing terrible would happen, but the 
exception would prevent resourceOffers from considering further TaskSets.
Note that launching the barrier TaskSet can still fail if other 
requirements are not satisfied, and still can be rolled-back by throwing 
exception in this `require`. Handling it more gracefully remains a TODO in 
SPARK-24818, but this fix at least should resolve the situation when it's 
unable to launch because of insufficient slots.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Added UT

Closes #23375

Closes #25946 from juliuszsompolski/SPARK-29263.

Authored-by: Juliusz Sompolski 
Signed-off-by: Xingbo Jiang 
(cherry picked from commit 420abb457df0f422f73bab19a6ed6d7c6bab3173)
Signed-off-by: Xingbo Jiang 
---
 .../apache/spark/scheduler/TaskSchedulerImpl.scala |  2 +-
 .../org/apache/spark/scheduler/FakeTask.scala  | 36 +++
 .../spark/scheduler/TaskSchedulerImplSuite.scala   | 51 --
 3 files changed, 65 insertions(+), 24 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala 
b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
index e194b79..38dbbe7 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
@@ -391,7 +391,6 @@ private[spark] class TaskSchedulerImpl(
 // Build a list of tasks to assign to each worker.
 val tasks = shuffledOffers.map(o => new 
ArrayBuffer[TaskDescription](o.cores / CPUS_PER_TASK))
 val availableCpus = shuffledOffers.map(o => o.cores).toArray
-val availableSlots = shuffledOffers.map(o => o.cores / CPUS_PER_TASK).sum
 val sortedTaskSets = rootPool.getSortedTaskSetQueue
 for (taskSet <- sortedTaskSets) {
   logDebug("parentName: %s, name: %s, runningTasks: %s".format(
@@ -405,6 +404,7 @@ private[spark] class TaskSchedulerImpl(
 // of locality levels so that it gets a chance to launch local tasks on 
all of them.
 // NOTE: the preferredLocality order: PROCESS_LOCAL, NODE_LOCAL, NO_PREF, 
RACK_LOCAL, ANY
 for (taskSet <- sortedTaskSets) {
+  val availableSlots = availableCpus.map(c => c / CPUS_PER_TASK).sum
   // Skip the barrier taskSet if the available slots are less than the 
number of pending tasks.
   if (taskSet.isBarrier && availableSlots < taskSet.numTasks) {
 // Skip the launch process.
diff --git a/core/src/test/scala/org/apache/spark/scheduler/FakeTask.scala 
b/core/src/test/scala/org/apache/spark/scheduler/FakeTask.scala
index b29d32f..abc8841 100644
--- a/core/src/test/scala/org/apache/spark/scheduler/FakeTask.scala
+++ b/core/src/test/scala/org/apache/spark/scheduler/FakeTask.scala
@@ -42,15 +42,23 @@ object FakeTask {
* locations for each task (given as varargs) if this sequence is not empty.
*/
   def createTaskSet(numTasks: Int, prefLocs: Seq[TaskLocation]*): TaskSet = {
-createTaskSet(numTasks, stageAttemptId = 0, prefLocs: _*)
+createTaskSet(numTasks, stageId = 0, stageAttemptId = 0, priority = 0, 
prefLocs: _*)
   }
 
-  def createTaskSet(numTasks: Int, stageAttemptId: Int, prefLocs: 
Seq[TaskLocation]*): TaskSet = {
-createTaskSet(numTasks, stageId = 0, stageAttemptId, prefLocs: _*)
+  def createTaskSet(
+  numTasks: Int,
+  stageId: Int,
+  stageAttemptId: Int,
+  prefLocs: Seq[TaskLocation]*): TaskSet = {
+

[spark] branch master updated (fda0e6e -> 420abb4)

2019-09-27 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fda0e6e  [SPARK-29240][PYTHON] Pass Py4J column instance to support 
PySpark column in element_at function
 add 420abb4  [SPARK-29263][SCHEDULER] Update `availableSlots` in 
`resourceOffers()` before checking available slots for barrier taskSet

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/scheduler/TaskSchedulerImpl.scala |  2 +-
 .../org/apache/spark/scheduler/FakeTask.scala  | 36 +++
 .../spark/scheduler/TaskSchedulerImplSuite.scala   | 51 --
 3 files changed, 65 insertions(+), 24 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-28561][WEBUI] DAG viz for barrier-execution mode

2019-08-12 Thread jiangxb1987

This is an automated email from the ASF dual-hosted git repository.

jiangxb1987 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 247bebc  [SPARK-28561][WEBUI] DAG viz for barrier-execution mode
247bebc is described below

commit 247bebcf94df77883ac245aea63e7e871fc7aa44
Author: Kousuke Saruta 
AuthorDate: Mon Aug 12 22:38:10 2019 -0700

[SPARK-28561][WEBUI] DAG viz for barrier-execution mode

## What changes were proposed in this pull request?

In the current UI, we cannot identify which RDDs are barrier. Visualizing 
it will make easy to debug.
Following images are shown after this change.

![Screenshot from 2019-07-30 
16-30-35](https://user-images.githubusercontent.com/4736016/62110508-83cec100-b2e9-11e9-83b9-bc2e485a4cbe.png)
![Screenshot from 2019-07-30 
16-31-09](https://user-images.githubusercontent.com/4736016/62110509-83cec100-b2e9-11e9-9e2e-47c4dae23a52.png)

The boxes in pale green mean barrier (We might need to discuss which color 
is proper).

## How was this patch tested?

Tested manually.
The images above are shown by following operations.

```
val rdd1 = sc.parallelize(1 to 10)
val rdd2 = sc.parallelize(1 to 10)
val rdd3 = rdd1.zip(rdd2).barrier.mapPartitions(identity(_))
val rdd4 = rdd3.map(identity(_))
val rdd5 = rdd4.reduceByKey(_+_)
rdd5.collect
```

Closes #25296 from sarutak/barrierexec-dagviz.

Authored-by: Kousuke Saruta 
Signed-off-by: Xingbo Jiang 
---
 .../org/apache/spark/ui/static/spark-dag-viz.css   | 12 +
 .../org/apache/spark/ui/static/spark-dag-viz.js|  6 +
 .../scala/org/apache/spark/status/storeTypes.scala |  4 ++-
 .../scala/org/apache/spark/storage/RDDInfo.scala   |  3 ++-
 .../main/scala/org/apache/spark/ui/UIUtils.scala   |  3 +++
 .../apache/spark/ui/scope/RDDOperationGraph.scala  | 30 +-
 .../scala/org/apache/spark/util/JsonProtocol.scala |  4 ++-
 .../spark/status/AppStatusListenerSuite.scala  |  6 ++---
 .../org/apache/spark/storage/StorageSuite.scala|  4 +--
 .../spark/ui/scope/RDDOperationGraphSuite.scala| 10 
 .../org/apache/spark/util/JsonProtocolSuite.scala  |  7 ++---
 11 files changed, 67 insertions(+), 22 deletions(-)

diff --git 
a/core/src/main/resources/org/apache/spark/ui/static/spark-dag-viz.css 
b/core/src/main/resources/org/apache/spark/ui/static/spark-dag-viz.css
index 9cc5c79..1fbc90b 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/spark-dag-viz.css
+++ b/core/src/main/resources/org/apache/spark/ui/static/spark-dag-viz.css
@@ -89,6 +89,12 @@
   stroke-width: 2px;
 }
 
+#dag-viz-graph svg.job g.cluster.barrier rect {
+  fill: #B4E9E2;
+  stroke: #32DBC6;
+  stroke-width: 2px;
+}
+
 /* Stage page specific styles */
 
 #dag-viz-graph svg.stage g.cluster rect {
@@ -123,6 +129,12 @@
   stroke-width: 2px;
 }
 
+#dag-viz-graph svg.stage g.cluster.barrier rect {
+  fill: #84E9E2;
+  stroke: #32DBC6;
+  stroke-width: 2px;
+}
+
 .tooltip-inner {
   white-space: pre-wrap;
 }
diff --git 
a/core/src/main/resources/org/apache/spark/ui/static/spark-dag-viz.js 
b/core/src/main/resources/org/apache/spark/ui/static/spark-dag-viz.js
index cf508ac..035d72f 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/spark-dag-viz.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/spark-dag-viz.js
@@ -172,6 +172,12 @@ function renderDagViz(forJob) {
 svg.selectAll("g." + nodeId).classed("cached", true);
   });
 
+  metadataContainer().selectAll(".barrier-rdd").each(function() {
+var rddId = d3.select(this).text().trim()
+var clusterId = VizConstants.clusterPrefix + rddId
+svg.selectAll("g." + clusterId).classed("barrier", true)
+  });
+
   resizeSvg(svg);
   interpretLineBreak(svg);
 }
diff --git a/core/src/main/scala/org/apache/spark/status/storeTypes.scala 
b/core/src/main/scala/org/apache/spark/status/storeTypes.scala
index eea47b3..9da5bea 100644
--- a/core/src/main/scala/org/apache/spark/status/storeTypes.scala
+++ b/core/src/main/scala/org/apache/spark/status/storeTypes.scala
@@ -402,7 +402,9 @@ private[spark] class RDDOperationClusterWrapper(
 val childClusters: Seq[RDDOperationClusterWrapper]) {
 
   def toRDDOperationCluster(): RDDOperationCluster = {
-val cluster = new RDDOperationCluster(id, name)
+val isBarrier = childNodes.exists(_.barrier)
+val name = if (isBarrier) this.name + "\n(barrier mode)" else this.name
+val cluster = new RDDOperationCluster(id, isBarrier, name)
 childNodes.foreach(cluster.attachChildNode)
 childClusters.foreach { child =>
   cluster.attachChildCluster(child.toRDDOperationCluster())
diff --git a/core/src/main/scala/org/apache/spark/storage/RDDInfo.scala 
b/core/src/main/scala/org/apache/spark/sto

1 2 >

1 - 100 of 108 matches

Mail list logo