[GitHub] spark pull request #22371: [SPARK-25386][CORE] Don't need to synchronize the...

squito Mon, 10 Sep 2018 09:27:15 -0700

Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22371#discussion_r216387272
  
    --- Diff: 
core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala ---
    @@ -138,13 +154,22 @@ private[spark] class IndexShuffleBlockResolver(
           mapId: Int,
           lengths: Array[Long],
           dataTmp: File): Unit = {
    +    val mapLocks = shuffleIdToLocks.get(shuffleId)
    +    require(mapLocks != null, "Shuffle should be registered to 
IndexShuffleBlockResolver first")
    +    val lock = mapLocks.synchronized {
    --- End diff --
    
    in the usual case, multiple threads are still sharing the same shuffleID, 
they're just writing to different map tasks.  (they are in the simple example 
job @ConeyLiu  gave, of `spark.range(0, 10000000, 1, 
100).repartition(200).count()`)



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22371: [SPARK-25386][CORE] Don't need to synchronize the...

Reply via email to