squito commented on a change in pull request #25951: [SPARK-28917][CORE] 
Synchronize access to RDD mutable state.
URL: https://github.com/apache/spark/pull/25951#discussion_r329252907
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/rdd/RDD.scala
 ##########
 @@ -223,12 +236,12 @@ abstract class RDD[T: ClassTag](
   }
 
   /** Get the RDD's current storage level, or StorageLevel.NONE if none is 
set. */
-  def getStorageLevel: StorageLevel = storageLevel
+  def getStorageLevel: StorageLevel = stateLock.synchronized { storageLevel }
 
 Review comment:
   I will be honest, I don't really have a good understanding of how the 
mutability of storageLevel could be an actual problem.  I do know of an 
instance where moving a `.cache()` before the creation of a Future which 
submitted a job fixed a hanging job.  Unfortunately I have very little info 
about what else was going on in that job.
   
   I have audited the uses of `storageLevel`, like`getOrCompute`, and I think 
its OK -- there its passing in a local copy, and anyway that whole function is 
running on the executor where the storageLevel is immutable.
   
   another tricky one is `DAGScheduler.getCacheLocs`, which reads both storage 
level and `rdd.partitions`.  that gets exposed via 
`SparkContext.getPreferredLocs`, and used in `CoalescedRDD` and 
`PartitionAwareUnionRDD`.  but I still can't come up with an explanation for 
the behavior we see.  I will try to poke some more.
   
   I could also submit a change which just covers `partitions` and 
`dependencies` for now, since that is clear, though I'd like to understand the 
problem w/ storageLevel as well.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to