Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11242#discussion_r56886896
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/UnionRDD.scala ---
    @@ -62,7 +64,23 @@ class UnionRDD[T: ClassTag](
         var rdds: Seq[RDD[T]])
       extends RDD[T](sc, Nil) {  // Nil since we implement getDependencies
     
    +  // Evaluate partitions in parallel. Partitions of each rdd will be 
cached by the `partitions`
    +  // val in `RDD`.
    +  private[spark] lazy val parallelPartitionEval: Boolean = {
    --- End diff --
    
    If an impl is doing that and it isn't thread-safe, surely that's a busted 
implementation? you can't expect that no two instances of the class in the same 
JVM would not be accessed concurrently (unless you design for it of course). At 
the least, no reading of data happens in this path anyway? I'm trying to avoid 
designing around this, since this is a flag that would be on by default, 
hidden, but it's not clear how someone would reasonably figure out this was the 
problem and this was the solution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to