[GitHub] spark pull request #22112: [SPARK-23243][Core] Fix RDD.repartition() data co...

cloud-fan Fri, 24 Aug 2018 07:50:59 -0700

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22112#discussion_r212654451
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
    @@ -1865,6 +1871,51 @@ abstract class RDD[T: ClassTag](
       // RDD chain.
       @transient protected lazy val isBarrier_ : Boolean =
         dependencies.filter(!_.isInstanceOf[ShuffleDependency[_, _, 
_]]).exists(_.rdd.isBarrier())
    +
    +  /**
    +   * Returns the random level of this RDD's output. Please refer to 
[[RandomLevel]] for the
    +   * definition.
    +   *
    +   * By default, an RDD without parents(root RDD) is IDEMPOTENT. For RDDs 
with parents, we will
    +   * generate a random level candidate per parent according to the 
dependency. The random level
    +   * of the current RDD is the random level candidate that is random most.
    +   */
    +  // TODO: make it public so users can set random level to their custom 
RDDs.
    +  // TODO: this can be per-partition. e.g. UnionRDD can have different 
random level for different
    +  // partitions.
    +  private[spark] def outputRandomLevel: RandomLevel.Value = {
    +    val randomLevelCandidates = dependencies.map {
    --- End diff --
    
    @mridulm I didn't add
    ```
    // if same partitioner, then shuffle not done.
    case dep: ShuffleDependency[_, _, _] if dep.partitioner == partitioner => 
dep.rdd.computingRandomLevel
    ```
    IIUC this condition is always true for `ShuffledRDD`?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22112: [SPARK-23243][Core] Fix RDD.repartition() data co...

Reply via email to