[GitHub] spark pull request #20091: [SPARK-22465][FOLLOWUP] Update the number of part...

mridulm Sat, 20 Jan 2018 01:14:44 -0800

Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20091#discussion_r162778187
  
    --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
    @@ -43,17 +43,19 @@ object Partitioner {
       /**
        * Choose a partitioner to use for a cogroup-like operation between a 
number of RDDs.
        *
    -   * If any of the RDDs already has a partitioner, and the number of 
partitions of the
    -   * partitioner is either greater than or is less than and within a 
single order of
    -   * magnitude of the max number of upstream partitions, choose that one.
    +   * If spark.default.parallelism is set, we'll use the value of 
SparkContext defaultParallelism
    +   * as the default partitions number, otherwise we'll use the max number 
of upstream partitions.
        *
    -   * Otherwise, we use a default HashPartitioner. For the number of 
partitions, if
    -   * spark.default.parallelism is set, then we'll use the value from 
SparkContext
    -   * defaultParallelism, otherwise we'll use the max number of upstream 
partitions.
    +   * If any of the RDDs already has a partitioner, and the partitioner is 
an eligible one (with a
    +   * partitions number that is not less than the max number of upstream 
partitions by an order of
    +   * magnitude), or the number of partitions is larger than the default 
one, we'll choose the
    +   * exsiting partitioner.
    --- End diff --
    
    We should rephrase this for clarity.
    How about
    "When available, we choose the partitioner from rdds with maximum number of 
partitions. If this partitioner is eligible (number of partitions within an 
order of maximum number of partitions in rdds), or has partition number higher 
than default partitions number - we use this partitioner"



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20091: [SPARK-22465][FOLLOWUP] Update the number of part...

Reply via email to