[GitHub] spark issue #20091: [SPARK-22465][FOLLOWUP] Update the number of partitions ...

jiangxb1987 Fri, 29 Dec 2017 05:28:50 -0800

Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/20091
  
    As now used in the function `defaultPartitioner()`, `defaultParallelism` 
only take effect when `spark.default.parallelism` is explicitly set. Previously 
before #20002 , if there is any existing partitioner in upstream RDDs, we won't 
create a new partitioner using `defaultParallelism`. But after that change, we 
may create a new partition whose number of partitions is `defaultParallelism` 
when the safety-check failed and `spark.default.parallelism` is explicitly set, 
but `defaultParallelism` can be smaller than the `numPartitions` of the 
existing partitioner, so the new partitioner should still fail the 
safety-check. I'm proposing in the regression case described above, we should 
still use the existing partitioner, instead of create a new partitioner which 
have less number of partitions.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #20091: [SPARK-22465][FOLLOWUP] Update the number of partitions ...

Reply via email to