Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/20091
  
    As now used in the function `defaultPartitioner()`, `defaultParallelism` 
only take effect when `spark.default.parallelism` is explicitly set. Previously 
before #20002 , if there is any existing partitioner in upstream RDDs, we won't 
create a new partitioner using `defaultParallelism`. But after that change, we 
may create a new partition whose number of partitions is `defaultParallelism` 
when the safety-check failed and `spark.default.parallelism` is explicitly set, 
but `defaultParallelism` can be smaller than the `numPartitions` of the 
existing partitioner, so the new partitioner should still fail the 
safety-check. I'm proposing in the regression case described above, we should 
still use the existing partitioner, instead of create a new partitioner which 
have less number of partitions.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to