Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/20091
As now used in the function `defaultPartitioner()`, `defaultParallelism`
only take effect when `spark.default.parallelism` is explicitly set. Previously
before #20002 , if there is any existing partitioner in upstream RDDs, we won't
create a new partitioner using `defaultParallelism`. But after that change, we
may create a new partition whose number of partitions is `defaultParallelism`
when the safety-check failed and `spark.default.parallelism` is explicitly set,
but `defaultParallelism` can be smaller than the `numPartitions` of the
existing partitioner, so the new partitioner should still fail the
safety-check. I'm proposing in the regression case described above, we should
still use the existing partitioner, instead of create a new partitioner which
have less number of partitions.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]