Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/20091
The major concern is that `spark.default.parallelism` usually is set a
relatively small value, so in case the safety-check failed, the value of
`defaultParallelism` can even be smaller than the number of partitions of the
existing partitioner, this is the regression case I want to fix in this PR.
A further more issue is that, we should rethink whether we should rely on
`defaultParallelism` to determine the numPartitions of the default partitioner,
or the number of partitions should be determined completely dynamistic by the
upstream RDDs? The current same-as-defaultParallelism way is really prone to
cause OOM during shuffle stage.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]