Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/20091
@jiangxb1987 I am not disagreeing with your hypothesis that default
parallelism might not be optimal in all cases within an application (example -
when different RDD's in application have widely varying cardinalities).
Since spark.default.parallelism is an exposed interface, which applications
depend on, changing the semantics here will be a regression in terms of
functionality and will be breaking an exposed contract in spark.
This is why we have the option of explicitly overriding number of
partitions when default does not work well.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]