Github user tgravescs commented on the issue:
https://github.com/apache/spark/pull/21527
we can definitely update the description with more details.
Personally I'm not fond of any hardcoded magic number like this that you
could override with at least a internal config (meaning leaving it undocumented
and only special case). It gives you a way to easily change something without
the user having to change code, redeploy jar, and then run again. You can
simply change the config and rerun. It also allows for easier experimentation.
Changing the # of partitions has other side affects, whether good or bad is
situation dependent. It can be worse are you could be increasing # of output
files when you don't want to be, affects the # of tasks needs and thus
executors to run in parallel, etc.
If no one else has seen a situation for this, I'm ok with closing for now
until we have more concrete data. Which really perhaps should be turned into
just improving it in general so we don't need 2 kinds.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]