Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/21527
  
    we can definitely update the description with more details.
    
    Personally I'm not fond of any hardcoded magic number like this that you 
could override with at least a internal config (meaning leaving it undocumented 
and only special case).  It gives you a way to easily change something without 
the user having to change code, redeploy jar, and then run again.  You can 
simply change the config and rerun. It also allows for easier experimentation.  
Changing the # of partitions has other side affects, whether good or bad is 
situation dependent.   It can be worse are you could be increasing # of output 
files when you don't want to be, affects the # of tasks needs and thus 
executors to run in parallel, etc.
    
    If no one else has seen a situation for this, I'm ok with closing for now 
until we have more concrete data. Which really perhaps should be turned into 
just improving it in general so we don't need 2 kinds.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to