cloud-fan commented on pull request #28778:
URL: https://github.com/apache/spark/pull/28778#issuecomment-645167188


   After more thoughts, I think the file partitions split logic itself is 
problematic. Its target is to make the number of partitions the same as the 
total number of cores, which doesn't make sense as the cluster may only have a 
few free cores.
   
   I think a proper way is to set an expected size of each partition, like 
64mb. This is also what we do when coalescing shuffle partitions in AQE. Can we 
add such a config?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to