Github user mingyukim commented on the pull request:
https://github.com/apache/spark/pull/4420#issuecomment-74972296
Thanks for the response. To be clear, I understand the hesitation with
exposing knobs. So, my proposal was to throttle the frequency of spills by how
much memory is acquired from the shuffle memory manager at a time (e.g. if you
ask 100MB at a time, you won't have spill files smaller than 100MB), but I
understand that this will also need some tuning depending on the executor heap
size.
That said, I'm trying to check if your simpler proposal of effectively
setting `trackMemoryThreshold=0` will fix the particular workflow we have. If
that fixes our problem and @andrewor14 says this is good to go, I'm fine with
this as a fix for now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]