Github user mingyukim commented on the pull request:
https://github.com/apache/spark/pull/4420#issuecomment-73303667
Can you elaborate on the "memory size as an additional heuristic" idea?
This is consistently causing OOMs in one of our workflows, which is exactly
what spilling to disk is supposed to handle. I'm happy to work on it on my end
if you have suggestions.
A few ideas off the top of my head are,
- Have a threshold on {currentMemory - myMemoryThreshold} value so it tries
to spill if the difference gets big enough.
- In fact, why not remove the entire threshold check just like how it was
originally suggested in #3656? You can tweak how often the spill is done by
setting a minimum on the amount of memory you request from
ShuffleMemoryManager. Then, you're guaranteed that the spill files are not too
small. You still get too many files? Well.. that's unavoidable. Your shuffle is
really big, so you'd have to spill a lot. Otherwise, your JVM will OOM.
Basically, I don't think trackMemoryThreshold and trackMemoryFrequency are the
right way to control your spill frequency or spill file size, since it can lead
to OOMs when each element is large.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]