Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/3302#issuecomment-63398241
I see, after a while we unconditionally try to spill every 32 elements
regardless of whether the in-memory buffer has exceeded the spill threshold.
This is a serious problem and it seems that this easy fix is just an omission
in the original code since we don't ever update `elementsRead` ever in this
code path. Changes here LGTM.
I think this is the first step towards fixing the too many files open issue
that many are seeing. We still need to hunt down the root cause for why the
lower bound for how much memory a data structure can have is not being
accounted for properly.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]