Github user davies commented on the issue:
https://github.com/apache/spark/pull/15722
@jiexiong The longArray will not grow indefinitely, it only grow when the
number of keys reach 50% of it's size. Another assumption is that the memory
used by longArray should be much smaller than the pages (longArray take 32
bytes per key, the pages take 56 bytes for a aggregate with 3 grouping key and
1 aggregate) . Is that true for your workload?
If the bookkeeping in the memory manager is right, it may do more spilling
(because longArray is using more memory than expected), should not OOM. It's
true that this patch could fix the OOM you saw in that query, but changing the
memory factor (or other configs) should also fix that, I'm worrying there could
be another bug in other places that cause the problem than this one. Could you
dump logging how the memory is used when OOM happened?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]