Github user rxin commented on the issue:
https://github.com/apache/spark/pull/22010
Thanks for pinging. Please don't merge this until you've addressed the OOM
issue. The aggregators were created to handle incoming data larger than size of
memory. We should never use a Scala or Java hash set to put all the data in.--- --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
