JoshRosen commented on issue #25084: [SPARK-28314][SQL] Use the same MemoryManager when building HashedRelation URL: https://github.com/apache/spark/pull/25084#issuecomment-511226643 I don't think that picking an _arbitrary_ high limit like 10,000 is a good idea. Instead, I think we should try to gain a deeper understanding of the circumstances where we might attempt to allocate, fail, then retry and fail again. My intuition is that an allocation failure followed by a spill followed by another allocation failure means that we're doomed such that a third retry won't succeed. I _think_ (but don't remember offhand) that when a memory consumer asks another consumer to spill then the spilled memory is atomically transferred to the consumer that requested the spill, so it seems unlikely that we'd get into a situation where we need more than a single spill to successfully allocate memory (because if spilling back-to-back can succeed then why wouldn't we have just spilled all of that memory in the first spill call?). It's admittedly been a couple of years since I've touched this part of the code, though, so my understanding could be outdated. If we're able to determine that back-to-back spills won't succeed, though, then we could potentially avoid this depth-limit constant choice problem and instead just bound it at 1 or 2.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
