JoshRosen commented on issue #25084: [SPARK-28314][SQL] Use the same 
MemoryManager when building HashedRelation
URL: https://github.com/apache/spark/pull/25084#issuecomment-511226643
 
 
   I don't think that picking an _arbitrary_ high limit like 10,000 is a good 
idea. Instead, I think we should try to gain a deeper understanding of the 
circumstances where we might attempt to allocate, fail, then retry and fail 
again.
   
   My intuition is that an allocation failure followed by a spill followed by 
another allocation failure means that we're doomed such that a third retry 
won't succeed. I _think_ (but don't remember offhand) that when a memory 
consumer asks another consumer to spill then the spilled memory is atomically 
transferred to the consumer that requested the spill, so it seems unlikely that 
we'd get into a situation where we need more than a single spill to 
successfully allocate memory (because if spilling back-to-back can succeed then 
why wouldn't we have just spilled all of that memory in the first spill call?).
   
   It's admittedly been a couple of years since I've touched this part of the 
code, though, so my understanding could be outdated.
   
   If we're able to determine that back-to-back spills won't succeed, though, 
then we could potentially avoid this depth-limit constant choice problem and 
instead just bound it at 1 or 2.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to