Github user zecevicp commented on the issue:

    https://github.com/apache/spark/pull/21109
  
    Implementing spilling over seems a lot of work because this is a queue. If 
data is spilled over to disk and you need to pop from the queue, it is not 
clear to me what is the best way to do that. Do you spill over only one part of 
the queue (so that you can add or pop more efficiently)? Which part (the 
beginning or the end)? Or maybe the middle? What is the threshold to bring it 
back to memory from disk? And other similar questions...
    But I think it can be expected that much less memory will be consumed by 
the queue, compared to the original `ExternalAppendOnlyUnsafeRowArray`, because 
the queue's purpose IS to reduce the number of rows in memory, so spill-over 
would rarely be needed (that would depend, of course, to the user's range 
condition). 
    That's why implementing spilling over doesn't seem critical to me. I can 
try and implement it, if everybody thinks it's really needed, but as I said, 
it's not clear (to me) what would be the best approach.
    
    Regarding the second point, this is not an ordinary range join, but an 
equi-join with a secondary range condition.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to