Github user zecevicp commented on the issue: https://github.com/apache/spark/pull/21109 Implementing spilling over seems a lot of work because this is a queue. If data is spilled over to disk and you need to pop from the queue, it is not clear to me what is the best way to do that. Do you spill over only one part of the queue (so that you can add or pop more efficiently)? Which part (the beginning or the end)? Or maybe the middle? What is the threshold to bring it back to memory from disk? And other similar questions... But I think it can be expected that much less memory will be consumed by the queue, compared to the original `ExternalAppendOnlyUnsafeRowArray`, because the queue's purpose IS to reduce the number of rows in memory, so spill-over would rarely be needed (that would depend, of course, to the user's range condition). That's why implementing spilling over doesn't seem critical to me. I can try and implement it, if everybody thinks it's really needed, but as I said, it's not clear (to me) what would be the best approach. Regarding the second point, this is not an ordinary range join, but an equi-join with a secondary range condition.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org