On Tue, Dec 31, 2024 at 6:07 PM Tomas Vondra <to...@vondra.me> wrote: > > This means that ultimately it's either (1) or (3), and the more I've > been looking into this the more I prefer (1), for a couple reasons: > > * It's much simpler (it doesn't really change anything on the basic > behavior, doesn't introduce any new files or anything like that. > > * There doesn't seem to be major difference in total memory consumption > between the two approaches. Both allow allocating more memory. > > * It actually helps with the "indivisible batch" case - it relaxes the > limit, so there's a chance the batch eventually fits and we stop adding > more and more batches. With spill files that's not the case - we still > keep the original limit, and we end up with the batch explosion (but > then we handle it much more efficiently).
Okay, I've read all the patches proposed in this mail and most of the downthread ideas, and I want to cast my vote for option 1. I find the design of 3 too complicated for what it buys us. The slices make it harder to understand how the system works. The current 1-1 relationship in master between batches and spill files is easier to reason about. With the slices, I'm imagining trying to understand why we, for example, have to move tuples from batch 4 just because batch 5 got too big for the hashtable. I think something like this might be worth it if it solved the problem entirely, but if it is just a somewhat better coping mechanism, I don't think it is worth it. I was excited about your raw file experiment. As Robert and you point out -- we may need a file per batch, but for most of the hash join's execution we don't need to keep buffers for each batch around. However, given that the experiment didn't yield great results and we haven't come up with an alternative solution with sufficiently few flaws, I'm still in favor of 1. The part of 1 I struggled to understand is the math in ExecHashExceededMemoryLimit(). I think the other email you sent with the charts and diagonals is about choosing the optimal hashtable size and number of batches (when to stop growing the number of batches and just increase the size of the hashtable). So, I'll dive into that. > One thing I'm not sure about yet is whether this needs to tweak the > hashjoin costing to also consider the files when deciding how many > batches to use. Maybe it should? I think it definitely should. The ExecChooseHashTableSize() calculations look similar to what we use to calculate spaceAllowed, so it makes sense that we would consider buffile sizes if we are counting those in spaceUsed now. - Melanie