Hi all, while working on a prototype of memory-bounded hash aggregate (alternative to Jeff's patch discussed in [1]), I ran into difficulties when dealing with dynahash. So I'm asking for help ...
Some of the difficulties stem from my limited knowledge of dynahash, and complexity in execGrouping.c, but it seems to me some are actually due to a mismatch with the proposed hashjoin-like approach to batching. The hashjoin-like batching essentially does this: 1) put data (in this case aggregate states) into a hash table, until a memory limit is reached 2) double the number of batches and move half the entries from the hash table (based on batch number, computed from the hash value) 3) repeat This however requires two things, that I think seem quite difficult to do with dynahash: (a) detecting that a memory limit was reached This is usually done with a condition like this: if (consumed_memory > work_mem) { ... do the batching / remove ~50% of data from hash table ... it's expected counsumed_memory drops by 50% } Where consumed memory is the size of a memory context (cheap to compute, thanks to another patch from Jeff [2]). This however does not work with dynahash, because dynahash does not release memory for removed tuples - it just moves it to a freelist, so consumed_memory only grows. For a while I thought I could do this: if (consumed_memory > consumed_memory_prev) { ... consumed_memory_prev = consumed_memory } But then I found out dynahash does not grow continuously, but in (quite large) steps. Exceeding the limit a bit is not a big deal, but the growth is quite fast and quickly leads to allocating much more than the limit. (b) removing tuples while batching Whenever the number of batches is increased (doubled), I need to walk through the hash table and remove entries not belonging to the current batch (should be ~50% of them). The only way to do this with dynahash sems to be iterating over the entries, and then doing another search with HASH_REMOVE. Is there a better way? I've been considering switching this to a custom hash table (similar to what's used in hashjoin https://commitfest.postgresql.org/action/patch_view?id=1494), which seems like a better solution for this use-case, but I'm not sure about this. I'm not a big fan of replacing large amounts of code for no good reason. Opinions? I'd be grateful if someone more knowledgeable of dynahash / the way it's used in execGrouping could review the prototype I have so far. There's only a handful of functions related to dynahash, and most of the issues I have is about passing the values properly (slots, dummy values, tuples). regards Toms [1] http://www.postgresql.org/message-id/1407706010.6623.16.camel@jeff-desktop [2] http://www.postgresql.org/message-id/1407012053.15301.53.camel@jeff-desktop -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers