On Wed, Feb 1, 2017 at 2:10 AM, Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> wrote: >> >> 0003-hj-refactor-memory-accounting-v4.patch: >> [...] >> > I looked at this patch. I agree that it accounts the memory usage more > accurately. Here are few comments.
Thanks for the review! > spaceUsed is defined with comment > Size spaceUsed; /* memory space currently used by tuples */ > > In ExecHashTableCreate(), although the space is allocated for buckets, no > tuples are yet inserted, so no space is used by the tuples, so going strictly > by the comment, spaceUsed should be 0 in that function. But I think the patch > is accounting the spaceUsed more accurately. Without this patch, the actual > allocation might cross spaceAllowed without being noticed. With this patch > that's not possible. Probably we should change the comment to say memory space > currently allocated. Right, that comment is out of date. It is now the space used by the bucket array and the tuples. I will fix that in the next version. > However, ExecHashIncreaseNumBatches() may change the > number of buckets; the patch does not seem to account for spaceUsed changes > because of that. That's what this hunk is intended to do: @@ -795,6 +808,12 @@ ExecHashIncreaseNumBuckets(HashJoinTable hashtable) TRACE_POSTGRESQL_HASH_INCREASE_BUCKETS(hashtable->nbuckets, hashtable->nbuckets_optimal); + /* account for the increase in space that will be used by buckets */ + hashtable->spaceUsed += sizeof(HashJoinTuple) * + (hashtable->nbuckets_optimal - hashtable->nbuckets); + if (hashtable->spaceUsed > hashtable->spacePeak) + hashtable->spacePeak = hashtable->spaceUsed; + hashtable->nbuckets = hashtable->nbuckets_optimal; hashtable->log2_nbuckets = hashtable->log2_nbuckets_optimal; It knows that spaceUsed already includes the old bucket array (nbuckets), so it figures out how much bigger the new bucket array will be (nbuckets_optimal - nbuckets) and adds that. > Without this patch ExecHashTableInsert() used to account for the space used by > a single tuple inserted. The patch moves this calculation in dense_alloc() and > accounts for out-of-bound allocation for larger tuples. That's good. > > The change in ExecChooseHashTableSize() too looks fine. > > In ExecHashTableReset(), do we want to update spacePeak while setting > spaceUsed. I figured there was no way that the new spaceUsed value could be bigger than spacePeak, because we threw out all chunks and have just the bucket array, and we had that number of buckets before, so spacePeak must at least have been set to a number >= this number either when we expanded to this many buckets, or when we created the hashtable in the first place. Perhaps I should Assert(hashtable->spaceUsed <= hashtable->spacePeak). > While this patch tracks space usage more accurately, I am afraid we might be > overdoing it; a reason why we don't track space usage accurately now. But I > think I will leave it to be judged by someone who is more familiar with the > code and possibly has historical perspective. Well it's not doing more work; it doesn't make any practical difference whatsoever but it's technically doing less work than master, by doing memory accounting only when acquiring a new 32KB chunk. But if by overdoing it you mean that no one really cares about the tiny increase in accuracy so the patch on its own is a bit of a waste of time, you're probably right. Depending on tuple size, you could imagine something like 64 bytes of header and unused space per 32KB chunk that we're not accounting for, and that's only 0.2%. So I probably wouldn't propose this refactoring just on accuracy grounds alone. This refactoring is intended to pave the way for shared memory accounting in the later patches, which would otherwise generate ugly IPC if done for every time a tuple is allocated. I considered using atomic add to count space per tuple, or maintaining per-backend local subtotals and periodically summing. Then I realised that switching to per-chunk accounting would fix the IPC problem AND be justifiable on theoretical grounds. When we allocate a new 32KB chunk, we really are using 32KB more of your memory. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers