On Mon, Oct 24, 2016 at 10:30 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: > [ new patches ]
I looked over parts of this today, mostly the hashinsert.c changes. + /* + * Copy bucket mapping info now; The comment in _hash_expandtable where + * we copy this information and calls _hash_splitbucket explains why this + * is OK. + */ So, I went and tried to find the comments to which this comment is referring and didn't have much luck. At the point this code is running, we have a pin but no lock on the metapage, so this is only safe if changing any of these fields requires a cleanup lock on the metapage. If that's true, it seems like you could just make the comment say that; if it's false, you've got problems. This code seems rather pointless anyway, the way it's written. All of these local variables are used in exactly one place, which is here: + _hash_finish_split(rel, metabuf, buf, nbuf, maxbucket, + highmask, lowmask); But you hold the same locks at the point where you copy those values into local variables and the point where that code runs. So if the code is safe as written, you could instead just pass metap->hashm_maxbucket, metap->hashm_highmask, and metap->hashm_lowmask to that function instead of having these local variables. Or, for that matter, you could just let that function read the data itself: it's got metabuf, after all. + * In future, if we want to finish the splits during insertion in new + * bucket, we must ensure the locking order such that old bucket is locked + * before new bucket. Not if the locks are conditional anyway. + nblkno = _hash_get_newblk(rel, pageopaque); I think this is not a great name for this function. It's not clear what "new blocks" refers to, exactly. I suggest FIND_SPLIT_BUCKET(metap, bucket) or OLD_BUCKET_TO_NEW_BUCKET(metap, bucket) returning a new bucket number. I think that macro can be defined as something like this: bucket + (1 << (fls(metap->hashm_maxbucket) - 1)). Then do nblkno = BUCKET_TO_BLKNO(metap, newbucket) to get the block number. That'd all be considerably simpler than what you have for hash_get_newblk(). Here's some test code I wrote, which seems to work: #include <stdio.h> #include <stdlib.h> #include <strings.h> #include <assert.h> int newbucket(int bucket, int nbuckets) { assert(bucket < nbuckets); return bucket + (1 << (fls(nbuckets) - 1)); } int main(int argc, char **argv) { int nbuckets = 1; int restartat = 1; int splitbucket = 0; while (splitbucket < 32) { printf("old bucket %d splits to new bucket %d\n", splitbucket, newbucket(splitbucket, nbuckets)); if (++splitbucket >= restartat) { splitbucket = 0; restartat *= 2; } ++nbuckets; } exit(0); } Moving on ... /* * ovfl page exists; go get it. if it doesn't have room, we'll - * find out next pass through the loop test above. + * find out next pass through the loop test above. Retain the + * pin, if it is a primary bucket page. */ - _hash_relbuf(rel, buf); + if (pageopaque->hasho_flag & LH_BUCKET_PAGE) + _hash_chgbufaccess(rel, buf, HASH_READ, HASH_NOLOCK); + else + _hash_relbuf(rel, buf); It seems like it would be cheaper, safer, and clearer to test whether buf != bucket_buf here, rather than examining the page opaque data. That's what you do down at the bottom of the function when you ensure that the pin on the primary bucket page gets released, and it seems like it should work up here, too. + bool retain_pin = false; + + /* page flags must be accessed before releasing lock on a page. */ + retain_pin = pageopaque->hasho_flag & LH_BUCKET_PAGE; Similarly. I have also attached a patch with some suggested comment changes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
hashinsert-comments.patch
Description: application/download
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers