Re: [HACKERS] Hash Indexes

Robert Haas Tue, 01 Nov 2016 18:10:12 -0700

On Mon, Oct 24, 2016 at 10:30 AM, Amit Kapila <[email protected]> wrote:
> [ new patches ]


I looked over parts of this today, mostly the hashinsert.c changes.

+    /*
+     * Copy bucket mapping info now;  The comment in _hash_expandtable where
+     * we copy this information and calls _hash_splitbucket explains why this
+     * is OK.
+     */

So, I went and tried to find the comments to which this comment is
referring and didn't have much luck.  At the point this code is
running, we have a pin but no lock on the metapage, so this is only
safe if changing any of these fields requires a cleanup lock on the
metapage.  If that's true, it seems like you could just make the
comment say that; if it's false, you've got problems.

This code seems rather pointless anyway, the way it's written.  All of
these local variables are used in exactly one place, which is here:

+            _hash_finish_split(rel, metabuf, buf, nbuf, maxbucket,
+                               highmask, lowmask);

But you hold the same locks at the point where you copy those values
into local variables and the point where that code runs.  So if the
code is safe as written, you could instead just pass
metap->hashm_maxbucket, metap->hashm_highmask, and
metap->hashm_lowmask to that function instead of having these local
variables.  Or, for that matter, you could just let that function read
the data itself: it's got metabuf, after all.

+     * In future, if we want to finish the splits during insertion in new
+     * bucket, we must ensure the locking order such that old bucket is locked
+     * before new bucket.

Not if the locks are conditional anyway.

+        nblkno = _hash_get_newblk(rel, pageopaque);

I think this is not a great name for this function.  It's not clear
what "new blocks" refers to, exactly.  I suggest
FIND_SPLIT_BUCKET(metap, bucket) or OLD_BUCKET_TO_NEW_BUCKET(metap,
bucket) returning a new bucket number.  I think that macro can be
defined as something like this: bucket + (1 <<
(fls(metap->hashm_maxbucket) - 1)). Then do nblkno =
BUCKET_TO_BLKNO(metap, newbucket) to get the block number.  That'd all
be considerably simpler than what you have for hash_get_newblk().

Here's some test code I wrote, which seems to work:

#include <stdio.h>
#include <stdlib.h>
#include <strings.h>
#include <assert.h>

int
newbucket(int bucket, int nbuckets)
{
    assert(bucket < nbuckets);
    return bucket + (1 << (fls(nbuckets) - 1));
}

int
main(int argc, char **argv)
{
    int    nbuckets = 1;
    int restartat = 1;
    int    splitbucket = 0;

    while (splitbucket < 32)
    {
        printf("old bucket %d splits to new bucket %d\n", splitbucket,
               newbucket(splitbucket, nbuckets));
        if (++splitbucket >= restartat)
        {
            splitbucket = 0;
            restartat *= 2;
        }
        ++nbuckets;
    }

    exit(0);
}

Moving on ...

             /*
              * ovfl page exists; go get it.  if it doesn't have room, we'll
-             * find out next pass through the loop test above.
+             * find out next pass through the loop test above.  Retain the
+             * pin, if it is a primary bucket page.
              */
-            _hash_relbuf(rel, buf);
+            if (pageopaque->hasho_flag & LH_BUCKET_PAGE)
+                _hash_chgbufaccess(rel, buf, HASH_READ, HASH_NOLOCK);
+            else
+                _hash_relbuf(rel, buf);

It seems like it would be cheaper, safer, and clearer to test whether
buf != bucket_buf here, rather than examining the page opaque data.
That's what you do down at the bottom of the function when you ensure
that the pin on the primary bucket page gets released, and it seems
like it should work up here, too.

+            bool        retain_pin = false;
+
+            /* page flags must be accessed before releasing lock on a page. */
+            retain_pin = pageopaque->hasho_flag & LH_BUCKET_PAGE;

Similarly.

I have also attached a patch with some suggested comment changes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

hashinsert-comments.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash Indexes

Reply via email to