Re: Hash index build performance tweak from sorting

David Rowley Wed, 23 Nov 2022 05:04:58 -0800

On Wed, 16 Nov 2022 at 17:33, Simon Riggs <simon.ri...@enterprisedb.com> wrote:
>
> Thanks for the review, apologies for the delay in acting upon your comments.
>
> My tests show the sorted and random tests are BOTH 4.6% faster with
> the v3 changes using 5-test avg, but you'll be pleased to know your
> kit is about 15.5% faster than mine, comparing absolute execution
> times.


Thanks for the updated patch.

I started to look at this again and I'm starting to think that the
HashInsertState struct is the wrong approach for passing along the
sorted flag to _hash_doinsert().  The reason I think this is that in
hashbuild() when setting buildstate.spool to NULL, you're also making
the decision about what to set the sorted flag to.  However, in
reality, we already know what we should be passing *every* time we
call _hash_doinsert().  The only place where we can pass the sorted
option as true is in _h_indexbuild() when we're doing the sorted
version of the index build. Trying to make that decision any sooner
seems error-prone.

I understand you have made HashInsertState so that we don't need to
add new parameters should we ever need to pass something else along,
but I'm just thinking that if we ever need to add more, then we should
just reconsider this in the future.  I think for today, the better
option is just to add the bool sorted as a parameter to
_hash_doinsert() and pass it as true in the single case where it's
valid to do so.  That seems less likely that we'll inherit some
options from some other place after some future modification and end
up passing sorted as true when it should be false.

Another reason I didn't like the HashInsertState idea is that in the
v3 patch there's an HashInsertState in both HashBuildState and HSpool.
Because in the normal insert path (hashinsert), we've neither a
HashBuildState nor an HSpool, you're having to fake up a
HashInsertStateData to pass something along to _hash_doinsert() in
hashinsert(). When we're building an index, in the non-sorted index
build case, you're always passing the HashInsertStateData from the
HashBuildState, but when we're doing the sorted index build the one
from HSpool is passed.  In other words, in each of the 3 calls to
_hash_doinsert(), the HashInsertStateData comes from a different
place.

Now, I do see that you've coded hashbuild() so both versions of the
HashInsertState point to the same HashInsertStateData, but I find it
unacceptable programming that in _h_spoolinit() the code palloc's the
memory for the HSpool and you're setting the istate field to the
HashInsertStateData that's on the stack. That just seems like a great
way to end up having istate pointing to junk should the HSpool ever
live beyond the hashbuild() call. If we really don't want HSpool to
live beyond hashbuild(), then it too should be a local variable to
hashbuild() instead of being palloc'ed in _h_spoolinit().
_h_spoolinit() could just be passed a pointer to the HSpool to
populate.

After getting rid of the  HashInsertState code and just adding bool
sorted to _hash_doinsert() and _hash_pgaddtup(), the resulting patch
is much more simple:

v3:
 src/backend/access/hash/hash.c       | 19 ++++++++++++++++---
 src/backend/access/hash/hashinsert.c | 40
++++++++++++++++++++++++++++++++++------
 src/backend/access/hash/hashsort.c   |  8 ++++++--
 src/include/access/hash.h            | 14 +++++++++++---
 4 files changed, 67 insertions(+), 14 deletions(-)

v4:
src/backend/access/hash/hash.c       |  4 ++--
src/backend/access/hash/hashinsert.c | 40 ++++++++++++++++++++++++++++--------
src/backend/access/hash/hashsort.c   |  3 ++-
src/include/access/hash.h            |  6 ++++--
4 files changed, 40 insertions(+), 13 deletions(-)

and v4 includes 7 extra lines in hashinsert.c for the Assert() I
mentioned in my previous email plus a bunch of extra comments.

I'd rather see this solved like v4 is doing it.

David

diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index c361509d68..77fd147f68 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -231,7 +231,7 @@ hashbuildCallback(Relation index,
                itup = index_form_tuple(RelationGetDescr(index),
                                                                index_values, 
index_isnull);
                itup->t_tid = *tid;
-               _hash_doinsert(index, itup, buildstate->heapRel);
+               _hash_doinsert(index, itup, buildstate->heapRel, false);
                pfree(itup);
        }
 
@@ -265,7 +265,7 @@ hashinsert(Relation rel, Datum *values, bool *isnull,
        itup = index_form_tuple(RelationGetDescr(rel), index_values, 
index_isnull);
        itup->t_tid = *ht_ctid;
 
-       _hash_doinsert(rel, itup, heapRel);
+       _hash_doinsert(rel, itup, heapRel, false);
 
        pfree(itup);
 
diff --git a/src/backend/access/hash/hashinsert.c 
b/src/backend/access/hash/hashinsert.c
index 23907d2e5b..6718ff18f3 100644
--- a/src/backend/access/hash/hashinsert.c
+++ b/src/backend/access/hash/hashinsert.c
@@ -32,9 +32,12 @@ static void _hash_vacuum_one_page(Relation rel, Relation 
hrel,
  *
  *             This routine is called by the public interface routines, 
hashbuild
  *             and hashinsert.  By here, itup is completely filled in.
+ 
+ * 'sorted' must only be passed as 'true' when inserts are done in hashkey
+ * order.
  */
 void
-_hash_doinsert(Relation rel, IndexTuple itup, Relation heapRel)
+_hash_doinsert(Relation rel, IndexTuple itup, Relation heapRel, bool sorted)
 {
        Buffer          buf = InvalidBuffer;
        Buffer          bucket_buf;
@@ -198,7 +201,7 @@ restart_insert:
        START_CRIT_SECTION();
 
        /* found page with enough space, so add the item here */
-       itup_off = _hash_pgaddtup(rel, buf, itemsz, itup);
+       itup_off = _hash_pgaddtup(rel, buf, itemsz, itup, sorted);
        MarkBufferDirty(buf);
 
        /* metapage operations */
@@ -263,21 +266,42 @@ restart_insert:
  *
  * Returns the offset number at which the tuple was inserted.  This function
  * is responsible for preserving the condition that tuples in a hash index
- * page are sorted by hashkey value.
+ * page are sorted by hashkey value, however, if the caller is certain that
+ * the hashkey for the tuple being added is >= the hashkeys of all existing
+ * tuples on the page, then the 'sorted' flag may be passed as true.
  */
 OffsetNumber
-_hash_pgaddtup(Relation rel, Buffer buf, Size itemsize, IndexTuple itup)
+_hash_pgaddtup(Relation rel, Buffer buf, Size itemsize, IndexTuple itup,
+                          bool sorted)
 {
        OffsetNumber itup_off;
        Page            page;
-       uint32          hashkey;
 
        _hash_checkpage(rel, buf, LH_BUCKET_PAGE | LH_OVERFLOW_PAGE);
        page = BufferGetPage(buf);
 
-       /* Find where to insert the tuple (preserving page's hashkey ordering) 
*/
-       hashkey = _hash_get_indextuple_hashkey(itup);
-       itup_off = _hash_binsearch(page, hashkey);
+       /*
+        * Find where to insert the tuple (preserving page's hashkey ordering).
+        * If the input is already sorted by hashkey, then we can avoid the
+        * binary search and just add it to the end of the page.
+        */
+       if (sorted)
+       {
+               itup_off = PageGetMaxOffsetNumber(page) + 1;
+
+               /* check the hashkey for itup is >= to the last existing tuple 
*/
+               Assert(PageGetMaxOffsetNumber(page) == 0 ||
+                          _hash_get_indextuple_hashkey((IndexTuple)
+                                                                               
PageGetItem(page, PageGetItemId(page,
+                                                                               
        PageGetMaxOffsetNumber(page)))) <=
+                                               
_hash_get_indextuple_hashkey(itup));
+       }
+       else
+       {
+               uint32          hashkey = _hash_get_indextuple_hashkey(itup);
+
+               itup_off = _hash_binsearch(page, hashkey);
+       }
 
        if (PageAddItem(page, (Item) itup, itemsize, itup_off, false, false)
                == InvalidOffsetNumber)
diff --git a/src/backend/access/hash/hashsort.c 
b/src/backend/access/hash/hashsort.c
index 19563148d0..a4ab8384e1 100644
--- a/src/backend/access/hash/hashsort.c
+++ b/src/backend/access/hash/hashsort.c
@@ -145,7 +145,8 @@ _h_indexbuild(HSpool *hspool, Relation heapRel)
                Assert(hashkey >= lasthashkey);
 #endif
 
-               _hash_doinsert(hspool->index, itup, heapRel);
+               /* the tuples are sorted by hashkey, so pass 'sorted' as true */
+               _hash_doinsert(hspool->index, itup, heapRel, true);
 
                pgstat_progress_update_param(PROGRESS_CREATEIDX_TUPLES_DONE,
                                                                         
++tups_done);
diff --git a/src/include/access/hash.h b/src/include/access/hash.h
index da372841c4..3ee37258d2 100644
--- a/src/include/access/hash.h
+++ b/src/include/access/hash.h
@@ -390,9 +390,11 @@ extern void hashadjustmembers(Oid opfamilyoid,
 /* private routines */
 
 /* hashinsert.c */
-extern void _hash_doinsert(Relation rel, IndexTuple itup, Relation heapRel);
+extern void _hash_doinsert(Relation rel, IndexTuple itup, Relation heapRel,
+                                                  bool sorted);
 extern OffsetNumber _hash_pgaddtup(Relation rel, Buffer buf,
-                                                                  Size 
itemsize, IndexTuple itup);
+                                                                  Size 
itemsize, IndexTuple itup,
+                                                                  bool sorted);
 extern void _hash_pgaddmultitup(Relation rel, Buffer buf, IndexTuple *itups,
                                                                OffsetNumber 
*itup_offsets, uint16 nitups);

Re: Hash index build performance tweak from sorting

Reply via email to