On Mon, Mar 8, 2021 at 8:43 AM Amit Khandekar <amitdkhan...@gmail.com> wrote: > > On Wed, 3 Mar 2021 at 23:32, John Naylor <john.nay...@enterprisedb.com> wrote:
> > 0001: > > > > + /* > > + * We can process 64-bit chunks only if both are mis-aligned by the same > > + * number of bytes. > > + */ > > + if (b_aligned - b == a_aligned - a) > > > > The obvious question here is: how often are they identically misaligned? You > > don't indicate that your measurements differ in a bimodal fashion, so does > > that mean they happen to be always (mis)aligned? > > I ran CREATE INDEX on tsvector columns using the tsearch.data that I > had attached upthread, with some instrumentation; here are the > proportions : > 1. In 15% of the cases, only one among a and b was aligned. The other > was offset from the 8-byte boundary by 4 bytes. > 2. 6% of the cases, both were offset by 4 bytes, i.e. identically misaligned. > 3. Rest of the cases, both were aligned. That's somewhat strange. I would have assumed always aligned, or usually not. It sounds like we could require them both to be aligned and not bother with the byte-by-byte prologue. I also wonder if it's worth it to memcpy to a new buffer if the passed pointer is not aligned. > > + /* For smaller lengths, do simple byte-by-byte traversal */ > > + if (bytes <= 32) > > > > You noted upthread: > > > > > Since for the gist index creation of some of these types the default > > > value for siglen is small (8-20), I tested with small siglens. For > > > siglens <= 20, particularly for values that are not multiples of 8 > > > (e.g. 10, 13, etc), I see a 1-7 % reduction in speed of index > > > creation. It's probably because of > > > an extra function call for pg_xorcount(); and also might be due to the > > > extra logic in pg_xorcount() which becomes prominent for shorter > > > traversals. So for siglen less than 32, I kept the existing method > > > using byte-by-byte traversal. > > > > I wonder if that can be addressed directly, while cleaning up the loops and > > alignment checks in pg_xorcount_long() a little. For example, have a look at > > pg_crc32c_armv8.c -- it might be worth testing a similar approach. > > Yeah, we can put the bytes <= 32 condition inside pg_xorcount_long(). > I avoided that to not hamper the <= 32 scenarios. Details explained > below for "why inline pg_xorcount is calling global function" > > > Also, pardon my ignorance, but where can I find the default siglen for various types? > Check SIGLEN_DEFAULT. Okay, so it's hard-coded in various functions in contrib modules. In that case, let's just keep the existing coding for those. In fact, the comments that got removed by your patch specifically say: /* Using the popcount functions here isn't likely to win */ ...which your testing confirmed. The new function should be used only for Gist and possibly Intarray, whose default is 124. That means we can't get rid of hemdistsign(), but that's fine. Alternatively, we could say that a small regression is justifiable reason to refactor all call sites, but I'm not proposing that. > > (I did make sure to remove indirect calls from the retail functions > > in [1], in case we want to go that route). > > Yeah, I quickly had a look at that. I am still going over that thread. > Thanks for the exhaustive analysis there. I'll post a patch soon that builds on that, so you can see what I mean. -- John Naylor EDB: http://www.enterprisedb.com