On Tue, Jul 26, 2022 at 3:28 PM David Rowley <dgrowle...@gmail.com> wrote:
> Thank for looking at this. > > On Sat, 23 Jul 2022 at 01:23, Amit Langote <amitlangot...@gmail.com> > wrote: > > + /* > > + * The Datum has changed. Zero the number of times > we've > > + * found last_found_datum_index in a row. > > + */ > > + partdesc->last_found_count = 0; > > > > + /* Zero the "winning streak" on the cache hit count > */ > > + partdesc->last_found_count = 0; > > > > Might it be better for the two comments to say the same thing? Also, > > I wonder which one do you intend as the resetting of last_found_count: > > setting it to 0 or 1? I can see that the stanza at the end of the > > function sets to 1 to start a new cycle. > > I think I've addressed all of your comments. The above one in > particular caused me to make some larger changes. > > The reason I was zeroing the last_found_count in LIST partitioned > tables when the Datum was not equal to the previous found Datum was > due to the fact that the code at the end of the function was only > checking the partition indexes matched rather than the bound_offset vs > last_found_datum_index. The reason I wanted to zero this was that if > you had a partition FOR VALUES IN(1,2), and you received rows with > values alternating between 1 and 2 then we'd match to the same > partition each time, however the equality test with the current > 'values' and the Datum at last_found_datum_index would have been false > each time. If we didn't zero the last_found_count we'd have kept > using the cache path even though the Datum and last Datum wouldn't > have been equal each time. That would have resulted in always doing > the cache check and failing, then doing the binary search anyway. > > I've now changed the code so that instead of checking the last found > partition is the same as the last one, I'm now checking if > bound_offset is the same as last_found_datum_index. This will be > false in the "values alternating between 1 and 2" case from above. > This caused me to have to change how the caching works for LIST > partitions with a NULL partition which is receiving NULL values. I've > coded things now to just skip the cache for that case. Finding the > correct LIST partition for a NULL value is cheap and no need to cache > that. I've also moved all the code which updates the cache fields to > the bottom of get_partition_for_tuple(). I'm only expecting to do that > when bound_offset is set by the lookup code in the switch statement. > Any paths, e.g. HASH partitioning lookup and LIST or RANGE with NULL > values shouldn't reach the code which updates the partition fields. > I've added an Assert(bound_offset >= 0) to ensure that stays true. > > There's probably a bit more to optimise here too, but not much. I > don't think the partdesc->last_found_part_index = -1; is needed when > we're in the code block that does return boundinfo->default_index; > However, that only might very slightly speedup the case when we're > inserting continuously into the DEFAULT partition. That code path is > also used when we fail to find any matching partition. That's not one > we need to worry about making go faster. > > I also ran the benchmarks again and saw that most of the use of > likely() and unlikely() no longer did what I found them to do earlier. > So the weirdness we saw there most likely was just down to random code > layout changes. In this patch, I just dropped the use of either of > those two macros. > > David > Hi, + return boundinfo->indexes[last_datum_offset + 1]; + + else if (cmpval < 0 && last_datum_offset + 1 < boundinfo->ndatums) nit: the `else` keyword is not needed. Cheers