On 07/04/2021 15:12, Andrey Borodin wrote:
7 апр. 2021 г., в 14:56, Heikki Linnakangas <hlinn...@iki.fi>
написал(а):
Ok, I think I understand that now. In btree_gist, the *_cmp()
function operates on non-leaf values, and *_lt(), *_gt() et al
operate on leaf values. For all other datatypes, the leaf and
non-leaf representation is the same, but for bit/varbit, the
non-leaf representation is different. The leaf representation is
VarBit, and non-leaf is just the bits without the 'bit_len' field.
That's why it is indeed correct for gbt_bitcmp() to just use
byteacmp(), whereas gbt_bitlt() et al compares the 'bit_len' field
separately. That's subtle, and 100% uncommented.
What that means for this patch is that gbt_bit_sort_build_cmp()
should *not* call byteacmp(), but bitcmp(). Because it operates on
the original datatype stored in the table.
+1 Thanks for investigating this. If I understand things right,
adding test values with different lengths of bit sequences would not
uncover the problem anyway?
That's right, the only consequence of a "wrong" sort order is that the
quality of the tree suffers, and scans need to scan more pages
unnecessarily.
I tried to investigate this by creating a varbit index with and without
sorting, and compared them with pageinspect, but in quick testing, I
wasn't able to find cases where the sorted version was badly ordered. I
guess I didn't find the right data set yet.
- Heikki