Hello Andrey!

Following up with the things I owed you: the benchmarks, the consistency check
and adding a note for the 2^53 case.

I added a fast path. Each integer opclass's consistent() / distance() now
detects the "same type" case and calls the original gbt_num_consistent() /
gbt_num_distance() directly.

To confirm there's no regression I ran a microbenchmark on an -O2 build, no
asserts, single client, over a 500k row int4 GiST index, with the following
options:

-c enable_seqscan=off \
-c enable_bitmapscan=off \
-c enable_sort=off \
-c max_parallel_workers_per_gather=0

This is the base for the bench:

CREATE EXTENSION IF NOT EXISTS btree_gist;
DROP TABLE IF EXISTS benchg;
CREATE TABLE benchg (a int4);
INSERT INTO benchg SELECT g FROM generate_series(0, 499999) g;
CREATE INDEX benchg_idx ON benchg USING gist (a);
VACUUM (ANALYZE, FREEZE) benchg;

And the two workloads:

consistent(), full-range index-only count(*):
SELECT count(*) FROM benchg WHERE a >= 0 AND a <= 499999;

distance(), full KNN ordering (ORDER BY a<->k over all rows):
SELECT count(*) FROM (SELECT a FROM benchg ORDER BY a <-> 250000 LIMIT 1000000) 
q;

The numbers in ms (12 repetitions, 15s each) before
(3e3d7875e95621b02311ea3443e5139e3bce944a) and after my patch:

  before   consistent   min/med/mean = 51.754 52.718 54.137 ms
  after    consistent   min/med/mean = 52.042 52.480 52.572 ms
  ------------------------------------------------------------------------
  before   distance     min/med/mean = 76.863 77.177 77.395 ms
  after    distance     min/med/mean = 77.357 77.803 77.980 ms

All numbers seem to be within measurement noise, except the consistent-before,
which is probably inflated by one slow rep.

Regarding the other point, I explored the regression suite path I mentioned.

The consistent() / distance() functions dispatch cross-type queries through a
single static table of supported subtype OIDs (gbt_int_crosstype_table in
btree_utils_num.c). I expose that exact table to SQL, in 
gbt_int_crosstype_subtypes(),
so there is no hand-maintained second copy of the list.

The int_crosstype.sql regression test then builds the set of cross-type
(lefttype, righttype, strategy) entries that should exist in pg_amop from that
function, and EXCEPTs it against the cross-type rows actually present in
gist_int{2,4,8}_ops:

  - a pg_amop row whose subtype the C dispatch does not handle shows up as
    "unexpected in pg_amop", and
  - a dispatch entry without the matching pg_amop rows shows up as
    "missing from pg_amop".

Either kind of drift produces a diff under `make check`. So adding an ALTER
OPERATOR FAMILY entry without a matching dispatch entry (or vice versa) fails
the suite (as I mentioned in my previous email, I'm not aware of a way to do
this with amvalidate() without patching core).

I'm attaching the new set of patches (this time I include the tests).

Best regards!

Attachment: 0001-Implement-cross-type-operators-for-GiST-indexes.patch
Description: Binary data

Attachment: 0002-Add-tests-for-cross-type-operators-for-GiST-indexes.patch
Description: Binary data

Reply via email to