Re: Adding skip scan (including MDAM style range skip scan) to nbtree

Heikki Linnakangas Fri, 24 Jan 2025 19:08:18 -0800

On 03/01/2025 21:43, Peter Geoghegan wrote:

The newly revised "skipskip" optimization seems to get the regressions
down to only a 5% - 10% increase in runtime across a wide variety of
unsympathetic cases -- I'm now validating performance against a test
suite based on the adversarial cases presented by Masahiro Ikeda on
this thread. Although I think that I'll end up tuning the "skipskip"
mechanism some more (I may have been too conservative in marginal
cases that actually do benefit from skipping), I deem these
regressions to be acceptable. They're only seen in the most
unsympathetic cases, where an omitted leading column has groupings of
no more than about 50 index tuples, making skipping pretty hopeless.


On my laptop, this is the worst case I could come up with:

create table skiptest as select g / 10 as a, g%10 as b fromgenerate_series(1, 10000000) g;

vacuum freeze skiptest;
create index on skiptest (a, b);

set enable_seqscan=off; set max_parallel_workers_per_gather=0;

\timing on

After repeating a few times, to warm the cache:

postgres=# select count(*) from skiptest where b=1;
  count
---------
 1000000
(1 row)

Time: 202.501 ms

And after 'set skipscan_prefix_cols=0':

select count(*) from skiptest where b=1;
  count
---------
 1000000
(1 row)

Time: 164.762 ms

EXPLAIN ANALYZE confirms that it uses an Index Only scan in both cases.

I knew from the outset that the hardest part of this project would be
avoiding regressions in highly unsympathetic cases. The regressions
that are still there seem very difficult to minimize any further; the
overhead that remains comes from the simple need to maintain the
scan's skip arrays once per page, before leaving the page. Once a scan
decides to apply the "skipskip" optimization, it tends to stick with
it for all future leaf pages -- leaving only the overhead of checking
the high key while advancing the scan's arrays. I've cut just about
all that that I can reasonably cut from the hot code paths that are at
issue with the regressed cases.

Hmm, looking at the code and profile with perf, a lot of code isexecuted per tuple. Just function calls, passing arguments, checkingflags etc. I suspect you could shave off some cycles by structuring thecode in a more compiler and branch-prediction friendly way. Or just withmore aggressive inlining. Not sure what exactly to suggest, but the codeof _bt_readpage() and all the subroutines it calls is complicated.

Aside from the performance of those routines, it doesn't feel veryintuitive. _bt_checkkeys() not only checks the current keys, but it canalso read ahead tuples on the same page and update the array keys.

One little thing I noticed by stepping with debugger is that it callsindex_getattr() twice for the same tuple and attribute. First in_bt_check_compare(), and if that sets *continuescan=false, again in_bt_tuple_before_array_skeys(). The first index_getattr() call iscertainly pretty expensive because that's where you get the cache misson the tuple when scanning. Not sure if the second call matters much,but it feels like a bad smell.


The comment on _bt_tuple_before_array_skeys() says:

 * readpagetup callers must only call here when _bt_check_compare already set
 * continuescan=false.  We help these callers deal with _bt_check_compare's
 * inability to distinguishing between the < and > cases (it uses equality
 * operator scan keys, whereas we use 3-way ORDER procs)

That begs the question, why does _bt_check_compare() not call the 3-wayORDER proc in the first place? That would avoid the overhead of anothercall here.

Aside from these micro-optimizations, I wonder about the "look-ahead"logic in _bt_checkkeys_look_ahead. It feels pretty simplistic. Could youuse Exponential Search(https://en.wikipedia.org/wiki/Exponential_search) instead of a plainbinary search on the page?


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

Reply via email to