Hello,
I've been thinking about a scan strategy that is sort of in between a
bitmap scan and an index scan, and I wanted to check whether this has been
explored before much or if there's a
reason it wouldn't work well in practice.
Example scenario where it could be useful: you need the first N rows
ordered by a B-tree column, filtered by a condition on another column with
a GIN index, e.g. full text search. Today I believe Postgres has two
options that use indexes:
1. Bitmap scan on the filter index → heap fetch all matches → sort →
limit. Can be expensive
when there are many matches but you only want a few results returned.
2. Index scan on the B-tree → recheck the filter condition against the
heap for each
row. Can be expensive when the match rate is low and many non-matching
rows are found before finding N hits.
The idea for the new scan strategy is: build the TID bitmap from the filter
index first (same as a bitmap scan would), then walk the B-tree in order,
but instead of going to the heap to check the
filter condition, check each TID against the bitmap. Matching TIDs get
returned
directly in order.
I built a small proof of concept as a pgrx extension against PG 18 to try
and test this:
https://github.com/lauri3new/ordered-bitmap-scan. The benchmarks there show
it using fewer buffer hits than either a bitmap scan or index scan for the
example data distribution - which is quite skewed to show the benefit.
I want to caveat that I am a web backend developer and not a Postgres
expert by any means, so there's probably good reasons this isn't a thing
(planner cost or benefit is too small/niche). I would appreciate though any
pointers to prior discussion or feedback on whether this is worth looking
into further or not.
Thanks