Hi folks, I am working on optimizing a query that attempts to efficiently paginate through a large table using multi-column "cursors" aka. the "seek method" (as described in detail here: https://use-the-index-luke.com/sql/partial-results/fetch-next-page).
The table (drastically simplified) looks like this: CREATE TABLE data ( col_1 int NOT NULL, col_2 int NOT NULL, col_3 int NOT NULL, content varchar(10) NOT NULL ); And has an appropriate index: CREATE INDEX data_index ON data (col_1, col_2, col_3); The recommended query to paginate through this table is using the "row values" syntax: SELECT content FROM data WHERE (col_1, col_2, col_3) > (10, 20, 29) ORDER BY col_1, col_2, col_3 LIMIT 100; Which results in a perfectly optimized query plan (slightly edited for readability, using test data of 1 million rows, on PostgreSQL 17.2): Limit (cost=0.42..5.33 rows=100 width=20) (actual time=0.084..0.197 rows=100 loops=1) -> Index Scan using data_index on data (cost=0.42..43617.30 rows=889600 width=20) (actual time=0.081..0.176 rows=100 loops=1) Index Cond: (ROW(col_1, col_2, col_3) > ROW(10, 20, 29)) Planning Time: 0.344 ms Execution Time: 0.264 ms However, in reality, my query uses a mix of ascending and descending ordering (with an index matching the order columns), in which case the WHERE (col_1, col_2, col_3) > (10, 20, 29) syntax is not an option (unless I somehow derive "reversed" data from the column, which I would like to avoid). Therefore I am using an equivalent query using multiple WHERE conditions, something like this (for simplicity, no mixed ordering is involved in the examples): SELECT content FROM data WHERE col_1 >= 10 AND ( col_1 > 10 OR ( col_2 >= 20 AND ( col_2 > 20 OR col_3 > 29 ) ) ) ORDER BY col_1, col_2, col_3 LIMIT 100; Which returns the same rows, but the query plan is slightly less efficient: Limit (cost=0.42..6.48 rows=100 width=20) (actual time=0.848..0.893 rows=100 loops=1) -> Index Scan using data_index on data (cost=0.42..52984.52 rows=874874 width=20) (actual time=0.847..0.884 rows=100 loops=1) Index Cond: (col_1 >= 10) Filter: ((col_1 > 10) OR ((col_2 >= 20) AND ((col_2 > 20) OR (col_3 > 29)))) Rows Removed by Filter: 2030 Planning Time: 0.133 ms Execution Time: 0.916 ms Instead of exclusively relying on an index access predicate, this plan involves an index filter predicate. Without being familiar the internals of the query planner, I *think* there *should* be a way to come up with WHERE conditions that results in the "perfect" plan. There are several ways to phrase the conditions, of which I've tried a few, only to get the same or worse performance. Does anyone have a suggestion for a better query? I am also aware that I might be chasing an optimization with low returns (yet to see how it performs in Real Life data) but I'm already too deep down in a rabbit hole without being able to turn back without knowing The Truth. I've dumped my experiments into a DB Fiddle: https://www.db-fiddle.com/f/kd8zaibZGKGH1HSStyxNkx/0 Cheers, Márton