red hit=56
Planning time: 0.515 ms
Execution time: 7339.772 ms
(19 rows)
** "Rows Removed by Index Recheck: 1639294" - OK, statistics are
pooched post-clustering. Let's reanalyze.
===
db0=# ANALYZE VERBOSE schema0.table0;
INFO: analyzing "schema0.table0"
INFO: "table0": scanned 3 of 1048881 pages, containing 12 live
rows and 0 dead rows; 3 rows in sample, 4195524 estimated total rows
ANALYZE
** Let's try this again...
===
db0=# EXPLAIN (analyze, verbose, costs, buffers) SELECT ctid,
(data->>'date')::numeric as date FROM schema0.table0 WHERE
(data->>'date')::numeric >= '1531267200'
AND (data->>'date')::numeric <= '1531353600'
ORDER BY ctid, (data->>'date')::numeric
DESC LIMIT 1000;
QUERY PLAN
---
Limit (cost=5444.33..5445.71 rows=555 width=38) (actual
time=6533.824..6533.927 rows=1000 loops=1)
Output: ctid, (((data ->> 'date'::text))::numeric)
Buffers: shared hit=410808
-> Sort (cost=5444.33..5445.71 rows=555 width=38) (actual
time=6533.822..6533.862 rows=1000 loops=1)
Output: ctid, (((data ->> 'date'::text))::numeric)
Sort Key: table0.ctid, (((table0.data ->>
'date'::text))::numeric) DESC
Sort Method: top-N heapsort Memory: 95kB
Buffers: shared hit=410808
-> Bitmap Heap Scan on schema0.table0 (cost=162.22..5419.03
rows=555 width=38) (actual time=59.316..6532.913 rows=3713 loops=1)
Output: ctid, ((data ->> 'date'::text))::numeric
Recheck Cond: table0.data ->>
'date'::text))::numeric >= '1531267200'::numeric) AND (((table0.data ->>
'date'::text))::numeric <= '1531353600'::numeric))
Rows Removed by Index Recheck: 1639294
Heap Blocks: lossy=410752
Buffers: shared hit=410808
-> Bitmap Index Scan on date_idx (cost=0.00..162.08
rows=1339 width=0) (actual time=30.859..30.859 rows=4107520 loops=1)
Index Cond: table0.data ->>
'date'::text))::numeric >= '1531267200'::numeric) AND (((table0.data ->>
'date'::text))::numeric <= '1531353600'::numeric))
Buffers: shared hit=56
Planning time: 1.100 ms
Execution time: 6534.026 ms
(19 rows)
** This cannot be possible! The table must be clustered on the same
expression we cluster BRIN! Let's look at the data.
===
db0=# SELECT ctid, (data->>'date')::numeric as date FROM schema0.table0
WHERE (data->>'date')::numeric >= '1531267200'
AND (data->>'date')::numeric <= '1531353600'
ORDER BY ctid, (data->>'date')::numeric
DESC LIMIT 1000;
ctid | date
---+
(4155,2) | 1531281600
...
(4159,1) | 1531281600
(4188,4) | 1531281600
(4189,1) | 1531281600
(4189,2) | 1531281600
(4189,3) | 1531281600
(4189,4) | 1531281600
(4231,4) | 1531281600
(4247,3) | 1531281600
(4247,4) | 1531281600
(4248,1) | 1531281600
(4248,2) | 1531281600
(5367,3) | 1531281600
...
(5368,4) | 1531281600
(5376,3) | 1531281600
...
(5441,1) | 1531281600
(5518,2) | 1531281600
(5518,3) | 1531281600
(5518,4) | 1531281600
(7836,1) | 1531281600
(7836,2) | 1531281600
(7851,3) | 1531281600
...
(7913,4) | 1531281600
(7984,4) | 1531281600
(8014,1) | 1531281600
(8014,2) | 1531281600
(8014,3) | 1531281600
(8014,4) | 1531281600
(8035,1) | 1531281600
(8035,2) | 1531281600
(8035,3) | 1531281600
(8035,4) | 1531281600
(9190,3) | 1531281600
(9190,4) | 1531281600
(10307,4) | 1531281600
...
(10349,3) | 1531281600
(10371,3) | 1531281600
...
(10392,2) | 1531281600
(10445,4) | 1531281600
(10446,1) | 1531281600
(10446,2) | 1531281600
(11580,3) | 1531281600
...
(11586,2) | 1531281600
(11595,3) | 1531281600
...
(11598,2) | 1531281600
(11600,1) | 1531281600
...
(11601,2) | 1531281600
(11609,1) | 1531281600
...
(11621,2) | 1531281600
(11621,3) | 1531281600
(11664,3) | 1531281600
...
(11684,3) | 1531281600
(11726,2) | 1531281600
...
(11770,4) | 1531281600
(12891,2) | 1531281600
(14057,1) | 1531281600
...
(14058,2) | 1531281600
(14064,3) | 1531281600
...
(14071,3) | 1531281600
(1000 rows)
The issue seems to be with both clustering and BRIN stats.
Fillfactor on the table is not a factor, can be 100 can be 75, I'm able
to reproduce and fail to reproduce with both.
Clustering on `date_ord_idx` can be done with ANALYZE and without, in
some cases PG picks scan and sort, sometimes it uses index regardless of
stats.
Sometimes the table is properly clustered, sometime