On 2/18/23 19:51, Justin Pryzby wrote: > Are (any of) these patches targetting v16 ? >
Probably not. Maybe if there's more feedback / scrutiny, but I'm not sure one commitfest is enough to polish the patch (especially considering I haven't done much on the costing yet). > typos: > ar we - we are? > morestly - mostly > interstect - intersect > >> + * XXX We don't sort the bins, so just do binary sort. For large number of >> values >> + * this might be an issue, for small number of values a linear search is >> fine. > > "binary sort" is wrong? > >> + * only half of there ranges, thus 1/2. This can be extended to randomly > > half of *these* ranges ? > Thanks, I'll fix those. >> From 7b3307c27b35ece119feab4891f03749250e454b Mon Sep 17 00:00:00 2001 >> From: Tomas Vondra <tomas.von...@postgresql.org> >> Date: Mon, 17 Oct 2022 18:39:28 +0200 >> Subject: [PATCH 01/11] Allow index AMs to build and use custom statistics > > I think the idea can also apply to btree - currently, correlation is > considered to be a property of a column, but not an index. But that > fails to distinguish between a freshly built index, and an index with > out of order heap references, which can cause an index scan to be a lot > more expensive. > > I implemented per-index correlation stats way back when: > https://www.postgresql.org/message-id/flat/20160524173914.GA11880%40telsasoft.com > > See also: > https://www.postgresql.org/message-id/14438.1512499...@sss.pgh.pa.us > > With my old test case: > > Index scan is 3x slower than bitmap scan, but index scan is costed as > being cheaper: > > postgres=# explain analyze SELECT * FROM t WHERE i>11 AND i<55; > Index Scan using t_i_idx on t (cost=0.43..21153.74 rows=130912 width=8) > (actual time=0.107..222.737 rows=128914 loops=1) > > postgres=# SET enable_indexscan =no; > postgres=# explain analyze SELECT * FROM t WHERE i>11 AND i<55; > Bitmap Heap Scan on t (cost=2834.28..26895.96 rows=130912 width=8) (actual > time=16.830..69.860 rows=128914 loops=1) > > If it's clustered, then the index scan is almost twice as fast, and the > costs are more consistent with the associated time. The planner assumes > that the indexes are freshly built... > > postgres=# CLUSTER t USING t_i_idx ; > postgres=# explain analyze SELECT * FROM t WHERE i>11 AND i<55; > Index Scan using t_i_idx on t (cost=0.43..20121.74 rows=130912 width=8) > (actual time=0.084..117.549 rows=128914 loops=1) > Yeah, the concept of indexam statistics certainly applies to other index types, and for btree we might collect information about correlation etc. I haven't looked at the 2017 patch, but it seems reasonable. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company