Re: estimate correlation of index separately from table (Re: [PERFORM] index fragmentation on insert-only table with non-unique column)

2017-07-07 Thread Peter Geoghegan
On Fri, Jul 7, 2017 at 4:41 PM, Justin Pryzby wrote: > The second change averages separate correlation values of small lists of 300 > consecutive TIDs, rather than the course-granularity/aggregate correlation of > a > small sample of pages across the entire index. Postgres' existing sampling is

estimate correlation of index separately from table (Re: [PERFORM] index fragmentation on insert-only table with non-unique column)

2017-07-07 Thread Justin Pryzby
Months ago I reported an issue with very slow index scan of tables with high "correlation" of its indexed column, due to (we concluded at the time) duplicate/repeated values of that column causing many lseek()s. References: https://www.postgresql.org/message-id/flat/20160524173914.GA11880%40telsa

Re: [PERFORM] index fragmentation on insert-only table with non-unique column

2016-08-15 Thread Claudio Freire
On Sat, Aug 13, 2016 at 3:54 PM, Justin Pryzby wrote: > On Sun, Jun 05, 2016 at 12:28:47PM -0700, Jeff Janes wrote: >> On Sun, Jun 5, 2016 at 9:03 AM, Tom Lane wrote: >> > Claudio Freire writes: >> >> So correlated index scans look extra favourable vs bitmap index scans >> >> because bitmap heap

Re: [PERFORM] index fragmentation on insert-only table with non-unique column

2016-08-13 Thread Justin Pryzby
Regarding this earlier thread: https://www.postgresql.org/message-id/flat/20160524173914.GA11880%40telsasoft.com#20160524173914.ga11...@telsasoft.com On Tue, May 24, 2016 at 10:39 AM, Justin Pryzby wrote: > Summary: Non-unique btree indices are returning CTIDs for rows with same > value of indexe

Re: [PERFORM] index fragmentation on insert-only table with non-unique column

2016-06-05 Thread Jeff Janes
On Sun, Jun 5, 2016 at 9:03 AM, Tom Lane wrote: > Claudio Freire writes: >> So correlated index scans look extra favourable vs bitmap index scans >> because bitmap heap scans consider random page costs sans correlation >> effects (even though correlation applies to bitmap heap scans as >> well).

Re: [PERFORM] index fragmentation on insert-only table with non-unique column

2016-06-05 Thread Tom Lane
Claudio Freire writes: > So correlated index scans look extra favourable vs bitmap index scans > because bitmap heap scans consider random page costs sans correlation > effects (even though correlation applies to bitmap heap scans as > well). Really? How? The index ordering has nothing to do wi

Re: [PERFORM] index fragmentation on insert-only table with non-unique column

2016-06-04 Thread Kevin Grittner
On Fri, Jun 3, 2016 at 6:54 PM, Justin Pryzby wrote: > max_wal_size| 4GB | configuration file > min_wal_size| 6GB | configuration file Just a minor digression -- it generally doesn't make sense to configure the minimum for somethi

Re: [PERFORM] index fragmentation on insert-only table with non-unique column

2016-06-03 Thread Claudio Freire
On Fri, Jun 3, 2016 at 8:54 PM, Justin Pryzby wrote: > As a test, I did SET effective_cache_size='1MB', before running explain, and > still does: > > |-> Index Scan using > cdrs_huawei_pgwrecord_2016_05_29_recordopeningtime_idx on > cdrs_huawei_pgwrecord_2016_05_29 (cost=0.44..1526689.

Re: [PERFORM] index fragmentation on insert-only table with non-unique column

2016-06-03 Thread Justin Pryzby
On Fri, Jun 03, 2016 at 06:26:33PM -0300, Claudio Freire wrote: > On Wed, May 25, 2016 at 11:00 AM, Justin Pryzby wrote: > >> > First, I found I was able to get 30-50min query results on full week's > >> > table by > >> > prefering a seq scan to an index scan. The row estimates seemed fine, > >

Re: [PERFORM] index fragmentation on insert-only table with non-unique column

2016-06-03 Thread Claudio Freire
On Wed, May 25, 2016 at 11:00 AM, Justin Pryzby wrote: >> > First, I found I was able to get 30-50min query results on full week's >> > table by >> > prefering a seq scan to an index scan. The row estimates seemed fine, and >> > the >> > only condition is the timestamp, so the planner's use of

Re: [PERFORM] index fragmentation on insert-only table with non-unique column

2016-05-25 Thread Justin Pryzby
On Tue, May 24, 2016 at 11:23:48PM -0700, Jeff Janes wrote: > > But note the non-uniqueness of the index column: > > ts=# SELECT recordopeningtime, COUNT(1) FROM cdrs_huawei_pgwrecord WHERE > > recordopeningtime>='2016-05-21' AND recordopeningtime<'2016-05-22' GROUP BY > > 1 ORDER BY 2 DESC; > >

Re: [PERFORM] index fragmentation on insert-only table with non-unique column

2016-05-25 Thread Justin Pryzby
On Tue, May 24, 2016 at 09:16:20PM -0700, Peter Geoghegan wrote: > On Tue, May 24, 2016 at 10:39 AM, Justin Pryzby wrote: > > Postgres seems to assume that the high degree of correlation of the table > > column seen in pg_stats is how it will get data from the index scan, which > > assumption seem

Re: [PERFORM] index fragmentation on insert-only table with non-unique column

2016-05-24 Thread Jeff Janes
On Tue, May 24, 2016 at 10:39 AM, Justin Pryzby wrote: > Summary: Non-unique btree indices are returning CTIDs for rows with same > value of indexed column not in logical order, imposing a high performance > penalty. > > Running PG 9.5.3 now, we have a time-based partitions of append-only tables >

Re: [PERFORM] index fragmentation on insert-only table with non-unique column

2016-05-24 Thread Peter Geoghegan
On Tue, May 24, 2016 at 9:43 PM, Tom Lane wrote: > Yeah. I wonder what would happen if we used the same rule for index > insertions. It would likely make insertions more expensive, but maybe > not by much. The existing "randomization" rule for where to insert new > items in a run of identical i

Re: [PERFORM] index fragmentation on insert-only table with non-unique column

2016-05-24 Thread Tom Lane
Peter Geoghegan writes: > The basic problem is that the B-Tree code doesn't maintain this > property. However, B-Tree index builds will create an index that > initially has this property, because the tuplesort.c code happens to > sort index tuples with a CTID tie-breaker. Yeah. I wonder what wou

Re: [PERFORM] index fragmentation on insert-only table with non-unique column

2016-05-24 Thread Stephen Frost
* Peter Geoghegan (p...@bowt.ie) wrote: > On Tue, May 24, 2016 at 10:39 AM, Justin Pryzby wrote: > > I was able to see great improvement without planner parameters by REINDEX > > the > > timestamp index. My theory is that the index/planner doesn't handle well > > the > > case of many tuples wit

Re: [PERFORM] index fragmentation on insert-only table with non-unique column

2016-05-24 Thread Peter Geoghegan
On Tue, May 24, 2016 at 10:39 AM, Justin Pryzby wrote: > I was able to see great improvement without planner parameters by REINDEX the > timestamp index. My theory is that the index/planner doesn't handle well the > case of many tuples with same column value, and returns pages out of logical > or

[PERFORM] index fragmentation on insert-only table with non-unique column

2016-05-24 Thread Justin Pryzby
Summary: Non-unique btree indices are returning CTIDs for rows with same value of indexed column not in logical order, imposing a high performance penalty. Running PG 9.5.3 now, we have a time-based partitions of append-only tables with data loaded from other sources. The tables are partitioned b