Re: [PERFORM] index fragmentation on insert-only table with non-unique column

2016-06-03 Thread Claudio Freire
On Fri, Jun 3, 2016 at 8:54 PM, Justin Pryzby  wrote:
> As a test, I did SET effective_cache_size='1MB', before running explain, and
> still does:
>
> |->  Index Scan using 
> cdrs_huawei_pgwrecord_2016_05_29_recordopeningtime_idx on 
> cdrs_huawei_pgwrecord_2016_05_29  (cost=0.44..1526689.49 rows=8342796 
> width=355)
> |  Index Cond: ((recordopeningtime >= '2016-05-29 
> 00:00:00'::timestamp without time zone) AND (recordopeningtime < '2016-05-30 
> 00:00:00'::timestamp without time zone))
>
> I Set enable_indexscan=0, and got:
>
> |->  Bitmap Heap Scan on cdrs_huawei_pgwrecord_2016_05_29  
> (cost=168006.10..4087526.04 rows=8342796 width=355)
> |  Recheck Cond: ((recordopeningtime >= '2016-05-29 
> 00:00:00'::timestamp without time zone) AND (recordopeningtime < '2016-05-30 
> 00:00:00'::timestamp without time zone))
> |  ->  Bitmap Index Scan on 
> cdrs_huawei_pgwrecord_2016_05_29_recordopeningtime_idx  (cost=0.00..165920.40 
> rows=8342796 width=0)
> |Index Cond: ((recordopeningtime >= '2016-05-29 
> 00:00:00'::timestamp without time zone) AND (recordopeningtime < '2016-05-30 
> 00:00:00'::timestamp without time zone))
>
> Here's a minimal query which seems to isolate the symptom:
>
> ts=# explain (analyze,buffers) SELECT sum(duration) FROM 
> cdrs_huawei_pgwrecord_2016_05_22 WHERE recordopeningtime>='2016-05-22' AND 
> recordopeningtime<'2016-05-23';
> | Aggregate  (cost=2888731.67..2888731.68 rows=1 width=8) (actual 
> time=388661.892..388661.892 rows=1 loops=1)
> |   Buffers: shared hit=4058501 read=1295147 written=35800
> |   ->  Index Scan using 
> cdrs_huawei_pgwrecord_2016_05_22_recordopeningtime_idx on 
> cdrs_huawei_pgwrecord_2016_05_22  (cost=0.56..2867075.33 rows=8662534 w
> |idth=8) (actual time=0.036..379332.910 rows=8575673 loops=1)
> | Index Cond: ((recordopeningtime >= '2016-05-22 00:00:00'::timestamp 
> without time zone) AND (recordopeningtime < '2016-05-23 00:00:00'::timestamp
> | without time zone))
> | Buffers: shared hit=4058501 read=1295147 written=35800
> | Planning time: 0.338 ms
> | Execution time: 388661.947 ms
>
> And here's an older one to avoid cache, with enable_indexscan=0
> |ts=# explain (analyze,buffers)  SELECT sum(duration) FROM 
> cdrs_huawei_pgwrecord_2016_05_08 WHERE recordopeningtime>='2016-05-08' AND 
> recordopeningtime<'2016-05-09';
> | Aggregate  (cost=10006286.58..10006286.59 rows=1 width=8) (actual 
> time=44219.156..44219.156 rows=1 loops=1)
> |   Buffers: shared hit=118 read=1213887 written=50113
> |   ->  Bitmap Heap Scan on cdrs_huawei_pgwrecord_2016_05_08  
> (cost=85142.24..9985848.96 rows=8175048 width=8) (actual 
> time=708.024..40106.062 rows=8179338 loops=1)
> | Recheck Cond: ((recordopeningtime >= '2016-05-08 
> 00:00:00'::timestamp without time zone) AND (recordopeningtime < '2016-05-09 
> 00:00:00'::timestamp without time zone))
> | Rows Removed by Index Recheck: 74909
> | Heap Blocks: lossy=1213568
> | Buffers: shared hit=118 read=1213887 written=50113
> | ->  Bitmap Index Scan on 
> cdrs_huawei_pgwrecord_2016_05_08_recordopeningtime_idx1  (cost=0.00..83098.48 
> rows=8175048 width=0) (actual time=706.557..706.557 rows=12135680 loops=1)
> |   Index Cond: ((recordopeningtime >= '2016-05-08 
> 00:00:00'::timestamp without time zone) AND (recordopeningtime < '2016-05-09 
> 00:00:00'::timestamp without time zone))
> |   Buffers: shared hit=117 read=320
> | Planning time: 214.786 ms
> | Execution time: 44228.874 ms
> |(12 rows)


Correct me if I'm wrong, but this looks like the planner not
accounting for correlation when using bitmap heap scans.

Checking the source, it really doesn't.

So correlated index scans look extra favourable vs bitmap index scans
because bitmap heap scans consider random page costs sans correlation
effects (even though correlation applies to bitmap heap scans as
well). While that sounds desirable a priori, it seems it's hurting
this case quite badly.

I'm not sure there's any simple way of working around that.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] index fragmentation on insert-only table with non-unique column

2016-06-03 Thread Justin Pryzby
On Fri, Jun 03, 2016 at 06:26:33PM -0300, Claudio Freire wrote:
> On Wed, May 25, 2016 at 11:00 AM, Justin Pryzby  wrote:
> >> > First, I found I was able to get 30-50min query results on full week's 
> >> > table by
> >> > prefering a seq scan to an index scan.  The row estimates seemed fine, 
> >> > and the
> >> > only condition is the timestamp, so the planner's use of index scan is as
> >> > expected.
> >>
> >> Can you show us the query?  I would expect a bitmap scan of the index
> >> (which would do what you want, but even more so), instead.
> > See explain, also showing additional tables/views being joined.  It's NOT 
> > doing
> > a bitmap scan though, and I'd be interested to find why; I'm sure that 
> > would've
> > improved this query enough so it never would've been an issue.
> > https://explain.depesz.com/s/s8KP
> >
> >  ->  Index Scan using 
> > cdrs_huawei_pgwrecord_2016_05_01_recordopeningtime_idx on 
> > cdrs_huawei_pgwrecord_2016_05_01  (cost=0.56..1601734.57 rows=8943848 
> > width=349)
> >Index Cond: ((recordopeningtime >= '2016-05-07 00:00:00'::timestamp 
> > without time zone) AND (recordopeningtime < '2016-05-08 
> > 00:00:00'::timestamp without time zone))
> 
> Please show your guc settings ( see
> https://wiki.postgresql.org/wiki/Server_Configuration )
> 
> A plan node like that, if it would result in I/O, with proper
> configuration should have selected a bitmap index/heap scan. If it
> didn't, it probably thinks it has more cache than it really does, and
> that would mean the wrong setting was set in effective_cache_size.

ts=# SELECT name, current_setting(name), SOURCE FROM pg_settings WHERE 
SOURCE='configuration file';
 dynamic_shared_memory_type  | posix   | configuration file
 effective_cache_size| 64GB| configuration file
 effective_io_concurrency| 8   | configuration file
 huge_pages  | try | configuration file
 log_autovacuum_min_duration | 0   | configuration file
 log_checkpoints | on  | configuration file
 maintenance_work_mem| 6GB | configuration file
 max_connections | 200 | configuration file
 max_wal_size| 4GB | configuration file
 min_wal_size| 6GB | configuration file
 shared_buffers  | 8GB | configuration file
 wal_compression | on  | configuration file
 work_mem| 1GB | configuration file

I changed at least maintenance_work_mem since I originally wrote, to try to
avoid tempfiles during REINDEX (though I'm not sure it matters, as the
tempfiles are effective cached and may never actually be written).

It's entirely possible those settings aren't ideal.  The server has 72GB RAM.
There are usually very few (typically n<3 but at most a handful) nontrivial
queries running at once, if at all.

I wouldn't expect any data that's not recent (table data last 2 days or index
from this month) to be cached, and wouldn't expect that to be entirely cached,
either:

ts=# SELECT sum(pg_table_size(oid))/1024^3 gb FROM pg_class WHERE 
relname~'_2016_05_..$';
gb | 425.783050537109

ts=# SELECT sum(pg_table_size(oid))/1024^3 gb FROM pg_class WHERE 
relname~'_2016_05_...*idx';
gb | 60.0909423828125

ts=# SELECT sum(pg_table_size(oid))/1024^3 gb FROM pg_class WHERE 
relname~'_201605.*idx';
gb | 4.85528564453125

ts=# SELECT sum(pg_table_size(oid))/1024^3 gb FROM pg_class WHERE 
relname~'_201605$';
gb | 86.8688049316406

As a test, I did SET effective_cache_size='1MB', before running explain, and
still does:

|->  Index Scan using 
cdrs_huawei_pgwrecord_2016_05_29_recordopeningtime_idx on 
cdrs_huawei_pgwrecord_2016_05_29  (cost=0.44..1526689.49 rows=8342796 width=355)
|  Index Cond: ((recordopeningtime >= '2016-05-29 
00:00:00'::timestamp without time zone) AND (recordopeningtime < '2016-05-30 
00:00:00'::timestamp without time zone))

I Set enable_indexscan=0, and got:

|->  Bitmap Heap Scan on cdrs_huawei_pgwrecord_2016_05_29  
(cost=168006.10..4087526.04 rows=8342796 width=355)
|  Recheck Cond: ((recordopeningtime >= '2016-05-29 
00:00:00'::timestamp without time zone) AND (recordopeningtime < '2016-05-30 
00:00:00'::timestamp without time zone))
|  ->  Bitmap Index Scan on 
cdrs_huawei_pgwrecord_2016_05_29_recordopeningtime_idx  (cost=0.00..165920.40 
rows=8342796 width=0)
|Index Cond: ((recordopeningtime >= '2016-05-29 
00:00:00'::timestamp without time zone) AND (recordopeningtime < '2016-05-30 
00:00:00'::timestamp without time zone))

Here's a minimal query which seems to isolate the symptom:

ts=# explain (analyze,buffers) SELECT sum(duration) FROM 
cdrs_huawei_pgwrecord_2016_05_22 WHERE recordopeningtime>='2016-05-22' AND 

[PERFORM] slony rpm help slony1-95-2.2.2-1.rhel6.x86_64

2016-06-03 Thread avi Singh
Hi All
 Can anyone please point me to location from where i can get slony
slony1-95-2.2.2-1.rhel5.x86_64

  rpm. I'm upgrading database from version 9.3 to 9.5. Current version of
rpm we are using is  slony1-93-2.2.2-1.el5.x86_64 and the one that is
available on postgresql website for 9.5 is slony1-95-2.2.4-4.rhel5.x86_64

  which is not compatible and throws an error when i test the upgrade.  In
the past i was able to find the 2.2.2-1 version rpm for previous versions
on postgres website but not this time for postgresql 9.5



Thanks
Avi


Re: [PERFORM] index fragmentation on insert-only table with non-unique column

2016-06-03 Thread Claudio Freire
On Wed, May 25, 2016 at 11:00 AM, Justin Pryzby  wrote:
>> > First, I found I was able to get 30-50min query results on full week's 
>> > table by
>> > prefering a seq scan to an index scan.  The row estimates seemed fine, and 
>> > the
>> > only condition is the timestamp, so the planner's use of index scan is as
>> > expected.
>>
>> Can you show us the query?  I would expect a bitmap scan of the index
>> (which would do what you want, but even more so), instead.
> See explain, also showing additional tables/views being joined.  It's NOT 
> doing
> a bitmap scan though, and I'd be interested to find why; I'm sure that 
> would've
> improved this query enough so it never would've been an issue.
> https://explain.depesz.com/s/s8KP
>
>  ->  Index Scan using cdrs_huawei_pgwrecord_2016_05_01_recordopeningtime_idx 
> on cdrs_huawei_pgwrecord_2016_05_01  (cost=0.56..1601734.57 rows=8943848 
> width=349)
>Index Cond: ((recordopeningtime >= '2016-05-07 00:00:00'::timestamp 
> without time zone) AND (recordopeningtime < '2016-05-08 00:00:00'::timestamp 
> without time zone))

Please show your guc settings ( see
https://wiki.postgresql.org/wiki/Server_Configuration )

A plan node like that, if it would result in I/O, with proper
configuration should have selected a bitmap index/heap scan. If it
didn't, it probably thinks it has more cache than it really does, and
that would mean the wrong setting was set in effective_cache_size.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance