Re: [PERFORM] Testing in AWS, EBS

2016-05-25 Thread Rayson Ho
Actually, when "EBS-Optimized" is on, then the instance gets dedicated
bandwidth to EBS.

Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html



On Wed, May 25, 2016 at 7:56 PM, Yves Dorfsman  wrote:

> Indeed, old-style disk EBS vs new-style SSd EBS.
>
> Be aware that EBS traffic is considered as part of the total "network"
> traffic, and each type of instance has different limits on maximum network
> throughput. Those difference are very significant, do tests on the same
> volume
> between two different type of instances, both with enough cpu and memory
> for
> the I/O to be the bottleneck, you will be surprised!
>
>
> On 2016-05-25 17:02, Rayson Ho wrote:
> > There are many factors that can affect EBS performance. For example, the
> type
> > of EBS volume, the instance type, whether EBS-optimized is turned on or
> not, etc.
> >
> > Without the details, then there is no apples to apples comparsion...
> >
> > Rayson
> >
> > ==
> > Open Grid Scheduler - The Official Open Source Grid Engine
> > http://gridscheduler.sourceforge.net/
> > http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html
> >
> >
> >
> > On Wed, May 25, 2016 at 6:34 PM, Tory M Blue  > > wrote:
> >>
> >> We are starting some testing in AWS, with EC2, EBS backed setups.
> >>
> >> What I found interesting today, was a single EBS 1TB volume, gave me
> >> something like 108MB/s throughput, however a RAID10 (4 250GB EBS
> >> volumes), gave me something like 31MB/s (test after test after test).
> >>
> >> I'm wondering what you folks are using inside of Amazon (not
> >> interested in RDS at the moment).
> >>
> >> Thanks
> >> Tory
> >>
> >>
> >> --
> >> Sent via pgsql-performance mailing list (
> pgsql-performance@postgresql.org
> > )
> >> To make changes to your subscription:
> >> http://www.postgresql.org/mailpref/pgsql-performance
>
>
> --
> http://yves.zioup.com
> gpg: 4096R/32B0F416
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] Testing in AWS, EBS

2016-05-25 Thread Yves Dorfsman
Indeed, old-style disk EBS vs new-style SSd EBS.

Be aware that EBS traffic is considered as part of the total "network"
traffic, and each type of instance has different limits on maximum network
throughput. Those difference are very significant, do tests on the same volume
between two different type of instances, both with enough cpu and memory for
the I/O to be the bottleneck, you will be surprised!


On 2016-05-25 17:02, Rayson Ho wrote:
> There are many factors that can affect EBS performance. For example, the type
> of EBS volume, the instance type, whether EBS-optimized is turned on or not, 
> etc.
> 
> Without the details, then there is no apples to apples comparsion...
> 
> Rayson
> 
> ==
> Open Grid Scheduler - The Official Open Source Grid Engine
> http://gridscheduler.sourceforge.net/
> http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html
> 
> 
> 
> On Wed, May 25, 2016 at 6:34 PM, Tory M Blue  > wrote:
>>
>> We are starting some testing in AWS, with EC2, EBS backed setups.
>>
>> What I found interesting today, was a single EBS 1TB volume, gave me
>> something like 108MB/s throughput, however a RAID10 (4 250GB EBS
>> volumes), gave me something like 31MB/s (test after test after test).
>>
>> I'm wondering what you folks are using inside of Amazon (not
>> interested in RDS at the moment).
>>
>> Thanks
>> Tory
>>
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org
> )
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance


-- 
http://yves.zioup.com
gpg: 4096R/32B0F416



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing in AWS, EBS

2016-05-25 Thread Rayson Ho
There are many factors that can affect EBS performance. For example, the
type of EBS volume, the instance type, whether EBS-optimized is turned on
or not, etc.

Without the details, then there is no apples to apples comparsion...

Rayson

==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html



On Wed, May 25, 2016 at 6:34 PM, Tory M Blue  wrote:
>
> We are starting some testing in AWS, with EC2, EBS backed setups.
>
> What I found interesting today, was a single EBS 1TB volume, gave me
> something like 108MB/s throughput, however a RAID10 (4 250GB EBS
> volumes), gave me something like 31MB/s (test after test after test).
>
> I'm wondering what you folks are using inside of Amazon (not
> interested in RDS at the moment).
>
> Thanks
> Tory
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6

2016-05-25 Thread Vladimir Borodin
Hi all.We have found that queries through PgBouncer 1.7.2 (with transaction pooling) to local PostgreSQL are almost two times slower in 9.5.3 than in 9.4.8 on RHEL 6 hosts (all packages are updated to last versions). Meanwhile the problem can’t be reproduced i.e. on Ubuntu 14.04 (also fully-updated).Here is how the results look like for 9.4, 9.5 and 9.6. All are built from latest commits on yesterday in	* REL9_4_STABLE (a0cc89a28141595d888d8aba43163d58a1578bfb),	* REL9_5_STABLE (e504d915bbf352ecfc4ed335af934e799bf01053),	* master (6ee7fb8244560b7a3f224784b8ad2351107fa55d).All of them are build on the host where testing is done (with stock gcc versions). Sysctls, pgbouncer config and everything we found are the same, postgres configs are default, PGDATA is in tmpfs. All numbers are reproducible, they are stable between runs.Shortly:OS			PostgreSQL version	TPS			Avg. latencyRHEL 6		9.4	44898		1.425 msRHEL 6		9.5	26199		2.443 msRHEL 6		9.5	43027		1.487 msUbuntu 14.04	9.4	67458		0.949 msUbuntu 14.04	9.5	64065		0.999 msUbuntu 14.04	9.6	64350		0.995 msYou could see that the difference between major versions on Ubuntu is not significant, but on RHEL 9.5 is 70% slower than 9.4 and 9.6.Below are more details.RHEL 6:postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg94'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of threads: 64duration: 60 snumber of transactions actually processed: 2693962latency average: 1.425 mstps = 44897.461518 (including connections establishing)tps = 44898.763258 (excluding connections establishing)postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg95'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of threads: 64duration: 60 snumber of transactions actually processed: 1572014latency average: 2.443 mstps = 26198.928627 (including connections establishing)tps = 26199.803363 (excluding connections establishing)postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg96'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of threads: 64duration: 60 snumber of transactions actually processed: 2581645latency average: 1.487 mstps = 43025.676995 (including connections establishing)tps = 43027.038275 (excluding connections establishing)postgres@pgload05g ~ $Ubuntu 14.04 (the same hardware):postgres@pgloadpublic02:~$ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg94'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of threads: 64duration: 60 snumber of transactions actually processed: 4047653latency average: 0.949 mstps = 67458.361515 (including connections establishing)tps = 67459.983480 (excluding connections establishing)postgres@pgloadpublic02:~$ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg95'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of threads: 64duration: 60 snumber of transactions actually processed: 3844084latency average: 0.999 mstps = 64065.447458 (including connections establishing)tps = 64066.943627 (excluding connections establishing)postgres@pgloadpublic02:~$ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=6432 dbname=pg96'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of threads: 64duration: 60 snumber of transactions actually processed: 3861088latency average: 0.995 mstps = 64348.573126 (including connections establishing)tps = 64350.195750 (excluding connections establishing)postgres@pgloadpublic02:~$In both tests (RHEL and Ubuntu) the bottleneck is performance of singe CPU core which is 100% consumed by PgBouncer. If pgbench connects to postgres directly I get the following (expected) numbers:postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=5432'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of threads: 64duration: 60 snumber of transactions actually processed: 10010710latency average: 0.384 mstps = 166835.937859 (including connections establishing)tps = 166849.730224 (excluding connections establishing)postgres@pgload05g ~ $ /usr/lib/postgresql/9.4/bin/pgbench -U postgres -T 60 -j 64 -c 64 -S -n 'host=localhost port=5433'transaction type: SELECT onlyscaling factor: 100query mode: simplenumber of clients: 64number of threads: 64duration: 60 snumber of transactions actually processed: 13373890latency average: 0.287 mstps = 222888.311289 (including connections establishing)tps = 

Re: [PERFORM] index fragmentation on insert-only table with non-unique column

2016-05-25 Thread Justin Pryzby
On Tue, May 24, 2016 at 11:23:48PM -0700, Jeff Janes wrote:
> > But note the non-uniqueness of the index column:
> > ts=# SELECT recordopeningtime, COUNT(1) FROM cdrs_huawei_pgwrecord WHERE 
> > recordopeningtime>='2016-05-21' AND recordopeningtime<'2016-05-22' GROUP BY 
> > 1 ORDER BY 2 DESC;
> >   recordopeningtime  | count
> > -+---
> >  2016-05-21 12:17:29 |   176
> >  2016-05-21 12:17:25 |   171
> >  2016-05-21 13:11:33 |   170
> >  2016-05-21 10:20:02 |   169
> >  2016-05-21 11:30:02 |   167
> > [...]
> 
> That is not that much duplication.  You aren't going to have dozens or
> hundreds of leaf pages all with equal values.   (and you only showed
> the most highly duplicated ones, presumably the average is much less)

Point taken, but it's not that great of a range either:

ts=# SELECT recordopeningtime, COUNT(1) FROM cdrs_huawei_pgwrecord WHERE 
recordopeningtime>='2016-05-21' AND recordopeningtime<'2016-05-22' GROUP BY 1 
ORDER BY 2 LIMIT 19;
  recordopeningtime  | count 
-+---
 2016-05-21 03:10:05 |44
 2016-05-21 03:55:05 |44
 2016-05-21 04:55:05 |45

ts=# SELECT count(distinct recordopeningtime) FROM cdrs_huawei_pgwrecord WHERE 
recordopeningtime>='2016-05-21' AND recordopeningtime<'2016-05-22';
-[ RECORD 1 ]
count | 86400

ts=# SELECT count(recordopeningtime) FROM cdrs_huawei_pgwrecord WHERE 
recordopeningtime>='2016-05-21' AND recordopeningtime<'2016-05-22';
-[ RECORD 1 ]--
count | 8892865

> > We have an daily analytic query which processes the previous day's data.  
> > For
> > new child tables, with only 1 days data loaded, this runs in ~30min, and for
> > child tables with an entire week's worth of data loaded, takes several hours
> > (even though both queries process the same amount of data).
> 
> For an append only table, why would the first day of a new partition
> be any less fragmented than that same day would be a week from now?
> Are you sure it isn't just that your week-old data has all been aged
> out of the cache?
I don't think it's cache effect, since we're not using the source table for
(maybe anything) else the entire rest of the day.  Server has 72GB RAM, same
size one the largest of the tables being joined (beginning) at 4am.

I didn't mean that a given day is more fragmented now than it was last week
(but I don't know, either).  I guess when we do a query on the table with ~32
hours of data in, it might do a seq scan rather than index scan, too.

Compare the end of month partition tables:
ts=# select * FROM 
pgstatindex('cdrs_huawei_pgwrecord_2016_02_29_recordopeningtime_idx');
leaf_fragmentation | 48.6
ts=# select * FROM 
pgstatindex('cdrs_huawei_pgwrecord_2016_03_29_recordopeningtime_idx');
leaf_fragmentation | 48.38
ts=# select * FROM 
pgstatindex('cdrs_huawei_pgwrecord_2016_04_29_recordopeningtime_idx');
leaf_fragmentation | 48.6
ts=# SELECT * FROM 
pgstatindex('cdrs_huawei_pgwrecord_2016_04_22_recordopeningtime_idx');
leaf_fragmentation | 48.66
ts=# SELECT * FROM 
pgstatindex('cdrs_huawei_pgwrecord_2016_03_22_recordopeningtime_idx');
leaf_fragmentation | 48.27
ts=# SELECT * FROM 
pgstatindex('cdrs_huawei_pgwrecord_2016_02_22_recordopeningtime_idx');
leaf_fragmentation | 48

This one I reindexed as a test:
ts=# SELECT * FROM 
pgstatindex('cdrs_huawei_pgwrecord_2016_05_01_recordopeningtime_idx');
leaf_fragmentation | 0.01

.. and query ran in ~30min (reran a 2nd time, with cache effects: 25min).

> > First, I found I was able to get 30-50min query results on full week's 
> > table by
> > prefering a seq scan to an index scan.  The row estimates seemed fine, and 
> > the
> > only condition is the timestamp, so the planner's use of index scan is as
> > expected.
> 
> Can you show us the query?  I would expect a bitmap scan of the index
> (which would do what you want, but even more so), instead.
See explain, also showing additional tables/views being joined.  It's NOT doing
a bitmap scan though, and I'd be interested to find why; I'm sure that would've
improved this query enough so it never would've been an issue.
https://explain.depesz.com/s/s8KP

 ->  Index Scan using cdrs_huawei_pgwrecord_2016_05_01_recordopeningtime_idx on 
cdrs_huawei_pgwrecord_2016_05_01  (cost=0.56..1601734.57 rows=8943848 width=349)
   Index Cond: ((recordopeningtime >= '2016-05-07 00:00:00'::timestamp 
without time zone) AND (recordopeningtime < '2016-05-08 00:00:00'::timestamp 
without time zone))

> > AFAICT what's happening is that the index scan was returning pages
> > nonsequentially.  strace-ing the backend showed alternating lseek()s and
> > read()s, with the offsets not consistently increasing (nor consistently
> > decreasing):
..
> 
> Which of those are the table, and which the index?
Those weren't necessarily strace of the same process; I believe both of these
were table data/heap, and didn't include any index access.

> Something doesn't add up here.  How could an index of an append-only
> table possibly become that 

Re: [PERFORM] index fragmentation on insert-only table with non-unique column

2016-05-25 Thread Justin Pryzby
On Tue, May 24, 2016 at 09:16:20PM -0700, Peter Geoghegan wrote:
> On Tue, May 24, 2016 at 10:39 AM, Justin Pryzby  wrote:
> > Postgres seems to assume that the high degree of correlation of the table
> > column seen in pg_stats is how it will get data from the index scan, which
> > assumption seems to be very poor on what turns out to be a higly fragmented
> > index.  Is there a way to help it to understand otherwise??
> 
> Your complaint is vague. Are you complaining about the planner making
> a poor choice? I don't think that's the issue here, because you never
> made any firm statement about the planner making a choice that was
> worth than an alternative that it had available.

I was thinking there a few possible places to make improvements: the planner
could have understood that scans of non-unique indices don't result in strictly
sequential scans of the table, the degree of non-sequentialness being
determined by the column statistics, and perhaps by properties of the index
itself.

Or the INSERT code or btree scan could improve on this, even if tuples aren't
fully ordered.

> If you're arguing for the idea that B-Trees should reliably keep
> tuples in order by a tie-break condition, that seems difficult to
> implement, and likely not worth it in practice.

I had the awful idea to change the index to use (recordopeningtime,ctid).
Maybe somebody will convince me otherwise, but may actually work better than
trying to reindex this table daily by 4am.

Thanks,
Justin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] index fragmentation on insert-only table with non-unique column

2016-05-25 Thread Jeff Janes
On Tue, May 24, 2016 at 10:39 AM, Justin Pryzby  wrote:
> Summary: Non-unique btree indices are returning CTIDs for rows with same
> value of indexed column not in logical order, imposing a high performance
> penalty.
>
> Running PG 9.5.3 now, we have a time-based partitions of append-only tables
> with data loaded from other sources.  The tables are partitioned by time, and
> timestamp column has an non-unique, not-null btree index.
>
> The child tables are each ~75GB and expected to keep growing.  For a child
> table with a week's worth of data:
> relpages  | 11255802
> reltuples | 5.90502e+07
>
> The data is loaded shortly after it's available, so have high correlation in
> pg_statistic:
> [pryzbyj@viaero ~]$ psql ts -c "SELECT tablename, correlation, n_distinct 
> FROM pg_stats s JOIN pg_class c ON (c.relname=s.tablename) WHERE tablename 
> LIKE 'cdrs_huawei_pgwrecord%' AND attname='recordopeningtime' ORDER BY 1" 
> |head
> tablename | correlation | n_distinct
> --+-+
>  cdrs_huawei_pgwrecord|0.97 | 102892
>  cdrs_huawei_pgwrecord_2016_02_15 |0.999658 |  96145
>  cdrs_huawei_pgwrecord_2016_02_22 |0.43 |  91916
>  cdrs_huawei_pgwrecord_2016_02_29 |0.997219 |  50341
>  cdrs_huawei_pgwrecord_2016_03_01 |0.47 |  97485
>
> But note the non-uniqueness of the index column:
> ts=# SELECT recordopeningtime, COUNT(1) FROM cdrs_huawei_pgwrecord WHERE 
> recordopeningtime>='2016-05-21' AND recordopeningtime<'2016-05-22' GROUP BY 1 
> ORDER BY 2 DESC;
>   recordopeningtime  | count
> -+---
>  2016-05-21 12:17:29 |   176
>  2016-05-21 12:17:25 |   171
>  2016-05-21 13:11:33 |   170
>  2016-05-21 10:20:02 |   169
>  2016-05-21 11:30:02 |   167
> [...]

That is not that much duplication.  You aren't going to have dozens or
hundreds of leaf pages all with equal values.   (and you only showed
the most highly duplicated ones, presumably the average is much less)


> We have an daily analytic query which processes the previous day's data.  For
> new child tables, with only 1 days data loaded, this runs in ~30min, and for
> child tables with an entire week's worth of data loaded, takes several hours
> (even though both queries process the same amount of data).

For an append only table, why would the first day of a new partition
be any less fragmented than that same day would be a week from now?
Are you sure it isn't just that your week-old data has all been aged
out of the cache?


> First, I found I was able to get 30-50min query results on full week's table 
> by
> prefering a seq scan to an index scan.  The row estimates seemed fine, and the
> only condition is the timestamp, so the planner's use of index scan is as
> expected.

Can you show us the query?  I would expect a bitmap scan of the index
(which would do what you want, but even more so), instead.

>
> AFAICT what's happening is that the index scan was returning pages
> nonsequentially.  strace-ing the backend showed alternating lseek()s and
> read()s, with the offsets not consistently increasing (nor consistently
> decreasing):
> % sudo strace -p 25588 2>&1 |grep -m9 'lseek(773'
> lseek(773, 1059766272, SEEK_SET)= 1059766272
> lseek(773, 824926208, SEEK_SET) = 824926208
> lseek(773, 990027776, SEEK_SET) = 990027776
> lseek(773, 990330880, SEEK_SET) = 990330880
> lseek(773, 1038942208, SEEK_SET)= 1038942208
> lseek(773, 1059856384, SEEK_SET)= 1059856384
> lseek(773, 977305600, SEEK_SET) = 977305600
> lseek(773, 990347264, SEEK_SET) = 990347264
> lseek(773, 871096320, SEEK_SET) = 871096320
>
> .. and consecutive read()s being rare:
> read(802, "g"..., 8192) = 8192
> lseek(802, 918003712, SEEK_SET) = 918003712
> read(802, "c"..., 8192) = 8192
> lseek(802, 859136000, SEEK_SET) = 859136000
> read(802, "a"..., 8192) = 8192
> lseek(802, 919601152, SEEK_SET) = 919601152
> read(802, "d"..., 8192) = 8192
> lseek(802, 905101312, SEEK_SET) = 905101312
> read(802, "c"..., 8192) = 8192
> lseek(801, 507863040, SEEK_SET) = 507863040
> read(801, "p"..., 8192) = 8192
> lseek(802, 914235392, SEEK_SET) = 914235392
> read(802, "c"..., 8192) = 8192


Which of those are the table, and which the index?

Something doesn't add up here.  How could an index of an append-only
table possibly become that fragmented, when the highest amount of key
duplication is about 170?

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance