Re: [PERFORM] Very poor read performance, query independent

2017-07-14 Thread Mark Kirkwood
Thinking about this a bit more - if somewhat more blazing performance is 
needed, then this could be achieved via losing the RAID card and 
spinning disks altogether and buying 1 of the NVME or SATA solid state 
products: e.g


- Samsung 960 Pro or Evo 2 TB (approx 1 or 2 GB/s seq scan speeds and 
200K IOPS)


- Intel S3610 or similar 1.2 TB (500 MB/s seq scan and 30K IOPS)


The Samsung needs an M.2 port on the mobo (but most should have 'em - 
and if not PCIe X4 adapter cards are quite cheap). The Intel is a bit 
more expensive compared to the Samsung, and is slower but has a longer 
lifetime. However for your workload the Sammy is probably fine.


regards

Mark

On 15/07/17 11:09, Mark Kirkwood wrote:
Ah yes - that seems more sensible (but still slower than I would 
expect for 5 disks RAID 0).




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very poor read performance, query independent

2017-07-14 Thread Mark Kirkwood
Ah yes - that seems more sensible (but still slower than I would expect 
for 5 disks RAID 0). You should be able to get something like 5 * 
(single disk speed) i.e about 500MB/s.


Might be worth increasing device read ahead (more than you have 
already). Some of these so-called 'smart' RAID cards need to be hit over 
the head before they will perform. E.g: I believe you have it set to 128 
- I'd try 4096 or even 16384 (In the past I've used those settings on 
some extremely stupid cards that refused to max out their disks known 
speeds).


Also worth investigating is RAID stripe size - for DW work it makes 
sense for it to be reasonably big (256K to 1M), which again will help 
speed is sequential scans.


Cheers

Mark


On 15/07/17 02:09, Charles Nadeau wrote:

Mark,

First I must say that I changed my disks configuration from 4 disks in 
RAID 10 to 5 disks in RAID 0 because I almost ran out of disk space 
during the last ingest of data.

Here is the result test you asked. It was done with a cold cache:

flows=# \timing
Timing is on.
flows=# explain select count(*) from flows;
  QUERY PLAN

---
 Finalize Aggregate  (cost=17214914.09..17214914.09 rows=1 width=8)
   ->  Gather  (cost=17214914.07..17214914.09 rows=1 width=8)
 Workers Planned: 1
 ->  Partial Aggregate  (cost=17213914.07..17213914.07
rows=1 width=8)
   ->  Parallel Seq Scan on flows
 (cost=0.00..17019464.49 rows=388899162 width=0)
(5 rows)

Time: 171.835 ms
flows=# select pg_relation_size('flows');
 pg_relation_size
--
 129865867264
(1 row)

Time: 57.157 ms
flows=# select count(*) from flows;
LOG:  duration: 625546.522 ms  statement: select count(*) from flows;
   count
---
 589831190
(1 row)

Time: 625546.662 ms

The throughput reported by Postgresql is almost 198MB/s, and the 
throughput as mesured by dstat during the query execution was between 
25 and 299MB/s. It is much better than what I had before! The i/o wait 
was about 12% all through the query. One thing I noticed is the 
discrepency between the read throughput reported by pg_activity and 
the one reported by dstat: pg_activity always report a value lower 
than dstat.


Besides the change of disks configuration, here is what contributed 
the most to the improvment of the performance so far:


Using Hugepage
Increasing effective_io_concurrency to 256
Reducing random_page_cost from 22 to 4
Reducing min_parallel_relation_size to 512kB to have more workers
when doing sequential parallel scan of my biggest table


Thanks for recomending this test, I now know what the real throughput 
should be!


Charles

On Wed, Jul 12, 2017 at 4:11 AM, Mark Kirkwood 
> 
wrote:


Hmm - how are you measuring that sequential scan speed of 4MB/s?
I'd recommend doing a very simple test e.g, here's one on my
workstation - 13 GB single table on 1 SATA drive - cold cache
after reboot, sequential scan using Postgres 9.6.2:

bench=#  EXPLAIN SELECT count(*) FROM pgbench_accounts;
 QUERY PLAN


 Aggregate  (cost=2889345.00..2889345.01 rows=1 width=8)
   ->  Seq Scan on pgbench_accounts (cost=0.00..2639345.00
rows=1 width=0)
(2 rows)


bench=#  SELECT pg_relation_size('pgbench_accounts');
 pg_relation_size
--
  13429514240
(1 row)

bench=# SELECT count(*) FROM pgbench_accounts;
   count
---
 1
(1 row)

Time: 118884.277 ms


So doing the math seq read speed is about 110MB/s (i.e 13 GB in
120 sec). Sure enough, while I was running the query iostat showed:

Device: rrqm/s   wrqm/s r/s w/srMB/s wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm %util
sda   0.00 0.00  926.000.00 114.89  0.00 
 254.10 1.902.032.030.00   1.08 100.00



So might be useful for us to see something like that from your
system - note you need to check you really have flushed the cache,
and that no other apps are using the db.

regards

Mark


On 12/07/17 00:46, Charles Nadeau wrote:

After reducing random_page_cost to 4 and testing more, I can
report that the aggregate read throughput for parallel
sequential scan is about 90MB/s. However the throughput for
sequential scan is still around 4MB/s.





--
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:

Re: [PERFORM] Very poor read performance, query independent

2017-07-14 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Igor Neyman
Sent: Friday, July 14, 2017 3:13 PM
To: Charles Nadeau 
Cc: Jeff Janes ; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent

From: Charles Nadeau [mailto:charles.nad...@gmail.com]
Sent: Friday, July 14, 2017 11:35 AM
To: Igor Neyman >
Cc: Jeff Janes >; 
pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent

Igor,

Initially temp_buffer was left to its default value (8MB). Watching the content 
of the directory that stores the temporary files, I found that I need at most 
21GB of temporary files space. Should I set temp_buffer to 21GB?
Here is the explain you requested with work_mem set to 6GB:
flows=# set work_mem='6GB';
SET
flows=# explain (analyze, buffers) SELECT DISTINCT
   srcaddr,
   dstaddr,
   dstport,
   COUNT(*) AS conversation,
   SUM(doctets) / 1024 / 1024 AS mbytes
FROM
   flowscompact,
   mynetworks
WHERE
   mynetworks.ipaddr >>= flowscompact.srcaddr
   AND dstaddr IN
   (
  SELECT
 dstaddr
  FROM
 dstexterne
   )
GROUP BY
   srcaddr,
   dstaddr,
   dstport
ORDER BY
   mbytes DESC LIMIT 50;

   QUERY PLAN

 Limit  (cost=48135680.07..48135680.22 rows=50 width=52) (actual 
time=2227678.196..2227678.223 rows=50 loops=1)
   Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
   ->  Unique  (cost=48135680.07..48143613.62 rows=2644514 width=52) (actual 
time=2227678.194..2227678.217 rows=50 loops=1)
 Buffers: shared hit=728798038 read=82974833, temp read=381154 
written=381154
 ->  Sort  (cost=48135680.07..48137002.33 rows=2644514 width=52) 
(actual time=2227678.192..2227678.202 rows=50 loops=1)
   Sort Key: (((sum(flows.doctets) / '1024'::numeric) / 
'1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport, (count(*))
   Sort Method: quicksort  Memory: 654395kB
   Buffers: shared hit=728798038 read=82974833, temp read=381154 
written=381154
   ->  GroupAggregate  (cost=48059426.65..48079260.50 rows=2644514 
width=52) (actual time=2167909.030..2211446.192 rows=5859671 loops=1)
 Group Key: flows.srcaddr, flows.dstaddr, flows.dstport
 Buffers: shared hit=728798038 read=82974833, temp 
read=381154 written=381154
 ->  Sort  (cost=48059426.65..48060748.90 rows=2644514 
width=20) (actual time=2167896.815..2189107.205 rows=91745640 loops=1)
   Sort Key: flows.srcaddr, flows.dstaddr, flows.dstport
   Sort Method: external merge  Disk: 3049216kB
   Buffers: shared hit=728798038 read=82974833, temp 
read=381154 written=381154
   ->  Gather  (cost=30060688.07..48003007.07 
rows=2644514 width=20) (actual time=1268989.000..1991357.232 rows=91745640 
loops=1)
 Workers Planned: 12
 Workers Launched: 12
 Buffers: shared hit=728798037 read=82974833
 ->  Hash Semi Join  
(cost=30059688.07..47951761.31 rows=220376 width=20) (actual 
time=1268845.181..2007864.725 rows=7057357 loops=13)
   Hash Cond: (flows.dstaddr = 
flows_1.dstaddr)
   Buffers: shared hit=728795193 
read=82974833
   ->  Nested Loop  (cost=0.03..17891246.86 
rows=220376 width=20) (actual time=0.207..723790.283 rows=37910370 loops=13)
 Buffers: shared hit=590692229 
read=14991777
 ->  Parallel Seq Scan on flows  
(cost=0.00..16018049.14 rows=55094048 width=20) (actual time=0.152..566179.117 
rows=45371630 loops=13)
   Buffers: shared hit=860990 
read=14991777
 ->  Index Only Scan using 
mynetworks_ipaddr_idx on mynetworks  (cost=0.03..0.03 rows=1 width=8) (actual 
time=0.002..0.002 rows=1 loops=589831190)
   Index Cond: (ipaddr >>= 
(flows.srcaddr)::ip4r)
   Heap Fetches: 0
   Buffers: shared hit=589831203
   ->  

Re: [PERFORM] Very poor read performance, query independent

2017-07-14 Thread Igor Neyman
From: Charles Nadeau [mailto:charles.nad...@gmail.com]
Sent: Friday, July 14, 2017 11:35 AM
To: Igor Neyman 
Cc: Jeff Janes ; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent

Igor,

Initially temp_buffer was left to its default value (8MB). Watching the content 
of the directory that stores the temporary files, I found that I need at most 
21GB of temporary files space. Should I set temp_buffer to 21GB?
Here is the explain you requested with work_mem set to 6GB:
flows=# set work_mem='6GB';
SET
flows=# explain (analyze, buffers) SELECT DISTINCT
   srcaddr,
   dstaddr,
   dstport,
   COUNT(*) AS conversation,
   SUM(doctets) / 1024 / 1024 AS mbytes
FROM
   flowscompact,
   mynetworks
WHERE
   mynetworks.ipaddr >>= flowscompact.srcaddr
   AND dstaddr IN
   (
  SELECT
 dstaddr
  FROM
 dstexterne
   )
GROUP BY
   srcaddr,
   dstaddr,
   dstport
ORDER BY
   mbytes DESC LIMIT 50;

   QUERY PLAN

 Limit  (cost=48135680.07..48135680.22 rows=50 width=52) (actual 
time=2227678.196..2227678.223 rows=50 loops=1)
   Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
   ->  Unique  (cost=48135680.07..48143613.62 rows=2644514 width=52) (actual 
time=2227678.194..2227678.217 rows=50 loops=1)
 Buffers: shared hit=728798038 read=82974833, temp read=381154 
written=381154
 ->  Sort  (cost=48135680.07..48137002.33 rows=2644514 width=52) 
(actual time=2227678.192..2227678.202 rows=50 loops=1)
   Sort Key: (((sum(flows.doctets) / '1024'::numeric) / 
'1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport, (count(*))
   Sort Method: quicksort  Memory: 654395kB
   Buffers: shared hit=728798038 read=82974833, temp read=381154 
written=381154
   ->  GroupAggregate  (cost=48059426.65..48079260.50 rows=2644514 
width=52) (actual time=2167909.030..2211446.192 rows=5859671 loops=1)
 Group Key: flows.srcaddr, flows.dstaddr, flows.dstport
 Buffers: shared hit=728798038 read=82974833, temp 
read=381154 written=381154
 ->  Sort  (cost=48059426.65..48060748.90 rows=2644514 
width=20) (actual time=2167896.815..2189107.205 rows=91745640 loops=1)
   Sort Key: flows.srcaddr, flows.dstaddr, flows.dstport
   Sort Method: external merge  Disk: 3049216kB
   Buffers: shared hit=728798038 read=82974833, temp 
read=381154 written=381154
   ->  Gather  (cost=30060688.07..48003007.07 
rows=2644514 width=20) (actual time=1268989.000..1991357.232 rows=91745640 
loops=1)
 Workers Planned: 12
 Workers Launched: 12
 Buffers: shared hit=728798037 read=82974833
 ->  Hash Semi Join  
(cost=30059688.07..47951761.31 rows=220376 width=20) (actual 
time=1268845.181..2007864.725 rows=7057357 loops=13)
   Hash Cond: (flows.dstaddr = 
flows_1.dstaddr)
   Buffers: shared hit=728795193 
read=82974833
   ->  Nested Loop  (cost=0.03..17891246.86 
rows=220376 width=20) (actual time=0.207..723790.283 rows=37910370 loops=13)
 Buffers: shared hit=590692229 
read=14991777
 ->  Parallel Seq Scan on flows  
(cost=0.00..16018049.14 rows=55094048 width=20) (actual time=0.152..566179.117 
rows=45371630 loops=13)
   Buffers: shared hit=860990 
read=14991777
 ->  Index Only Scan using 
mynetworks_ipaddr_idx on mynetworks  (cost=0.03..0.03 rows=1 width=8) (actual 
time=0.002..0.002 rows=1 loops=589831190)
   Index Cond: (ipaddr >>= 
(flows.srcaddr)::ip4r)
   Heap Fetches: 0
   Buffers: shared hit=589831203
   ->  Hash  (cost=30059641.47..30059641.47 
rows=13305 width=4) (actual time=1268811.101..1268811.101 rows=3803508 loops=13)
 Buckets: 4194304 (originally 
16384)  Batches: 1 (originally 1)  Memory Usage: 166486kB
 Buffers: shared hit=138102964 
read=67983056
 ->  HashAggregate  
(cost=30059561.64..30059601.56 

Re: [PERFORM] Very poor read performance, query independent

2017-07-14 Thread Charles Nadeau
Igor,

Initially temp_buffer was left to its default value (8MB). Watching the
content of the directory that stores the temporary files, I found that I
need at most 21GB of temporary files space. Should I set temp_buffer to
21GB?
Here is the explain you requested with work_mem set to 6GB:

flows=# set work_mem='6GB';
SET
flows=# explain (analyze, buffers) SELECT DISTINCT
   srcaddr,
   dstaddr,
   dstport,
   COUNT(*) AS conversation,
   SUM(doctets) / 1024 / 1024 AS mbytes
FROM
   flowscompact,
   mynetworks
WHERE
   mynetworks.ipaddr >>= flowscompact.srcaddr
   AND dstaddr IN
   (
  SELECT
 dstaddr
  FROM
 dstexterne
   )
GROUP BY
   srcaddr,
   dstaddr,
   dstport
ORDER BY
   mbytes DESC LIMIT 50;

   QUERY PLAN


 Limit  (cost=48135680.07..48135680.22 rows=50 width=52) (actual
time=2227678.196..2227678.223 rows=50 loops=1)
   Buffers: shared hit=728798038 read=82974833, temp read=381154
written=381154
   ->  Unique  (cost=48135680.07..48143613.62 rows=2644514 width=52)
(actual time=2227678.194..2227678.217 rows=50 loops=1)
 Buffers: shared hit=728798038 read=82974833, temp read=381154
written=381154
 ->  Sort  (cost=48135680.07..48137002.33 rows=2644514 width=52)
(actual time=2227678.192..2227678.202 rows=50 loops=1)
   Sort Key: (((sum(flows.doctets) / '1024'::numeric) /
'1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport,
(count(*))
   Sort Method: quicksort  Memory: 654395kB
   Buffers: shared hit=728798038 read=82974833, temp
read=381154 written=381154
   ->  GroupAggregate  (cost=48059426.65..48079260.50
rows=2644514 width=52) (actual time=2167909.030..2211446.192 rows=5859671
loops=1)
 Group Key: flows.srcaddr, flows.dstaddr, flows.dstport
 Buffers: shared hit=728798038 read=82974833, temp
read=381154 written=381154
 ->  Sort  (cost=48059426.65..48060748.90 rows=2644514
width=20) (actual time=2167896.815..2189107.205 rows=91745640 loops=1)
   Sort Key: flows.srcaddr, flows.dstaddr,
flows.dstport
   Sort Method: external merge  Disk: 3049216kB
   Buffers: shared hit=728798038 read=82974833,
temp read=381154 written=381154
   ->  Gather  (cost=30060688.07..48003007.07
rows=2644514 width=20) (actual time=1268989.000..1991357.232 rows=91745640
loops=1)
 Workers Planned: 12
 Workers Launched: 12
 Buffers: shared hit=728798037 read=82974833
 ->  Hash Semi Join
 (cost=30059688.07..47951761.31 rows=220376 width=20) (actual
time=1268845.181..2007864.725 rows=7057357 loops=13)
   Hash Cond: (flows.dstaddr =
flows_1.dstaddr)
   Buffers: shared hit=728795193
read=82974833
   ->  Nested Loop
 (cost=0.03..17891246.86 rows=220376 width=20) (actual
time=0.207..723790.283 rows=37910370 loops=13)
 Buffers: shared hit=590692229
read=14991777
 ->  Parallel Seq Scan on flows
 (cost=0.00..16018049.14 rows=55094048 width=20) (actual
time=0.152..566179.117 rows=45371630 loops=13)
   Buffers: shared
hit=860990 read=14991777
 ->  Index Only Scan using
mynetworks_ipaddr_idx on mynetworks  (cost=0.03..0.03 rows=1 width=8)
(actual time=0.002..0.002 rows=1 loops=589831190)
   Index Cond: (ipaddr >>=
(flows.srcaddr)::ip4r)
   Heap Fetches: 0
   Buffers: shared
hit=589831203
   ->  Hash
 (cost=30059641.47..30059641.47 rows=13305 width=4) (actual
time=1268811.101..1268811.101 rows=3803508 loops=13)
 Buckets: 4194304 (originally
16384)  Batches: 1 (originally 1)  Memory Usage: 166486kB
 Buffers: shared hit=138102964
read=67983056
 ->  HashAggregate
 (cost=30059561.64..30059601.56 rows=13305 width=4) (actual
time=1265248.165..1267432.083 rows=3803508 loops=13)
   Group Key:
flows_1.dstaddr
   Buffers: shared
hit=138102964 read=67983056
   ->  Nested Loop Anti
Join  (cost=0.00..29729327.92 rows=660467447 width=4) (actual

Re: [PERFORM] Very poor read performance, query independent

2017-07-14 Thread Charles Nadeau
Mark,

First I must say that I changed my disks configuration from 4 disks in RAID
10 to 5 disks in RAID 0 because I almost ran out of disk space during the
last ingest of data.
Here is the result test you asked. It was done with a cold cache:

flows=# \timing
Timing is on.
flows=# explain select count(*) from flows;
  QUERY PLAN


---
 Finalize Aggregate  (cost=17214914.09..17214914.09 rows=1 width=8)
   ->  Gather  (cost=17214914.07..17214914.09 rows=1 width=8)
 Workers Planned: 1
 ->  Partial Aggregate  (cost=17213914.07..17213914.07 rows=1
width=8)
   ->  Parallel Seq Scan on flows  (cost=0.00..17019464.49
rows=388899162 width=0)
(5 rows)

Time: 171.835 ms
flows=# select pg_relation_size('flows');
 pg_relation_size
--
 129865867264
(1 row)

Time: 57.157 ms
flows=# select count(*) from flows;
LOG:  duration: 625546.522 ms  statement: select count(*) from flows;
   count
---
 589831190
(1 row)

Time: 625546.662 ms

The throughput reported by Postgresql is almost 198MB/s, and the throughput
as mesured by dstat during the query execution was between 25 and 299MB/s.
It is much better than what I had before! The i/o wait was about 12% all
through the query. One thing I noticed is the discrepency between the read
throughput reported by pg_activity and the one reported by dstat:
pg_activity always report a value lower than dstat.

Besides the change of disks configuration, here is what contributed the
most to the improvment of the performance so far:

Using Hugepage
Increasing effective_io_concurrency to 256
Reducing random_page_cost from 22 to 4
Reducing min_parallel_relation_size to 512kB to have more workers when
doing sequential parallel scan of my biggest table


Thanks for recomending this test, I now know what the real throughput
should be!

Charles

On Wed, Jul 12, 2017 at 4:11 AM, Mark Kirkwood <
mark.kirkw...@catalyst.net.nz> wrote:

> Hmm - how are you measuring that sequential scan speed of 4MB/s? I'd
> recommend doing a very simple test e.g, here's one on my workstation - 13
> GB single table on 1 SATA drive - cold cache after reboot, sequential scan
> using Postgres 9.6.2:
>
> bench=#  EXPLAIN SELECT count(*) FROM pgbench_accounts;
>  QUERY PLAN
> 
> 
>  Aggregate  (cost=2889345.00..2889345.01 rows=1 width=8)
>->  Seq Scan on pgbench_accounts (cost=0.00..2639345.00 rows=1
> width=0)
> (2 rows)
>
>
> bench=#  SELECT pg_relation_size('pgbench_accounts');
>  pg_relation_size
> --
>   13429514240
> (1 row)
>
> bench=# SELECT count(*) FROM pgbench_accounts;
>count
> ---
>  1
> (1 row)
>
> Time: 118884.277 ms
>
>
> So doing the math seq read speed is about 110MB/s (i.e 13 GB in 120 sec).
> Sure enough, while I was running the query iostat showed:
>
> Device: rrqm/s   wrqm/s r/s w/srMB/s wMB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> sda   0.00 0.00  926.000.00 114.89 0.00   254.10
>1.902.032.030.00   1.08 100.00
>
>
> So might be useful for us to see something like that from your system -
> note you need to check you really have flushed the cache, and that no other
> apps are using the db.
>
> regards
>
> Mark
>
>
> On 12/07/17 00:46, Charles Nadeau wrote:
>
>> After reducing random_page_cost to 4 and testing more, I can report that
>> the aggregate read throughput for parallel sequential scan is about 90MB/s.
>> However the throughput for sequential scan is still around 4MB/s.
>>
>>
>


-- 
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/