Re: [PERFORM] Very poor read performance, query independent

Charles Nadeau Wed, 12 Jul 2017 03:05:58 -0700

Jeff,

Here are the 2 EXPLAINs for one of my simplest query:


flows=# SET track_io_timing = on;
LOG:  duration: 24.101 ms  statement: SET track_io_timing = on;
SET
flows=# explain (analyze, timing off) SELECT DISTINCT
flows-#    srcaddr,
flows-#    dstaddr,
flows-#    dstport,
flows-#    COUNT(*) AS conversation,
flows-#    SUM(doctets) / 1024 / 1024 AS mbytes
flows-# FROM
flows-#    flowscompact,
flows-#    mynetworks
flows-# WHERE
flows-#    mynetworks.ipaddr >>= flowscompact.srcaddr
flows-#    AND dstaddr IN
flows-#    (
flows(#       SELECT
flows(#          dstaddr
flows(#       FROM
flows(#          dstexterne
flows(#    )
flows-# GROUP BY
flows-#    srcaddr,
flows-#    dstaddr,
flows-#    dstport
flows-# ORDER BY
flows-#    mbytes DESC LIMIT 50;
LOG:  temporary file: path
"pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.3", size
1073741824
LOG:  temporary file: path
"pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.4", size
1073741824
LOG:  temporary file: path
"pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.5", size
639696896
LOG:  duration: 2632108.352 ms  statement: explain (analyze, timing off)
SELECT DISTINCT
   srcaddr,
   dstaddr,
   dstport,
   COUNT(*) AS conversation,
   SUM(doctets) / 1024 / 1024 AS mbytes
FROM
   flowscompact,
   mynetworks
WHERE
   mynetworks.ipaddr >>= flowscompact.srcaddr
   AND dstaddr IN
   (
      SELECT
         dstaddr
      FROM
         dstexterne
   )
GROUP BY
   srcaddr,
   dstaddr,
   dstport
ORDER BY
   mbytes DESC LIMIT 50;

                      QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=37762321.83..37762321.98 rows=50 width=52) (actual rows=50
loops=1)
   ->  Unique  (cost=37762321.83..37769053.57 rows=2243913 width=52)
(actual rows=50 loops=1)
         ->  Sort  (cost=37762321.83..37763443.79 rows=2243913 width=52)
(actual rows=50 loops=1)
               Sort Key: (((sum(flows.doctets) / '1024'::numeric) /
'1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport,
(count(*))
               Sort Method: quicksort  Memory: 563150kB
               ->  GroupAggregate  (cost=37698151.34..37714980.68
rows=2243913 width=52) (actual rows=4691734 loops=1)
                     Group Key: flows.srcaddr, flows.dstaddr, flows.dstport
                     ->  Sort  (cost=37698151.34..37699273.29 rows=2243913
width=20) (actual rows=81896988 loops=1)
                           Sort Key: flows.srcaddr, flows.dstaddr,
flows.dstport
                           Sort Method: external merge  Disk: 2721856kB
                           ->  Gather  (cost=19463936.00..37650810.19
rows=2243913 width=20) (actual rows=81896988 loops=1)
                                 Workers Planned: 9
                                 Workers Launched: 9
                                 ->  Hash Semi Join
 (cost=19462936.00..37622883.23 rows=249324 width=20) (actual rows=8189699
loops=10)
                                       Hash Cond: (flows.dstaddr =
flows_1.dstaddr)
                                       ->  Nested Loop
 (cost=0.03..18159012.30 rows=249324 width=20) (actual rows=45499045
loops=10)
                                             ->  Parallel Seq Scan on flows
 (cost=0.00..16039759.79 rows=62330930 width=20) (actual rows=54155970
loops=10)
                                             ->  Index Only Scan using
mynetworks_ipaddr_idx on mynetworks  (cost=0.03..0.03 rows=1 width=8)
(actual rows=1 loops=541559704)
                                                   Index Cond: (ipaddr >>=
(flows.srcaddr)::ip4r)
                                                   Heap Fetches: 48679396
                                       ->  Hash
 (cost=19462896.74..19462896.74 rows=11210 width=4) (actual rows=3099798
loops=10)
                                             Buckets: 4194304 (originally
16384)  Batches: 1 (originally 1)  Memory Usage: 141746kB
                                             ->  HashAggregate
 (cost=19462829.48..19462863.11 rows=11210 width=4) (actual rows=3099798
loops=10)
                                                   Group Key:
flows_1.dstaddr
                                                   ->  Nested Loop Anti
Join  (cost=0.12..19182620.78 rows=560417390 width=4) (actual
rows=113420172 loops=10)
                                                         Join Filter:
(mynetworks_1.ipaddr >> (flows_1.dstaddr)::ip4r)
                                                         Rows Removed by
Join Filter: 453681377
                                                         ->  Index Only
Scan using flows_srcaddr_dstaddr_idx on flows flows_1
 (cost=0.12..9091067.70 rows=560978368 width=4) (actual rows=541559704
loops=10)
                                                               Heap
Fetches: 91
                                                         ->  Materialize
 (cost=0.00..1.02 rows=4 width=8) (actual rows=2 loops=5415597040)
                                                               ->  Seq Scan
on mynetworks mynetworks_1  (cost=0.00..1.01 rows=4 width=8) (actual rows=4
loops=10)
 Planning time: 62.066 ms
 Execution time: 2631923.716 ms
(33 rows)

flows=# explain (analyze, buffers) SELECT DISTINCT
flows-#    srcaddr,
flows-#    dstaddr,
flows-#    dstport,
flows-#    COUNT(*) AS conversation,
flows-#    SUM(doctets) / 1024 / 1024 AS mbytes
flows-# FROM
flows-#    flowscompact,
flows-#    mynetworks
flows-# WHERE
flows-#    mynetworks.ipaddr >>= flowscompact.srcaddr
flows-#    AND dstaddr IN
flows-#    (
flows(#       SELECT
flows(#          dstaddr
flows(#       FROM
flows(#          dstexterne
flows(#    )
flows-# GROUP BY
flows-#    srcaddr,
flows-#    dstaddr,
flows-#    dstport
flows-# ORDER BY
flows-#    mbytes DESC LIMIT 50;
LOG:  temporary file: path
"pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.6", size
1073741824
LOG:  temporary file: path
"pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.7", size
1073741824
LOG:  temporary file: path
"pg_tblspc/36238/PG_9.6_201608131/pgsql_tmp/pgsql_tmp14573.8", size
639696896
LOG:  duration: 2765020.327 ms  statement: explain (analyze, buffers)
SELECT DISTINCT
   srcaddr,
   dstaddr,
   dstport,
   COUNT(*) AS conversation,
   SUM(doctets) / 1024 / 1024 AS mbytes
FROM
   flowscompact,
   mynetworks
WHERE
   mynetworks.ipaddr >>= flowscompact.srcaddr
   AND dstaddr IN
   (
      SELECT
         dstaddr
      FROM
         dstexterne
   )
GROUP BY
   srcaddr,
   dstaddr,
   dstport
ORDER BY
   mbytes DESC LIMIT 50;

                                  QUERY PLAN


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=37762321.83..37762321.98 rows=50 width=52) (actual
time=2764548.863..2764548.891 rows=50 loops=1)
   Buffers: shared hit=1116590560 read=15851133, temp read=340244
written=340244
   I/O Timings: read=5323746.860
   ->  Unique  (cost=37762321.83..37769053.57 rows=2243913 width=52)
(actual time=2764548.861..2764548.882 rows=50 loops=1)
         Buffers: shared hit=1116590560 read=15851133, temp read=340244
written=340244
         I/O Timings: read=5323746.860
         ->  Sort  (cost=37762321.83..37763443.79 rows=2243913 width=52)
(actual time=2764548.859..2764548.872 rows=50 loops=1)
               Sort Key: (((sum(flows.doctets) / '1024'::numeric) /
'1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport,
(count(*))
               Sort Method: quicksort  Memory: 563150kB
               Buffers: shared hit=1116590560 read=15851133, temp
read=340244 written=340244
               I/O Timings: read=5323746.860
               ->  GroupAggregate  (cost=37698151.34..37714980.68
rows=2243913 width=52) (actual time=2696721.610..2752109.551 rows=4691734
loops=1)
                     Group Key: flows.srcaddr, flows.dstaddr, flows.dstport
                     Buffers: shared hit=1116590560 read=15851133, temp
read=340244 written=340244
                     I/O Timings: read=5323746.860
                     ->  Sort  (cost=37698151.34..37699273.29 rows=2243913
width=20) (actual time=2696711.428..2732781.705 rows=81896988 loops=1)
                           Sort Key: flows.srcaddr, flows.dstaddr,
flows.dstport
                           Sort Method: external merge  Disk: 2721856kB
                           Buffers: shared hit=1116590560 read=15851133,
temp read=340244 written=340244
                           I/O Timings: read=5323746.860
                           ->  Gather  (cost=19463936.00..37650810.19
rows=2243913 width=20) (actual time=1777219.713..2590530.887 rows=81896988
loops=1)
                                 Workers Planned: 9
                                 Workers Launched: 9
                                 Buffers: shared hit=1116590559
read=15851133
                                 I/O Timings: read=5323746.860
                                 ->  Hash Semi Join
 (cost=19462936.00..37622883.23 rows=249324 width=20) (actual
time=1847579.360..2602039.780 rows=8189699 loops=10)
                                       Hash Cond: (flows.dstaddr =
flows_1.dstaddr)
                                       Buffers: shared hit=1116588309
read=15851133
                                       I/O Timings: read=5323746.860
                                       ->  Nested Loop
 (cost=0.03..18159012.30 rows=249324 width=20) (actual
time=1.562..736556.583 rows=45499045 loops=10)
                                             Buffers: shared hit=996551813
read=15851133
                                             I/O Timings: read=5323746.860
                                             ->  Parallel Seq Scan on flows
 (cost=0.00..16039759.79 rows=62330930 width=20) (actual
time=1.506..547485.066 rows=54155970 loops=10)
                                                   Buffers: shared hit=1634
read=15851133
                                                   I/O Timings:
read=5323746.860
                                             ->  Index Only Scan using
mynetworks_ipaddr_idx on mynetworks  (cost=0.03..0.03 rows=1 width=8)
(actual time=0.002..0.002 rows=1 loops=541559704)
                                                   Index Cond: (ipaddr >>=
(flows.srcaddr)::ip4r)
                                                   Heap Fetches: 59971474
                                                   Buffers: shared
hit=996550152
                                       ->  Hash
 (cost=19462896.74..19462896.74 rows=11210 width=4) (actual
time=1847228.894..1847228.894 rows=3099798 loops=10)
                                             Buckets: 4194304 (originally
16384)  Batches: 1 (originally 1)  Memory Usage: 141746kB
                                             Buffers: shared hit=120036496
                                             ->  HashAggregate
 (cost=19462829.48..19462863.11 rows=11210 width=4) (actual
time=1230049.015..1845955.764 rows=3099798 loops=10)
                                                   Group Key:
flows_1.dstaddr
                                                   Buffers: shared
hit=120036496
                                                   ->  Nested Loop Anti
Join  (cost=0.12..19182620.78 rows=560417390 width=4) (actual
time=0.084..831832.333 rows=113420172 loops=10)
                                                         Join Filter:
(mynetworks_1.ipaddr >> (flows_1.dstaddr)::ip4r)
                                                         Rows Removed by
Join Filter: 453681377
                                                         Buffers: shared
hit=120036496
                                                         ->  Index Only
Scan using flows_srcaddr_dstaddr_idx on flows flows_1
 (cost=0.12..9091067.70 rows=560978368 width=4) (actual
time=0.027..113052.437 rows=541559704 loops=10)
                                                               Heap
Fetches: 91
                                                               Buffers:
shared hit=120036459
                                                         ->  Materialize
 (cost=0.00..1.02 rows=4 width=8) (actual time=0.000..0.000 rows=2
loops=5415597040)
                                                               Buffers:
shared hit=10
                                                               ->  Seq Scan
on mynetworks mynetworks_1  (cost=0.00..1.01 rows=4 width=8) (actual
time=0.007..0.008 rows=4 loops=10)

 Buffers: shared hit=10
 Planning time: 6.689 ms
 Execution time: 2764860.853 ms
(58 rows)


Regarding "Also using dstat I can see that iowait time is at about 25%", I
don't think the server was doing anything else. If it is important, I can
repeat the benchmarks.
Thanks!

Charles

On Wed, Jul 12, 2017 at 2:39 AM, Jeff Janes <jeff.ja...@gmail.com> wrote:

> On Tue, Jul 11, 2017 at 4:02 AM, Charles Nadeau <charles.nad...@gmail.com>
> wrote:
>
>> Jeff,
>>
>> I used fio in a quick benchmarking script inspired by
>> https://smcleod.net/benchmarking-io/:
>>
>> #!/bin/bash
>> #Random throughput
>> echo "Random throughput"
>> sync
>> fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1
>> --name=test --filename=test --bs=4M --iodepth=256 --size=10G
>> --readwrite=randread --ramp_time=4
>> #Random IOPS
>> echo "Random IOPS"
>> sync
>> fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1
>> --name=test --filename=test --bs=4k --iodepth=256 --size=4G
>> --readwrite=randread --ramp_time=4
>> #Sequential throughput
>> echo "Sequential throughput"
>> sync
>> fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1
>> --name=test --filename=test --bs=4M --iodepth=256 --size=10G
>> --readwrite=read --ramp_time=4
>> #Sequential IOPS
>> echo "Sequential IOPS"
>> sync
>> fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1
>> --name=test --filename=test --bs=4k --iodepth=256 --size=4G
>> --readwrite=read --ramp_time=4
>>
>>
> I don't think any of those are directly relevant to PostgreSQL, as it
> doesn't use direct IO, doesn't use libaio, and is rarely going to get
> anywhere near 256 iodepth.  So the best they can do is put a theoretical
> ceiling on the performance.  Also, random IO with a 4MB stride doesn't make
> any sense from a PostgreSQL perspective.
>
>
>
>>
>> Performing the test you suggested, I get 128.5MB/s. Monitoring the test,
>> I find that the throughput is constant from start to finish and that the
>> iowait is also constant at 5%:
>>
>
> I would have expected it to do better than that.  Maybe you increase the
> kernel readahead setting.  I've found the default to be much too small.
> But it doesn't make much difference to you, as you appear to be doing
> random IO in your queries, not sequential.
>
>
>> Could you suggest another way to benchmark random reads?
>>
>
> Your 1100 IOPS times 8kb block size gives about 8MB/s of throughput, which
> is close to what you report.  So I think I'd would instead focus on tuning
> your actual queries.  You say the problem is not query-dependent, but I
> think that that just means all the queries you looked at are similar.  If
> you looked at a query that can't use indexes, like count(unindexed_column)
> from biggest_table; you would find it doing much more IO than 4MB/s.
>
> Can you pick the simplest query you actually care about, and post both an
> "explain (analyze, timing off)" and an "explain (analyze, buffers)" for it?
>  (Preferably turning "track_io_timing" on first).
>
> One other question I had, you said you had "2x Intel Xeon E5550", which
> should be 8 CPU (or 16, if the hyperthreads
> are reported as separate CPUs).  But you also said: "Also using dstat I
> can see that iowait time is at about 25%".  Usually if there is only one
> thing going on on the server, then IOWAIT won't be more than reciprocal of
> #CPU.  Is the server busy doing other stuff at the same time you are
> benchmarking it?
>
> Cheers,
>
> Jeff
>



-- 
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/

Re: [PERFORM] Very poor read performance, query independent

Reply via email to