Igor,

Initially temp_buffer was left to its default value (8MB). Watching the
content of the directory that stores the temporary files, I found that I
need at most 21GB of temporary files space. Should I set temp_buffer to
21GB?
Here is the explain you requested with work_mem set to 6GB:

flows=# set work_mem='6GB';
SET
flows=# explain (analyze, buffers) SELECT DISTINCT
   srcaddr,
   dstaddr,
   dstport,
   COUNT(*) AS conversation,
   SUM(doctets) / 1024 / 1024 AS mbytes
FROM
   flowscompact,
   mynetworks
WHERE
   mynetworks.ipaddr >>= flowscompact.srcaddr
   AND dstaddr IN
   (
      SELECT
         dstaddr
      FROM
         dstexterne
   )
GROUP BY
   srcaddr,
   dstaddr,
   dstport
ORDER BY
   mbytes DESC LIMIT 50;

               QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=48135680.07..48135680.22 rows=50 width=52) (actual
time=2227678.196..2227678.223 rows=50 loops=1)
   Buffers: shared hit=728798038 read=82974833, temp read=381154
written=381154
   ->  Unique  (cost=48135680.07..48143613.62 rows=2644514 width=52)
(actual time=2227678.194..2227678.217 rows=50 loops=1)
         Buffers: shared hit=728798038 read=82974833, temp read=381154
written=381154
         ->  Sort  (cost=48135680.07..48137002.33 rows=2644514 width=52)
(actual time=2227678.192..2227678.202 rows=50 loops=1)
               Sort Key: (((sum(flows.doctets) / '1024'::numeric) /
'1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport,
(count(*))
               Sort Method: quicksort  Memory: 654395kB
               Buffers: shared hit=728798038 read=82974833, temp
read=381154 written=381154
               ->  GroupAggregate  (cost=48059426.65..48079260.50
rows=2644514 width=52) (actual time=2167909.030..2211446.192 rows=5859671
loops=1)
                     Group Key: flows.srcaddr, flows.dstaddr, flows.dstport
                     Buffers: shared hit=728798038 read=82974833, temp
read=381154 written=381154
                     ->  Sort  (cost=48059426.65..48060748.90 rows=2644514
width=20) (actual time=2167896.815..2189107.205 rows=91745640 loops=1)
                           Sort Key: flows.srcaddr, flows.dstaddr,
flows.dstport
                           Sort Method: external merge  Disk: 3049216kB
                           Buffers: shared hit=728798038 read=82974833,
temp read=381154 written=381154
                           ->  Gather  (cost=30060688.07..48003007.07
rows=2644514 width=20) (actual time=1268989.000..1991357.232 rows=91745640
loops=1)
                                 Workers Planned: 12
                                 Workers Launched: 12
                                 Buffers: shared hit=728798037 read=82974833
                                 ->  Hash Semi Join
 (cost=30059688.07..47951761.31 rows=220376 width=20) (actual
time=1268845.181..2007864.725 rows=7057357 loops=13)
                                       Hash Cond: (flows.dstaddr =
flows_1.dstaddr)
                                       Buffers: shared hit=728795193
read=82974833
                                       ->  Nested Loop
 (cost=0.03..17891246.86 rows=220376 width=20) (actual
time=0.207..723790.283 rows=37910370 loops=13)
                                             Buffers: shared hit=590692229
read=14991777
                                             ->  Parallel Seq Scan on flows
 (cost=0.00..16018049.14 rows=55094048 width=20) (actual
time=0.152..566179.117 rows=45371630 loops=13)
                                                   Buffers: shared
hit=860990 read=14991777
                                             ->  Index Only Scan using
mynetworks_ipaddr_idx on mynetworks  (cost=0.03..0.03 rows=1 width=8)
(actual time=0.002..0.002 rows=1 loops=589831190)
                                                   Index Cond: (ipaddr >>=
(flows.srcaddr)::ip4r)
                                                   Heap Fetches: 0
                                                   Buffers: shared
hit=589831203
                                       ->  Hash
 (cost=30059641.47..30059641.47 rows=13305 width=4) (actual
time=1268811.101..1268811.101 rows=3803508 loops=13)
                                             Buckets: 4194304 (originally
16384)  Batches: 1 (originally 1)  Memory Usage: 166486kB
                                             Buffers: shared hit=138102964
read=67983056
                                             ->  HashAggregate
 (cost=30059561.64..30059601.56 rows=13305 width=4) (actual
time=1265248.165..1267432.083 rows=3803508 loops=13)
                                                   Group Key:
flows_1.dstaddr
                                                   Buffers: shared
hit=138102964 read=67983056
                                                   ->  Nested Loop Anti
Join  (cost=0.00..29729327.92 rows=660467447 width=4) (actual
time=0.389..1201072.707 rows=125838232 loops=13)
                                                         Join Filter:
(mynetworks_1.ipaddr >> (flows_1.dstaddr)::ip4r)
                                                         Rows Removed by
Join Filter: 503353617
                                                         Buffers: shared
hit=138102964 read=67983056
                                                         ->  Seq Scan on
flows flows_1  (cost=0.00..17836152.73 rows=661128576 width=4) (actual
time=0.322..343152.274 rows=589831190 loops=13)
                                                               Buffers:
shared hit=138102915 read=67983056
                                                         ->  Materialize
 (cost=0.00..1.02 rows=4 width=8) (actual time=0.000..0.000 rows=2
loops=7667805470)
                                                               Buffers:
shared hit=13
                                                               ->  Seq Scan
on mynetworks mynetworks_1  (cost=0.00..1.01 rows=4 width=8) (actual
time=0.006..0.007 rows=4 loops=13)

 Buffers: shared hit=13
 Planning time: 0.941 ms
 Execution time: 2228345.171 ms
(48 rows)


With a work_mem at 6GB, I noticed that for the first 20 minutes the query
was running, the i/o wait was much lower, hovering aroun 3% then it jumped
45% until almost the end of the query.

flowscompact and dstexterne are actually views. I use views to simplify
query writing and to "abstract" queries that are use often in other
queries. flowscompact is a view built on table flows (having about 590
million rows), it only keeps the most often used fields.

flows=# \d+ flowscompact;
                        View "public.flowscompact"
  Column   |           Type           | Modifiers | Storage | Description
-----------+--------------------------+-----------+---------+-------------
 flow_id   | bigint                   |           | plain   |
 sysuptime | bigint                   |           | plain   |
 exaddr    | ip4                      |           | plain   |
 dpkts     | integer                  |           | plain   |
 doctets   | bigint                   |           | plain   |
 first     | bigint                   |           | plain   |
 last      | bigint                   |           | plain   |
 srcaddr   | ip4                      |           | plain   |
 dstaddr   | ip4                      |           | plain   |
 srcport   | integer                  |           | plain   |
 dstport   | integer                  |           | plain   |
 prot      | smallint                 |           | plain   |
 tos       | smallint                 |           | plain   |
 tcp_flags | smallint                 |           | plain   |
 timestamp | timestamp with time zone |           | plain   |
View definition:
 SELECT flowstimestamp.flow_id,
    flowstimestamp.sysuptime,
    flowstimestamp.exaddr,
    flowstimestamp.dpkts,
    flowstimestamp.doctets,
    flowstimestamp.first,
    flowstimestamp.last,
    flowstimestamp.srcaddr,
    flowstimestamp.dstaddr,
    flowstimestamp.srcport,
    flowstimestamp.dstport,
    flowstimestamp.prot,
    flowstimestamp.tos,
    flowstimestamp.tcp_flags,
    flowstimestamp."timestamp"
   FROM flowstimestamp;

mynetworks is a table having one column and 4 rows; it contains a list of
our network networks:

flows=# select * from mynetworks;
     ipaddr
----------------
 192.168.0.0/24
 10.112.12.0/30
 10.112.12.4/30
 10.112.12.8/30
(4 row)
flows=# \d+ mynetworks;
                    Table "public.mynetworks"
 Column | Type | Modifiers | Storage | Stats target | Description
--------+------+-----------+---------+--------------+-------------
 ipaddr | ip4r |           | plain   |              |
Indexes:
    "mynetworks_ipaddr_idx" gist (ipaddr)

dstexterne is a view listing all the destination IPv4 addresses not inside
our network; it has one column and 3.8 million rows.

flows=# \d+ dstexterne;
              View "public.dstexterne"
 Column  | Type | Modifiers | Storage | Description
---------+------+-----------+---------+-------------
 dstaddr | ip4  |           | plain   |
View definition:
 SELECT DISTINCT flowscompact.dstaddr
   FROM flowscompact
     LEFT JOIN mynetworks ON mynetworks.ipaddr >> flowscompact.dstaddr::ip4r
  WHERE mynetworks.ipaddr IS NULL;

Thanks!

Charles

On Wed, Jul 12, 2017 at 6:39 PM, Igor Neyman <iney...@perceptron.com> wrote:

>
>
>
>
> *From:* pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
> ow...@postgresql.org <pgsql-performance-ow...@postgresql.org>] *On Behalf
> Of *Charles Nadeau
> *Sent:* Wednesday, July 12, 2017 6:05 AM
> *To:* Jeff Janes <jeff.ja...@gmail.com>
> *Cc:* pgsql-performance@postgresql.org
> *Subject:* Re: [PERFORM] Very poor read performance, query independent
>
>
>
>
>
> flows=# explain (analyze, buffers) SELECT DISTINCT
>
> flows-#    srcaddr,
>
> flows-#    dstaddr,
>
> flows-#    dstport,
>
> flows-#    COUNT(*) AS conversation,
>
> flows-#    SUM(doctets) / 1024 / 1024 AS mbytes
>
> flows-# FROM
>
> flows-#    flowscompact,
>
> flows-#    mynetworks
>
> flows-# WHERE
>
> flows-#    mynetworks.ipaddr >>= flowscompact.srcaddr
>
> flows-#    AND dstaddr IN
>
> flows-#    (
>
> flows(#       SELECT
>
> flows(#          dstaddr
>
> flows(#       FROM
>
> flows(#          dstexterne
>
> flows(#    )
>
> flows-# GROUP BY
>
> flows-#    srcaddr,
>
> flows-#    dstaddr,
>
> flows-#    dstport
>
> flows-# ORDER BY
>
> flows-#    mbytes DESC LIMIT 50;
>
> LOG:  temporary file: path "pg_tblspc/36238/PG_9.6_
> 201608131/pgsql_tmp/pgsql_tmp14573.6", size 1073741824
>
> LOG:  temporary file: path "pg_tblspc/36238/PG_9.6_
> 201608131/pgsql_tmp/pgsql_tmp14573.7", size 1073741824
>
> LOG:  temporary file: path "pg_tblspc/36238/PG_9.6_
> 201608131/pgsql_tmp/pgsql_tmp14573.8", size 639696896
>
> LOG:  duration: 2765020.327 ms  statement: explain (analyze, buffers)
> SELECT DISTINCT
>
>    srcaddr,
>
>    dstaddr,
>
>    dstport,
>
>    COUNT(*) AS conversation,
>
>    SUM(doctets) / 1024 / 1024 AS mbytes
>
> FROM
>
>    flowscompact,
>
>    mynetworks
>
> WHERE
>
>    mynetworks.ipaddr >>= flowscompact.srcaddr
>
>    AND dstaddr IN
>
>    (
>
>       SELECT
>
>          dstaddr
>
>       FROM
>
>          dstexterne
>
>    )
>
> GROUP BY
>
>    srcaddr,
>
>    dstaddr,
>
>    dstport
>
> ORDER BY
>
>    mbytes DESC LIMIT 50;
>
>
>                                     QUERY PLAN
>
>
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> ------------------------------------------------------------
> --------------------------------------------------
>
>  Limit  (cost=37762321.83..37762321.98 rows=50 width=52) (actual
> time=2764548.863..2764548.891 rows=50 loops=1)
>
>    Buffers: shared hit=1116590560 read=15851133, temp read=340244
> written=340244
>
>    I/O Timings: read=5323746.860
>
>    ->  Unique  (cost=37762321.83..37769053.57 rows=2243913 width=52)
> (actual time=2764548.861..2764548.882 rows=50 loops=1)
>
>          Buffers: shared hit=1116590560 read=15851133, temp read=340244
> written=340244
>
>          I/O Timings: read=5323746.860
>
>          ->  Sort  (cost=37762321.83..37763443.79 rows=2243913 width=52)
> (actual time=2764548.859..2764548.872 rows=50 loops=1)
>
>                Sort Key: (((sum(flows.doctets) / '1024'::numeric) /
> '1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport,
> (count(*))
>
>                Sort Method: quicksort  Memory: 563150kB
>
>                Buffers: shared hit=1116590560 read=15851133, temp
> read=340244 written=340244
>
>                I/O Timings: read=5323746.860
>
>                ->  GroupAggregate  (cost=37698151.34..37714980.68
> rows=2243913 width=52) (actual time=2696721.610..2752109.551 rows=4691734
> loops=1)
>
>                      Group Key: flows.srcaddr, flows.dstaddr, flows.dstport
>
>                      Buffers: shared hit=1116590560 read=15851133, temp
> read=340244 written=340244
>
>                      I/O Timings: read=5323746.860
>
>                      ->  Sort  (cost=37698151.34..37699273.29
> rows=2243913 width=20) (actual time=2696711.428..2732781.705 rows=81896988
> loops=1)
>
>                            Sort Key: flows.srcaddr, flows.dstaddr,
> flows.dstport
>
>                            Sort Method: external merge  Disk: 2721856kB
>
>                            Buffers: shared hit=1116590560 read=15851133,
> temp read=340244 written=340244
>
>                            I/O Timings: read=5323746.860
>
>                            ->  Gather  (cost=19463936.00..37650810.19
> rows=2243913 width=20) (actual time=1777219.713..2590530.887 rows=81896988
> loops=1)
>
>                                  Workers Planned: 9
>
>                                  Workers Launched: 9
>
>                                  Buffers: shared hit=1116590559
> read=15851133
>
>                                  I/O Timings: read=5323746.860
>
>                                  ->  Hash Semi Join
>  (cost=19462936.00..37622883.23 rows=249324 width=20) (actual
> time=1847579.360..2602039.780 rows=8189699 loops=10)
>
>                                        Hash Cond: (flows.dstaddr =
> flows_1.dstaddr)
>
>                                        Buffers: shared hit=1116588309
> read=15851133
>
>                                        I/O Timings: read=5323746.860
>
>                                        ->  Nested Loop
>  (cost=0.03..18159012.30 rows=249324 width=20) (actual
> time=1.562..736556.583 rows=45499045 loops=10)
>
>                                              Buffers: shared hit=996551813
> read=15851133
>
>                                              I/O Timings: read=5323746.860
>
>                                              ->  Parallel Seq Scan on
> flows  (cost=0.00..16039759.79 rows=62330930 width=20) (actual
> time=1.506..547485.066 rows=54155970 loops=10)
>
>                                                    Buffers: shared
> hit=1634 read=15851133
>
>                                                    I/O Timings:
> read=5323746.860
>
>                                              ->  Index Only Scan using
> mynetworks_ipaddr_idx on mynetworks  (cost=0.03..0.03 rows=1 width=8)
> (actual time=0.002..0.002 rows=1 loops=541559704)
>
>                                                    Index Cond: (ipaddr >>=
> (flows.srcaddr)::ip4r)
>
>                                                    Heap Fetches: 59971474
>
>                                                    Buffers: shared
> hit=996550152
>
>                                        ->  Hash
>  (cost=19462896.74..19462896.74 rows=11210 width=4) (actual
> time=1847228.894..1847228.894 rows=3099798 loops=10)
>
>                                              Buckets: 4194304 (originally
> 16384)  Batches: 1 (originally 1)  Memory Usage: 141746kB
>
>                                              Buffers: shared hit=120036496
>
>                                              ->  HashAggregate
>  (cost=19462829.48..19462863.11 rows=11210 width=4) (actual
> time=1230049.015..1845955.764 rows=3099798 loops=10)
>
>                                                    Group Key:
> flows_1.dstaddr
>
>                                                    Buffers: shared
> hit=120036496
>
>                                                    ->  Nested Loop Anti
> Join  (cost=0.12..19182620.78 rows=560417390 width=4) (actual
> time=0.084..831832.333 rows=113420172 loops=10)
>
>                                                          Join Filter:
> (mynetworks_1.ipaddr >> (flows_1.dstaddr)::ip4r)
>
>                                                          Rows Removed by
> Join Filter: 453681377
>
>                                                          Buffers: shared
> hit=120036496
>
>                                                          ->  Index Only
> Scan using flows_srcaddr_dstaddr_idx on flows flows_1
>  (cost=0.12..9091067.70 rows=560978368 width=4) (actual
> time=0.027..113052.437 rows=541559704 loops=10)
>
>                                                                Heap
> Fetches: 91
>
>                                                                Buffers:
> shared hit=120036459
>
>                                                          ->  Materialize
>  (cost=0.00..1.02 rows=4 width=8) (actual time=0.000..0.000 rows=2
> loops=5415597040)
>
>                                                                Buffers:
> shared hit=10
>
>                                                                ->  Seq
> Scan on mynetworks mynetworks_1  (cost=0.00..1.01 rows=4 width=8) (actual
> time=0.007..0.008 rows=4 loops=10)
>
>
>  Buffers: shared hit=10
>
>  Planning time: 6.689 ms
>
>  Execution time: 2764860.853 ms
>
> (58 rows)
>
>
>
> Regarding "Also using dstat I can see that iowait time is at about 25%", I
> don't think the server was doing anything else. If it is important, I can
> repeat the benchmarks.
>
> Thanks!
>
>
>
> Charles
>
>
>
> Charles,
>
>
>
> In your original posting I couldn’t find what value you set for
> temp_buffers.
>
> Considering you have plenty of RAM, try setting temp_buffers=’6GB’ and
> then run ‘explain (analyze, buffers) select…’ in the same session. This
> should alleviate “disk sort’ problem.
>
>
>
> Also, could you post the structure of flowscompact, mynetworks, and
> dstextern tables with all the indexes and number of rows. Actually, are
> they all – tables, or some of them – views?
>
>
>
> Igor
>
>
>
>
>
> Sorry, I misstated the parameter to change.
>
> It is work_mem (not temp_buffers) you should try to increase to 6GB.
>
>
>
> Igor
>
>
>
>
>



-- 
Charles Nadeau Ph.D.
http://charlesnadeau.blogspot.com/

Reply via email to