From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Igor Neyman
Sent: Friday, July 14, 2017 3:13 PM
To: Charles Nadeau <charles.nad...@gmail.com>
Cc: Jeff Janes <jeff.ja...@gmail.com>; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very poor read performance, query independent

From: Charles Nadeau [mailto:charles.nad...@gmail.com]
Sent: Friday, July 14, 2017 11:35 AM
To: Igor Neyman <iney...@perceptron.com<mailto:iney...@perceptron.com>>
Cc: Jeff Janes <jeff.ja...@gmail.com<mailto:jeff.ja...@gmail.com>>; 
pgsql-performance@postgresql.org<mailto:pgsql-performance@postgresql.org>
Subject: Re: [PERFORM] Very poor read performance, query independent

Igor,

Initially temp_buffer was left to its default value (8MB). Watching the content 
of the directory that stores the temporary files, I found that I need at most 
21GB of temporary files space. Should I set temp_buffer to 21GB?
Here is the explain you requested with work_mem set to 6GB:
flows=# set work_mem='6GB';
SET
flows=# explain (analyze, buffers) SELECT DISTINCT
   srcaddr,
   dstaddr,
   dstport,
   COUNT(*) AS conversation,
   SUM(doctets) / 1024 / 1024 AS mbytes
FROM
   flowscompact,
   mynetworks
WHERE
   mynetworks.ipaddr >>= flowscompact.srcaddr
   AND dstaddr IN
   (
      SELECT
         dstaddr
      FROM
         dstexterne
   )
GROUP BY
   srcaddr,
   dstaddr,
   dstport
ORDER BY
   mbytes DESC LIMIT 50;
                                                                                
           QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=48135680.07..48135680.22 rows=50 width=52) (actual 
time=2227678.196..2227678.223 rows=50 loops=1)
   Buffers: shared hit=728798038 read=82974833, temp read=381154 written=381154
   ->  Unique  (cost=48135680.07..48143613.62 rows=2644514 width=52) (actual 
time=2227678.194..2227678.217 rows=50 loops=1)
         Buffers: shared hit=728798038 read=82974833, temp read=381154 
written=381154
         ->  Sort  (cost=48135680.07..48137002.33 rows=2644514 width=52) 
(actual time=2227678.192..2227678.202 rows=50 loops=1)
               Sort Key: (((sum(flows.doctets) / '1024'::numeric) / 
'1024'::numeric)) DESC, flows.srcaddr, flows.dstaddr, flows.dstport, (count(*))
               Sort Method: quicksort  Memory: 654395kB
               Buffers: shared hit=728798038 read=82974833, temp read=381154 
written=381154
               ->  GroupAggregate  (cost=48059426.65..48079260.50 rows=2644514 
width=52) (actual time=2167909.030..2211446.192 rows=5859671 loops=1)
                     Group Key: flows.srcaddr, flows.dstaddr, flows.dstport
                     Buffers: shared hit=728798038 read=82974833, temp 
read=381154 written=381154
                     ->  Sort  (cost=48059426.65..48060748.90 rows=2644514 
width=20) (actual time=2167896.815..2189107.205 rows=91745640 loops=1)
                           Sort Key: flows.srcaddr, flows.dstaddr, flows.dstport
                           Sort Method: external merge  Disk: 3049216kB
                           Buffers: shared hit=728798038 read=82974833, temp 
read=381154 written=381154
                           ->  Gather  (cost=30060688.07..48003007.07 
rows=2644514 width=20) (actual time=1268989.000..1991357.232 rows=91745640 
loops=1)
                                 Workers Planned: 12
                                 Workers Launched: 12
                                 Buffers: shared hit=728798037 read=82974833
                                 ->  Hash Semi Join  
(cost=30059688.07..47951761.31 rows=220376 width=20) (actual 
time=1268845.181..2007864.725 rows=7057357 loops=13)
                                       Hash Cond: (flows.dstaddr = 
flows_1.dstaddr)
                                       Buffers: shared hit=728795193 
read=82974833
                                       ->  Nested Loop  (cost=0.03..17891246.86 
rows=220376 width=20) (actual time=0.207..723790.283 rows=37910370 loops=13)
                                             Buffers: shared hit=590692229 
read=14991777
                                             ->  Parallel Seq Scan on flows  
(cost=0.00..16018049.14 rows=55094048 width=20) (actual time=0.152..566179.117 
rows=45371630 loops=13)
                                                   Buffers: shared hit=860990 
read=14991777
                                             ->  Index Only Scan using 
mynetworks_ipaddr_idx on mynetworks  (cost=0.03..0.03 rows=1 width=8) (actual 
time=0.002..0.002 rows=1 loops=589831190)
                                                   Index Cond: (ipaddr >>= 
(flows.srcaddr)::ip4r)
                                                   Heap Fetches: 0
                                                   Buffers: shared hit=589831203
                                       ->  Hash  (cost=30059641.47..30059641.47 
rows=13305 width=4) (actual time=1268811.101..1268811.101 rows=3803508 loops=13)
                                             Buckets: 4194304 (originally 
16384)  Batches: 1 (originally 1)  Memory Usage: 166486kB
                                             Buffers: shared hit=138102964 
read=67983056
                                             ->  HashAggregate  
(cost=30059561.64..30059601.56 rows=13305 width=4) (actual 
time=1265248.165..1267432.083 rows=3803508 loops=13)
                                                   Group Key: flows_1.dstaddr
                                                   Buffers: shared 
hit=138102964 read=67983056
                                                   ->  Nested Loop Anti Join  
(cost=0.00..29729327.92 rows=660467447 width=4) (actual time=0.389..1201072.707 
rows=125838232 loops=13)
                                                         Join Filter: 
(mynetworks_1.ipaddr >> (flows_1.dstaddr)::ip4r)
                                                         Rows Removed by Join 
Filter: 503353617
                                                         Buffers: shared 
hit=138102964 read=67983056
                                                         ->  Seq Scan on flows 
flows_1  (cost=0.00..17836152.73 rows=661128576 width=4) (actual 
time=0.322..343152.274 rows=589831190 loops=13)
                                                               Buffers: shared 
hit=138102915 read=67983056
                                                         ->  Materialize  
(cost=0.00..1.02 rows=4 width=8) (actual time=0.000..0.000 rows=2 
loops=7667805470)
                                                               Buffers: shared 
hit=13
                                                               ->  Seq Scan on 
mynetworks mynetworks_1  (cost=0.00..1.01 rows=4 width=8) (actual 
time=0.006..0.007 rows=4 loops=13)
                                                                     Buffers: 
shared hit=13
 Planning time: 0.941 ms
 Execution time: 2228345.171 ms
(48 rows)

With a work_mem at 6GB, I noticed that for the first 20 minutes the query was 
running, the i/o wait was much lower, hovering aroun 3% then it jumped 45% 
until almost the end of the query.

flowscompact and dstexterne are actually views. I use views to simplify query 
writing and to "abstract" queries that are use often in other queries. 
flowscompact is a view built on table flows (having about 590 million rows), it 
only keeps the most often used fields.
flows=# \d+ flowscompact;
                        View "public.flowscompact"
  Column   |           Type           | Modifiers | Storage | Description
-----------+--------------------------+-----------+---------+-------------
 flow_id   | bigint                   |           | plain   |
 sysuptime | bigint                   |           | plain   |
 exaddr    | ip4                      |           | plain   |
 dpkts     | integer                  |           | plain   |
 doctets   | bigint                   |           | plain   |
 first     | bigint                   |           | plain   |
 last      | bigint                   |           | plain   |
 srcaddr   | ip4                      |           | plain   |
 dstaddr   | ip4                      |           | plain   |
 srcport   | integer                  |           | plain   |
 dstport   | integer                  |           | plain   |
 prot      | smallint                 |           | plain   |
 tos       | smallint                 |           | plain   |
 tcp_flags | smallint                 |           | plain   |
 timestamp | timestamp with time zone |           | plain   |
View definition:
 SELECT flowstimestamp.flow_id,
    flowstimestamp.sysuptime,
    flowstimestamp.exaddr,
    flowstimestamp.dpkts,
    flowstimestamp.doctets,
    flowstimestamp.first,
    flowstimestamp.last,
    flowstimestamp.srcaddr,
    flowstimestamp.dstaddr,
    flowstimestamp.srcport,
    flowstimestamp.dstport,
    flowstimestamp.prot,
    flowstimestamp.tos,
    flowstimestamp.tcp_flags,
    flowstimestamp."timestamp"
   FROM flowstimestamp;
mynetworks is a table having one column and 4 rows; it contains a list of our 
network networks:
flows=# select * from mynetworks;
     ipaddr
----------------
 192.168.0.0/24<http://192.168.0.0/24>
 10.112.12.0/30<http://10.112.12.0/30>
 10.112.12.4/30<http://10.112.12.4/30>
 10.112.12.8/30<http://10.112.12.8/30>
(4 row)
flows=# \d+ mynetworks;
                    Table "public.mynetworks"
 Column | Type | Modifiers | Storage | Stats target | Description
--------+------+-----------+---------+--------------+-------------
 ipaddr | ip4r |           | plain   |              |
Indexes:
    "mynetworks_ipaddr_idx" gist (ipaddr)
dstexterne is a view listing all the destination IPv4 addresses not inside our 
network; it has one column and 3.8 million rows.
flows=# \d+ dstexterne;
              View "public.dstexterne"
 Column  | Type | Modifiers | Storage | Description
---------+------+-----------+---------+-------------
 dstaddr | ip4  |           | plain   |
View definition:
 SELECT DISTINCT flowscompact.dstaddr
   FROM flowscompact
     LEFT JOIN mynetworks ON mynetworks.ipaddr >> flowscompact.dstaddr::ip4r
  WHERE mynetworks.ipaddr IS NULL;
Thanks!

Charles

Charles,

Also, let’s try to simplify your query and see if it performs better.
You are grouping by srcaddr, dstaddr, dstport, that makes DISTINCT not needed.
And after simplifying WHERE clause (let me know if the result is not what you 
want), the query looks like:

SELECT srcaddr, dstaddr, dstport,
   COUNT(*) AS conversation,
   SUM(doctets) / 1024 / 1024 AS mbytes
FROM flowscompact
WHERE srcaddr IN (SELECT ipaddr FROM mynetworks)
     AND dstaddr NOT IN (SELECT ipaddr FROM mynetworks)
GROUP BY srcaddr, dstaddr, dstport
ORDER BY mbytes DESC
LIMIT 50;

Now, you didn’t provide the definition of flowstimestamp table.
If this table doesn’t have an index on (srcaddr, dstaddr, dstport) creating one 
should help (I think).

Igor



Reply via email to