Re: [PERFORM] Fast insert, but slow join and updates for table with 4 billion rows

2016-10-24 Thread Scott Marlowe
On Mon, Oct 24, 2016 at 2:07 PM, Lars Aksel Opsahl  wrote:
> Hi
>
> Yes this makes both the update and both selects much faster. We are now down 
> to 3000 ms. for select, but then I get a problem with another SQL where I 
> only use epoch in the query.
>
> SELECT count(o.*) FROM  met_vaer_wisline.nora_bc25_observation o WHERE 
> o.epoch = 128844;
>  count
> ---
>  97831
> (1 row)
> Time: 92763.389 ms
>
> To get the SQL above work fast it seems like we also need a single index on 
> the epoch column, this means two indexes on the same column and that eats 
> memory when we have more than 4 billion rows.
>
> Is it any way to avoid to two indexes on the epoch column ?

You could try reversing the order. Basically whatever comes first in a
two column index is easier / possible for postgres to use like a
single column index. If not. then you're probably stuck with two
indexes.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Fast insert, but slow join and updates for table with 4 billion rows

2016-10-24 Thread Lars Aksel Opsahl
Hi

Yes this makes both the update and both selects much faster. We are now down to 
3000 ms. for select, but then I get a problem with another SQL where I only use 
epoch in the query. 

SELECT count(o.*) FROM  met_vaer_wisline.nora_bc25_observation o WHERE o.epoch 
= 128844;
 count 
---
 97831
(1 row)
Time: 92763.389 ms

To get the SQL above work fast it seems like we also need a single index on the 
epoch column, this means two indexes on the same column and that eats memory 
when we have more than 4 billion rows.

Is it any way to avoid to two indexes on the epoch column ?

Thanks.

Lars

EXPLAIN  analyze SELECT count(o.*) FROM  met_vaer_wisline.nora_bc25_observation 
o WHERE o.epoch = 128844;
-[ RECORD 1 
]-
QUERY PLAN | Aggregate  (cost=44016888.13..44016888.14 rows=1 width=42) (actual 
time=91307.470..91307.471 rows=1 loops=1)
-[ RECORD 2 
]-
QUERY PLAN |   ->  Index Scan using 
idx_met_vaer_wisline_nora_bc25_observation_test on nora_bc25_observation o  
(cost=0.58..44016649.38 rows=95500 width=42) (actual time=1.942..91287.495 
rows=97831 loops=1)
-[ RECORD 3 
]-
QUERY PLAN | Index Cond: (epoch = 128844)
-[ RECORD 4 
]-
QUERY PLAN | Total runtime: 91307.534 ms


EXPLAIN  analyze
SELECT count(o.*)
FROM 
met_vaer_wisline.nora_bc25_observation o,
met_vaer_wisline.new_data n
WHERE o.point_uid_ref = n.id_point AND o.epoch = n.epoch;
-[ RECORD 1 
]
QUERY PLAN | Aggregate  (cost=131857.71..131857.72 rows=1 width=42) (actual 
time=182.459..182.459 rows=1 loops=1)
-[ RECORD 2 
]
QUERY PLAN |   ->  Nested Loop  (cost=0.58..131727.00 rows=52283 width=42) 
(actual time=0.114..177.420 rows=5 loops=1)
-[ RECORD 3 
]
QUERY PLAN | ->  Seq Scan on new_data n  (cost=0.00..1136.00 rows=5 
width=8) (actual time=0.050..7.873 rows=5 loops=1)
-[ RECORD 4 
]
QUERY PLAN | ->  Index Scan using 
idx_met_vaer_wisline_nora_bc25_observation_test on nora_bc25_observation o  
(cost=0.58..2.60 rows=1 width=50) (actual time=0.003..0.003 rows=1 loops=5)
-[ RECORD 5 
]
QUERY PLAN |   Index Cond: ((point_uid_ref = n.id_point) AND (epoch 
= n.epoch))
-[ RECORD 6 
]
QUERY PLAN | Total runtime: 182.536 ms

Time: 3095.618 ms


Lars
 


Fra: pgsql-performance-ow...@postgresql.org 
 på vegne av Tom Lane 

Sendt: 24. oktober 2016 14:52
Til: Lars Aksel Opsahl
Kopi: pgsql-performance@postgresql.org
Emne: Re: [PERFORM] Fast insert, but slow join and updates for table with 4 
billion rows

Lars Aksel Opsahl  writes:
> In this example I have two tables one with 4 billion rows and another with 
> 5 rows and then I try to do a standard simple join between this two 
> tables and this takes 397391  ms. with this SQL (the query plan is added is 
> further down)

This particular query would work a lot better if you had an index on
nora_bc25_observation (point_uid_ref, epoch), ie both join columns
in one index.  I get the impression that that ought to be the primary
key of the table, which would be an even stronger reason to 

Re: [PERFORM] Fast insert, but slow join and updates for table with 4 billion rows

2016-10-24 Thread Tom Lane
Lars Aksel Opsahl  writes:
> In this example I have two tables one with 4 billion rows and another with 
> 5 rows and then I try to do a standard simple join between this two 
> tables and this takes 397391  ms. with this SQL (the query plan is added is 
> further down)

This particular query would work a lot better if you had an index on
nora_bc25_observation (point_uid_ref, epoch), ie both join columns
in one index.  I get the impression that that ought to be the primary
key of the table, which would be an even stronger reason to have a
unique index on it.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Fast insert, but slow join and updates for table with 4 billion rows

2016-10-24 Thread Lars Aksel Opsahl

Hi


I have two main problems and that is slow updates and joins, but when I build 
up the table met_vaer_wisline.nora_bc25_observation with more than 4 billion we 
are able to insert about 85.000 rows pr sekund so thats ok.


The problems start when I need to update or joins with other tables using this 
table.


In this example I have two tables one with 4 billion rows and another with 
5 rows and then I try to do a standard simple join between this two tables 
and this takes 397391  ms. with this SQL (the query plan is added is further 
down)

SELECT o.*

FROM

met_vaer_wisline.nora_bc25_observation o,

met_vaer_wisline.new_data n

WHERE o.point_uid_ref = n.id_point AND o.epoch = n.epoch

but if I use this SQL  it takes 25727 ms (the query plan is added is further 
down).

SELECT

o.*

FROM

(

SELECT o.*

FROM

met_vaer_wisline.nora_bc25_observation o

WHERE

EXISTS (SELECT 1 FROM  (SELECT distinct epoch FROM  met_vaer_wisline.new_data) 
AS n WHERE n.epoch = o.epoch )

AND

EXISTS (SELECT 1 FROM  (SELECT distinct id_point FROM  
met_vaer_wisline.new_data) AS n WHERE n.id_point = o.point_uid_ref )

) AS o,

met_vaer_wisline.new_data n

WHERE o.point_uid_ref = n.id_point AND o.epoch = n.epoch


The columns are indexed and I did run vacuum analyze on both tables before I 
tested. work_mem is 200MB but I also tested with much more work_mem but that 
does not change the execution time.

The CPU goes to 100% when the query is running and there is no IOWait while the 
SQL is running.


Why is the second SQL 15 times faster ?

Is this normal or have I done something wrong here ?


I have tested clustering around a index but that did not help.


Is the only way to fix slow updates and joins to use partitioning ?

https://www.postgresql.org/docs/9.6/static/ddl-partitioning.html



Here are the SQL and more info


EXPLAIN  analyze

SELECT o.*

FROM

met_vaer_wisline.nora_bc25_observation o,

met_vaer_wisline.new_data n

WHERE o.point_uid_ref = n.id_point AND o.epoch = n.epoch



-[ RECORD 1 
]---

QUERY PLAN | Merge Join  (cost=0.87..34374722.51 rows=52579 width=16) (actual 
time=0.127..397379.844 rows=5 loops=1)

-[ RECORD 2 
]---

QUERY PLAN |   Merge Cond: (n.id_point = o.point_uid_ref)

-[ RECORD 3 
]---

QUERY PLAN |   Join Filter: (o.epoch = n.epoch)

-[ RECORD 4 
]---

QUERY PLAN |   Rows Removed by Join Filter: 217915

-[ RECORD 5 
]---

QUERY PLAN |   ->  Index Scan using idx_met_vaer_wisline_new_data_id_point on 
new_data n  (cost=0.29..23802.89 rows=5 width=8) (actual time=0.024..16.736 
rows=5 loops=1)

-[ RECORD 6 
]---

QUERY PLAN |   ->  Index Scan using 
idx_met_vaer_wisline_nora_bc25_observation_point_uid_ref on 
nora_bc25_observation o  (cost=0.58..2927642364.25 rows=4263866624 width=16) 
(actual time=0.016..210486.136 rows=217921 loops=1)

-[ RECORD 7 
]---

QUERY PLAN | Total runtime: 397383.663 ms


Time: 397391.388 ms



EXPLAIN  analyze

SELECT

o.*

FROM

(

SELECT o.*

FROM

met_vaer_wisline.nora_bc25_observation o

WHERE

EXISTS (SELECT 1 FROM  (SELECT distinct epoch FROM  met_vaer_wisline.new_data) 
AS n WHERE n.epoch = o.epoch )

AND

EXISTS (SELECT 1 FROM  (SELECT distinct id_point FROM  
met_vaer_wisline.new_data) AS n WHERE n.id_point = o.point_uid_ref )

) AS o,

met_vaer_wisline.new_data n

WHERE o.point_uid_ref = n.id_point AND o.epoch = n.epoch


-[ RECORD 1