Re: [PERFORM] Fast insert, but slow join and updates for table with 4 billion rows
On Mon, Oct 24, 2016 at 2:07 PM, Lars Aksel Opsahlwrote: > Hi > > Yes this makes both the update and both selects much faster. We are now down > to 3000 ms. for select, but then I get a problem with another SQL where I > only use epoch in the query. > > SELECT count(o.*) FROM met_vaer_wisline.nora_bc25_observation o WHERE > o.epoch = 128844; > count > --- > 97831 > (1 row) > Time: 92763.389 ms > > To get the SQL above work fast it seems like we also need a single index on > the epoch column, this means two indexes on the same column and that eats > memory when we have more than 4 billion rows. > > Is it any way to avoid to two indexes on the epoch column ? You could try reversing the order. Basically whatever comes first in a two column index is easier / possible for postgres to use like a single column index. If not. then you're probably stuck with two indexes. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Fast insert, but slow join and updates for table with 4 billion rows
Hi Yes this makes both the update and both selects much faster. We are now down to 3000 ms. for select, but then I get a problem with another SQL where I only use epoch in the query. SELECT count(o.*) FROM met_vaer_wisline.nora_bc25_observation o WHERE o.epoch = 128844; count --- 97831 (1 row) Time: 92763.389 ms To get the SQL above work fast it seems like we also need a single index on the epoch column, this means two indexes on the same column and that eats memory when we have more than 4 billion rows. Is it any way to avoid to two indexes on the epoch column ? Thanks. Lars EXPLAIN analyze SELECT count(o.*) FROM met_vaer_wisline.nora_bc25_observation o WHERE o.epoch = 128844; -[ RECORD 1 ]- QUERY PLAN | Aggregate (cost=44016888.13..44016888.14 rows=1 width=42) (actual time=91307.470..91307.471 rows=1 loops=1) -[ RECORD 2 ]- QUERY PLAN | -> Index Scan using idx_met_vaer_wisline_nora_bc25_observation_test on nora_bc25_observation o (cost=0.58..44016649.38 rows=95500 width=42) (actual time=1.942..91287.495 rows=97831 loops=1) -[ RECORD 3 ]- QUERY PLAN | Index Cond: (epoch = 128844) -[ RECORD 4 ]- QUERY PLAN | Total runtime: 91307.534 ms EXPLAIN analyze SELECT count(o.*) FROM met_vaer_wisline.nora_bc25_observation o, met_vaer_wisline.new_data n WHERE o.point_uid_ref = n.id_point AND o.epoch = n.epoch; -[ RECORD 1 ] QUERY PLAN | Aggregate (cost=131857.71..131857.72 rows=1 width=42) (actual time=182.459..182.459 rows=1 loops=1) -[ RECORD 2 ] QUERY PLAN | -> Nested Loop (cost=0.58..131727.00 rows=52283 width=42) (actual time=0.114..177.420 rows=5 loops=1) -[ RECORD 3 ] QUERY PLAN | -> Seq Scan on new_data n (cost=0.00..1136.00 rows=5 width=8) (actual time=0.050..7.873 rows=5 loops=1) -[ RECORD 4 ] QUERY PLAN | -> Index Scan using idx_met_vaer_wisline_nora_bc25_observation_test on nora_bc25_observation o (cost=0.58..2.60 rows=1 width=50) (actual time=0.003..0.003 rows=1 loops=5) -[ RECORD 5 ] QUERY PLAN | Index Cond: ((point_uid_ref = n.id_point) AND (epoch = n.epoch)) -[ RECORD 6 ] QUERY PLAN | Total runtime: 182.536 ms Time: 3095.618 ms Lars Fra: pgsql-performance-ow...@postgresql.orgpå vegne av Tom Lane Sendt: 24. oktober 2016 14:52 Til: Lars Aksel Opsahl Kopi: pgsql-performance@postgresql.org Emne: Re: [PERFORM] Fast insert, but slow join and updates for table with 4 billion rows Lars Aksel Opsahl writes: > In this example I have two tables one with 4 billion rows and another with > 5 rows and then I try to do a standard simple join between this two > tables and this takes 397391 ms. with this SQL (the query plan is added is > further down) This particular query would work a lot better if you had an index on nora_bc25_observation (point_uid_ref, epoch), ie both join columns in one index. I get the impression that that ought to be the primary key of the table, which would be an even stronger reason to
Re: [PERFORM] Fast insert, but slow join and updates for table with 4 billion rows
Lars Aksel Opsahlwrites: > In this example I have two tables one with 4 billion rows and another with > 5 rows and then I try to do a standard simple join between this two > tables and this takes 397391 ms. with this SQL (the query plan is added is > further down) This particular query would work a lot better if you had an index on nora_bc25_observation (point_uid_ref, epoch), ie both join columns in one index. I get the impression that that ought to be the primary key of the table, which would be an even stronger reason to have a unique index on it. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Fast insert, but slow join and updates for table with 4 billion rows
Hi I have two main problems and that is slow updates and joins, but when I build up the table met_vaer_wisline.nora_bc25_observation with more than 4 billion we are able to insert about 85.000 rows pr sekund so thats ok. The problems start when I need to update or joins with other tables using this table. In this example I have two tables one with 4 billion rows and another with 5 rows and then I try to do a standard simple join between this two tables and this takes 397391 ms. with this SQL (the query plan is added is further down) SELECT o.* FROM met_vaer_wisline.nora_bc25_observation o, met_vaer_wisline.new_data n WHERE o.point_uid_ref = n.id_point AND o.epoch = n.epoch but if I use this SQL it takes 25727 ms (the query plan is added is further down). SELECT o.* FROM ( SELECT o.* FROM met_vaer_wisline.nora_bc25_observation o WHERE EXISTS (SELECT 1 FROM (SELECT distinct epoch FROM met_vaer_wisline.new_data) AS n WHERE n.epoch = o.epoch ) AND EXISTS (SELECT 1 FROM (SELECT distinct id_point FROM met_vaer_wisline.new_data) AS n WHERE n.id_point = o.point_uid_ref ) ) AS o, met_vaer_wisline.new_data n WHERE o.point_uid_ref = n.id_point AND o.epoch = n.epoch The columns are indexed and I did run vacuum analyze on both tables before I tested. work_mem is 200MB but I also tested with much more work_mem but that does not change the execution time. The CPU goes to 100% when the query is running and there is no IOWait while the SQL is running. Why is the second SQL 15 times faster ? Is this normal or have I done something wrong here ? I have tested clustering around a index but that did not help. Is the only way to fix slow updates and joins to use partitioning ? https://www.postgresql.org/docs/9.6/static/ddl-partitioning.html Here are the SQL and more info EXPLAIN analyze SELECT o.* FROM met_vaer_wisline.nora_bc25_observation o, met_vaer_wisline.new_data n WHERE o.point_uid_ref = n.id_point AND o.epoch = n.epoch -[ RECORD 1 ]--- QUERY PLAN | Merge Join (cost=0.87..34374722.51 rows=52579 width=16) (actual time=0.127..397379.844 rows=5 loops=1) -[ RECORD 2 ]--- QUERY PLAN | Merge Cond: (n.id_point = o.point_uid_ref) -[ RECORD 3 ]--- QUERY PLAN | Join Filter: (o.epoch = n.epoch) -[ RECORD 4 ]--- QUERY PLAN | Rows Removed by Join Filter: 217915 -[ RECORD 5 ]--- QUERY PLAN | -> Index Scan using idx_met_vaer_wisline_new_data_id_point on new_data n (cost=0.29..23802.89 rows=5 width=8) (actual time=0.024..16.736 rows=5 loops=1) -[ RECORD 6 ]--- QUERY PLAN | -> Index Scan using idx_met_vaer_wisline_nora_bc25_observation_point_uid_ref on nora_bc25_observation o (cost=0.58..2927642364.25 rows=4263866624 width=16) (actual time=0.016..210486.136 rows=217921 loops=1) -[ RECORD 7 ]--- QUERY PLAN | Total runtime: 397383.663 ms Time: 397391.388 ms EXPLAIN analyze SELECT o.* FROM ( SELECT o.* FROM met_vaer_wisline.nora_bc25_observation o WHERE EXISTS (SELECT 1 FROM (SELECT distinct epoch FROM met_vaer_wisline.new_data) AS n WHERE n.epoch = o.epoch ) AND EXISTS (SELECT 1 FROM (SELECT distinct id_point FROM met_vaer_wisline.new_data) AS n WHERE n.id_point = o.point_uid_ref ) ) AS o, met_vaer_wisline.new_data n WHERE o.point_uid_ref = n.id_point AND o.epoch = n.epoch -[ RECORD 1