Re: trying to delete most of the table by range of date col
Hi jefff, I tried every solution that I checked on net. I cant disable foreign keys or indexes. Trying to have better performance by just changing the query / changing parameters. בתאריך יום ב׳, 3 בספט׳ 2018 ב-18:25 מאת Jeff Janes < jeff.ja...@gmail.com>: > > > >> >> 4)delete in chunks : >> do $$ >> declare >> rec integer; >> begin >> select count(*) from my_table into rec where end_date <= >> to_date('12/12/2018','DD/MM/') and end_date > >> to_date('11/12/2018','DD/MM/'); >> while rec > 0 loop >> DELETE FROM my_Table WHERE id IN (select id from my_tablewhere end_date >> <= to_date('12/12/2018','DD/MM/') and end_date > >> to_date('11/12/2018','DD/MM/') limit 5000); >> rec := rec - 5000; >> raise notice '5000 records were deleted, current rows :%',rec; >> end loop; >> >> end; >> $$ >> ; >> >> Execution time : 6 minutes. >> >> So, it seems that the second solution is the fastest one. It there a >> reason why the delete chunks (solution 4) wasnt faster? >> > > Why would it be faster? The same amount of work needs to get done, no > matter how you slice it. Unless there is a specific reason to think it > would be faster, I would expect it won't be. > > If you aren't willing to drop the constraints, then I think you just need > to resign yourself to paying the price of checking those constraints. Maybe > some future version of PostgreSQL will be able to do them in parallel. > > Cheers, > > Jeff >
Re: trying to delete most of the table by range of date col
> > 4)delete in chunks : > do $$ > declare > rec integer; > begin > select count(*) from my_table into rec where end_date <= > to_date('12/12/2018','DD/MM/') and end_date > > to_date('11/12/2018','DD/MM/'); > while rec > 0 loop > DELETE FROM my_Table WHERE id IN (select id from my_tablewhere end_date <= > to_date('12/12/2018','DD/MM/') and end_date > > to_date('11/12/2018','DD/MM/') limit 5000); > rec := rec - 5000; > raise notice '5000 records were deleted, current rows :%',rec; > end loop; > > end; > $$ > ; > > Execution time : 6 minutes. > > So, it seems that the second solution is the fastest one. It there a > reason why the delete chunks (solution 4) wasnt faster? > Why would it be faster? The same amount of work needs to get done, no matter how you slice it. Unless there is a specific reason to think it would be faster, I would expect it won't be. If you aren't willing to drop the constraints, then I think you just need to resign yourself to paying the price of checking those constraints. Maybe some future version of PostgreSQL will be able to do them in parallel. Cheers, Jeff
Re: trying to delete most of the table by range of date col
Hi > > I understand about having to deal with a bad design. How big is the table > "select pg_size_pretty(pg_table_size(table_name)).? If the table is not > that large relative to the IOPS on your disk system, another solution is to > add a binary column IS_DELETED to the table and modify the queries that hit > the table to exclude rows where IS_DELETED=y. Also you need an index on > this column. I did this with a user table that was a parent table to 120 > data tables and users could not be dropped from the system. On Mon, Sep 3, 2018 at 7:19 AM Mariel Cherkassky < mariel.cherkas...@gmail.com> wrote: > I'm not responsible for this design but I'm trying to improve it. Using > partition isnt an option because partitions doesnt support foreign key. > Moreover, most queries on all those tables uses the id col of the main > table. > > בתאריך יום ב׳, 3 בספט׳ 2018 ב-14:09 מאת Carrie Berlin < > berlincar...@gmail.com>: > >> This is a terribley inflexible design, why so many foreign keys? If the >> table requires removing data, rebuild with partitions. Parent keys should >> be in reference tables, not in fact table. >> >> On Mon, Sep 3, 2018 at 04:51 Mariel Cherkassky < >> mariel.cherkas...@gmail.com> wrote: >> >>> Cant drop foreign keys, there are too much. >>> >>> בתאריך יום ב׳, 3 בספט׳ 2018 ב-11:35 מאת Sergei Kornilov < >>> s...@zsrv.org>: >>> Hello > Delete on my_table (cost=0.00..65183.30 rows=1573862 width=6) (actual time=5121.344..5121.344 rows=0 loops=1) >-> Seq Scan on my_table (cost=0.00..65183.30 rows=1573862 width=6) (actual time=0.012..2244.393 rows=1572864 loops=1) > Filter: ((end_date <= to_date('12/12/2018'::text, 'DD/MM/'::text)) AND (end_date > to_date('11/12/2018'::text, 'DD/MM/'::text))) > Rows Removed by Filter: 40253 > Planning time: 0.210 ms > Trigger for constraint table1: time=14730.816 calls=1572864 > Trigger for constraint table2: time=30718.084 calls=1572864 > Trigger for constraint table3: time=28170.363 calls=1572864 > Trigger for constraint table4: time=29573.681 calls=1572864 > Trigger for constraint table5: time=29629.263 calls=1572864 > Trigger for constraint table6: time=29628.489 calls=1572864 > Trigger for constraint table7: time=29798.121 calls=1572864 > Trigger for constraint table8: time=29645.705 calls=1572864 > Trigger for constraint table9: time=29657.177 calls=1572864 > Trigger for constraint table10: time=29487.054 calls=1572864 > Trigger for constraint table11: time=30010.978 calls=1572864 > Trigger for constraint table12: time=26383.924 calls=1572864 > Execution time: 350603.047 ms As you can see in "actual time" - delete was run only 5 sec. All the other time postgresql checked foreign keys triggers. 0,02ms per row seems adequate for index lookup. It may be better drop foreign keys, delete data, and create foreign keys back. regards, Sergei >>>
Re: trying to delete most of the table by range of date col
I'm not responsible for this design but I'm trying to improve it. Using partition isnt an option because partitions doesnt support foreign key. Moreover, most queries on all those tables uses the id col of the main table. בתאריך יום ב׳, 3 בספט׳ 2018 ב-14:09 מאת Carrie Berlin < berlincar...@gmail.com>: > This is a terribley inflexible design, why so many foreign keys? If the > table requires removing data, rebuild with partitions. Parent keys should > be in reference tables, not in fact table. > > On Mon, Sep 3, 2018 at 04:51 Mariel Cherkassky < > mariel.cherkas...@gmail.com> wrote: > >> Cant drop foreign keys, there are too much. >> >> בתאריך יום ב׳, 3 בספט׳ 2018 ב-11:35 מאת Sergei Kornilov <s...@zsrv.org >> >: >> >>> Hello >>> >>> > Delete on my_table (cost=0.00..65183.30 rows=1573862 width=6) >>> (actual time=5121.344..5121.344 rows=0 loops=1) >>> >-> Seq Scan on my_table (cost=0.00..65183.30 rows=1573862 >>> width=6) (actual time=0.012..2244.393 rows=1572864 loops=1) >>> > Filter: ((end_date <= to_date('12/12/2018'::text, >>> 'DD/MM/'::text)) AND (end_date > to_date('11/12/2018'::text, >>> 'DD/MM/'::text))) >>> > Rows Removed by Filter: 40253 >>> > Planning time: 0.210 ms >>> > Trigger for constraint table1: time=14730.816 calls=1572864 >>> > Trigger for constraint table2: time=30718.084 calls=1572864 >>> > Trigger for constraint table3: time=28170.363 calls=1572864 >>> > Trigger for constraint table4: time=29573.681 calls=1572864 >>> > Trigger for constraint table5: time=29629.263 calls=1572864 >>> > Trigger for constraint table6: time=29628.489 calls=1572864 >>> > Trigger for constraint table7: time=29798.121 calls=1572864 >>> > Trigger for constraint table8: time=29645.705 calls=1572864 >>> > Trigger for constraint table9: time=29657.177 calls=1572864 >>> > Trigger for constraint table10: time=29487.054 calls=1572864 >>> > Trigger for constraint table11: time=30010.978 calls=1572864 >>> > Trigger for constraint table12: time=26383.924 calls=1572864 >>> > Execution time: 350603.047 ms >>> >>> As you can see in "actual time" - delete was run only 5 sec. All the >>> other time postgresql checked foreign keys triggers. 0,02ms per row seems >>> adequate for index lookup. >>> It may be better drop foreign keys, delete data, and create foreign keys >>> back. >>> >>> regards, Sergei >>> >>
Re: trying to delete most of the table by range of date col
I checked, the results : 1)explain (analyze,buffers) delete from my_table where end_date <= to_date('12/12/2018','DD/MM/') and end_date > to_date('11/12/2018','DD/MM/'); QUERY PLAN -- Delete on my_table (cost=0.00..97294.80 rows=1571249 width=6) (actual time=4706.791..4706.791 rows=0 loops=1) Buffers: shared hit=3242848 -> Seq Scan on my_table (cost=0.00..97294.80 rows=1571249 width=6) (actual time=0.022..2454.686 rows=1572864 loops=1) Filter: ((end_date <= to_date('12/12/2018'::text, 'DD/MM/'::text)) AND (end_date > to_date('11/12/2018'::text, 'DD/MM/'::text))) Rows Removed by Filter: 40253 Buffers: shared hit=65020(*8k/1024)=507MB Planning time: 0.182 ms 2)explain (analyze,buffers) DELETE FROM my_table WHERE id IN (select id from my_table where end_date <= to_date('12/12/2018','DD/MM/') and end_date > to_date('11/12/2018','DD/MM/')); QUERY PLAN Delete on my_table (cost=108908.17..252425.01 rows=1559172 width=12) (actual time=11168.090..11168.090 rows=0 loops=1) Buffers: shared hit=3307869 dirtied=13804, temp read=13656 written=13594 -> Hash Join (cost=108908.17..252425.01 rows=1559172 width=12) (actual time=1672.222..6401.288 rows=1572864 loops=1) Hash Cond: (my_table_1.id = my_table.id) Buffers: shared hit=130040, temp read=13656 written=13594 -> Seq Scan on my_table my_table_1 (cost=0.00..97075.26 rows=1559172 width=14) (actual time=0.008..2474.671 rows=1572864 loops=1) Filter: ((end_date <= to_date('12/12/2018'::text, 'DD/MM/'::text)) AND (end_date > to_date('11/12/2018'::text, 'DD/MM/'::text))) Rows Removed by Filter: 40253 Buffers: shared hit=65020 -> Hash (cost=81047.63..81047.63 rows=1602763 width=14) (actual time=1671.613..1671.613 rows=1613117 loops=1) Buckets: 131072 Batches: 32 Memory Usage: 3392kB Buffers: shared hit=65020, temp written=6852 -> Seq Scan on my_table (cost=0.00..81047.63 rows=1602763 width=14) (actual time=0.003..778.311 rows=1613117 loops=1) Buffers: shared hit=65020 3)explain (analyze,buffers) DELETE FROM my_table my_table USING id_test WHERE my_table.id = id_test.id; QUERY PLAN -- Delete on my_table my_table (cost=109216.05..178743.05 rows=1572960 width=12) (actual time=7307.465..7307.465 rows=0 loops=1) Buffers: shared hit=3210748, local hit=6960, temp read=13656 written=13594 -> Hash Join (cost=109216.05..178743.05 rows=1572960 width=12) (actual time=1636.744..4489.246 rows=1572864 loops=1) Hash Cond: (id_test.id = my_table.id) Buffers: shared hit=65020, local hit=6960, temp read=13656 written=13594 -> Seq Scan on id_test(cost=0.00..22689.60 rows=1572960 width=14) (actual time=0.009..642.859 rows=1572864 loops=1) Buffers: local hit=6960 -> Hash (cost=81160.02..81160.02 rows=1614002 width=14) (actual time=1636.228..1636.228 rows=1613117 loops=1) Buckets: 131072 Batches: 32 Memory Usage: 3392kB Buffers: shared hit=65020, temp written=6852 -> Seq Scan on my_table my_table (cost=0.00..81160.02 rows=1614002 width=14) (actual time=0.297..815.133 rows=1613117 loops=1) Buffers: shared hit=65020 I restarted the cluster after running every query. בתאריך יום ב׳, 3 בספט׳ 2018 ב-12:23 מאת Justin Pryzby < pry...@telsasoft.com>: > On Mon, Sep 03, 2018 at 11:17:58AM +0300, Mariel Cherkassky wrote: > > Hi, > > I already checked and on all the tables that uses the id col of the main > > table as a foreign key have index on that column. > > > > So, it seems that the second solution is the fastest one. It there a > reason > > why the delete chunks (solution 4) wasnt faster? > > I suggest running: > > SET track_io_timing=on; -- requires superuser > explain(ANALYZE,BUFFERS) DELETE [...] > > https://wiki.postgresql.org/wiki/Slow_Query_Questions > > Maybe you just need larger shared_buffers ? > > Justin >
Re: trying to delete most of the table by range of date col
On Mon, Sep 03, 2018 at 11:17:58AM +0300, Mariel Cherkassky wrote: > Hi, > I already checked and on all the tables that uses the id col of the main > table as a foreign key have index on that column. > > So, it seems that the second solution is the fastest one. It there a reason > why the delete chunks (solution 4) wasnt faster? I suggest running: SET track_io_timing=on; -- requires superuser explain(ANALYZE,BUFFERS) DELETE [...] https://wiki.postgresql.org/wiki/Slow_Query_Questions Maybe you just need larger shared_buffers ? Justin
Re: trying to delete most of the table by range of date col
Cant drop foreign keys, there are too much. בתאריך יום ב׳, 3 בספט׳ 2018 ב-11:35 מאת Sergei Kornilov <s...@zsrv.org >: > Hello > > > Delete on my_table (cost=0.00..65183.30 rows=1573862 width=6) (actual > time=5121.344..5121.344 rows=0 loops=1) > >-> Seq Scan on my_table (cost=0.00..65183.30 rows=1573862 width=6) > (actual time=0.012..2244.393 rows=1572864 loops=1) > > Filter: ((end_date <= to_date('12/12/2018'::text, > 'DD/MM/'::text)) AND (end_date > to_date('11/12/2018'::text, > 'DD/MM/'::text))) > > Rows Removed by Filter: 40253 > > Planning time: 0.210 ms > > Trigger for constraint table1: time=14730.816 calls=1572864 > > Trigger for constraint table2: time=30718.084 calls=1572864 > > Trigger for constraint table3: time=28170.363 calls=1572864 > > Trigger for constraint table4: time=29573.681 calls=1572864 > > Trigger for constraint table5: time=29629.263 calls=1572864 > > Trigger for constraint table6: time=29628.489 calls=1572864 > > Trigger for constraint table7: time=29798.121 calls=1572864 > > Trigger for constraint table8: time=29645.705 calls=1572864 > > Trigger for constraint table9: time=29657.177 calls=1572864 > > Trigger for constraint table10: time=29487.054 calls=1572864 > > Trigger for constraint table11: time=30010.978 calls=1572864 > > Trigger for constraint table12: time=26383.924 calls=1572864 > > Execution time: 350603.047 ms > > As you can see in "actual time" - delete was run only 5 sec. All the other > time postgresql checked foreign keys triggers. 0,02ms per row seems > adequate for index lookup. > It may be better drop foreign keys, delete data, and create foreign keys > back. > > regards, Sergei >
Re: trying to delete most of the table by range of date col
Hello > Delete on my_table (cost=0.00..65183.30 rows=1573862 width=6) (actual > time=5121.344..5121.344 rows=0 loops=1) > -> Seq Scan on my_table (cost=0.00..65183.30 rows=1573862 width=6) > (actual time=0.012..2244.393 rows=1572864 loops=1) > Filter: ((end_date <= to_date('12/12/2018'::text, > 'DD/MM/'::text)) AND (end_date > to_date('11/12/2018'::text, > 'DD/MM/'::text))) > Rows Removed by Filter: 40253 > Planning time: 0.210 ms > Trigger for constraint table1: time=14730.816 calls=1572864 > Trigger for constraint table2: time=30718.084 calls=1572864 > Trigger for constraint table3: time=28170.363 calls=1572864 > Trigger for constraint table4: time=29573.681 calls=1572864 > Trigger for constraint table5: time=29629.263 calls=1572864 > Trigger for constraint table6: time=29628.489 calls=1572864 > Trigger for constraint table7: time=29798.121 calls=1572864 > Trigger for constraint table8: time=29645.705 calls=1572864 > Trigger for constraint table9: time=29657.177 calls=1572864 > Trigger for constraint table10: time=29487.054 calls=1572864 > Trigger for constraint table11: time=30010.978 calls=1572864 > Trigger for constraint table12: time=26383.924 calls=1572864 > Execution time: 350603.047 ms As you can see in "actual time" - delete was run only 5 sec. All the other time postgresql checked foreign keys triggers. 0,02ms per row seems adequate for index lookup. It may be better drop foreign keys, delete data, and create foreign keys back. regards, Sergei
Re: trying to delete most of the table by range of date col
Hi, I already checked and on all the tables that uses the id col of the main table as a foreign key have index on that column. I tried all the next 4 solutions : 1)delete from my_table where end_date <= to_date('12/12/2018','DD/MM/') and end_date > to_date('11/12/2018','DD/MM/'); Execution time: 350603.047 ms ~ 5.8 minutes 2)DELETE FROM my_table WHERE id IN (select id from my_table where end_date <= to_date('12/12/2018','DD/MM/') and end_date > to_date('11/12/2018','DD/MM/')); Execution time: 333166.233 ms ~ 5.5 minutes 3) set temp_buffers='1GB'; SET create temp table id_temp as select id from my_Table where end_date <= to_date('12/12/2018','DD/MM/') and end_date > to_date('11/12/2018','DD/MM/') ; SELECT 1572864 Time: 2196.670 ms DELETE FROM my_table USING id_temp WHERE my_table.id = id_temp.id; Execution time: 459650.621 ms 7.6minutes 4)delete in chunks : do $$ declare rec integer; begin select count(*) from my_table into rec where end_date <= to_date('12/12/2018','DD/MM/') and end_date > to_date('11/12/2018','DD/MM/'); while rec > 0 loop DELETE FROM my_Table WHERE id IN (select id from my_tablewhere end_date <= to_date('12/12/2018','DD/MM/') and end_date > to_date('11/12/2018','DD/MM/') limit 5000); rec := rec - 5000; raise notice '5000 records were deleted, current rows :%',rec; end loop; end; $$ ; Execution time : 6 minutes. So, it seems that the second solution is the fastest one. It there a reason why the delete chunks (solution 4) wasnt faster? בתאריך יום ב׳, 3 בספט׳ 2018 ב-10:35 מאת Andreas Kretschmer < andr...@a-kretschmer.de>: > > > Am 03.09.2018 um 09:06 schrieb Justin Pryzby: > > Note, I believe it's planned in the future for foreign keys to support > > referenes to partitioned tables, at which point you could just DROP the > monthly > > partition...but not supported right now. > > the future is close, that's possible in 11 ;-) > > Regards, Andreas > > -- > 2ndQuadrant - The PostgreSQL Support Company. > www.2ndQuadrant.com > > >
Re: trying to delete most of the table by range of date col
Am 03.09.2018 um 09:06 schrieb Justin Pryzby: Note, I believe it's planned in the future for foreign keys to support referenes to partitioned tables, at which point you could just DROP the monthly partition...but not supported right now. the future is close, that's possible in 11 ;-) Regards, Andreas -- 2ndQuadrant - The PostgreSQL Support Company. www.2ndQuadrant.com
Re: trying to delete most of the table by range of date col
On Mon, Sep 03, 2018 at 09:27:52AM +0300, Mariel Cherkassky wrote: > I'm trying to find the best way to delete most of the table but not all of it > according to a range of dates. > Indexes: > "end_date_idx" btree (end_date) > Referenced by: > TABLE "table1" CONSTRAINT "application_change_my_table_id_fkey" FOREIGN > KEY (my_table_id) REFERENCES my_table(id) > TABLE "table2" CONSTRAINT "configuration_changes_my_table_id_fkey" > FOREIGN KEY (my_table_id) REFERENCES my_table(id) ... > As you can see alot of other tables uses the id col as a foreign key which > make the delete much slower. > Trigger for constraint table1: time=14730.816 calls=1572864 > Trigger for constraint table2: time=30718.084 calls=1572864 > Trigger for constraint table3: time=28170.363 calls=1572864 ... Do the other tables have indices on their referencING columns ? https://www.postgresql.org/docs/devel/static/ddl-constraints.html#DDL-CONSTRAINTS-FK "Since a DELETE of a row from the referenced table [...] will require a scan of the referencing table for rows matching the old value, it is often a good idea to index the referencing columns too." Note, I believe it's planned in the future for foreign keys to support referenes to partitioned tables, at which point you could just DROP the monthly partition...but not supported right now. Justin
trying to delete most of the table by range of date col
Hi, I have a big table (with 1.6 milion records). One of the columns is called end_date and it`s type is timestamp. I'm trying to find the best way to delete most of the table but not all of it according to a range of dates. The table structure : afa=# \d my_table; Table "public.my_table" Column | Type | Modifiers -+--+-- id | bigint | not null default nextval('my_table_id_seq'::regclass) devid| integer | not null column_name| integer | not null column_name| integer | not null column_name| integer | not null column_name| timestamp with time zone | column_name| integer | not null column_name| integer | not null column_name| integer | not null column_name| integer | not null column_name| integer | not null column_name| integer | not null column_name| integer | not null column_name| text | not null column_name| integer | not null default 0 column_name| integer | not null default 0 column_name| integer | not null default 0 column_name| integer | not null default 0 column_name| integer | not null default 0 column_name| integer | not null default 0 column_name| integer | not null default 0 column_name| integer | default 0 column_name| integer | default 0 column_name| integer | default 0 column_name| integer | default 0 column_name| integer | default 0 column_name| integer | default 0 end_date| timestamp with time zone | Indexes: "my_table_pkey" PRIMARY KEY, btree (id) "my_table_date_idx" btree (date) "my_table_device_idx" btree (devid) "end_date_idx" btree (end_date) Foreign-key constraints: "fk_a" FOREIGN KEY (devid) REFERENCES device_data(id) Referenced by: TABLE "table1" CONSTRAINT "application_change_my_table_id_fkey" FOREIGN KEY (my_table_id) REFERENCES my_table(id) TABLE "table2" CONSTRAINT "configuration_changes_my_table_id_fkey" FOREIGN KEY (my_table_id) REFERENCES my_table(id) TABLE "table3" CONSTRAINT "fk_57hmvnx423bw9h203260r8gic" FOREIGN KEY (my_table) REFERENCES my_table(id) TABLE "table3" CONSTRAINT "interface_change_my_table_fk" FOREIGN KEY (my_table) REFERENCES my_table(id) TABLE "table4" CONSTRAINT "my_table_id_fkey" FOREIGN KEY (my_table_id) REFERENCES my_table(id) ON DELETE CASCADE TABLE "table5" CONSTRAINT "my_table_report_my_table_fk" FOREIGN KEY (my_table_id) REFERENCES my_table(id) TABLE "table6" CONSTRAINT "my_table_to_policy_change_my_table_foreign_key" FOREIGN KEY (my_table) REFERENCES my_table(id) TABLE "table7" CONSTRAINT "network_object_change_my_table_id_fkey" FOREIGN KEY (my_table_id) REFERENCES my_table(id) TABLE "table8" CONSTRAINT "orig_nat_rule_change_my_table_id_fkey" FOREIGN KEY (my_table_id) REFERENCES my_table(id) TABLE "table9" CONSTRAINT "risk_change_my_table_id_fkey" FOREIGN KEY (my_table_id) REFERENCES my_table(id) TABLE "table10" CONSTRAINT "rule_change_my_table_id_fkey" FOREIGN KEY (my_table_id) REFERENCES my_table(id) TABLE "table11" CONSTRAINT "service_change_my_table_id_fkey" FOREIGN KEY (my_table_id) REFERENCES my_table(id) As you can see alot of other tables uses the id col as a foreign key which make the delete much slower. *Solution I tried for the query : * delete from my_table where end_date <= to_date('12/12/2018','DD/MM/') and end_date > to_date('11/12/2018','DD/MM/'); QUERY PLAN -- Delete on my_table (cost=0.00..65183.30 rows=1573862 width=6) (actual time=5121.344..5121.344 rows=0 loops=1) -> Seq Scan on my_table (cost=0.00..65183.30 rows=1573862 width=6) (actual time=0.012..2244.393 rows=1572864 loops=1) Filter: ((end_date <= to_date('12/12/2018'::text, 'DD/MM/'::text)) AND (end_date > to_date('11/12/2018'::text, 'DD/MM/'::text))) Rows Removed by Filter: 40253 Planning time: 0.210 ms Trigger for constraint table1: time=14730.816 calls=1572864 Trigger for constraint table2: time=30718.084 calls=1572864 Trigger for constraint table3: time=28170.363 calls=1572864 Trigger for constraint table4: time=29573.681 calls=1572864 Trigger for constraint table5: time=29629.263 calls=1572864 Trigger for