Re: [PERFORM] Delete performance
On 01/06/2011, at 11:45 AM, Jarrod Chesney wrote: I'm executing 30,000 single delete statements in one transaction. At this point i'm looking into combining the multiple deletes into one statement and breaking my big transaction into smaller ones of about 100 deletes or so. On 01/06/2011, at 11:40 AM, Craig Ringer wrote: On 1/06/2011 7:11 AM, Pierre C wrote: If i run 30,000 prepared DELETE FROM xxx WHERE ID = ? commands it takes close to 10 minutes. Do you run those in a single transaction or do you use one transaction per DELETE ? In the latter case, postgres will ensure each transaction is commited to disk, at each commit. Since this involves waiting for the physical I/O to happen, it is slow. If you do it 30.000 times, it will be 30.000 times slow. Not only that, but if you're doing it via some application the app has to wait for Pg to respond before it can send the next query. This adds even more delay, as do all the processor switches between Pg and your application. If you really must issue individual DELETE commands one-by-one, I *think* you can use synchronous_commit=off or SET LOCAL synchronous_commit TO OFF; See: http://www.postgresql.org/docs/current/static/runtime-config-wal.html -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/ Apologies for top posting, Sorry. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Delete performance
9.0rc1 ? You know that the stable 9.0 has been out for quite a while now. Its not going to affect the delete speed in any way, but I would generally advice you to upgrade it to the lates 9.0.x As for the delete it self, check if you have indices on the tables that refer the main table on the referred column. Often times that's the issue. Other thing is , number of triggers on the other tables. -- GJ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Delete performance
If i run 30,000 prepared DELETE FROM xxx WHERE ID = ? commands it takes close to 10 minutes. Do you run those in a single transaction or do you use one transaction per DELETE ? In the latter case, postgres will ensure each transaction is commited to disk, at each commit. Since this involves waiting for the physical I/O to happen, it is slow. If you do it 30.000 times, it will be 30.000 times slow. Note that you should really do : DELETE FROM table WHERE id IN (huge list of ids). or DELETE FROM table JOIN VALUES (list of ids) ON (...) Also, check your foreign keys using cascading deletes have indexes in the referencing tables. Without an index, finding the rows to cascade-delete will be slow. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Delete performance
Hi All My database uses joined table inheritance and my server version is 9.0 Version string PostgreSQL 9.0rc1 on x86_64-pc-linux-gnu, compiled by GCC x86_64-pc-linux-gnu-gcc (Gentoo 4.4.4-r1 p1.1, pie-0.4.5) 4.4.4, 64-bit I have about 120,000 records in the table that everything else inherits from, if i truncate-cascaded this table it happens almost instantly. If i run 30,000 prepared DELETE FROM xxx WHERE ID = ? commands it takes close to 10 minutes. My foreign keys to the base table are all set with ON DELETE CASCADE. I've looked though all the feilds that relate to the ID in the base table and created btree indexes for them. Can anyone outline what I need to verify/do to ensure i'm getting the best performance for my deletes? Regards, Jarrod Chesney -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] DELETE performance problem
On Tue, Nov 24, 2009 at 2:37 PM, Luca Tettamanti kronos...@gmail.com wrote: - HashAggregate (cost=1031681.15..1033497.20 rows=181605 width=8) (a ctual time=571807.575..610178.552 rows=26185953 loops=1) This is Your problem. The system`s estimate for the number of distinct annotation_ids in t2 is wildly off. The disk activity is almost certainly swapping (You can check it iostat on the linux machine). Can You try analyze t2 just before the delete quety? maybe try raising statistics target for the annotation_id column. If all else fails, You may try set enable_hashagg to false just before the query. Greetings Marcin Mańk Greetings Marcin Mańk -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] DELETE performance problem
On Wed, Nov 25, 2009 at 04:22:47PM +0100, marcin mank wrote: On Tue, Nov 24, 2009 at 2:37 PM, Luca Tettamanti kronos...@gmail.com wrote: - HashAggregate (cost=1031681.15..1033497.20 rows=181605 width=8) (a ctual time=571807.575..610178.552 rows=26185953 loops=1) This is Your problem. The system`s estimate for the number of distinct annotation_ids in t2 is wildly off. Ah, I see. The disk activity is almost certainly swapping (You can check it iostat on the linux machine). Nope, zero swap activity. Under Linux postgres tops up at about 4.4GB, leaving 3.6GB of page cache (nothing else is running right now). Can You try analyze t2 just before the delete quety? maybe try raising statistics target for the annotation_id column. I already tried, the estimation is still way off. If all else fails, You may try set enable_hashagg to false just before the query. Hash IN Join (cost=1879362.27..11080576.17 rows=202376 width=6) (actual time=250281.607..608638.141 rows=26185953 loops=1) Hash Cond: (t1.annotation_id = t2.annotation_id) - Seq Scan on t1 (cost=0.00..661734.12 rows=45874812 width=14) (actual time=0.017..193661.353 rows=45874812 loops=1) - Hash (cost=879289.12..879289.12 rows=60956812 width=8) (actual time=250271.012..250271.012 rows=60956812 loops=1) - Seq Scan on t2 (cost=0.00..879289.12 rows=60956812 width=8) (actual time=0.023..178297.862 rows=60956812 loops=1) Total runtime: 900019.033 ms (6 rows) This is after an analyze. The alternative query suggested by Shrirang Chitnis: DELETE FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t1.annotation_id = t2.annotation_id) performs event better: Seq Scan on t1 (cost=0.00..170388415.89 rows=22937406 width=6) (actual time=272.625..561241.294 rows=26185953 loops=1) Filter: (subplan) SubPlan - Index Scan using t2_idx on t2 (cost=0.00..1113.63 rows=301 width=0) (actual time=0.008..0.008 rows=1 loops=45874812) Index Cond: ($0 = annotation_id) Total runtime: 629426.014 ms (6 rows) Will try on the full data set. thanks, Luca -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] DELETE performance problem
On Wed, Nov 25, 2009 at 4:13 PM, Luca Tettamanti kronos...@gmail.comwrote: DELETE FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t1.annotation_id = t2.annotation_id) performs event better: Seq Scan on t1 (cost=0.00..170388415.89 rows=22937406 width=6) (actual time=272.625..561241.294 rows=26185953 loops=1) Filter: (subplan) SubPlan - Index Scan using t2_idx on t2 (cost=0.00..1113.63 rows=301 width=0) (actual time=0.008..0.008 rows=1 loops=45874812) Index Cond: ($0 = annotation_id) Total runtime: 629426.014 ms (6 rows) Have you tried: DELETE FROM t1 USING t2 WHERE t1.annotation_id = t2.annotation_id; ? -- GJ
[PERFORM] DELETE performance problem
Hello, I've run in a severe performance problem with the following statement: DELETE FROM t1 WHERE t1.annotation_id IN ( SELECT t2.annotation_id FROM t2) t1 contains about 48M record (table size is 5.8GB), while t2 contains about 60M record (total size 8.6GB). annotation_id is the PK in t1 but not in t2 (it's not even unique, in fact there are duplicates - there are about 20M distinct annotation_id in this table). There are no FKs on either tables. I've killed the query after 14h(!) of runtime... I've reproduced the problem using a only the ids (extracted from the full tables) with the following schemas: test2=# \d t1 Table public.t1 Column | Type | Modifiers ---++--- annotation_id | bigint | not null Indexes: t1_pkey PRIMARY KEY, btree (annotation_id) test2=# \d t2 Table public.t2 Column | Type | Modifiers ---++--- annotation_id | bigint | Indexes: t2_idx btree (annotation_id) The query above takes about 30 minutes to complete. The slowdown is not as severe, but (IMHO) the behaviour is strange. On a win2k8 with 8.3.8 using procexp I see the process churning the disk and using more memory until it hits some limit (at about 1.8GB) then the IO slows down considerably. See this screenshot[1]. This is exactly what happens with the full dataset. This is the output of the explain: test2= explain analyze delete from t1 where annotation_id in (select annotation _id from t2); QUERY PLAN - Hash Join (cost=1035767.26..2158065.55 rows=181605 width=6) (actual time=64339 5.565..1832056.588 rows=26185953 loops=1) Hash Cond: (t1.annotation_id = t2.annotation_id) - Seq Scan on t1 (cost=0.00..661734.12 rows=45874812 width=14) (actual tim e=0.291..179119.487 rows=45874812 loops=1) - Hash (cost=1033497.20..1033497.20 rows=181605 width=8) (actual time=6433 93.742..643393.742 rows=26185953 loops=1) - HashAggregate (cost=1031681.15..1033497.20 rows=181605 width=8) (a ctual time=571807.575..610178.552 rows=26185953 loops=1) - Seq Scan on t2 (cost=0.00..879289.12 rows=60956812 width=8) (actual time=2460.595..480446.581 rows=60956812 loops=1) Total runtime: 2271122.474 ms (7 rows) Time: 2274723,284 ms An identital linux machine (with 8.4.1) shows the same issue; with strace I see a lots of seeks: % time seconds usecs/call callserrors syscall -- --- --- - - 90.370.155484 15 10601 read 9.100.0156495216 3 fadvise64 0.390.000668 0 5499 write 0.150.000253 0 10733 lseek 0.000.00 0 3 open 0.000.00 0 3 close 0.000.00 0 3 semop -- --- --- - - 100.000.172054 26845 total (30s sample) Before hitting the memory limit (AS on win2k8, unsure about Linux) the trace is the following: % time seconds usecs/call callserrors syscall -- --- --- - - 100.000.063862 0321597 read 0.000.00 0 3 lseek 0.000.00 076 mmap -- --- --- - - 100.000.063862321676 total The machines have 8 cores (2 Xeon E5320), 8GB of RAM. Postgres data directory is on hardware (Dell PERC5) raid mirror, with the log on a separate array. One machine is running linux 64bit (Debian/stable), the other win2k8 (32 bit). shared_buffers = 512MB work_mem = 512MB maintenance_work_mem = 1GB checkpoint_segments = 16 wal_buffers = 8MB fsync = off # Just in case... usually it's enabled effective_cache_size = 4096MB (the machine with win2k8 is running with a smaller shared_buffers - 16MB) Any idea on what's going wrong here? thanks, Luca [1] http://img10.imageshack.us/i/psql2.png/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] DELETE performance problem
On Tue, Nov 24, 2009 at 3:59 PM, Jerry Champlin jchamp...@absolute-performance.com wrote: You may want to consider using partitioning. That way you can drop the appropriate partition and never have the overhead of a delete. Hum, I don't think it's doable in my case; the partitioning is not know a priori. First t1 is fully populated, then the data is loaded and manipulated by my application, the result is stored in t2; only then I want to remove (part of) the data from t1. thanks, Luca -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] DELETE performance problem
2009/11/24 Luca Tettamanti kronos...@gmail.com On Tue, Nov 24, 2009 at 3:59 PM, Jerry Champlin jchamp...@absolute-performance.com wrote: You may want to consider using partitioning. That way you can drop the appropriate partition and never have the overhead of a delete. Hum, I don't think it's doable in my case; the partitioning is not know a priori. First t1 is fully populated, then the data is loaded and manipulated by my application, the result is stored in t2; only then I want to remove (part of) the data from t1. thanks, Luca It's a shame there isn't a LIMIT option on DELETE so this can be done in small batches. Thom
Re: [PERFORM] DELETE performance problem
You may want to consider using partitioning. That way you can drop the appropriate partition and never have the overhead of a delete. Jerry Champlin|Absolute Performance Inc.|Mobile: 303-588-2547 -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Luca Tettamanti Sent: Tuesday, November 24, 2009 6:37 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] DELETE performance problem Hello, I've run in a severe performance problem with the following statement: DELETE FROM t1 WHERE t1.annotation_id IN ( SELECT t2.annotation_id FROM t2) t1 contains about 48M record (table size is 5.8GB), while t2 contains about 60M record (total size 8.6GB). annotation_id is the PK in t1 but not in t2 (it's not even unique, in fact there are duplicates - there are about 20M distinct annotation_id in this table). There are no FKs on either tables. I've killed the query after 14h(!) of runtime... I've reproduced the problem using a only the ids (extracted from the full tables) with the following schemas: test2=# \d t1 Table public.t1 Column | Type | Modifiers ---++--- annotation_id | bigint | not null Indexes: t1_pkey PRIMARY KEY, btree (annotation_id) test2=# \d t2 Table public.t2 Column | Type | Modifiers ---++--- annotation_id | bigint | Indexes: t2_idx btree (annotation_id) The query above takes about 30 minutes to complete. The slowdown is not as severe, but (IMHO) the behaviour is strange. On a win2k8 with 8.3.8 using procexp I see the process churning the disk and using more memory until it hits some limit (at about 1.8GB) then the IO slows down considerably. See this screenshot[1]. This is exactly what happens with the full dataset. This is the output of the explain: test2= explain analyze delete from t1 where annotation_id in (select annotation _id from t2); QUERY PLAN - Hash Join (cost=1035767.26..2158065.55 rows=181605 width=6) (actual time=64339 5.565..1832056.588 rows=26185953 loops=1) Hash Cond: (t1.annotation_id = t2.annotation_id) - Seq Scan on t1 (cost=0.00..661734.12 rows=45874812 width=14) (actual tim e=0.291..179119.487 rows=45874812 loops=1) - Hash (cost=1033497.20..1033497.20 rows=181605 width=8) (actual time=6433 93.742..643393.742 rows=26185953 loops=1) - HashAggregate (cost=1031681.15..1033497.20 rows=181605 width=8) (a ctual time=571807.575..610178.552 rows=26185953 loops=1) - Seq Scan on t2 (cost=0.00..879289.12 rows=60956812 width=8) (actual time=2460.595..480446.581 rows=60956812 loops=1) Total runtime: 2271122.474 ms (7 rows) Time: 2274723,284 ms An identital linux machine (with 8.4.1) shows the same issue; with strace I see a lots of seeks: % time seconds usecs/call callserrors syscall -- --- --- - - 90.370.155484 15 10601 read 9.100.0156495216 3 fadvise64 0.390.000668 0 5499 write 0.150.000253 0 10733 lseek 0.000.00 0 3 open 0.000.00 0 3 close 0.000.00 0 3 semop -- --- --- - - 100.000.172054 26845 total (30s sample) Before hitting the memory limit (AS on win2k8, unsure about Linux) the trace is the following: % time seconds usecs/call callserrors syscall -- --- --- - - 100.000.063862 0321597 read 0.000.00 0 3 lseek 0.000.00 076 mmap -- --- --- - - 100.000.063862321676 total The machines have 8 cores (2 Xeon E5320), 8GB of RAM. Postgres data directory is on hardware (Dell PERC5) raid mirror, with the log on a separate array. One machine is running linux 64bit (Debian/stable), the other win2k8 (32 bit). shared_buffers = 512MB work_mem = 512MB maintenance_work_mem = 1GB checkpoint_segments = 16 wal_buffers = 8MB fsync = off # Just in case... usually it's enabled effective_cache_size = 4096MB (the machine with win2k8 is running with a smaller shared_buffers - 16MB) Any idea on what's going wrong here? thanks, Luca [1] http://img10.imageshack.us/i/psql2.png/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql
Re: [PERFORM] DELETE performance problem
On Tue, Nov 24, 2009 at 3:19 PM, Thom Brown thombr...@gmail.com wrote: 2009/11/24 Luca Tettamanti kronos...@gmail.com On Tue, Nov 24, 2009 at 3:59 PM, Jerry Champlin jchamp...@absolute-performance.com wrote: You may want to consider using partitioning. That way you can drop the appropriate partition and never have the overhead of a delete. Hum, I don't think it's doable in my case; the partitioning is not know a priori. First t1 is fully populated, then the data is loaded and manipulated by my application, the result is stored in t2; only then I want to remove (part of) the data from t1. thanks, Luca It's a shame there isn't a LIMIT option on DELETE so this can be done in small batches. you sort of can do it, using PK on table as pointer. DELETE FROM foo USING ... etc. with subquery in using that will limit number of rows ;) Thom -- GJ
Re: [PERFORM] DELETE performance problem
On Tuesday 24 November 2009, Thom Brown thombr...@gmail.com wrote: It's a shame there isn't a LIMIT option on DELETE so this can be done in small batches. delete from table where pk in (select pk from table where delete_condition limit X); -- No animals were harmed in the recording of this episode. We tried but that damn monkey was just too fast. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] DELETE performance problem
Even though the column in question is not unique on t2 could you not index it? That should improve the performance of the inline query. Are dates applicable in any way? In some cases adding a date field, partitioning or indexing on that and adding where datex days. That can be an effective way to limit records searched. Kris On 24-Nov-09, at 9:59, Jerry Champlin jchamp...@absolute-performance.com wrote: You may want to consider using partitioning. That way you can drop the appropriate partition and never have the overhead of a delete. Jerry Champlin|Absolute Performance Inc.|Mobile: 303-588-2547 -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Luca Tettamanti Sent: Tuesday, November 24, 2009 6:37 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] DELETE performance problem Hello, I've run in a severe performance problem with the following statement: DELETE FROM t1 WHERE t1.annotation_id IN ( SELECT t2.annotation_id FROM t2) t1 contains about 48M record (table size is 5.8GB), while t2 contains about 60M record (total size 8.6GB). annotation_id is the PK in t1 but not in t2 (it's not even unique, in fact there are duplicates - there are about 20M distinct annotation_id in this table). There are no FKs on either tables. I've killed the query after 14h(!) of runtime... I've reproduced the problem using a only the ids (extracted from the full tables) with the following schemas: test2=# \d t1 Table public.t1 Column | Type | Modifiers ---++--- annotation_id | bigint | not null Indexes: t1_pkey PRIMARY KEY, btree (annotation_id) test2=# \d t2 Table public.t2 Column | Type | Modifiers ---++--- annotation_id | bigint | Indexes: t2_idx btree (annotation_id) The query above takes about 30 minutes to complete. The slowdown is not as severe, but (IMHO) the behaviour is strange. On a win2k8 with 8.3.8 using procexp I see the process churning the disk and using more memory until it hits some limit (at about 1.8GB) then the IO slows down considerably. See this screenshot[1]. This is exactly what happens with the full dataset. This is the output of the explain: test2= explain analyze delete from t1 where annotation_id in (select annotation _id from t2); QUERY PLAN --- --- -- - Hash Join (cost=1035767.26..2158065.55 rows=181605 width=6) (actual time=64339 5.565..1832056.588 rows=26185953 loops=1) Hash Cond: (t1.annotation_id = t2.annotation_id) - Seq Scan on t1 (cost=0.00..661734.12 rows=45874812 width=14) (actual tim e=0.291..179119.487 rows=45874812 loops=1) - Hash (cost=1033497.20..1033497.20 rows=181605 width=8) (actual time=6433 93.742..643393.742 rows=26185953 loops=1) - HashAggregate (cost=1031681.15..1033497.20 rows=181605 width=8) (a ctual time=571807.575..610178.552 rows=26185953 loops=1) - Seq Scan on t2 (cost=0.00..879289.12 rows=60956812 width=8) (actual time=2460.595..480446.581 rows=60956812 loops=1) Total runtime: 2271122.474 ms (7 rows) Time: 2274723,284 ms An identital linux machine (with 8.4.1) shows the same issue; with strace I see a lots of seeks: % time seconds usecs/call callserrors syscall -- --- --- - - 90.370.155484 15 10601 read 9.100.0156495216 3 fadvise64 0.390.000668 0 5499 write 0.150.000253 0 10733 lseek 0.000.00 0 3 open 0.000.00 0 3 close 0.000.00 0 3 semop -- --- --- - - 100.000.172054 26845 total (30s sample) Before hitting the memory limit (AS on win2k8, unsure about Linux) the trace is the following: % time seconds usecs/call callserrors syscall -- --- --- - - 100.000.063862 0321597 read 0.000.00 0 3 lseek 0.000.00 076 mmap -- --- --- - - 100.000.063862321676 total The machines have 8 cores (2 Xeon E5320), 8GB of RAM. Postgres data directory is on hardware (Dell PERC5) raid mirror, with the log on a separate array. One machine is running linux 64bit (Debian/stable), the other win2k8 (32 bit). shared_buffers = 512MB work_mem = 512MB maintenance_work_mem = 1GB
Re: [PERFORM] Delete performance again
BTW: Have just tried clean (without any foreign keys constraints) peformance of delete from tbl where field not in (select) vs create temporary table tmp(id) as select distinct field from tbl; delete from tmp where id in (select); delete from tbl where field in (select id from tmp). both tbl and select are huge. tbl cardinality is ~5 million, select is ~1 milliion. Number of records to delete is small. select is simply select id from table2. First (simple) one could not do in a night, second did in few seconds.
Re: [PERFORM] Delete performance again
OK, I did try you proposal and correlated subselect. I have a database ~90 companies. First try was to remove randomly selected 1000 companies Uncorrelated subselect: 65899ms Correlated subselect: 97467ms using: 9605ms my way: 104979ms. (without constraints recreate) My is the worst because it is oriented on massive delete. So I thought USING would perform better, so I did try 1 companies my way: 190527ms. (without constraints recreate) using: 694144ms I was a little shocked, but I did check plans and found out that it did switch from Nested Loop to Hash Join. I did disable Hash Join, it not show Merge Join. This was also disabled and I've got 747253ms. Then I've tried combinations: Without hash join it was the best result of 402629ms, without merge join it was 1096116ms. My conclusion: Until optimizer would take into account additional actions needed (like constraints check/cascade deletes/triggers), it can not make good plan.
Re: [PERFORM] Delete performance again
Hi, Maybe you can try this syntax. I'm not sure, but it eventually perform better: delete from company_alias USING comprm where company_alias.company_id =comprm.id Cheers, Marc -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Delete performance again
Hello. I have a database with company table that have a bunch of related (delete=cascade) tables. Also it has 1-M relation to company_descr table. Once we've found that ~half of our companies do not have any description and we would like to remove them. First this I've tried was delete from company where id not in (select company_id from company_descr); I've tried to analyze command, but unlike to other RDBM I've used it did not include cascade deletes/checks into query plan. That is first problem. It was SLOW. To make it faster I've done next thing: create temporary table comprm(id) as select id from company; delete from comprm where id in (select company_id from company_descr); delete from company where id in (select id from comprm); That was much better. So the question is why postgresql can't do such a thing. But it was better only until removing dataset was small (~5% of all table). As soon as I've tried to remove 50% I've got speed problems. I've ensured I have all indexes for both ends of foreign key. I've tried to remove all cascaded entries by myself, e.g.: create temporary table comprm(id) as select id from company; delete from comprm where id in (select company_id from company_descr); delete from company_alias where company_id in (select id from comprm); ... delete from company where id in (select id from comprm); It did not help until I drop all constraints before and recreate all constraints after. Now I have it work for 15minutes, while previously it could not do in a day. Is it OK? I'd say, some (if not all) of the optimizations could be done by postgresql optimizer.
Re: [PERFORM] Delete performance again
=?ISO-8859-5?B?svbi0Nv22SDC2Nzn2OjY3Q==?= [EMAIL PROTECTED] writes: delete from company where id not in (select company_id from company_descr); I've tried to analyze command, but unlike to other RDBM I've used it did not include cascade deletes/checks into query plan. That is first problem. It was SLOW. Usually the reason for that is having forgotten to make an index on the referencing column(s) ? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Delete performance again
2008/10/2 Tom Lane [EMAIL PROTECTED] =?ISO-8859-5?B?svbi0Nv22SDC2Nzn2OjY3Q==?= [EMAIL PROTECTED] writes: delete from company where id not in (select company_id from company_descr); I've tried to analyze command, but unlike to other RDBM I've used it did not include cascade deletes/checks into query plan. That is first problem. It was SLOW. Usually the reason for that is having forgotten to make an index on the referencing column(s) ? Not at all. As you can see below in original message, simply extending the query to what should have been done by optimizer helps. I'd say optimizer always uses fixed plan not taking into account that this is massive update and id doing index lookup of children records for each parent record, while it would be much more effective to perform removal of all children records in single table scan. It's like trigger for each record instead of for each statement.
Re: [PERFORM] Delete performance on delete from table with inherited tables
Thanks Stephan and Tom for your responses. We have been busy, so I haven't had time to do any further research on this till yesterday. I found that the large number of triggers on the parent or master table were foreign key triggers for each table back to the child tables (update and delete on master, insert on child). The triggers have existed through several versions of postgres and as far as we can tell were automatically created using the references keyword at inception. Yesterday I dropped all the current triggers on parent and children and ran a script that did an alter table add foreign key constraint to each of the 67 child tables with update cascade delete cascade. After this, the delete from the parent where no records existed in the child tables was far more acceptable. Instead of taking hours to do the delete, the process ran for about 5 minutes on my workstation. Removing all constraints entirely reduces this time to a couple of seconds. I am currently evaluating if the foreign key constraints are worth the performance penalty in this particular case. To finish up, it appears that the foreign key implementation has changed since when these first tables were created in our database. Dropping the existing triggers and re-adding the constraints on each table significantly improved performance for us. I do not know enough of the internals to know why this happened. But our experience seems to prove that the newer implementation of foreign keys is more efficient then previous versions. YMMV One other item that was brought up was whether the child tables have the fk column indexed, and the answer was yes. Each had a standard btree index on the foreign key. Explain showed nothing as all the time was being spent in the triggers. Time spent in triggers is not shown in the pg 7.3.4 version of explain (nor would I necessarily expect it to). Thanks for your time, expertise and responses. -Chris On Tuesday 09 March 2004 7:18 pm, Stephan Szabo wrote: On Wed, 3 Mar 2004, Chris Kratz wrote: Which certainly points to the triggers being the culprit. In reading the documentation, it seems like the delete from only... statement should ignore the constraint triggers. But it seems quite obvious from the Delete from only merely means that children of the table being deleted will not have their rows checked against any where conditions and removed for that reason. It does not affect constraint triggers at all. Given I'm guessing it's going to be running about 7000 * 67 queries to check the validity of the delete for 7000 rows each having 67 foreign keys, I'm not sure there's much to do other than hack around the issue right now. If you're a superuser, you could temporarily hack reltriggers on the table's pg_class row to 0, run the delete and then set it back to the correct number. I'm guessing from your message that there's never any chance of a concurrent transaction putting in a matching row in a way that something is marked as deletable when it isn't? -- Chris Kratz Systems Analyst/Programmer VistaShare LLC www.vistashare.com ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PERFORM] Delete performance on delete from table with inherited tables
Hello all, I have a performance issue that I cannot seem to solve and am hoping that someone might be able to make some suggestions. First some background information. We are using PostgreSQL 7.3.4 on Linux with kernel 2.4.19. The box is a single P4 2.4Ghz proc with 1G ram and uw scsi drives in a hardware raid setup. We have a transactioninfo table with about 163k records. psql describes the table as: \d transactioninfo Table public.transactioninfo Column | Type | Modifiers ---+--+ transactionid | integer | not null default nextval('transaction_sequence'::text) userid| integer | programid | integer | time | timestamp with time zone | comment | text | undoable | boolean | del | boolean | Indexes: transactioninfo_pkey primary key btree (transactionid), delidx btree (del), transactioninfo_date btree (time, programid, userid) Triggers: RI_ConstraintTrigger_6672989, RI_ConstraintTrigger_6672990, RI_ConstraintTrigger_6672992, --snip-- --snip-- RI_ConstraintTrigger_6673121, RI_ConstraintTrigger_6673122 There are about 67 inherited tables that inherit the fields from this table, hence the 134 constraint triggers. There is a related table transactionlog which has a fk(foreign key) to transactioninfo. It contains about 600k records. There are 67 hist_tablename tables, each with a different structure. Then an additional 67 tables called hist_tablename_log which inherit from the transactionlog table and appropriate hist_tablename table. By the automagic of inheritance, since the transactionlog has a fk to transactioninfo, each of the hist_tablename_log tables does as well (if I am reading the pg_trigger table correctly). Once a day we run a sql select statement to clear out all records in transactioninfo that don't have a matching record in transactionlog. We accumulate between 5k-10k records a day that need clearing from transactioninfo. That clear ran this morning for 5 hours and 45 minutes. Today I am working on streamlining the sql to try and get the delete down to a manageable time frame. The original delete statement was quite inefficent. So, far, I've found that it appears to be much faster to break the task into two pieces. The first is to update a flag on transactioninfo to mark empty transactions and then a followup delete which clears based on that flag. The update takes about a minute or so. update only transactioninfo set del=TRUE where not exists (select transactionid from transactionlog l where l.transactionid=transactioninfo.transactionid); UPDATE 6911 Time: 59763.26 ms Now if I delete a single transactioninfo record found by selecting del=true limit 1 I get explain analyze delete from only transactioninfo where transactionid=734607; QUERY PLAN Index Scan using transactioninfo_pkey on transactioninfo (cost=0.00..6.01 rows=1 width=6) (actual time=0.18..0.18 rows=1 loops=1) Index Cond: (transactionid = 734607) Total runtime: 0.41 msec (3 rows) Time: 855.08 ms With the 7000 records to delete and a delete time of 0.855s, we are looking at 1.5hrs to do the clear which is a great improvement from the 6 hours we have been seeing. But it still seems like it should run faster. The actual clear statement used in the clear is as follows: explain delete from transactioninfo where del=true; QUERY PLAN -- Seq Scan on transactioninfo (cost=0.00..6177.21 rows=78528 width=6) Filter: (del = true) (2 rows) Another interesting observation is that the raid subsystem shows very low activity during the clear. The backend process is almost entirely cpu bound. Some of the documentation implies that inherited tables cause deletes to be very slow on the parent table, so I did the following experiment. vistashare=# create table transactioninfo_copy as select * from transactioninfo; SELECT Time: 6876.88 ms vistashare=# create index transinfo_copy_del_idx on transactioninfo_copy(del); CREATE INDEX Time: 446.20 ms vistashare=# delete from transactioninfo_copy where del=true; DELETE 6904 Time: 202.33 ms Which certainly points to the triggers being the culprit. In reading the documentation, it seems like the delete from only... statement should ignore the constraint triggers. But it seems quite obvious from the experiments that it is not. Also, the fact that the query plan