Re: [PERFORM] Delete performance again
BTW: Have just tried clean (without any foreign keys constraints) peformance of delete from tbl where field not in (select) vs create temporary table tmp(id) as select distinct field from tbl; delete from tmp where id in (select); delete from tbl where field in (select id from tmp). both tbl and select are huge. tbl cardinality is ~5 million, select is ~1 milliion. Number of records to delete is small. select is simply select id from table2. First (simple) one could not do in a night, second did in few seconds.
Re: [PERFORM] Delete performance again
OK, I did try you proposal and correlated subselect. I have a database ~90 companies. First try was to remove randomly selected 1000 companies Uncorrelated subselect: 65899ms Correlated subselect: 97467ms using: 9605ms my way: 104979ms. (without constraints recreate) My is the worst because it is oriented on massive delete. So I thought USING would perform better, so I did try 1 companies my way: 190527ms. (without constraints recreate) using: 694144ms I was a little shocked, but I did check plans and found out that it did switch from Nested Loop to Hash Join. I did disable Hash Join, it not show Merge Join. This was also disabled and I've got 747253ms. Then I've tried combinations: Without hash join it was the best result of 402629ms, without merge join it was 1096116ms. My conclusion: Until optimizer would take into account additional actions needed (like constraints check/cascade deletes/triggers), it can not make good plan.
Re: [PERFORM] Delete performance again
Hi, Maybe you can try this syntax. I'm not sure, but it eventually perform better: delete from company_alias USING comprm where company_alias.company_id =comprm.id Cheers, Marc -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Delete performance again
Hello. I have a database with company table that have a bunch of related (delete=cascade) tables. Also it has 1-M relation to company_descr table. Once we've found that ~half of our companies do not have any description and we would like to remove them. First this I've tried was delete from company where id not in (select company_id from company_descr); I've tried to analyze command, but unlike to other RDBM I've used it did not include cascade deletes/checks into query plan. That is first problem. It was SLOW. To make it faster I've done next thing: create temporary table comprm(id) as select id from company; delete from comprm where id in (select company_id from company_descr); delete from company where id in (select id from comprm); That was much better. So the question is why postgresql can't do such a thing. But it was better only until removing dataset was small (~5% of all table). As soon as I've tried to remove 50% I've got speed problems. I've ensured I have all indexes for both ends of foreign key. I've tried to remove all cascaded entries by myself, e.g.: create temporary table comprm(id) as select id from company; delete from comprm where id in (select company_id from company_descr); delete from company_alias where company_id in (select id from comprm); ... delete from company where id in (select id from comprm); It did not help until I drop all constraints before and recreate all constraints after. Now I have it work for 15minutes, while previously it could not do in a day. Is it OK? I'd say, some (if not all) of the optimizations could be done by postgresql optimizer.
Re: [PERFORM] Delete performance again
=?ISO-8859-5?B?svbi0Nv22SDC2Nzn2OjY3Q==?= [EMAIL PROTECTED] writes: delete from company where id not in (select company_id from company_descr); I've tried to analyze command, but unlike to other RDBM I've used it did not include cascade deletes/checks into query plan. That is first problem. It was SLOW. Usually the reason for that is having forgotten to make an index on the referencing column(s) ? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Delete performance again
2008/10/2 Tom Lane [EMAIL PROTECTED] =?ISO-8859-5?B?svbi0Nv22SDC2Nzn2OjY3Q==?= [EMAIL PROTECTED] writes: delete from company where id not in (select company_id from company_descr); I've tried to analyze command, but unlike to other RDBM I've used it did not include cascade deletes/checks into query plan. That is first problem. It was SLOW. Usually the reason for that is having forgotten to make an index on the referencing column(s) ? Not at all. As you can see below in original message, simply extending the query to what should have been done by optimizer helps. I'd say optimizer always uses fixed plan not taking into account that this is massive update and id doing index lookup of children records for each parent record, while it would be much more effective to perform removal of all children records in single table scan. It's like trigger for each record instead of for each statement.