Even DB2 and Oracle will take hellishly long times to perform large scale deletes....
What I do for a database just under 300 gb in size is do deletes in groups of 10,000 So your where clause might look some like WHERE id NOT IN (SELECT id FROM unique_records fetch first 10000 rows only) DB2 has a clause of "with ur" to specify its ok to use dirty reads. I am not sure if postgres has this, been awhile. The goal is to make sure postgres allows "dirty reads". It prevents row locking... In DB2 the query would like like: DELETE FROM records WHERE id NOT IN (SELECT id FROM unique_records fetch first 10000 rows only) with ur Other tips that might enhance the performance is make sure the unique_records table is indexed... even if it has a primary key. In some cases the optimizer may choose an index to satisfy the select clause or it may do a table scan. Table scans are more costly than index scans. What's going to save you the real time is to break up your delete into chunks. All the rdbms log the transactions and each delete is a transaction. See where this is going? Some rdbms allow you to turn off "transactional logging" some don't. DB2 doesn't (( at least not without more effort than reasonably necessary )) so I write my delete queries to use chunks at a time. The most I would recommend is 100,000 records deleted at once. Play with timing and see what works for you Hope this helps -Rob -----Original Message----- From: pgsql-sql-ow...@postgresql.org [mailto:pgsql-sql-ow...@postgresql.org] On Behalf Of Josh Sent: Thursday, February 10, 2011 11:57 AM To: pgsql-sql@postgresql.org Subject: [SQL] DELETE FROM takes forever Hi I'm trying to do a DELETE FROM on my large table (about 800 million rows) based on the contents of another, moderately large table (about 110 million rows). The command I'm using is: DELETE FROM records WHERE id NOT IN (SELECT id FROM unique_records); This process ran for about two weeks before I decided to stop it -- it was dragging down the DB server. I can understand long-running processes, but two weeks seems a bit much even for a big table. Is this the best way to approach the problem? Is there a better way? Some background: The server is version 8.3, running nothing but Pg. The 'records' table has 'id' as its primary key, and one other index on another column. The table is referenced by just about every other table in my DB (about 15 other tables) via foreign key constraints, which I don't want to break (which is why I'm not just recreating the table rather than deleting rows). Most of the dependent tables have ON DELETE CASCADE. The 'unique_records' table is a temp table I got via something like: SELECT DISTINCT (other_column) id INTO unique_records FROM records Thanks very much! Josh Leder -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql PRIVILEGED AND CONFIDENTIAL This email transmission contains privileged and confidential information intended only for the use of the individual or entity named above. If the reader of the email is not the intended recipient or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any use, dissemination or copying of this email transmission is strictly prohibited by the sender. If you have received this transmission in error, please delete the email and immediately notify the sender via the email return address or mailto:postmas...@argushealth.com. Thank you. -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql