On 2011-02-10, Josh <slus...@gmail.com> wrote: > Hi > > I'm trying to do a DELETE FROM on my large table (about 800 million > rows) based on the contents of another, moderately large table (about > 110 million rows). The command I'm using is: > > DELETE FROM records WHERE id NOT IN (SELECT id FROM unique_records); > > This process ran for about two weeks before I decided to stop it -- it > was dragging down the DB server. I can understand long-running > processes, but two weeks seems a bit much even for a big table.
I find that scripting deletes of smaller numbers of records can help here, long-running queries do bad things to the efficiency of postgres. on strategy that could work for your task would be to create a temp table first: create temp table planned_deletions as select id from records except SELECT id FROM unique_records; create index badids on planned_deletions(id); the repeatedly delete from records where id in ( select id from planned_deletions limit 10000 order by id); delete from planned_deletions where id in ( select id from planned_deletions limit 10000 order by id); until there are none left. possibly pausing a few seconds between each slug if there is a heavy load on the server (that you were able to run the query for 2 weeks suggests that there may not be). > Is this the best way to approach the problem? Is there a better way? > > Some background: The server is version 8.3, running nothing but Pg. > The 'records' table has 'id' as its primary key, and one other index > on another column. The table is referenced by just about every other > table in my DB (about 15 other tables) via foreign key constraints, > which I don't want to break (which is why I'm not just recreating the > table rather than deleting rows). Most of the dependent tables have ON > DELETE CASCADE. The 'unique_records' table is a temp table I got via > something like: SELECT DISTINCT (other_column) id INTO unique_records > FROM records if you can accept the down-time I would drop the constraints (if you don't have them on file do a pg_dump --schema-only , and grep it for the ADD CONSTRIANT commands, use sed or similar to create matching DROP CONSTRAINT commands, run them) then rebuild the table then reintroduce the constraints, keep a copy of the grep output above -- ⚂⚃ 100% natural -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql