Hi, Try the "NOT EXISTS" construct. I found it is order of magnitude faster than "NOT IN".
It should be something like (test the query before using it, i'm not sure of it :) DELETE FROM portal.metadata WHERE NOT EXISTS ( SELECT MIN(idx) FROM portal.metadata GROUP BY "data_object.unique_id" ); Nicolas On 5 March 2012 17:17, Stephen Woodbridge <wood...@swoodbridge.com> wrote: > On 3/5/2012 11:12 AM, DrYSG wrote: >> >> I have a large table (20M records) but mostly short text fields. There are >> duplicates that I am trying to remove. I have a bigseriel index that I >> unique, but duplicates in another field. >> >> I have an 8 core, 24GB memory computer with RAID disks. >> >> This request has been running for 70 hours (is it safe to kill it?). > > > Yes, it is safe. > > >> How can I make this run faster? This is a one time processing task, but it >> is taking a long time. > > > CREATE TABLE portal.new_metatdata AS > select distinct on (data_object.unique_id) * FROM portal.metadata; > > Or something of this ilk should be faster because it only need to do a sort > on data_object.unique_id and then an insert. After you have verified the > results you can do: > > BEGIN; > ALTER TABLE portal.metatdata rename TO portal.new_metatdata_old; > ALTER TABLE portal.new_metatdata rename TO portal.metatdata_old; > COMMIT; > > Other people might have other ideas. > -Steve > > >> DELETE FROM portal.metadata >> WHERE idx NOT IN >> ( >> SELECT MIN(idx) >> FROM portal.metadata >> GROUP BY "data_object.unique_id" >> ); >> >> CREATE TABLE >> metadata >> ( >> data_object.unique_id CHARACTER(64) NOT NULL, >> size_bytes BIGINT, >> object_date TIMESTAMP(6) WITHOUT TIME ZONE, >> object_type CHARACTER VARYING(25), >> classification CHARACTER VARYING(7), >> object_source CHARACTER VARYING(50), >> object_managed_date TIMESTAMP(6) WITHOUT TIME ZONE, >> clevel INTEGER, >> fsctlh CHARACTER VARYING(50), >> oname CHARACTER VARYING(40), >> description CHARACTER VARYING(80), >> horizontal_datum CHARACTER VARYING(20), >> do_location CHARACTER VARYING(200), >> elevation_ft INTEGER, >> location USER-DEFINED, >> idx BIGINT DEFAULT nextval('portal.metadata_idx_seq'::regclass) >> NOT >> NULL, >> bbox CHARACTER VARYING(160), >> CONSTRAINT MetaDataKey PRIMARY KEY (idx) >> ) >> >> -- >> View this message in context: >> http://postgis.17.n6.nabble.com/Slow-delete-of-duplicate-rows-tp4548251p4548251.html >> Sent from the PostGIS - User mailing list archive at Nabble.com. >> _______________________________________________ >> postgis-users mailing list >> postgis-users@postgis.refractions.net >> http://postgis.refractions.net/mailman/listinfo/postgis-users > > > _______________________________________________ > postgis-users mailing list > postgis-users@postgis.refractions.net > http://postgis.refractions.net/mailman/listinfo/postgis-users _______________________________________________ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users