I have a large table (20M records) but mostly short text fields. There are duplicates that I am trying to remove. I have a bigseriel index that I unique, but duplicates in another field.
I have an 8 core, 24GB memory computer with RAID disks. This request has been running for 70 hours (is it safe to kill it?). How can I make this run faster? This is a one time processing task, but it is taking a long time. DELETE FROM portal.metadata WHERE idx NOT IN ( SELECT MIN(idx) FROM portal.metadata GROUP BY "data_object.unique_id" ); CREATE TABLE metadata ( data_object.unique_id CHARACTER(64) NOT NULL, size_bytes BIGINT, object_date TIMESTAMP(6) WITHOUT TIME ZONE, object_type CHARACTER VARYING(25), classification CHARACTER VARYING(7), object_source CHARACTER VARYING(50), object_managed_date TIMESTAMP(6) WITHOUT TIME ZONE, clevel INTEGER, fsctlh CHARACTER VARYING(50), oname CHARACTER VARYING(40), description CHARACTER VARYING(80), horizontal_datum CHARACTER VARYING(20), do_location CHARACTER VARYING(200), elevation_ft INTEGER, location USER-DEFINED, idx BIGINT DEFAULT nextval('portal.metadata_idx_seq'::regclass) NOT NULL, bbox CHARACTER VARYING(160), CONSTRAINT MetaDataKey PRIMARY KEY (idx) ) -- View this message in context: http://postgis.17.n6.nabble.com/Slow-delete-of-duplicate-rows-tp4548251p4548251.html Sent from the PostGIS - User mailing list archive at Nabble.com. _______________________________________________ postgis-users mailing list postgis-users@postgis.refractions.net http://postgis.refractions.net/mailman/listinfo/postgis-users