On 3/5/2012 11:12 AM, DrYSG wrote:
I have a large table (20M records) but mostly short text fields. There are
duplicates that I am trying to remove.  I have a bigseriel index that I
unique, but duplicates in another field.

I have an 8 core, 24GB memory computer with RAID disks.

This request has been running for 70 hours (is it safe to kill it?).

Yes, it is safe.

How can I make this run faster? This is a one time processing task, but it
is taking a long time.

CREATE TABLE portal.new_metatdata AS
select distinct on (data_object.unique_id) * FROM portal.metadata;

Or something of this ilk should be faster because it only need to do a sort on data_object.unique_id and then an insert. After you have verified the results you can do:

BEGIN;
ALTER TABLE portal.metatdata rename TO portal.new_metatdata_old;
ALTER TABLE portal.new_metatdata rename TO portal.metatdata_old;
COMMIT;

Other people might have other ideas.
-Steve

DELETE FROM portal.metadata
WHERE idx NOT IN
(
     SELECT MIN(idx)
     FROM portal.metadata
     GROUP BY "data_object.unique_id"
);

CREATE TABLE
     metadata
     (
         data_object.unique_id CHARACTER(64) NOT NULL,
         size_bytes BIGINT,
         object_date TIMESTAMP(6) WITHOUT TIME ZONE,
         object_type CHARACTER VARYING(25),
         classification CHARACTER VARYING(7),
         object_source CHARACTER VARYING(50),
         object_managed_date TIMESTAMP(6) WITHOUT TIME ZONE,
         clevel INTEGER,
         fsctlh CHARACTER VARYING(50),
         oname CHARACTER VARYING(40),
         description CHARACTER VARYING(80),
         horizontal_datum CHARACTER VARYING(20),
         do_location CHARACTER VARYING(200),
         elevation_ft INTEGER,
         location USER-DEFINED,
         idx BIGINT DEFAULT nextval('portal.metadata_idx_seq'::regclass) NOT
NULL,
         bbox CHARACTER VARYING(160),
         CONSTRAINT MetaDataKey PRIMARY KEY (idx)
     )

--
View this message in context: 
http://postgis.17.n6.nabble.com/Slow-delete-of-duplicate-rows-tp4548251p4548251.html
Sent from the PostGIS - User mailing list archive at Nabble.com.
_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
postgis-users@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users

Reply via email to