--- Klint Gore <[EMAIL PROTECTED]> wrote: > On Wed, 26 Jan 2005 21:25:19 -0500, "D. Richard Hipp" > <[EMAIL PROTECTED]> wrote: > > On Wed, 2005-01-26 at 17:45 -0800, Clark Christensen > wrote: > > > I'm new to SQL, and SQLite, and I find myself needing > to > > > identify duplicate records in a SQLite table (there > are > > > about 2K duplicates in a 36K row table). Any > suggestions > > > or magic SQL queries appreciated :-) > > > > > > > SELECT * FROM table EXCEPT SELECT DISTINCT * FROM > table; > > Does that actually work? > > sqlite> create table firstnames (fname varchar(5)); > sqlite> insert into firstnames values ('bob'); > sqlite> insert into firstnames values ('bob'); > sqlite> insert into firstnames values ('bill'); > sqlite> select * from firstnames except select distinct * > from firstnames; > sqlite> > sqlite> select fname from firstnames group by fname > having count(*) > 1; > bob > sqlite> > > klint. > > +---------------------------------------+-----------------+ > : Klint Gore : "Non rhyming > : > : EMail : [EMAIL PROTECTED] : slang - the > : > : Snail : A.B.R.I. : possibilities > : > : Mail University of New England : are useless" > : > : Armidale NSW 2351 Australia : L.J.J. > : > : Fax : +61 2 6772 5376 : > : > +---------------------------------------+-----------------+ >
Many thanks to all who replied with help. As has been pointed out, the EXCEPT method wasn't successful :-( Seemed almost too good to be true. I managed to find and eliminate the duplicates using a combination of the GROUP BY method Klint suggested, and a temp table. While possibly not clear without having the schema for table DMI_SKU, here's what I finally was able to do after lots of trial and error (gotta love transactions). create temp table TEMP_DUPES (MPN text, DMI text, MDESC text, RID integer, DCOUNT integer); -- Get all duplicates that are dupes by mpn, dmi, and modeldesc into temp table begin transaction; insert into temp_dupes select ModelPartNumber MPN, dmistring DMI, modeldesc MDESC, rowid RID, count(1) DCOUNT from dmi_sku group by MPN, DMI, MDESC HAVING count(*) >1; --examine all duplicates select * from dmi_sku where modelpartnumber in (select mpn from temp_dupes); delete from dmi_sku where modelpartnumber in (select mpn from temp_dupes) and rowid not in (select rid from temp_dupes); --inspect the results again select ModelPartNumber, ModelDesc, DMIString, rowid from dmi_sku where modelpartnumber in (select mpn from temp_dupes); --and, of course... commit; Fortunately, this is a one-time build-up of a dataset for another project, so fast, elegant code wasn't necessary. Also, many thanks to DRH for an excellent product in SQLite. -Clark