--- Brass Tilde <[EMAIL PROTECTED]> wrote: > > I'm new to SQL, and SQLite, and I find myself needing > to > > identify duplicate records in a SQLite table (there are > > about 2K duplicates in a 36K row table). Any > suggestions > > or magic SQL queries appreciated :-) > > Here are a couple of strategies I've used successfully > for finding and > potentially removing duplicate records. It assumes a > primary key field that > is *separate* from the fields being checked for > duplication. This example > also assumes that I only want to check for duplication on > two of the fields > in the table: field1 and field2. > > Assume a table such as follows, that might included > duplicate records. > > create table checkedtable > ( > priKey integer primary key, > field1 char (10), > field2 char (10), > <other possible fields> > ); > > and a temporary table to hold the duplicates > > create temp table dups > ( > field1, > field2, > reccount, > keyvalue > ); > > The first method is probably the simplest: > > insert into dups (field1, field2, keyvalue) > select field1, field2, max(priKey) > from checkedtable > group by field1, field2 > > > I don't know if this one works on SQLite, as I've never > had reason to try > it. (I've learned a fair bit about DB design in the last > couple of years > before starting with SQLite, so duplication that needs > correction is less of > a problem now :) It does work on SQL Server 2000, > though. > > The second method is one that I used before I understood > that GROUP BY could > be used for more than counting. :-) > > Get all the records that have duplicate information and > put them into the > temp table: > > insert into dups (field1, field2, reccount) > select field1, field2, count(priKey) > from checkedtable > group by field1, field2 > having count(priKey) > 1 > > Now, get the key for one of them: > > update dups > set keyvalue = checkedtable.priKey > from checkedtable > where checkedtable.field1 = checkedtable.field1 > and checkedtable.field2 = checkedtable.field2 > > > > Using either of these methods, the temporary table now > contains key values > that point to one of each set of duplicated records in > the original table. > It can be used to delete them, archive them, change them, > whatever, by > joining back to the original. > > Note that the code presented here is off the top of my > head, and not tested, > so there may be syntactic errors that I didn't catch > while writing it. With > any luck, there aren't any logic errors. Hopefully, it > will give you a > starting point. > > Brad > > Basically, it worked in SQLite. Thanks for your help.
-Clark