Re: [sqlite] finding duplicate records i.e. records with same values across 4 colums...
On 8 November 2012 11:23, e-mail mgbg25171 wrote: > I'm using SQLIte. > All columns can have duplicate values but it would be helpful to report > on those rows which are identical across all columns. > More specifically I'm looking for matching itm values where the first 3 > cols ALSO match but am not sure of the sqlite select query to do this. > > epic, yr, statement, itm > == > mcro, 2002, income, revs > mcro, 2002, income, cogs > mcro, 2002, income, sg&a > mcro, 2002, income, cogs > mcro, 2003, balance, gdwil > etc > > in the example I'm looking for a query that would return > mcro, 2002, income, cogs > mcro, 2002, income, cogs select * from t natural join ( select epic, yr, statement, itm from t group by epic, yr, statement, itm having count(*) > 1 ) t1; > > any help much appreciated > > BTW no field can be unique in the createtable statement Regards, Simon ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] finding duplicate records i.e. records with same values across 4 colums...
I'm using SQLIte. All columns can have duplicate values but it would be helpful to report on those rows which are identical across all columns. More specifically I'm looking for matching itm values where the first 3 cols ALSO match but am not sure of the sqlite select query to do this. epic, yr, statement, itm == mcro, 2002, income, revs mcro, 2002, income, cogs mcro, 2002, income, sg&a mcro, 2002, income, cogs mcro, 2003, balance, gdwil etc in the example I'm looking for a query that would return mcro, 2002, income, cogs mcro, 2002, income, cogs any help much appreciated BTW no field can be unique in the createtable statement ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Finding duplicate records
On Wed, 2005-01-26 at 17:45 -0800, Clark Christensen wrote: > Hello all, > > I'm new to SQL, and SQLite, and I find myself needing to > identify duplicate records in a SQLite table (there are > about 2K duplicates in a 36K row table). Any suggestions > or magic SQL queries appreciated :-) > I've used these queries to delete rows with the same GROUPING_FIELDS: 1) keep first inserted item: delete from TABLE where rowid not in (select min(rowid) from TABLE group by GROUPING_FIELDS); 2) keep last inserted item: delete from TABLE where rowid not in (select max(rowid) from TABLE group by GROUPING_FIELDS); they *seem* to work as expected bye, stefano
Re: [sqlite] Finding duplicate records
On Wed, 2005-01-26 at 17:45 -0800, Clark Christensen wrote: > Hello all, > > I'm new to SQL, and SQLite, and I find myself needing to > identify duplicate records in a SQLite table (there are > about 2K duplicates in a 36K row table). Any suggestions > or magic SQL queries appreciated :-) > I've used these queries to delete rows with the same GROUPING_FIELDS: 1) keep first inserted item: delete from TABLE where rowid not in (select min(rowid) from TABLE group by GROUPING_FIELDS); 2) keep last inserted item: delete from TABLE where rowid not in (select max(rowid) from TABLE group by GROUPING_FIELDS); they *seem* to work as expected bye, stefano
Re: [sqlite] Finding duplicate records
--- Brass Tilde <[EMAIL PROTECTED]> wrote: > > I'm new to SQL, and SQLite, and I find myself needing > to > > identify duplicate records in a SQLite table (there are > > about 2K duplicates in a 36K row table). Any > suggestions > > or magic SQL queries appreciated :-) > > Here are a couple of strategies I've used successfully > for finding and > potentially removing duplicate records. It assumes a > primary key field that > is *separate* from the fields being checked for > duplication. This example > also assumes that I only want to check for duplication on > two of the fields > in the table: field1 and field2. > > Assume a table such as follows, that might included > duplicate records. > > create table checkedtable > ( > priKeyinteger primary key, > field1char(10), > field2char(10), > > ); > > and a temporary table to hold the duplicates > > create temp table dups > ( > field1, > field2, > reccount, > keyvalue > ); > > The first method is probably the simplest: > > insert into dups (field1, field2, keyvalue) > select field1, field2, max(priKey) > from checkedtable > group by field1, field2 > > > I don't know if this one works on SQLite, as I've never > had reason to try > it. (I've learned a fair bit about DB design in the last > couple of years > before starting with SQLite, so duplication that needs > correction is less of > a problem now :) It does work on SQL Server 2000, > though. > > The second method is one that I used before I understood > that GROUP BY could > be used for more than counting. :-) > > Get all the records that have duplicate information and > put them into the > temp table: > > insert into dups (field1, field2, reccount) > select field1, field2, count(priKey) > from checkedtable > group by field1, field2 > having count(priKey) > 1 > > Now, get the key for one of them: > > update dups > set keyvalue = checkedtable.priKey > fromcheckedtable > where checkedtable.field1 = checkedtable.field1 > and checkedtable.field2 = checkedtable.field2 > > > > Using either of these methods, the temporary table now > contains key values > that point to one of each set of duplicated records in > the original table. > It can be used to delete them, archive them, change them, > whatever, by > joining back to the original. > > Note that the code presented here is off the top of my > head, and not tested, > so there may be syntactic errors that I didn't catch > while writing it. With > any luck, there aren't any logic errors. Hopefully, it > will give you a > starting point. > > Brad > > Basically, it worked in SQLite. Thanks for your help. -Clark
Re: [sqlite] Finding duplicate records
--- Klint Gore <[EMAIL PROTECTED]> wrote: > On Wed, 26 Jan 2005 21:25:19 -0500, "D. Richard Hipp" > <[EMAIL PROTECTED]> wrote: > > On Wed, 2005-01-26 at 17:45 -0800, Clark Christensen > wrote: > > > I'm new to SQL, and SQLite, and I find myself needing > to > > > identify duplicate records in a SQLite table (there > are > > > about 2K duplicates in a 36K row table). Any > suggestions > > > or magic SQL queries appreciated :-) > > > > > > > SELECT * FROM table EXCEPT SELECT DISTINCT * FROM > table; > > Does that actually work? > > sqlite> create table firstnames (fname varchar(5)); > sqlite> insert into firstnames values ('bob'); > sqlite> insert into firstnames values ('bob'); > sqlite> insert into firstnames values ('bill'); > sqlite> select * from firstnames except select distinct * > from firstnames; > sqlite> > sqlite> select fname from firstnames group by fname > having count(*) > 1; > bob > sqlite> > > klint. > > +---+-+ > : Klint Gore: "Non rhyming > : > : EMail : [EMAIL PROTECTED] : slang - the > : > : Snail : A.B.R.I.: possibilities > : > : Mail University of New England : are useless" > : > : Armidale NSW 2351 Australia : L.J.J. > : > : Fax : +61 2 6772 5376 : > : > +---+-+ > Many thanks to all who replied with help. As has been pointed out, the EXCEPT method wasn't successful :-( Seemed almost too good to be true. I managed to find and eliminate the duplicates using a combination of the GROUP BY method Klint suggested, and a temp table. While possibly not clear without having the schema for table DMI_SKU, here's what I finally was able to do after lots of trial and error (gotta love transactions). create temp table TEMP_DUPES (MPN text, DMI text, MDESC text, RID integer, DCOUNT integer); -- Get all duplicates that are dupes by mpn, dmi, and modeldesc into temp table begin transaction; insert into temp_dupes select ModelPartNumber MPN, dmistring DMI, modeldesc MDESC, rowid RID, count(1) DCOUNT from dmi_sku group by MPN, DMI, MDESC HAVING count(*) >1; --examine all duplicates select * from dmi_sku where modelpartnumber in (select mpn from temp_dupes); delete from dmi_sku where modelpartnumber in (select mpn from temp_dupes) and rowid not in (select rid from temp_dupes); --inspect the results again select ModelPartNumber, ModelDesc, DMIString, rowid from dmi_sku where modelpartnumber in (select mpn from temp_dupes); --and, of course... commit; Fortunately, this is a one-time build-up of a dataset for another project, so fast, elegant code wasn't necessary. Also, many thanks to DRH for an excellent product in SQLite. -Clark
Re: [sqlite] Finding duplicate records
> I'm new to SQL, and SQLite, and I find myself needing to > identify duplicate records in a SQLite table (there are > about 2K duplicates in a 36K row table). Any suggestions > or magic SQL queries appreciated :-) Here are a couple of strategies I've used successfully for finding and potentially removing duplicate records. It assumes a primary key field that is *separate* from the fields being checked for duplication. This example also assumes that I only want to check for duplication on two of the fields in the table: field1 and field2. Assume a table such as follows, that might included duplicate records. create table checkedtable ( priKeyinteger primary key, field1char(10), field2char(10), ); and a temporary table to hold the duplicates create temp table dups ( field1, field2, reccount, keyvalue ); The first method is probably the simplest: insert into dups (field1, field2, keyvalue) select field1, field2, max(priKey) from checkedtable group by field1, field2 I don't know if this one works on SQLite, as I've never had reason to try it. (I've learned a fair bit about DB design in the last couple of years before starting with SQLite, so duplication that needs correction is less of a problem now :) It does work on SQL Server 2000, though. The second method is one that I used before I understood that GROUP BY could be used for more than counting. :-) Get all the records that have duplicate information and put them into the temp table: insert into dups (field1, field2, reccount) select field1, field2, count(priKey) from checkedtable group by field1, field2 having count(priKey) > 1 Now, get the key for one of them: update dups set keyvalue = checkedtable.priKey fromcheckedtable where checkedtable.field1 = checkedtable.field1 and checkedtable.field2 = checkedtable.field2 Using either of these methods, the temporary table now contains key values that point to one of each set of duplicated records in the original table. It can be used to delete them, archive them, change them, whatever, by joining back to the original. Note that the code presented here is off the top of my head, and not tested, so there may be syntactic errors that I didn't catch while writing it. With any luck, there aren't any logic errors. Hopefully, it will give you a starting point. Brad
Re: [sqlite] Finding duplicate records
At 10:43 PM -0500 1/26/05, D. Richard Hipp wrote: You are right - my original suggestion does not work. I forgot that the EXCEPT operator does a DISTINCT on the result set of both operands. Your approach works much better. This could be fixed by updating SQLite to accept the standard SQL syntax where a "set quantifier" can be used with not only a standard query but a compound query. Meaning: SELECT ... And: SELECT ... [ ] SELECT ... Where: ::= DISTINCT|ALL ::= UNION|EXCEPT|INTERCEPT This way, users can specify what behaviour they want, and get flexability. it should be very simple to implement. Since the is optional, the default behaviour is akin to ALL for regular selects and DISTINCT for compound selects, as SQLite and other databases already do. See SQL:2003, 7.13 "" (p351) for the definition. A slightly modified version of your suggestion would then produce the desired result, a duplicates list: SELECT * FROM table EXCEPT ALL SELECT DISTINCT * FROM table; -- Darren Duncan
Re: [sqlite] Finding duplicate records
On Thu, 2005-01-27 at 14:40 +1100, Klint Gore wrote: > On Wed, 26 Jan 2005 21:25:19 -0500, "D. Richard Hipp" <[EMAIL PROTECTED]> > wrote: > > On Wed, 2005-01-26 at 17:45 -0800, Clark Christensen wrote: > > > I'm new to SQL, and SQLite, and I find myself needing to > > > identify duplicate records in a SQLite table (there are > > > about 2K duplicates in a 36K row table). Any suggestions > > > or magic SQL queries appreciated :-) > > > > > > > SELECT * FROM table EXCEPT SELECT DISTINCT * FROM table; > > Does that actually work? > > sqlite> create table firstnames (fname varchar(5)); > sqlite> insert into firstnames values ('bob'); > sqlite> insert into firstnames values ('bob'); > sqlite> insert into firstnames values ('bill'); > sqlite> select * from firstnames except select distinct * from firstnames; > sqlite> > sqlite> select fname from firstnames group by fname having count(*) > 1; > bob > sqlite> > You are right - my original suggestion does not work. I forgot that the EXCEPT operator does a DISTINCT on the result set of both operands. Your approach works much better. -- D. Richard Hipp <[EMAIL PROTECTED]>
Re: [sqlite] Finding duplicate records
On Wed, 26 Jan 2005 21:25:19 -0500, "D. Richard Hipp" <[EMAIL PROTECTED]> wrote: > On Wed, 2005-01-26 at 17:45 -0800, Clark Christensen wrote: > > I'm new to SQL, and SQLite, and I find myself needing to > > identify duplicate records in a SQLite table (there are > > about 2K duplicates in a 36K row table). Any suggestions > > or magic SQL queries appreciated :-) > > > > SELECT * FROM table EXCEPT SELECT DISTINCT * FROM table; Does that actually work? sqlite> create table firstnames (fname varchar(5)); sqlite> insert into firstnames values ('bob'); sqlite> insert into firstnames values ('bob'); sqlite> insert into firstnames values ('bill'); sqlite> select * from firstnames except select distinct * from firstnames; sqlite> sqlite> select fname from firstnames group by fname having count(*) > 1; bob sqlite> klint. +---+-+ : Klint Gore: "Non rhyming: : EMail : [EMAIL PROTECTED] : slang - the: : Snail : A.B.R.I.: possibilities : : Mail University of New England : are useless" : : Armidale NSW 2351 Australia : L.J.J. : : Fax : +61 2 6772 5376 : : +---+-+
Re: [sqlite] Finding duplicate records
On Wed, 2005-01-26 at 17:45 -0800, Clark Christensen wrote: > Hello all, > > I'm new to SQL, and SQLite, and I find myself needing to > identify duplicate records in a SQLite table (there are > about 2K duplicates in a 36K row table). Any suggestions > or magic SQL queries appreciated :-) > SELECT * FROM table EXCEPT SELECT DISTINCT * FROM table;
[sqlite] Finding duplicate records
Hello all, I'm new to SQL, and SQLite, and I find myself needing to identify duplicate records in a SQLite table (there are about 2K duplicates in a 36K row table). Any suggestions or magic SQL queries appreciated :-) Thanks! -Clark