Re: [SQL] identifying duplicates in table with redundancies

Oliver d'Azevedo Christina Fri, 24 Sep 2010 10:11:30 -0700

Hey,Tarlika.

I tried to reproduce your test case through a series of inserts and Itseems that the lower case "d" went unnoticed.

That explains the empty list i got as result.  My fault.  Sorry :-(
Great to hear it helped you


Best,
Oliveiros



Enviado de meu iPhone

Em 24/09/2010, às 05:12 PM, "Tarlika Elisabeth Schmitz" <[email protected]e> escreveu:

Dear Oliveiros,
Thank you for taking the time to help.

On Fri, 24 Sep 2010 11:22:21 +0100
"Oliveiros d'Azevedo Cristina" <[email protected]> wrote:

----- Original Message -----
From: "Tarlika Elisabeth Schmitz" <[email protected]>
To: <[email protected]>
Sent: Thursday, September 23, 2010 10:39 PM
Subject: [SQL] identifying duplicates in table with redundancies

[...] I want to check for duplicates:

1) multiples trainer names for same trainer id
2) multiple trainer ids for same trainer name

I cobbled together the SQL and it does the job but it seems rather
convoluted. I would like to know how I can improve it.

CREATE TABLE student (
id INTEGER NOT NULL,
name VARCHAR(256) NOT NULL,
trainer_id INTEGER,
trainer_name VARCHAR(256),
);

====
EXAMPLE DATA

22 John 1 Macdonald
23 Jane 1 MacDonald
24 Paul 1 MacDonald
25 Dick 2 Smith
26 Bill 3 Smith
27 Kate 3 Smith
====

select trainer_id, trainer_name from
(
  select distinct on (trainer_name) trainer_id, trainer_name
  from student
  where trainer_id in
  (
      select distinct on (id) id
      from
      (
          select distinct on (trainer_id,trainer_name)
   trainer_id as id,
          trainer_name as name from student
      ) as trainer
      group by trainer.id
      having count (trainer.name) > 1
  )
) as y
order by trainer_id

Howdy, Tarlika.

First, did you past correctly your query into your mail?
I am asking this because your query doesn't seem work for me, it
returns an empty list :-|

Your most nested query, [...]
returns this

1|"MacDonald"
2|"Smith"
3|"Smith"



For me, the innermost query returns:
1|"Macdonald"
1|"MacDonald"
2|"Smith"
3|"Smith"

(note the lower/uppercase "d" in MacDonald)

The whole query returns:
1|"Macdonald"
1|"MacDonald"

1) multiples trainer names for same trainer id
2) multiple trainer ids for same trainer name


To achieve 2) I would use this query

SELECT DISTINCT trainer_id,trainer_name
FROM (
SELECT trainer_name   -- The field you want to test for duplicates
FROM (
SELECT DISTINCT "trainer_id","trainer_name"
FROM student
) x

GROUP BY "trainer_name" -- the field you want to test forduplicates

HAVING (COUNT(*) > 1)
) z
NATURAL JOIN student y


I see my 2 innermost queries are the same as yours, just a bit more
wordy. I messed up at the third query, which threw up an error when I
tried to add an ORDER BY.

It will give you a list of the trainer names who have more than one
trainer ID and the respective trainer IDS.
For your particular example data result will be
2|"Smith"
3|"Smith"


Splendid! Just what I wanted.

As Smith is the only trainer with records with diferent trainer IDs.
Question : Can this be what you want?


The real table has 250000 entries and quite a few dups.

If you want to achieve 1) just substitute the trainer_name by
trainer_id on the commented places.


1) works as well now - just had to transpose id/name.





--

Best Regards,
Tarlika Elisabeth Schmitz

--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] identifying duplicates in table with redundancies

Reply via email to