On Wed, 2006-02-08 at 16:27 -0500, Frank Bax wrote:
At 04:10 PM 2/8/06, Ken Hill wrote:
>I need some help with a bit of SQL. I have two tables. I want to find 
>records in one table that don't match records in another table based on a 
>common column in the two tables. Both tables have a column named 'key100'. 
>I was trying something like:
>
>SELECT count(*)
>FROM table1, table2
>WHERE (table1.key100 != table2.key100);
>
>But the query is very slow and I finally just cancel it. Any help is very 
>much appreciated.


vacuum analyse table1;

vacuum analyse table2;

select count(*) from table1 full outer join table2 on 
table1.key100=table2.key100 where table1.key100 is null or table2.key100 is 
null;

If this is also slow, post output of "EXPLAIN ANALYSE SELECT ...." 


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to [EMAIL PROTECTED] so that your
       message can get through to the mailing list cleanly

OK. I added indexes on the two columns in the two tables:

CREATE INDEX key100_idex
ON ncccr9 (key100);

CREATE INDEX key100_ncccr10_idex
ON ncccr10 (key100);

Here is the analysis of the query:

csalgorithm=# EXPLAIN ANALYSE SELECT count(*)
csalgorithm-#   FROM ncccr9 LEFT JOIN ncccr10 ON ncccr9.key100 = ncccr10.key100
csalgorithm-#   WHERE ncccr10.key100 IS NULL;
                                                               QUERY PLAN
-------------------------------------------------------------------------------- --------------------------------------------------------
Aggregate  (cost=208337.59..208337.59 rows=1 width=0) (actual time=255723.212.. 255723.214 rows=1 loops=1)
   ->  Hash Left Join  (cost=99523.55..207101.41 rows=494471 width=0) (actual ti me=92326.635..255538.447 rows=38144 loops=1)
         Hash Cond: ("outer".key100 = "inner".key100)
         Filter: ("inner".key100 IS NULL)
         ->  Seq Scan on ncccr9  (cost=0.00..59360.71 rows=494471 width=104) (ac tual time=171.778..75099.734 rows=494471 loops=1)
         ->  Hash  (cost=88438.64..88438.64 rows=611564 width=104) (actual time= 91962.956..91962.956 rows=0 loops=1)
               ->  Seq Scan on ncccr10  (cost=0.00..88438.64 rows=611564 width=1 04) (actual time=11.704..76519.323 rows=611564 loops=1)
Total runtime: 255724.219 ms
(8 rows)

The result of 38,144 non-matching records seems too much:

csalgorithm=# SELECT count(*)
csalgorithm-#   FROM ncccr9 LEFT JOIN ncccr10 ON ncccr9.key100 = ncccr10.key100
csalgorithm-#   WHERE ncccr10.key100 IS NULL;
count
-------
38144
(1 row)

Maybe I need to do a RIGHT JOIN to return the count of records in table 'ncccr10' that don't match records in 'ncccr9'?

Thanks for your help. JOINS are fairly new to me.

Reply via email to