Re: [SQL] "large" IN/NOT IN subqueries result in query returning

John McCawley Tue, 27 Dec 2005 11:53:56 -0800

At a glance I would guess that NULL values in one or both of your tablesis the culprit. NULL values always return false.


Example:


A quick test on my database:

select count(*) FROM tbl_employee;

count
-------
 2689

select count(*) FROM tbl_employee WHERE username IS NULL;
count
-------
   35

So I have 35 null values.

create table tbl_foo (username varchar(32));

insert into tbl_foo (username) values ('poop');

select count(*) FROM tbl_employee WHERE tbl_employee.username NOT IN(select tbl_foo.username FROM tbl_foo);

count
-------
 2654

So I only get 2654 values that are NOT IN the set 'poop'...i.e. the NULLvalues are not included when I use the "NOT IN" my query. Nulls can beconfusing.


Hope this helps.

George Pavlov wrote:

The following looks like a bug to me, but please someone review and tell
me what I might be missing. Seems that past a certain result set size a
"[NOT] IN (subquery)" stops behaving as expected and returns 0 matches
even when there should be matches. No errors are returned, just faulty
data. The exact threshholds seem to depend on the data, subquery and
possibly the indexes in place. Nothing in the documentation mentions a
limit to what "IN subquery" can take so I am quite dismayed (if there is
a limit I would expect an ERROR/WARNING/something, but not wrong data).
Read below for details. This has been tried on PG 8.0.4 and 8.1.1 on
WinXP, and 8.0.4 and 8.0.5 on Linux.

I have two tables:
 t1 (id, name, a_type)
 t2 (id, name)

The t1 table is "big" (483211 rows), the name column contains
duplicates, the a_type has only two values. The t2 table is "small" (40
rows), the name values are unique. Some, but not all, t2.name values are
present in t1.name. To be precise, there are 10 t2.name values that do
not occur in t1.name (this is based on extraneous knowledge). I want to
find out which of the t2.name values are not ever used in t1.name:

select count(*) from t2 where t2.name not in (
 select t1.name from t1);
--> 0

This should return 10, instead it returns 0!!! Adding a LIMIT to the
subquery and doing some trial and error produces very interesting
results:

select count(*) from t2 where t2.name not in (

select t1.name from t1 limit 261683)--> 13

select count(*) from t2 where t2.name not in (

select t1.name from t1 limit 261684)--> 0


What is so magical about 261683? The JOIN alternative produces the
correct results no matter what:

select count(*)from t2 left join t1 using (name)where t1.name is null--> 10


This pretty much summarizes the issue. Any thoughts greatly appreciated.
Follow a few variations to show how the threshhold varies.

-- restrict to only one a_type


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: [SQL] "large" IN/NOT IN subqueries result in query returning

Reply via email to