>To reiterate, Keith: to get the query to execute properly, I didn't
>change
>the LEFT JOIN to an INNER JOIN!
>Nope,
>I rewrote
>
>SELECT
>FROM
>  a
>  LEFT JOIN b ON <key>
>WHERE b.c=5

This means:

Take all rows of table a
match with rows of table b on the key and where there is no match "fill" 
columns that would have come from b with a NULL
AND THEN AFTER THAT
return all rows where b.c = 5


>to

>SELECT
>FROM
>  a
>  LEFT JOIN b ON <key> AND b.c=5
>WHERE b.c IS NOT NULL

This translates to:
Take all rows of table a
match with rows of table b on the key AND where b.c = 5 and where there is no 
match "fill" columns that would have come from b with a NULL
AND THEN AFTER THAT
return only rows where b.c is not NULL

The primary difference is likely the test on b.c.  You see, to check if b.c is 
a particular value requires unpacking the row and retrieving the value.  
Checking whether b.c IS NULL does not require unpacking the data but merely 
reading the 'datatype', which should be a much faster operation.

If you wrote the query as an inner join it would be much faster since it would 
not be possible after projection for b.c to be anything other than not null.

>So I just added a redundant predicate and it runs perfectly, on
>SQLite!

>That's why I said this simple improvement can surely be taken care of
>on the
>optimizer, while the larger discussion of actually changing the outer
>join
>to an inner join or even tackling the -OR- case is for sure something
>nice
>to think of, but increasingly more complicated.

I think the query is ill-conceived when written as an outer join.  You are 
asking the query to "project" a result containing a crap-load of rows which you 
will subsequently remove from the result set.  For bitty-bases (a couple of 
hundred rows) this is irrelevant.  However, for tables containing millions of 
rows, creating a googillion of rows then culling out all of them except 4 is a 
very bad use of computer resources.  Why generate them in the first place if 
you are just going to discard them?

I have seen many a mis-designed program which works blazingly quickly on a 
"test" database that contains a dozen customers, half-a-dozen products, with 
two or three components each.  But when populated with the production data of 
1,000,000 customers, 500,000 products, and between 100 and 10,000 components 
per product frankly take "years" to run any step and are completely unuseable.

One should always design as if each table contained a googillion rows.  If it 
is fast enough then, it will certainly be efficient enough to handle your ten 
row per table database.




_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to