This response would be good to put in swiki.

--Keith

******************************************************
- I'm not a professional; I just get paid to do this.

- Things I've learned about multithreaded programming:

    123...   PPArrvooottieedcc ttm  ueelvvteeirrtyyhtt
rhheiianndgge  dwi hnpi rctohhg eri aslm omscitanalgt 
 iowcbh,je engceltvo ebwrah lip,co hso srci abonlt ehb
.ee^Nr waicscee snsoetd  'aotb jtehcet -slaomcea lt'il
m^Ne from two or more threads
******************************************************

-----Original Message-----
From: D. Richard Hipp [mailto:[EMAIL PROTECTED] 
Sent: Friday, October 08, 2004 5:59 PM
To: [EMAIL PROTECTED]
Subject: Re: [sqlite] Questions about sqlite's join translation


Keith Herold wrote:
> The swiki says that making JOINs into a where clause is more 
> efficient, since sqlite translates the join condition into a where 
> clause.

When SQLite sees this:

    SELECT * FROM a JOIN b ON a.x=b.y;

It translate it into the following before compiling it:

    SELECT * FROM a, b WHERE a.x=b.y;

Neither form is more efficient that the other.  Both will generate
identical code.  (There are subtle differences on an LEFT OUTER JOIN,
but those details can be ignored when you are looking at things at a
high level, as we are.)

 > It also
> says that you make queries more effiecient by minimizing the number of

> rows returned in the FROM clause as far to the left as possible in the

> join.  Does the latter matter if you are translating everything into a

> where  clause anyway?
> 

SQLite implements joins using nested loops with the outer
loop formed by the first table in the join and the inner loop formed by
the last table in the join.  So for the example above you would have:

    For each row in a:
      For each row in b such that b.y=a.x:
        Return the row

If you reverse the order of the tables in the FROM clause like
this:

    SELECT * FROM b, a WHERE a.x=b.y;

You should get an equivalent result on output, but SQLite will implement
the query differently.  Specifically it does this:

    For each row in b:
      For each row in a such that a.x=b.y:
        Return the row

The trick is that you want to arrange the order of tables so that the
"such that" clause on the inner loop is able to use an index to jump
right to the appropriate row instead of having to do a full table scan.
Suppose, for example, that you have an index on a(x) but not on b(y).
Then if you do this:

    SELECT * FROM a, b WHERE a.x=b.y;

    For each row in a:
      For each row in b such that b.y=a.x:
        Return the row

For each row in a, you have to do a full scan of table b.  So the time
complexity will be O(N^2).  But if you reverse the order of the tables
in the FROM clause, like this:

    SELECT * FROM b, a WHERE b.y=a.x;

    For each row in b:
      For each row in a such that a.x=b.y
        Return the row

No the inner loop is able to use an index to jump directly to the rows
in a that it needs and does not need to do a full scan of the table.
The time complexity drops to O(NlogN).

So the rule should be:  For every table other than the first, make sure
there is a term in the WHERE clause (or the ON or USING clause if that
is your preference) that lets the search jump directly to the relavant
rows in that table based on the results from tables to the left.

Other database engines with more complex query optimizers will typically
attempt to reorder the tables in the FROM clause in order to give you
the best result.  SQLite is more simple-minded - it codes whatever you
tell it to code.

Before you ask, I'll point out that it makes no different whether you
say "a.x=b.y" or "b.y=a.x".  They are equivalent.  All of the following
generate the same code:

      ON a.x=b.y
      ON b.y=a.x
      WHERE a.x=b.y
      WHERE b.y=a.x

-- 
D. Richard Hipp -- [EMAIL PROTECTED] -- 704.948.4565

Reply via email to