Interesting... I googled cross join and got the following definition:

"A cross join (or Cartesian Product join) will return a result table where
 each row from the first table is combined with each row from the second
 table. The number of rows in the result table is the product of the number
 of rows in each table. If the tables involved are large, this join can
 take a very long time."

So this isn't SQLite's definition of a cross join?


The schema is follows:

table1.schema

ATTACH DATABASE './table1.db' AS t1 ;

CREATE TABLE t1.table1
(
        i_id INT4,
        b_id INT4,
        d_id INT4,
        c_id INT2,
        data_in REAL,
        data_out REAL
);
CREATE INDEX t1.ix_table1_b_id ON table1( b_id );

DETACH DATABASE t1 ;

table2.schema

ATTACH DATABASE './table2.db' AS t2 ;

CREATE TABLE t2.table2
(
        d_id INT4 PRIMARY KEY,
        r_id INT2,
        m_id INT2,
        i TEXT,
        ct TEXT,
        cc TEXT,
        type TEXT,
        notes TEXT
);

DETACH DATABASE t2 ;

The actual query is

select    t1.b_id, t1.c_id, t2.r_id, t2.m_id,
          sum( t1.data_in ) as data_in,
          sum( t1.data_out ) as data_out
from      t1
join      t2
on        t2.d_id = t1.d_id and t1.b_id >= 100 and t1.b_id < 200
group by  t1.b_id, t1.c_id, t2.m_id, t2.r_id

Is this enough information?

Steve

Joe Wilson wrote:

CROSS JOIN is an SQLite-specific thing to disable table join optimization.

Please post the schema and indexes of yours tables so that this bug may be corrected when CROSS JOIN is not used.

--- Steve Green <[EMAIL PROTECTED]> wrote:


Interestingly, using

from       t1
cross join t2

fixes the problem; using this, causes indices from both tables to be used...

addr  opcode          p1          p2          p3
----  --------------  ----------  ----------  ---------------------------------
<snip>
37    OpenRead        0           4
38    SetNumColumns   0           7
39    Integer         2           0
40    OpenRead        3           6           keyinfo(1,BINARY)
41    Integer         0           0
42    OpenRead        1           2
43    SetNumColumns   1           3
44    Integer         0           0
45    OpenRead        4           3           keyinfo(1,BINARY)
<snip>

However, using

from       t2
cross join t1

does *not* cause both indices to be used.

As well, using

from t1, t2

which is suppossed to be the same as

from       t1
cross join t2

does *not* cause both indices to be used.

This is all rather puzzling; I had also thought that using "join" was equivalent
to "cross join", but this could be ignorance on my behalf...

Steve

Joe Wilson wrote:


Hard to say what's the problem if you don't post the schema of the tables and the indexes, and provide some
sample data.

Perhaps there is a bug in the join optimizer.
Try using an explicit CROSS JOIN.

select     t1.a, t1.b, t2.c, t2.d
from       t1
cross join t2
where      t1.x = t2.x and t1.a >= 100 and t1.a < 200
group by   t1.a, t1.b, t2.c, t2.d


--- Steve Green <[EMAIL PROTECTED]> wrote:



The performance is exactly the same after running analyze on both tables.

Steve

Joe Wilson wrote:



Run an ANALYZE statement on your database and your queries will be fast once again.

Perhaps SQLite should fall back to the old non-optimized join behavior when the sqlite_stat1 table is missing. This would cover 99% of the pre-SQLite3.2.3 legacy databases out there where the queries have already been hand-tuned.

--- Steve Green <[EMAIL PROTECTED]> wrote:




Hi,

I currently have a v3.2.0 database that contains two tables that I regularly
query with a join, e.g.,

select    t1.a, t1.b, t2.c, t2.d

from      t1


join      t2
on        t1.x = t2.x and t1.a >= 100 and t1.a < 200
group by  t1.a, t1.b, t2.c, t2.d

Table t1 has an index on a
Table t2 has an index on x

I've created a new v3.3.4 database with the same schema and data.  Table t1 has
about 150 million rows and t2 has about 3000 rows.  When I run the query using
v3.2.0, it returns in about 0.1 seconds.  When run using v3.3.4, it takes about
20 seconds.  For completeness, I ran the same query using v3.2.8, v3.3.2, and
v3.3.3.  v3.2.8 performed the same (roughly) as v3.2.0; v3.3.2 and v3.3.3 
performed
the same as v3.3.4 (so, it appears that the change was made going into v3.3.x).

When I ran an explain on the query in both databases, the main difference I
noticed was in the opening of the database tables; in v3.2.0, it appears that
the indices of both tables are being used...

addr  opcode          p1          p2          p3
----  --------------  ----------  ----------  ---------------------------------
<snip>
37    OpenRead        1           4
38    SetNumColumns   1           7
39    Integer         0           0
40    OpenRead        3           6           keyinfo(1,BINARY)
41    Integer         2           0
42    OpenRead        0           2
43    SetNumColumns   0           8
44    Integer         2           0
45    OpenRead        4           3           keyinfo(1,BINARY)
46    Integer         3795633     0
<snip>

In v3.3.4, it appears that the index of only one of the tables is being used...

addr  opcode          p1          p2          p3
----  --------------  ----------  ----------  ---------------------------------
<snip>
37    OpenRead        0           2
38    SetNumColumns   0           3
39    Integer         0           0
40    OpenRead        1           4
41    SetNumColumns   1           7
42    Integer         0           0
43    OpenRead        3           6           keyinfo(1,BINARY)
44    Rewind          0           78
45    Integer         3795633     0
<snip>

It's quite possible that I'm misinterpreting the explain data as I'm fairly new
with it... However, it seems to me that this might explain the huge difference
in performance.

Has anyone else seen this type of performance difference between v3.2.x and 
v3.3.x?
Any ideas on how we can improve the performance using v3.3.x (we'd really like 
to
use the most current version of the database, but the performance difference 
would
be a killer)?

Thanks,

Steve




__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com

--
Steve Green
SAVVIS
Transforming Information Technology SM

This message contains information which may be confidential and/or
privileged.  Unless you are the intended recipient (or authorized
to receive for the intended recipient), you may not read, use,
copy or disclose to anyone the message or any information contained
in the message. If you have received the message in error, please
advise the sender by reply e-mail at [EMAIL PROTECTED] and
delete the message and any attachment(s) thereto without retaining
any copies.




__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com

--
Steve Green
SAVVIS
Transforming Information Technology SM

This message contains information which may be confidential and/or
privileged.  Unless you are the intended recipient (or authorized
to receive for the intended recipient), you may not read, use,
copy or disclose to anyone the message or any information contained
in the message. If you have received the message in error, please
advise the sender by reply e-mail at [EMAIL PROTECTED] and
delete the message and any attachment(s) thereto without retaining
any copies.




__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com

--
Steve Green
SAVVIS
Transforming Information Technology SM

This message contains information which may be confidential and/or
privileged.  Unless you are the intended recipient (or authorized
to receive for the intended recipient), you may not read, use,
copy or disclose to anyone the message or any information contained
in the message. If you have received the message in error, please
advise the sender by reply e-mail at [EMAIL PROTECTED] and
delete the message and any attachment(s) thereto without retaining
any copies.

Reply via email to