Let's construct this example from the reverse.

The object is to avoid the sort at the end (sorting "millions" to return 100 is 
a bad tradeoff), so the B table needs to be visited in B1 order.

-> outer loop = B
-> inner loop = A
-> index B on (B1,...)

The join is on A2 = B2

->index A on (A2,...)

The WHERE clause specifies to quickly determine the subset of rows that have a 
specific value of A1

->index A on (A1,...)

To minimize the number of rows visited in A, the more specific field should 
come first

-> index A on (A1,A2,...) if card(A1) > card(A2)

Which results in

CREATE INDEX B_B1 on B (B1);
CREATE INDEX A_A1_A2 on A (A1, A2);

SELECT ... FROM B CROSS JOIN A ON (A.A1 = ? AND A.A2 = B.B2) ORDER BY B.B1 
LIMIT 100;

For reversed cardinality, only the index on A needs to switch field order.


Rowids will be faster than primary keys.

CREATE TABE C AS SELECT ( A.A1, B.B1, A.rowid AS IDA, B.rowid AS IDB FROM A 
JOIN B ON A.A2 = B.B2);
CREATE INDEX ON C (A1,B1);

SELECT .... FROM C JOIN A ON A.ROWID = C.IDA JOIN B ON B.ROWID = C.IDB WHERE 
C.A1 = ? ORDER BY C.B1 LIMIT 100;


-----Urspr?ngliche Nachricht-----
Von: Eric Grange [mailto:zarglu at gmail.com]
Gesendet: Dienstag, 03. M?rz 2015 12:10
An: General Discussion of SQLite Database
Betreff: [sqlite] Multi-table index ersatz?

Hi,

I have problem where I need a "multi-table index" ersatz, or maybe a better 
data structure :-)

The problem is as follow:

   - Table A : some fields plus fields A1 & A2
   - Table B : some fields plus fields B1 & B2

Both tables have several dozen millions of rows, and both are accessed 
independently of each others by some queries, their current structure has no 
performance issues for those queries.

However I have a new query which is like

select ...some fields of A & B...
from A join B on A.A2 = B.B2
where A.A1 = ?1
order by B.B1
limit 100


Without the limit, there can be tens of thousandths resulting rows, without the 
A1 condition, there can be millions of resulting rows.

With indexes on A & B, the performance of the above is not very good, as 
indexing A1 is not enough, and indexing B1 is not enough either, so no query 
plan is satisfying.

I can make the query instantaneous by duplicating the A1 & B1 fields in a 
dedicated C table (along with the primary keys of A & B), index that table, and 
then join back the A & B table to get the other fields.

However this results in a fairly large table of duplicated data, whose sole 
purpose is to allow the creation of a fairly large index, which gets me the 
performance.

Note that if the fields A1 & B1 are removed from their tables and kept only in 
C, this has massive performance implication on other queries running only 
against A & B, as those fields are leveraged in other composite indexes.

Is there a better way that would not involve duplicating the data?

Eric
_______________________________________________
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: hick at scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


Reply via email to