Re: [sqlite] [EXTERNAL] Slow joining of tables with indexes

Hick Gunter Tue, 26 Nov 2019 01:57:32 -0800

You are using text columns as primary keys and referencing them directly in 
foreign keys. This is probably not what you want, because it duplicates the 
text key. Also, with foreign keys enabled, your join is not accomplishing 
anything more than a direct select from joining_table, just with more effort 
(and circumventing the count() optimization).


SQLite uses an 64bit Integer as a rowid that uniquely identifies the row in the 
table. This is what you should be using as a foreign key, because it is twice 
as fast as using an index.

OTOH, SQLite supports WITHOUT ROWID tables, you might like to read up on those 
too

-----Ursprüngliche Nachricht-----
Von: sqlite-users [mailto:[email protected]] Im 
Auftrag von Jonathan Moules
Gesendet: Dienstag, 26. November 2019 10:25
An: SQLite mailing list <[email protected]>
Betreff: [EXTERNAL] [sqlite] Slow joining of tables with indexes

Hi List,
I have a relational table setup where I've built indexes but I'm still seeing 
very slow join times on middling amounts of data. I'm guessing I'm doing 
something wrong but I can't see what. (SQLite: 3.24.0)

Simplified schema as below.
The ids are 16 character hex strings. I've included the ignore_me table only 
because it's relevant to the indexes.
Note: I can *guarantee* that the data inserted into `data_table` and 
`ignore_me` is ordered by their respective primary keys ASC. Entries in 
joining_table are ordered by one of either data_id ASC or ignored_id ASC 
depending on creation method.

--==================================

-- 1.7 million items
CREATE TABLE data_table (
     data_id                    TEXT     PRIMARY KEY
                                         NOT NULL
                                         COLLATE NOCASE,
     data_1                        TEXT,
     data_2                         TEXT );

-- 1.9 million items
CREATE TABLE joining_table (
     data_id         TEXT     REFERENCES data_table (data_id)
                             NOT NULL
                             COLLATE NOCASE,
     ignored_id         TEXT    REFERENCES ignore_me (ignored_id)
                             NOT NULL
                             COLLATE NOCASE,
     misc_col_1        TEXT,
     misc_col_2        TEXT
);

-- ~200,000 items
CREATE TABLE ignore_me (
     ignored_id                    TEXT     PRIMARY KEY
                                         NOT NULL
                                         COLLATE NOCASE );

CREATE INDEX IF NOT EXISTS data_table__data_id__pk_idx ON data_table (
     data_id
);
CREATE INDEX IF NOT EXISTS ignore_me__ignored_id__pk_idx ON ignore_me (
     ignored_id
);

-- Allow quick joining from data_table to ignore_me CREATE INDEX IF NOT EXISTS 
joining_table__data_ignored_id__fk_idx ON joining_table (
     data_id ASC,
     ignored_id ASC
);
-- Allow quick joining from ignore_me to data_table CREATE INDEX IF NOT EXISTS 
joining_table__ignored_data_id__fk_idx ON joining_table (
     ignored_id ASC,
     data_id ASC
);

-- Example data:

INSERT INTO data_table (data_id) VALUES ('00196a21e8c0f9f6'); INSERT INTO 
data_table (data_id) VALUES ('579c57f1268c0f5c');

INSERT INTO ignore_me VALUES ('c402eb3f05d433f2'); INSERT INTO ignore_me VALUES 
('d827e58953265f63'); INSERT INTO ignore_me VALUES ('ec1d2e817f55b249');

INSERT INTO joining_table (data_id, ignored_id) VALUES ('00196a21e8c0f9f6', 
'c402eb3f05d433f2'); INSERT INTO joining_table (data_id, ignored_id) VALUES 
('00196a21e8c0f9f6', 'd827e58953265f63'); INSERT INTO joining_table (data_id, 
ignored_id) VALUES ('579c57f1268c0f5c', 'ec1d2e817f55b249');


--------------------

-- Then to test the speed I'm simply doing:
     SELECT
         count(1)
     FROM
         data_table
         JOIN joining_table USING (data_id);

--==================================

The query plan says it's using the indexes:
     SCAN TABLE joining_table USING COVERING INDEX 
joining_table__ignored_data_id__fk_idx
     SEARCH TABLE data_table USING COVERING INDEX data_table__data_id__pk_idx 
(data_id=?)

But it takes about 20 seconds to do that count on the full dataset.

The full EXPLAIN from the full dataset:

0    Init    0    16    0        00
1    Null    0    1    1        00
2    OpenRead    2    771875    0    k(3,NOCASE,NOCASE,)    00
3    OpenRead    3    737715    0    k(2,NOCASE,)    02
4    Rewind    2    12    2    0    00
5    Column    2    1    2        00
6    SeekGE    3    11    2    1    00
7    IdxGT    3    11    2    1    00
8    Integer    1    3    0        00
9    AggStep0    0    3    1    count(1)    01
10    Next    3    7    1        00
11    Next    2    5    0        01
12    AggFinal    1    1    0    count(1)    00
13    Copy    1    4    0        00
14    ResultRow    4    1    0        00
15    Halt    0    0    0        00
16    Transaction    0    0    77    0    01
17    Goto    0    1    0        00

Thoughts? What (probably obvious) thing am I missing?

Thanks,
Jonathan



_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___________________________________________
 Gunter Hick | Software Engineer | Scientific Games International GmbH | 
Klitschgasse 2-4, A-1130 Vienna | FN 157284 a, HG Wien, DVR: 0430013 | (O) +43 
1 80100 - 0

May be privileged. May be confidential. Please delete if not the addressee.
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] [EXTERNAL] Slow joining of tables with indexes

Reply via email to