Hi List,
I have a relational table setup where I've built indexes but I'm still seeing very slow join times on middling amounts of data. I'm guessing I'm doing something wrong but I can't see what. (SQLite: 3.24.0)

Simplified schema as below.
The ids are 16 character hex strings. I've included the ignore_me table only because it's relevant to the indexes. Note: I can *guarantee* that the data inserted into `data_table` and `ignore_me` is ordered by their respective primary keys ASC. Entries in joining_table are ordered by one of either data_id ASC or ignored_id ASC depending on creation method.

--==================================

-- 1.7 million items
CREATE TABLE data_table (
    data_id                    TEXT     PRIMARY KEY
                                        NOT NULL
                                        COLLATE NOCASE,
    data_1                        TEXT,
    data_2                         TEXT
);

-- 1.9 million items
CREATE TABLE joining_table (
    data_id         TEXT     REFERENCES data_table (data_id)
                            NOT NULL
                            COLLATE NOCASE,
    ignored_id         TEXT    REFERENCES ignore_me (ignored_id)
                            NOT NULL
                            COLLATE NOCASE,
    misc_col_1        TEXT,
    misc_col_2        TEXT
);

-- ~200,000 items
CREATE TABLE ignore_me (
    ignored_id                    TEXT     PRIMARY KEY
                                        NOT NULL
                                        COLLATE NOCASE
);

CREATE INDEX IF NOT EXISTS data_table__data_id__pk_idx ON data_table (
    data_id
);
CREATE INDEX IF NOT EXISTS ignore_me__ignored_id__pk_idx ON ignore_me (
    ignored_id
);

-- Allow quick joining from data_table to ignore_me
CREATE INDEX IF NOT EXISTS joining_table__data_ignored_id__fk_idx ON joining_table (
    data_id ASC,
    ignored_id ASC
);
-- Allow quick joining from ignore_me to data_table
CREATE INDEX IF NOT EXISTS joining_table__ignored_data_id__fk_idx ON joining_table (
    ignored_id ASC,
    data_id ASC
);

-- Example data:

INSERT INTO data_table (data_id) VALUES ('00196a21e8c0f9f6');
INSERT INTO data_table (data_id) VALUES ('579c57f1268c0f5c');

INSERT INTO ignore_me VALUES ('c402eb3f05d433f2');
INSERT INTO ignore_me VALUES ('d827e58953265f63');
INSERT INTO ignore_me VALUES ('ec1d2e817f55b249');

INSERT INTO joining_table (data_id, ignored_id) VALUES ('00196a21e8c0f9f6', 'c402eb3f05d433f2'); INSERT INTO joining_table (data_id, ignored_id) VALUES ('00196a21e8c0f9f6', 'd827e58953265f63'); INSERT INTO joining_table (data_id, ignored_id) VALUES ('579c57f1268c0f5c', 'ec1d2e817f55b249');


--------------------

-- Then to test the speed I'm simply doing:
    SELECT
        count(1)
    FROM
        data_table
        JOIN joining_table USING (data_id);

--==================================

The query plan says it's using the indexes:
    SCAN TABLE joining_table USING COVERING INDEX joining_table__ignored_data_id__fk_idx     SEARCH TABLE data_table USING COVERING INDEX data_table__data_id__pk_idx (data_id=?)

But it takes about 20 seconds to do that count on the full dataset.

The full EXPLAIN from the full dataset:

0    Init    0    16    0        00
1    Null    0    1    1        00
2    OpenRead    2    771875    0    k(3,NOCASE,NOCASE,)    00
3    OpenRead    3    737715    0    k(2,NOCASE,)    02
4    Rewind    2    12    2    0    00
5    Column    2    1    2        00
6    SeekGE    3    11    2    1    00
7    IdxGT    3    11    2    1    00
8    Integer    1    3    0        00
9    AggStep0    0    3    1    count(1)    01
10    Next    3    7    1        00
11    Next    2    5    0        01
12    AggFinal    1    1    0    count(1)    00
13    Copy    1    4    0        00
14    ResultRow    4    1    0        00
15    Halt    0    0    0        00
16    Transaction    0    0    77    0    01
17    Goto    0    1    0        00

Thoughts? What (probably obvious) thing am I missing?

Thanks,
Jonathan



_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to