Hi List,
I have a relational table setup where I've built indexes but I'm still
seeing very slow join times on middling amounts of data. I'm guessing
I'm doing something wrong but I can't see what. (SQLite: 3.24.0)
Simplified schema as below.
The ids are 16 character hex strings. I've included the ignore_me table
only because it's relevant to the indexes.
Note: I can *guarantee* that the data inserted into `data_table` and
`ignore_me` is ordered by their respective primary keys ASC. Entries in
joining_table are ordered by one of either data_id ASC or ignored_id ASC
depending on creation method.
--==================================
-- 1.7 million items
CREATE TABLE data_table (
data_id TEXT PRIMARY KEY
NOT NULL
COLLATE NOCASE,
data_1 TEXT,
data_2 TEXT
);
-- 1.9 million items
CREATE TABLE joining_table (
data_id TEXT REFERENCES data_table (data_id)
NOT NULL
COLLATE NOCASE,
ignored_id TEXT REFERENCES ignore_me (ignored_id)
NOT NULL
COLLATE NOCASE,
misc_col_1 TEXT,
misc_col_2 TEXT
);
-- ~200,000 items
CREATE TABLE ignore_me (
ignored_id TEXT PRIMARY KEY
NOT NULL
COLLATE NOCASE
);
CREATE INDEX IF NOT EXISTS data_table__data_id__pk_idx ON data_table (
data_id
);
CREATE INDEX IF NOT EXISTS ignore_me__ignored_id__pk_idx ON ignore_me (
ignored_id
);
-- Allow quick joining from data_table to ignore_me
CREATE INDEX IF NOT EXISTS joining_table__data_ignored_id__fk_idx ON
joining_table (
data_id ASC,
ignored_id ASC
);
-- Allow quick joining from ignore_me to data_table
CREATE INDEX IF NOT EXISTS joining_table__ignored_data_id__fk_idx ON
joining_table (
ignored_id ASC,
data_id ASC
);
-- Example data:
INSERT INTO data_table (data_id) VALUES ('00196a21e8c0f9f6');
INSERT INTO data_table (data_id) VALUES ('579c57f1268c0f5c');
INSERT INTO ignore_me VALUES ('c402eb3f05d433f2');
INSERT INTO ignore_me VALUES ('d827e58953265f63');
INSERT INTO ignore_me VALUES ('ec1d2e817f55b249');
INSERT INTO joining_table (data_id, ignored_id) VALUES
('00196a21e8c0f9f6', 'c402eb3f05d433f2');
INSERT INTO joining_table (data_id, ignored_id) VALUES
('00196a21e8c0f9f6', 'd827e58953265f63');
INSERT INTO joining_table (data_id, ignored_id) VALUES
('579c57f1268c0f5c', 'ec1d2e817f55b249');
--------------------
-- Then to test the speed I'm simply doing:
SELECT
count(1)
FROM
data_table
JOIN joining_table USING (data_id);
--==================================
The query plan says it's using the indexes:
SCAN TABLE joining_table USING COVERING INDEX
joining_table__ignored_data_id__fk_idx
SEARCH TABLE data_table USING COVERING INDEX
data_table__data_id__pk_idx (data_id=?)
But it takes about 20 seconds to do that count on the full dataset.
The full EXPLAIN from the full dataset:
0 Init 0 16 0 00
1 Null 0 1 1 00
2 OpenRead 2 771875 0 k(3,NOCASE,NOCASE,) 00
3 OpenRead 3 737715 0 k(2,NOCASE,) 02
4 Rewind 2 12 2 0 00
5 Column 2 1 2 00
6 SeekGE 3 11 2 1 00
7 IdxGT 3 11 2 1 00
8 Integer 1 3 0 00
9 AggStep0 0 3 1 count(1) 01
10 Next 3 7 1 00
11 Next 2 5 0 01
12 AggFinal 1 1 0 count(1) 00
13 Copy 1 4 0 00
14 ResultRow 4 1 0 00
15 Halt 0 0 0 00
16 Transaction 0 0 77 0 01
17 Goto 0 1 0 00
Thoughts? What (probably obvious) thing am I missing?
Thanks,
Jonathan
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users