Re: [sqlite] speeding up an FTS5 search with a JOIN

Puneet Kishor Sun, 08 Mar 2020 11:45:07 -0700

Update: so, after much hitting of my head against the wall of sql, I came up 
with the following – as noted above, I really have two distinct set of queries 
I can do separately like so


Q1: (SELECT t1Id FROM t1 WHERE …) AS a

Q2: (SELECT t1Id FROM vt1 WHERE vt1 MATCH ‘bar’) AS b

Then, I can do the following -

SELECT Count(*) FROM a WHERE a.t1Id IN b

Of course, in reality, I don’t do this separately but all in one go to make a 
really messy SQL but a really fast query, a couple of hundred ms as opposed to 
> 25s

You might notice that in my Q2 above I MATCHed for ‘bar’ instead of ‘foo’. That 
is because ‘bar’ returns fewer rows than ‘foo’ does. The problem remains when 
there are too many matches in the FTS query in which case the FTS query itself 
is slow, for example, with ‘foo’ which matches > 80K rows.

Now, one interesting point of comparison – the same kind of query (from the 
user point-of-view) against an ElasticSearch instance (that is, all the rows 
with ‘foo’ anywhere in the text) is very fast, in the order of sub-hundred ms. 
I realize it might be unfair comparing SQLite with ElasticSearch, but still. 
(Or, is it a fair comparison?)


> On Mar 7, 2020, at 8:59 AM, P Kishor <[email protected]> wrote:
> 
> I asked this question on Stackoverflow with not much success, and a 
> suggestion to ask it on the list. So here I am. I have two tables, t1(id, 
> t1Id, … other cols …, fullText) and a FTS5 virtual table vt1(t1Id, fullText)
> 
> ```
> sqlite> EXPLAIN QUERY PLAN
>   ...> SELECT Count(*) as num FROM t1 WHERE deleted = 0;
> 
> QUERY 
> PLAN
> --SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=?)
> 
> sqlite> SELECT Count(*) as num FROM t1 WHERE deleted = 0;
> 308498
> 
> Run Time
> : real 0.043 user 0.023668 sys 0.009005
> ```
> 
> 
> As can be see above, the actual query takes ~43ms
> 
> ```
> sqlite> EXPLAIN QUERY PLAN
>   ...> SELECT Count(*) as num FROM vt1 WHERE vt1 MATCH 'foo';
> 
> QUERY 
> PLAN
> --SCAN TABLE vt1 VIRTUAL TABLE INDEX 131073:
> 
> sqlite> SELECT Count(*) as num FROM vt1 WHERE vt1 MATCH 'foo';
> 80789
> 
> Run Time
> : real 0.047 user 0.008021 sys 0.009640
> ```
> 
> The actual query, in this case, takes ~47ms. So far so good. But the problem 
> occurs when I join the two tables
> 
> ```
> sqlite> EXPLAIN QUERY PLAN
>   ...> SELECT Count(*) as num
>   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>   ...> WHERE t1.deleted = 0 AND vt1 MATCH 'foo';
> 
> QUERY 
> PLAN
> |--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m
> --SEARCH TABLE t1 USING INDEX sqlite_autoindex_t1_1 (t1Id=?)
> 
> sqlite> SELECT Count(*) as num
>   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>   ...> WHERE t1.deleted = 0 AND vt1 MATCH 'foo';
> 80789
> 
> Run Time
> : real 26.218 user 1.396376 sys 5.413630
> ```
> 
> The answer is correct but the query takes more than 26 seconds! Of course, I 
> would like to speed up this query by several orders of magnitude, but I would 
> also like to understand why this join is causing the slowdown.
> 
> Now, the reason I have constructed a query like this is because users can add 
> further constraints for the table t1. For example, 
> 
> ```
> sqlite> SELECT Count(*) as num
>   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>   ...> WHERE t1.deleted = 0 AND 
>   ...> WHERE t1.frob = ‘bar' AND 
>   ...> WHERE t1.nob = ‘baz' AND 
>   ...> vt1 MATCH 'foo’;
> ```
> 
> Also, in every operation, for every given constraint, two queries are 
> performed, one that returns the count and the other that returns the actual 
> columns. And, finally, only a subset of the results are returned using LIMIT 
> and OFFSET but *after* a sort ORDER has been prescribed. So, in reality, the 
> last constraint above would result in the following
> 
> ```
> sqlite> SELECT Count(*) as num
>   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>   ...> WHERE t1.deleted = 0 AND 
>   ...> WHERE t1.frob = ‘bar' AND 
>   ...> WHERE t1.nob = ‘baz' AND 
>   ...> vt1 MATCH 'foo’;
> 20367
> 
> sqlite> SELECT t1.id, t1.t1Id, … other cols …, 
>   ...> snippet(vt1, 1, "<b>", "</b>", "", 50) AS context,
>   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>   ...> WHERE t1.deleted = 0 AND 
>   ...> WHERE t1.frob = ‘bar' AND 
>   ...> WHERE t1.nob = ‘baz' AND 
>   ...> vt1 MATCH ‘foo’
>   ...> ORDER BY <some t1 col>
>   ...> LIMIT 30 OFFSET <some offset calculated by pagination>;
> ```
> 
> When no t1 columns are prescribed in the constraint, the default count (shown 
> above) and default cols are returned with the FTS search
> 
> ```
> sqlite> SELECT Count(*) as num
>   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>   ...> WHERE t1.deleted = 0 
>   ...> vt1 MATCH 'foo’;
> 20367
> 
> sqlite> SELECT t1.id, t1.t1Id, … other cols …, 
>   ...> snippet(vt1, 1, "<b>", "</b>", "", 50) AS context,
>   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>   ...> WHERE t1.deleted = 0 
>   ...> vt1 MATCH ‘foo’
>   ...> ORDER BY <some t1 col>
>   ...> LIMIT 30 OFFSET 0;
> ```
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] speeding up an FTS5 search with a JOIN

Reply via email to