Re: [sqlite] speeding up FTS4

Black, Michael (IS) Wed, 28 Sep 2011 14:18:59 -0700

I have no idea if this would work...but...here's some more thoughts...



#1 How long does this take:

     select count(*) from fts_uri match 'education school';



#2 Create a view on uris with just what you need and use that in your join (I'm 
guessing that uri_content takes up most of your database space).



create view v_uris as select uri_id,feed_history_id from uri;



#3 Move your uri_content to another database and attach it.  Again I'm assuming 
that's your big data field.  That will seperate the paging needed to walk 
through uris.







Michael D. Black

Senior Scientist

NG Information Systems

Advanced Analytics Directorate



________________________________
From: [email protected] [[email protected]] on 
behalf of Puneet Kishor [[email protected]]
Sent: Wednesday, September 28, 2011 2:54 PM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] speeding up FTS4


On Sep 28, 2011, at 2:41 PM, Black, Michael (IS) wrote:

> What happens if you create an index on uris(feed_history_id)
>
>
>

Yeah, I noticed that lacking as well.


sqlite> EXPLAIN QUERY PLAN SELECT u.uri_id
   ...> FROM projects p
   ...>   JOIN feeds f ON f.project_id = p.project_id
   ...>   JOIN feed_history fh ON f.feed_id = fh.feed_id
   ...>   JOIN uris u ON fh.feed_history_id = u.feed_history_id
   ...> WHERE p.project_id = 3;
0|0|0|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|1|3|SCAN TABLE uris AS u (~46608 rows)
0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) (~1 
rows)
0|3|1|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
CPU Time: user 0.000079 sys 0.000014



sqlite> CREATE INDEX uris_feed_history_id ON uris (feed_history_id);
CPU Time: user 12.766977 sys 82.766372


sqlite> EXPLAIN QUERY PLAN SELECT u.uri_id
   ...> FROM projects p
   ...>   JOIN feeds f ON f.project_id = p.project_id
   ...>   JOIN feed_history fh ON f.feed_id = fh.feed_id
   ...>   JOIN uris u ON fh.feed_history_id = u.feed_history_id
   ...> WHERE p.project_id = 3;
0|0|0|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|1|1|SCAN TABLE feeds AS f (~7 rows)
0|2|2|SEARCH TABLE feed_history AS fh USING AUTOMATIC COVERING INDEX 
(feed_id=?) (~5 rows)
0|3|3|SEARCH TABLE uris AS u USING COVERING INDEX uris_feed_history_id 
(feed_history_id=?) (~10 rows)
CPU Time: user 0.000123 sys 0.000015


But, no joy.


SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on
FROM fts_uri f
    JOIN uris u ON f.uri_id = u.uri_id
    JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id
    JOIN feeds f ON fh.feed_id = f.feed_id
    JOIN projects p ON f.project_id = p.project_id
WHERE p.project_id = 3
    AND u.u_downloaded_on >= p.u_project_start
    AND fts_uri MATCH 'education school'
ORDER BY u.uri_id, u_downloaded_on DESC;
..
CPU Time: user 28.599581 sys 108.518205




>
>
>
> ________________________________
> From: [email protected] [[email protected]] on 
> behalf of Puneet Kishor [[email protected]]
> Sent: Wednesday, September 28, 2011 2:00 PM
> To: General Discussion of SQLite Database
> Subject: EXT :Re: [sqlite] speeding up FTS4
>
>
> On Sep 28, 2011, at 1:14 PM, Black, Michael (IS) wrote:
>
>> strftime returns a text representation.  So you didn't really change 
>> anything.
>>
>
>
> That's not true at all. I added u_downloaded_on (u_ for unixtime)
>
> CREATE TABLE uris (
>    uri_id INTEGER PRIMARY KEY,
>    uri TEXT,
>    uri_html TEXT,
>    uri_content TEXT,
>    downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP,
>    feed_history_id INTEGER,
>    u_downloaded_on INTEGER
> );
>
> sqlite> SELECT u_downloaded_on FROM uris LIMIT 5;
> 1306450769
> 1306450769
> 1306450770
> 1306450774
> 1306450776
>
>
>> You need to use juliandays() as I said.
>>
>>
>>
>> And you want a REAL number...not integer...though SQLite doesn't really care 
>> what you call it.  It's more for your own reference.
>>
>>
>>
>> You just added a bunch more strings increasing the size of your 
>> database...ergo it ran slower.
>>
>>
>
>
> If I understand correctly, the *size* of the database should not matter. Or, 
> at least not matter as much. Imagine a database with one table with only one 
> row but with so much content in that row that it is 27 GB vs. a database with 
> a few million rows, each with small amount of content but together totaling 
> 27 GB.
>
> What should matter is using the indexes correctly. In this case, my query 
> plan shows that I am hitting all the indexes. But, I think I am getting 
> closer to understanding this.
>
> I started backward with
>
> sqlite> SELECT p.project_id
>   ...> FROM projects p
>   ...> WHERE p.project_id = 3;
> 3
> CPU Time: user 0.000080 sys 0.000089
>
> sqlite> SELECT f.feed_id, p.project_id
>   ...> FROM projects p JOIN feeds f ON f.project_id = p.project_id
>   ...> WHERE p.project_id = 3;
> ..
> CPU Time: user 0.000239 sys 0.000170
>
> SELECT fh.feed_history_id, f.feed_id, p.project_id
>   ...> FROM projects p
>   ...>   JOIN feeds f ON f.project_id = p.project_id
>   ...>   JOIN feed_history fh ON f.feed_id = fh.feed_id
>   ...> WHERE p.project_id = 3;
> ..
> CPU Time: user 0.008491 sys 0.008054
>
> SELECT u.uri_id, fh.feed_history_id, f.feed_id, p.project_id
>   ...> FROM projects p
>   ...>   JOIN feeds f ON f.project_id = p.project_id
>   ...>   JOIN feed_history fh ON f.feed_id = fh.feed_id
>   ...>   JOIN uris u ON fh.feed_history_id = u.feed_history_id
>   ...> WHERE p.project_id = 3;
>
>
> BOOM! Adding that last table makes my query way too slow.
>
> Taking out the uris table and querying only the fts table
>
> sqlite> SELECT uri_id
>   ...> FROM fts_uri
>   ...> WHERE fts_uri MATCH 'education school';
>
>
> starts producing the results immediately, but gives me back way too many 
> matches. I am now getting all the matches, but I want only the matches for 
> project id = 3 or matches throttled by u_downloaded_on (both of which are 
> achievable only via a JOIN with uris table).
>
>
>>
>> Michael D. Black
>>
>> Senior Scientist
>>
>> NG Information Systems
>>
>> Advanced Analytics Directorate
>>
>>
>>
>> ________________________________
>> From: [email protected] [[email protected]] on 
>> behalf of Puneet Kishor [[email protected]]
>> Sent: Wednesday, September 28, 2011 12:44 PM
>> To: General Discussion of SQLite Database
>> Subject: EXT :Re: [sqlite] speeding up FTS4
>>
>>
>> On Sep 28, 2011, at 11:00 AM, Black, Michael (IS) wrote:
>>
>>> Your change to numeric date/time may not take a long as you think.
>>>
>>>
>>>
>>
>>
>> Took an hour and a half.
>>
>> Step 1: Alter all tables with datetime columns, converting those columns to 
>> integer;
>>
>> Step 2: Update all tables setting new datetime columns to unixtime
>>
>> UPDATE table SET new_column = strftime('%s', old_column);
>>
>> Step 3: CREATE INDEX uris_downloaded_on ON uris (u_downloaded_on);
>>
>> Step 4: Run the following query
>>
>>       SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on
>>       FROM fts_uri f
>>               JOIN uris u ON f.uri_id = u.uri_id
>>               JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id
>>               JOIN feeds f ON fh.feed_id = f.feed_id
>>               JOIN projects p ON f.project_id = p.project_id
>>       WHERE p.project_id = 3
>>               AND u.u_downloaded_on >= p.u_project_start
>>               AND fts_uri MATCH 'education school'
>>       ORDER BY u.uri_id, u_downloaded_on DESC;
>>
>> Terrible time.
>>>> CPU Time: user 27.584849 sys 115.219293
>>
>> Step 5: EXPLAIN QUERY PLAN (above SELECT query)
>>
>>       0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
>>       0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 
>> rows)
>>       0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY 
>> (rowid=?) (~1 rows)
>>       0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 
>> rows)
>>       0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) 
>> (~1 rows)
>>       0|0|0|USE TEMP B-TREE FOR ORDER BY
>>       CPU Time: user 0.000099 sys 0.000008
>>
>> Step 6: ANALYZE;
>>
>> Step 7: Run the above SELECT query again
>>
>> Terrible time.
>>>> CPU Time: user 27.703538 sys 116.684390
>>
>>
>>
>> This train is going nowhere. The times are actually worse than they were 
>> when I was using a non-text column for date time.
>>
>> By the way, have tried this on two machines -- the previous one was a 
>> dual-Xeon Xserve with 12 GB RAM. The current machine of choice is the top of 
>> the line iMac (quad core 3.4 GHz Intel Core i7) with 12 GB RAM and a 7200 
>> RPM SATA drive.
>>
>> ..
>

_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] speeding up FTS4

Reply via email to