Re: [sqlite] EXT : speeding up FTS4
On 28 Sep 2011, at 3:52pm, Petite Abeille wrote: > On Sep 28, 2011, at 4:48 PM, Puneet Kishor wrote: > >> But, if I understood [http://www.sqlite.org/datatype3.html] correctly, there >> really is no such thing as DATETIME value. Internally, it is stored as TEXT >> anyway. > > Or as a number. Your choice: > > • TEXT as ISO8601 strings ("-MM-DD HH:MM:SS.SSS"). > • REAL as Julian day numbers, the number of days since noon in > Greenwich on November 24, 4714 B.C. according to the proleptic Gregorian > calendar. > • INTEGER as Unix Time, the number of seconds since 1970-01-01 00:00:00 > UTC. > > Considering the amount of data you have, perhaps something like a unix time > would be more, hmmm, more frugal. Of course, if he is consistent for any of these he can just replace > WHERE Datetime(u.downloaded_on) >= Datetime(p.project_start) with WHERE u.downloaded_on >= p.project_start and make indexes which include the columns. No conversion needed. The only reason to need conversion is if the source data is in one format in some rows and another format in other rows. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] EXT : speeding up FTS4
On 28 Sep 2011, at 3:48pm, Puneet Kishor wrote: > Could I? Sure, if I had known better. Should I? I would be happy to create a > new column, convert the values to julian days, and try that, but on a 27 GB > db, that would take a bit of a while. You only have to do it once, you can do it overnight, and you can pick which night you do it. You can store them as INTEGER or REAL, which is far faster to search than TEXT. And the alternative is to do some extra work every time someone uses that SELECT or anything like it. It also means you can usefully put those columns in an INDEX. Speeding up SELECTs is what INDEXes are all about. What you have done is the equivalent of collecting all the knowledge of the world and putting it unsorted in a huge warehouse. Every time anyone wants something they have to wade through, on average, half the warehouse before they find it. > But, if I understood [http://www.sqlite.org/datatype3.html] correctly, there > really is no such thing as DATETIME value. Internally, it is stored as TEXT > anyway. If you want to see what value is actually being stored just SELECT it without converting to Datatype and see what you get. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] EXT : speeding up FTS4
On Sep 28, 2011, at 4:48 PM, Puneet Kishor wrote: > But, if I understood [http://www.sqlite.org/datatype3.html] correctly, there > really is no such thing as DATETIME value. Internally, it is stored as TEXT > anyway. Or as a number. Your choice: • TEXT as ISO8601 strings ("-MM-DD HH:MM:SS.SSS"). • REAL as Julian day numbers, the number of days since noon in Greenwich on November 24, 4714 B.C. according to the proleptic Gregorian calendar. • INTEGER as Unix Time, the number of seconds since 1970-01-01 00:00:00 UTC. Considering the amount of data you have, perhaps something like a unix time would be more, hmmm, more frugal. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] EXT : speeding up FTS4
On Sep 28, 2011, at 9:44 AM, Simon Slavin wrote: > > On 28 Sep 2011, at 3:41pm, Puneet Kishor wrote: > >> WHERE Datetime(u.downloaded_on) >= Datetime(p.project_start) > > Why are you doing 'Datetime' here ? Not only does the conversion take time, > but it means you can't usefully index either of those two columns. > > Can you instead store your stamps in a format which is readily sortable ? > Either in text form or as julian days. Could I? Sure, if I had known better. Should I? I would be happy to create a new column, convert the values to julian days, and try that, but on a 27 GB db, that would take a bit of a while. But, if I understood [http://www.sqlite.org/datatype3.html] correctly, there really is no such thing as DATETIME value. Internally, it is stored as TEXT anyway. > > Simon. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] EXT : speeding up FTS4
On 28 Sep 2011, at 3:41pm, Puneet Kishor wrote: >WHERE Datetime(u.downloaded_on) >= Datetime(p.project_start) Why are you doing 'Datetime' here ? Not only does the conversion take time, but it means you can't usefully index either of those two columns. Can you instead store your stamps in a format which is readily sortable ? Either in text form or as julian days. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] EXT : speeding up FTS4
On Sep 28, 2011, at 8:02 AM, Black, Michael (IS) wrote: > Have you done "ANALYZE"? That might help. > > Also...try to arrange your joins based on record count (both high-to-low and > low-to-high) and see what difference it makes. > > Since you have only one WHERE clause I'm guessing having project_ids as the > first join makes sense. > > So, I ran ANALYZE. Then, with the query as is, I got CPU Time: user 24.742481 sys 79.120486 If I flip the WHERE clause to WHERE Datetime(u.downloaded_on) >= Datetime(p.project_start) AND p.project_id = 3 AND fts_uri MATCH 'education school' I still get CPU Time: user 24.726792 sys 79.240780 Yesterday, in desperation, I tried to pare down my fts_uri table which brought the file size to 17 GB, but ended up creating a "malformed disk image" whatever the heck that means. Thankfully I had a backup (clever boy). So, I am now back with a 27 GB file, and a query that takes forever thereby locking up my web app. > > > > From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on > behalf of Puneet Kishor [punk.k...@gmail.com] > Sent: Tuesday, September 27, 2011 5:46 PM > To: General Discussion of SQLite Database > Subject: EXT :[sqlite] speeding up FTS4 > > > I have > >CREATE TABLE uris ( >uri_id INTEGER PRIMARY KEY, >uri TEXT, >uri_content TEXT, >downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP, >feed_history_id INTEGER >); > > with 46608 rows > >CREATE TABLE feed_history ( >feed_history_id INTEGER PRIMARY KEY, >feed_id INTEGER, >scanned_on DATETIME DEFAULT CURRENT_TIMESTAMP >); > > with 3276 rows > >CREATE TABLE feeds (feed_id INTEGER PRIMARY KEY, feed_uri TEXT, project_id > INTEGER); > > with 79 rows > >CREATE TABLE projects (project_id INTEGER PRIMARY KEY, project_name TEXT); > > with 3 rows > >CREATE INDEX idx_uris_downloaded_on ON uris (downloaded_on); >CREATE VIRTUAL TABLE fts_uri USING fts4 (uri_id, uri_content); > > The database file is about 27 GB. > > > The following query takes (CPU Time: user 23.952698 sys 73.188765) returning > 46608 rows > >SELECT u.uri_id uri_id, u.uri uri, u.downloaded_on, >Snippet(fts_uri, '', '', '', -1, > 64) snippet >FROM fts_uri f >JOIN uris u ON f.uri_id = u.uri_id >JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id >JOIN feeds f ON fh.feed_id = f.feed_id >JOIN projects p ON f.project_id = p.project_id >WHERE p.project_id = 3 >AND Datetime(u.downloaded_on) >= Datetime(p.project_start) >AND fts_uri MATCH 'education,school' >ORDER BY u.uri_id, downloaded_on DESC; > > > The EXPLAIN QUERY PLAN for the above query tells me > >0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows) >0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) >0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) > (~1 rows) >0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) >0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 > rows) >0|0|0|USE TEMP B-TREE FOR ORDER BY > > > Is there anything I can do to speed this up? > > -- > Puneet Kishor ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] EXT : speeding up FTS4
Have you done "ANALYZE"? That might help. Also...try to arrange your joins based on record count (both high-to-low and low-to-high) and see what difference it makes. Since you have only one WHERE clause I'm guessing having project_ids as the first join makes sense. Michael D. Black Senior Scientist NG Information Systems Advanced Analytics Directorate From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Puneet Kishor [punk.k...@gmail.com] Sent: Tuesday, September 27, 2011 5:46 PM To: General Discussion of SQLite Database Subject: EXT :[sqlite] speeding up FTS4 I have CREATE TABLE uris ( uri_id INTEGER PRIMARY KEY, uri TEXT, uri_content TEXT, downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP, feed_history_id INTEGER ); with 46608 rows CREATE TABLE feed_history ( feed_history_id INTEGER PRIMARY KEY, feed_id INTEGER, scanned_on DATETIME DEFAULT CURRENT_TIMESTAMP ); with 3276 rows CREATE TABLE feeds (feed_id INTEGER PRIMARY KEY, feed_uri TEXT, project_id INTEGER); with 79 rows CREATE TABLE projects (project_id INTEGER PRIMARY KEY, project_name TEXT); with 3 rows CREATE INDEX idx_uris_downloaded_on ON uris (downloaded_on); CREATE VIRTUAL TABLE fts_uri USING fts4 (uri_id, uri_content); The database file is about 27 GB. The following query takes (CPU Time: user 23.952698 sys 73.188765) returning 46608 rows SELECT u.uri_id uri_id, u.uri uri, u.downloaded_on, Snippet(fts_uri, '', '', '', -1, 64) snippet FROM fts_uri f JOIN uris u ON f.uri_id = u.uri_id JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id JOIN feeds f ON fh.feed_id = f.feed_id JOIN projects p ON f.project_id = p.project_id WHERE p.project_id = 3 AND Datetime(u.downloaded_on) >= Datetime(p.project_start) AND fts_uri MATCH 'education,school' ORDER BY u.uri_id, downloaded_on DESC; The EXPLAIN QUERY PLAN for the above query tells me 0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows) 0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|0|0|USE TEMP B-TREE FOR ORDER BY Is there anything I can do to speed this up? -- Puneet Kishor ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users