Re: [sqlite] EXT : speeding up FTS4

2011-09-28 Thread Simon Slavin

On 28 Sep 2011, at 3:52pm, Petite Abeille wrote:

> On Sep 28, 2011, at 4:48 PM, Puneet Kishor wrote:
> 
>> But, if I understood [http://www.sqlite.org/datatype3.html] correctly, there 
>> really is no such thing as DATETIME value. Internally, it is stored as TEXT 
>> anyway.
> 
> Or as a number. Your choice:
> 
>   • TEXT as ISO8601 strings ("-MM-DD HH:MM:SS.SSS").
>   • REAL as Julian day numbers, the number of days since noon in 
> Greenwich on November 24, 4714 B.C. according to the proleptic Gregorian 
> calendar.
>   • INTEGER as Unix Time, the number of seconds since 1970-01-01 00:00:00 
> UTC.
> 
> Considering the amount of data you have, perhaps something like a unix time 
> would be more, hmmm, more frugal.

Of course, if he is consistent for any of these he can just replace

>   WHERE Datetime(u.downloaded_on) >= Datetime(p.project_start) 

with

  WHERE u.downloaded_on >= p.project_start

and make indexes which include the columns.  No conversion needed.  The only 
reason to need conversion is if the source data is in one format in some rows 
and another format in other rows.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] EXT : speeding up FTS4

2011-09-28 Thread Simon Slavin

On 28 Sep 2011, at 3:48pm, Puneet Kishor wrote:

> Could I? Sure, if I had known better. Should I? I would be happy to create a 
> new column, convert the values to julian days, and try that, but on a 27 GB 
> db, that would take a bit of a while.

You only have to do it once, you can do it overnight, and you can pick which 
night you do it.  You can store them as INTEGER or REAL, which is far faster to 
search than TEXT.  And the alternative is to do some extra work every time 
someone uses that SELECT or anything like it.  It also means you can usefully 
put those columns in an INDEX.  Speeding up SELECTs is what INDEXes are all 
about.

What you have done is the equivalent of collecting all the knowledge of the 
world and putting it unsorted in a huge warehouse.  Every time anyone wants 
something they have to wade through, on average, half the warehouse before they 
find it.

> But, if I understood [http://www.sqlite.org/datatype3.html] correctly, there 
> really is no such thing as DATETIME value. Internally, it is stored as TEXT 
> anyway.

If you want to see what value is actually being stored just SELECT it without 
converting to Datatype and see what you get.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] EXT : speeding up FTS4

2011-09-28 Thread Petite Abeille

On Sep 28, 2011, at 4:48 PM, Puneet Kishor wrote:

> But, if I understood [http://www.sqlite.org/datatype3.html] correctly, there 
> really is no such thing as DATETIME value. Internally, it is stored as TEXT 
> anyway.

Or as a number. Your choice:

• TEXT as ISO8601 strings ("-MM-DD HH:MM:SS.SSS").
• REAL as Julian day numbers, the number of days since noon in 
Greenwich on November 24, 4714 B.C. according to the proleptic Gregorian 
calendar.
• INTEGER as Unix Time, the number of seconds since 1970-01-01 00:00:00 
UTC.

Considering the amount of data you have, perhaps something like a unix time 
would be more, hmmm, more frugal.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] EXT : speeding up FTS4

2011-09-28 Thread Puneet Kishor

On Sep 28, 2011, at 9:44 AM, Simon Slavin wrote:

> 
> On 28 Sep 2011, at 3:41pm, Puneet Kishor wrote:
> 
>>   WHERE Datetime(u.downloaded_on) >= Datetime(p.project_start) 
> 
> Why are you doing 'Datetime' here ?  Not only does the conversion take time, 
> but it means you can't usefully index either of those two columns.
> 
> Can you instead store your stamps in a format which is readily sortable ?  
> Either in text form or as julian days.


Could I? Sure, if I had known better. Should I? I would be happy to create a 
new column, convert the values to julian days, and try that, but on a 27 GB db, 
that would take a bit of a while.

But, if I understood [http://www.sqlite.org/datatype3.html] correctly, there 
really is no such thing as DATETIME value. Internally, it is stored as TEXT 
anyway.


> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] EXT : speeding up FTS4

2011-09-28 Thread Simon Slavin

On 28 Sep 2011, at 3:41pm, Puneet Kishor wrote:

>WHERE Datetime(u.downloaded_on) >= Datetime(p.project_start) 

Why are you doing 'Datetime' here ?  Not only does the conversion take time, 
but it means you can't usefully index either of those two columns.

Can you instead store your stamps in a format which is readily sortable ?  
Either in text form or as julian days.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] EXT : speeding up FTS4

2011-09-28 Thread Puneet Kishor

On Sep 28, 2011, at 8:02 AM, Black, Michael (IS) wrote:

> Have you done "ANALYZE"?  That might help.
> 
> Also...try to arrange your joins based on record count (both high-to-low and 
> low-to-high) and see what difference it makes.
> 
> Since you have only one WHERE clause I'm guessing having project_ids as the 
> first join makes sense.
> 
> 


So, I ran ANALYZE. Then, with the query as is, I got

CPU Time: user 24.742481 sys 79.120486

If I flip the WHERE clause to

WHERE Datetime(u.downloaded_on) >= Datetime(p.project_start) 
AND p.project_id = 3
AND fts_uri MATCH 'education school'

I still get

CPU Time: user 24.726792 sys 79.240780



Yesterday, in desperation, I tried to pare down my fts_uri table which brought 
the file size to 17 GB, but ended up creating a "malformed disk image" whatever 
the heck that means. Thankfully I had a backup (clever boy).

So, I am now back with a 27 GB file, and a query that takes forever thereby 
locking up my web app.


> 
> 
> 
> From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
> behalf of Puneet Kishor [punk.k...@gmail.com]
> Sent: Tuesday, September 27, 2011 5:46 PM
> To: General Discussion of SQLite Database
> Subject: EXT :[sqlite] speeding up FTS4
> 
> 
> I have
> 
>CREATE TABLE uris (
>uri_id INTEGER PRIMARY KEY,
>uri TEXT,
>uri_content TEXT,
>downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP,
>feed_history_id INTEGER
>);
> 
> with 46608 rows
> 
>CREATE TABLE feed_history (
>feed_history_id INTEGER PRIMARY KEY,
>feed_id INTEGER,
>scanned_on DATETIME DEFAULT CURRENT_TIMESTAMP
>);
> 
> with 3276 rows
> 
>CREATE TABLE feeds (feed_id INTEGER PRIMARY KEY, feed_uri TEXT, project_id 
> INTEGER);
> 
> with 79 rows
> 
>CREATE TABLE projects (project_id INTEGER PRIMARY KEY, project_name TEXT);
> 
> with 3 rows
> 
>CREATE INDEX idx_uris_downloaded_on ON uris (downloaded_on);
>CREATE VIRTUAL TABLE fts_uri USING fts4 (uri_id, uri_content);
> 
> The database file is about 27 GB.
> 
> 
> The following query takes (CPU Time: user 23.952698 sys 73.188765) returning 
> 46608 rows
> 
>SELECT u.uri_id uri_id, u.uri uri, u.downloaded_on,
>Snippet(fts_uri, '', '', '', -1, 
> 64) snippet
>FROM fts_uri f
>JOIN uris u ON f.uri_id = u.uri_id
>JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id
>JOIN feeds f ON fh.feed_id = f.feed_id
>JOIN projects p ON f.project_id = p.project_id
>WHERE p.project_id = 3
>AND Datetime(u.downloaded_on) >= Datetime(p.project_start)
>AND fts_uri MATCH 'education,school'
>ORDER BY u.uri_id, downloaded_on DESC;
> 
> 
> The EXPLAIN QUERY PLAN for the above query tells me
> 
>0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
>0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
>0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) 
> (~1 rows)
>0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
>0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 
> rows)
>0|0|0|USE TEMP B-TREE FOR ORDER BY
> 
> 
> Is there anything I can do to speed this up?
> 
> --
> Puneet Kishor

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] EXT : speeding up FTS4

2011-09-28 Thread Black, Michael (IS)
Have you done "ANALYZE"?  That might help.

Also...try to arrange your joins based on record count (both high-to-low and 
low-to-high) and see what difference it makes.

Since you have only one WHERE clause I'm guessing having project_ids as the 
first join makes sense.

Michael D. Black
Senior Scientist
NG Information Systems
Advanced Analytics Directorate




From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Puneet Kishor [punk.k...@gmail.com]
Sent: Tuesday, September 27, 2011 5:46 PM
To: General Discussion of SQLite Database
Subject: EXT :[sqlite] speeding up FTS4


I have

CREATE TABLE uris (
uri_id INTEGER PRIMARY KEY,
uri TEXT,
uri_content TEXT,
downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP,
feed_history_id INTEGER
);

with 46608 rows

CREATE TABLE feed_history (
feed_history_id INTEGER PRIMARY KEY,
feed_id INTEGER,
scanned_on DATETIME DEFAULT CURRENT_TIMESTAMP
);

with 3276 rows

CREATE TABLE feeds (feed_id INTEGER PRIMARY KEY, feed_uri TEXT, project_id 
INTEGER);

with 79 rows

CREATE TABLE projects (project_id INTEGER PRIMARY KEY, project_name TEXT);

with 3 rows

CREATE INDEX idx_uris_downloaded_on ON uris (downloaded_on);
CREATE VIRTUAL TABLE fts_uri USING fts4 (uri_id, uri_content);

The database file is about 27 GB.


The following query takes (CPU Time: user 23.952698 sys 73.188765) returning 
46608 rows

SELECT u.uri_id uri_id, u.uri uri, u.downloaded_on,
Snippet(fts_uri, '', '', '', -1, 
64) snippet
FROM fts_uri f
JOIN uris u ON f.uri_id = u.uri_id
JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id
JOIN feeds f ON fh.feed_id = f.feed_id
JOIN projects p ON f.project_id = p.project_id
WHERE p.project_id = 3
AND Datetime(u.downloaded_on) >= Datetime(p.project_start)
AND fts_uri MATCH 'education,school'
ORDER BY u.uri_id, downloaded_on DESC;


The EXPLAIN QUERY PLAN for the above query tells me

0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) 
(~1 rows)
0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 
rows)
0|0|0|USE TEMP B-TREE FOR ORDER BY


Is there anything I can do to speed this up?

--
Puneet Kishor
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users