On Sep 29, 2011, at 8:52 AM, Black, Michael (IS) wrote:

> Why do you think postgres would be any better?
> 

I don't. That is why I have two options.


> Have you thought about writing your own code to process the tables rather 
> than letting the database do all the work?

Well, that kinda defeats the purpose of having a database... I want the 
database to do all the work (where "all" has a sliding scale definition).

In any case, I think (again, "think"... no scientific tests here) that FTS4 
itself may not be the problem. I can see fts searches are slow, but not that 
slow. The problem is throttling the fts searches based on criteria that are 
coming from joined tables.

So, I do have my work cut out for me. I have to go through my SQL code, really 
narrow down the query that is gumming up the works, then try to eliminate that 
or code a work around.

We'll see. The database is not that large when it comes to rows (less than 45K 
rows), but it is definitely gargantuan when it comes to size (because of all 
the nonsense I have stuffed in it). I have to figure out a workaround for that 
as well.


> 
> Load your data into memory and then slog through the uris to winnow out the 
> matches?
> Probably a LOT faster than letting the db do it.
> 
> 
> 
> Michael D. Black
> Senior Scientist
> NG Information Systems
> Advanced Analytics Directorate
> 
> 
> 
> 
> From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
> behalf of Mr. Puneet Kishor [punk.k...@gmail.com]
> Sent: Thursday, September 29, 2011 8:30 AM
> To: General Discussion of SQLite Database
> Subject: EXT :Re: [sqlite] speeding up FTS4
> 
> 
> Well, defeated by FTS4 for now, I will try the following approach --
> 
> 1. drop the fts tables and rebuild them and test.
> 
> 2. if the above doesn't work, then either migrate the data to Postgres and 
> use its fts, or implement e-Swish or httpdig for full text search.
> 
> On Sep 28, 2011, at 4:35 PM, Puneet Kishor wrote:
> 
>> 
>> On Sep 28, 2011, at 4:18 PM, Black, Michael (IS) wrote:
>> 
>>> I have no idea if this would work...but...here's some more thoughts...
>>> 
>>> 
>>> 
>>> #1 How long does this take:
>>> 
>>>   select count(*) from fts_uri match 'education school';
>>> 
>>> 
>>> 
>>> #2 Create a view on uris with just what you need and use that in your join 
>>> (I'm guessing that uri_content takes up most of your database space).
>>> 
>>> 
>>> 
>>> create view v_uris as select uri_id,feed_history_id from uri;
>>> 
>>> 
>>> ..
>> 
>> 
>> <snipped a bunch of stuff>
>> 
>> I did a query on just the fts table and got the answers relatively quickly. 
>> Not instantly, but very fast compared to all the attempts so far. So, 
>> assuming that the bottleneck is the multiple JOINs to get the data for the 
>> correct project_id, I created a temp table with all that JOIN nonsense
>> 
>>       sqlite> CREATE TEMP TABLE tmp_uris AS SELECT u.uri_id, u.uri uri, 
>> u.u_downloaded_on
>>          ...> FROM projects p
>>          ...>   JOIN feeds f ON f.project_id = p.project_id
>>          ...>   JOIN feed_history fh ON f.feed_id = fh.feed_id
>>          ...>   JOIN uris u ON fh.feed_history_id = u.feed_history_id
>>          ...> WHERE p.project_id = 3 AND u.u_downloaded_on >= 
>> p.u_project_start;
>>       CPU Time: user 16.369556 sys 81.393235
>> 
>> 
>>       sqlite> EXPLAIN QUERY PLAN SELECT u.uri_id uri_id, u.uri uri, 
>> u.u_downloaded_on,
>>          ...>   Snippet(fts_uri, '<span class="hilite">', '</span>', 
>> '&hellip;', -1, 64) snippet
>>          ...> FROM fts_uri f JOIN tmp_uris u ON f.uri_id = u.uri_id
>>          ...> WHERE fts_uri MATCH 'education school'
>>          ...> ORDER BY u.uri_id, u_downloaded_on DESC;
>>       0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
>>       0|1|1|SEARCH TABLE tmp_uris AS u USING INDEX tmp_uris_uri_id 
>> (uri_id=?) (~10 rows)
>>       0|0|0|USE TEMP B-TREE FOR ORDER BY
>>       CPU Time: user 0.000086 sys 0.000006
>> 
>> and yet
>> 
>>       sqlite> SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on,
>>          ...>   Snippet(fts_uri, '<span class="hilite">', '</span>', 
>> '&hellip;', -1, 64) snippet
>>          ...> FROM fts_uri f JOIN tmp_uris u ON f.uri_id = u.uri_id
>>          ...> WHERE fts_uri MATCH 'education school'
>>          ...> ORDER BY u.uri_id, u_downloaded_on DESC;
>> 
>>       CPU Time: user 21.871541 sys 26.414337
>> 
>> 
>> A lot better, but simply not usable for a web application.
>> 
>> --
>> Puneet Kishor
> 
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to