[sqlite] ROWID-based queries slow in FTS db

2015-01-28 Thread farkas andras
Hi all, Im using FTS through DBD::SQLite (perl) to query large text 
databases (~10GB, ~10 million records). The regular FTS MATCH searches work 
fine (they usually run under a second), but searches based on ROWID are 
atrociously slow and hog massive amounts of memory. Im trying to retrieve 
a couple of adjacent rows like so:
my $q_c = $dbh->prepare( "SELECT * FROM ftstable WHERE (ROWID BETWEEN 1000 AND 
1040)" );
# my $q_c = $dbh->prepare( "SELECT * FROM ftstable LIMIT 1040 OFFSET 1000" ); # 
tried this too, it isnt any better
The execute takes several minutes and uses ~600 MB of memory. Now, 
http://www.sqlite.org/changes.html writes that:
3.8.1: FTS4 queries are better able to make use of docid<$limit constraints to 
limit the amount of I/O required
Theres also this thread, indicating that rowid searches on FTS databases 
are optimized: 
 I was using 3.7.x so I updated DBD::SQLite and that got me up to SQLite 3.8.7, 
but I see no significant improvement. Explain query gives the same result as 
the linked thread: 0|0|0|SCAN TABLE tmdata VIRTUAL TABLE INDEX 393216.
Maybe there is a better way to write the query? If not, is there any hope that 
this will get fixed? Thanks,András Farkas
sqlite-users mailing list

Re: [sqlite] ROWID-based queries slow in FTS db

2015-01-30 Thread farkas andras
Thank you for the quick fix and the info, I will wait for 3.8.9 to trickle down 
into DBD::SQLite. AF
Dan Kennedy <danielk1...@gmail.com> írta:
>On 01/30/2015 10:49 PM, Dominique Devienne wrote:
>> On Fri, Jan 30, 2015 at 4:38 PM, Dan Kennedy <danielk1...@gmail.com> wrote:
>>> On 01/29/2015 02:29 AM, farkas andras wrote:
>>>> [...] but searches based on ROWID are atrociously slow and hog massive
>>>> amounts of memory [...]
>>> Looks like range constraints on rowids were only taken into account when
>>> there was also a MATCH term in the WHERE clause. Now fixed here:
>>> The fix should be in 3.8.9.
>> Just curious Dan. The tests added do not seem to check the query plans
>> despite the report being about a performance issue. I only skimmed them,
>> and Im unfamiliar with TCL and the exact specifics of SQLite testing, 
>> so I
>> could well have missed them, but I do recall seen other perf tests checking
>> execution plans, in addition to checking correctness. Did I miss them?
>Fair point. It would be better if there were tests to show that the 
>queries were being correctly optimized.
>But the change was fairly trivial, and I didnt think there was much 
>chance that it would fail to optimize the queries correctly. Also, its 
>a pretty obscure optimization (one complaint in how many years?), so I 
>figured it wasnt all that important. Finally its fiddly to test in 
>this case, as the EXPLAIN QUERY PLAN (or even EXPLAIN) output is not 
>sufficient to figure out if its working properly or not. So I just 
>checked by hand that the optimization is working.
>On the other hand, that the change could contain some bug related to 
>integer overflow or some other boundary condition is a real risk. So the 
>tests focus on that.
>sqlite-users mailing list
sqlite-users mailing list

[sqlite] FTS5

2015-02-11 Thread farkas andras
Hi all, I found a previous FTS5 thread and, encouraged by the comments of Dan 
Kennedy, thought I would comment on the issue. - Smaller memory footprint and 
more speed is always great. Im already very impressed with the speed but 
even faster is even better of course. My experience is that searches that 
produce few hits are very fast (well under a second on a db with 10M+ records). 
Searches that produce many hits (tens of thousands) are much slower: several 
seconds, or even minutes if there are 100,000+ hits. I can live with that, but 
improvement on queries with many hits would be welcome. - I would probably also 
make tokenize=unicode61 "remove_diacritics=0" the default tokenization 
behaviour instead of simple, but thats a minor issue. - In my usage, the 
most inconvenient limitation is that the first  search term cant be 
negative in queries (i.e. MATCH foo NOT bar is good but MATCH NOT bar foo 
throws an error). I would also like to have negative-only queries (MATCH NOT f
 oo, returning all records that dont contain foo). Negative-only queries 
would mostly be used in combination (INTERSECT) with a positive query on 
another column. I know this is probably not a common need, but one can dream. - 
Fuzzy matching would be useful as well, but obviously thats a major 
feature and introducing it might well compromise performance.
- Same for in-word matching (i.e. MATCH reasonable also matching "unreasonable")
- Same for advanced matching like matching 3 out of 4 search terms if there is 
no match with 4 out of 4, or ranking hits based on how close to each other 
terms occur.
- For some reason, searches like SELECT * FROM ftstable WHERE col1 MATCH ? 
INTERSECT SELECT * FROM ftstable WHERE col2 MATCH ? run very slowly for me. 
Much slower than running the two queries separately. This may not be related to 
FTS per se, and maybe the query could be written better.
- BTW, will there be full backwards compatibility? And I assume one will need 
to recreate (export/reimport) existing databases with FTS5 in order to enjoy 
the new features, right? AF 
"Fts5 will use less memory and be faster than fts4 (I think - initial 
testing has been positive). It will also be smaller, as we can do 
without a bunch of code that is used to workaround problems inherent in 
the file-format. 
The most user-visible change is the addition of an API that allows users 
to write their own auxiliary (i.e. snippet(), rank(), offsets()) functions: 
The included snippet() and rank() functions use this API. 
Fts5 is still in the experimental stage at the moment. 
If anybody has any ideas for useful features, or knows of problems with 
FTS4 that could be fixed in FTS5, dont keep them to yourself! - Dan 
sqlite-users mailing list