[sqlite] FTS5
Hi all, I found a previous FTS5 thread and, encouraged by the comments of Dan Kennedy, thought I would comment on the issue. - Smaller memory footprint and more speed is always great. Im already very impressed with the speed but even faster is even better of course. My experience is that searches that produce few hits are very fast (well under a second on a db with 10M+ records). Searches that produce many hits (tens of thousands) are much slower: several seconds, or even minutes if there are 100,000+ hits. I can live with that, but improvement on queries with many hits would be welcome. - I would probably also make tokenize=unicode61 "remove_diacritics=0" the default tokenization behaviour instead of simple, but thats a minor issue. - In my usage, the most inconvenient limitation is that the first search term cant be negative in queries (i.e. MATCH foo NOT bar is good but MATCH NOT bar foo throws an error). I would also like to have negative-only queries (MATCH NOT f oo, returning all records that dont contain foo). Negative-only queries would mostly be used in combination (INTERSECT) with a positive query on another column. I know this is probably not a common need, but one can dream. - Fuzzy matching would be useful as well, but obviously thats a major feature and introducing it might well compromise performance. - Same for in-word matching (i.e. MATCH reasonable also matching "unreasonable") - Same for advanced matching like matching 3 out of 4 search terms if there is no match with 4 out of 4, or ranking hits based on how close to each other terms occur. - For some reason, searches like SELECT * FROM ftstable WHERE col1 MATCH ? INTERSECT SELECT * FROM ftstable WHERE col2 MATCH ? run very slowly for me. Much slower than running the two queries separately. This may not be related to FTS per se, and maybe the query could be written better. - BTW, will there be full backwards compatibility? And I assume one will need to recreate (export/reimport) existing databases with FTS5 in order to enjoy the new features, right? AF Context: "Fts5 will use less memory and be faster than fts4 (I think - initial testing has been positive). It will also be smaller, as we can do without a bunch of code that is used to workaround problems inherent in the file-format. ... The most user-visible change is the addition of an API that allows users to write their own auxiliary (i.e. snippet(), rank(), offsets()) functions: http://www.sqlite.org/src/artifact/064f9bf705e59d The included snippet() and rank() functions use this API. ... Fts5 is still in the experimental stage at the moment. If anybody has any ideas for useful features, or knows of problems with FTS4 that could be fixed in FTS5, dont keep them to yourself! - Dan Kennedy" ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] ROWID-based queries slow in FTS db
Thank you for the quick fix and the info, I will wait for 3.8.9 to trickle down into DBD::SQLite. AF Dan Kennedy <danielk1...@gmail.com> írta: >On 01/30/2015 10:49 PM, Dominique Devienne wrote: >> On Fri, Jan 30, 2015 at 4:38 PM, Dan Kennedy <danielk1...@gmail.com> wrote: >> >>> On 01/29/2015 02:29 AM, farkas andras wrote: >>> >>>> [...] but searches based on ROWID are atrociously slow and hog massive >>>> amounts of memory [...] >>>> >>> Looks like range constraints on rowids were only taken into account when >>> there was also a MATCH term in the WHERE clause. Now fixed here: >>> >>>http://www.sqlite.org/src/info/85dc12625d300f >>> >>> The fix should be in 3.8.9. >> >> Just curious Dan. The tests added do not seem to check the query plans >> despite the report being about a performance issue. I only skimmed them, >> and Im unfamiliar with TCL and the exact specifics of SQLite testing, >> so I >> could well have missed them, but I do recall seen other perf tests checking >> execution plans, in addition to checking correctness. Did I miss them? > >Fair point. It would be better if there were tests to show that the >queries were being correctly optimized. > >But the change was fairly trivial, and I didnt think there was much >chance that it would fail to optimize the queries correctly. Also, its >a pretty obscure optimization (one complaint in how many years?), so I >figured it wasnt all that important. Finally its fiddly to test in >this case, as the EXPLAIN QUERY PLAN (or even EXPLAIN) output is not >sufficient to figure out if its working properly or not. So I just >checked by hand that the optimization is working. > >On the other hand, that the change could contain some bug related to >integer overflow or some other boundary condition is a real risk. So the >tests focus on that. > >Dan. > > >___ >sqlite-users mailing list >sqlite-users@sqlite.org >http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] ROWID-based queries slow in FTS db
Hi all, Im using FTS through DBD::SQLite (perl) to query large text databases (~10GB, ~10 million records). The regular FTS MATCH searches work fine (they usually run under a second), but searches based on ROWID are atrociously slow and hog massive amounts of memory. Im trying to retrieve a couple of adjacent rows like so: my $q_c = $dbh->prepare( "SELECT * FROM ftstable WHERE (ROWID BETWEEN 1000 AND 1040)" ); # my $q_c = $dbh->prepare( "SELECT * FROM ftstable LIMIT 1040 OFFSET 1000" ); # tried this too, it isnt any better $q_c->execute(); The execute takes several minutes and uses ~600 MB of memory. Now, http://www.sqlite.org/changes.html writes that: 3.8.1: FTS4 queries are better able to make use of docid<$limit constraints to limit the amount of I/O required Theres also this thread, indicating that rowid searches on FTS databases are optimized: http://sqlite.1065341.n5.nabble.com/FTS-full-text-query-vs-query-by-rowid-td77534.html I was using 3.7.x so I updated DBD::SQLite and that got me up to SQLite 3.8.7, but I see no significant improvement. Explain query gives the same result as the linked thread: 0|0|0|SCAN TABLE tmdata VIRTUAL TABLE INDEX 393216. Maybe there is a better way to write the query? If not, is there any hope that this will get fixed? Thanks,András Farkas ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users