I am using FTS5 for pretty complex search in my application, but recently,
trying to make it even more complex I faced some problems that are more general
than only FTS5.
I have a forum engine where are several tables for the threads, for the posts,
for the users etc. At first I want to be able to search in the posts text, but
moreover, this search have to be limiter to some subset of the posts, for
example in the posts of a particular thread or posts of some user. Also, there
are cases where free-text search is not actually necessary, for example when I
am searching for all posts from a particular user.
At first, I tried to create a FTS5 table, containing only the text data that
need to be searched and then to access it by queries of the type:
select
some,
fields
from
fts
left join posts p on p.id = fts.rowid
left join threads t on t.id = p.threadid
left join users u on u.id = p.userid
where
fts match ?1 and u.nick = ?2 and t.id = ?3
order by ORDER
Such queries are pretty fast when there is only fts match directive in the
where clause.
But any additional condition added ruins the performance, especially if the fts
match returns big amount of matches.
Additional problem is the order by clause. If the ORDER BY term is "rank"
everything works great, but changing it to
other field (for example the post time in order to get first most recent posts)
causes huge slow down of the query.
My second attempt was to sacrifice space for speed and to put all searchable
data in the fts table - post text, the thread titles and the usernames. This
way, building complex fts queries kind of:
(content: ?1 OR caption: ?2) AND thread: ?3 AND user: ?4
I can leave only the fts query in the WHERE clause. This way, the search is
pretty fast, but the huge problem remains
the ORDER BY clause. Again everything works fine with "rank", but attempts to
use any other field for sorting, causes
huge probems: slow downs up to tens of seconds (usual search time is few
milliseconds) and out of memory errors.
Such problems with this second approach are even more serious than on the first
approach. i.e. with the second approach everything works fine and quick with
"rank" order by, and very, very slow and with errors, on any other "order by"
option.
So, he main question follows:
What is the right way to design such complex search systems, based on FTS? How
to properly approach the sorting of the search results in order to not have so
big slowdowns and out of memory errors.
Any tips are highly welcome!
Regards
--
http://fresh.flatassembler.net
http://asm32.info
John Found <[email protected]>
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users