I am using FTS5 for pretty complex search in my application, but recently, 
trying to make it even more complex I faced some problems that are more general 
than only FTS5.

I have a forum engine where are several tables for the threads, for the posts, 
for the users etc. At first I want to be able to search in the posts text, but 
moreover, this search have to be limiter to some subset of the posts, for 
example in the posts of a particular thread or posts of some user. Also, there 
are cases where free-text search is not actually necessary, for example when I 
am searching for all posts from a particular user.

At first, I tried to create a FTS5 table, containing only the text data that 
need to be searched and then to access it by queries of the type:

      left join posts p on p.id = fts.rowid
      left join threads t on t.id = p.threadid
      left join users u on u.id = p.userid
      fts match ?1 and u.nick = ?2 and t.id = ?3
    order by ORDER

Such queries are pretty fast when there is only fts match directive in the 
where clause. 
But any additional condition added ruins the performance, especially if the fts 
match returns big amount of matches.

Additional problem is the order by clause. If the ORDER BY term is "rank" 
everything works great, but changing it to
other field (for example the post time in order to get first most recent posts) 
causes huge slow down of the query.

My second attempt was to sacrifice space for speed and to put all searchable 
data in the fts table - post text, the thread titles and the usernames. This 
way, building complex fts queries kind of:

   (content: ?1 OR caption: ?2) AND thread: ?3 AND user: ?4

I can leave only the fts query in the WHERE clause. This way, the search is 
pretty fast, but the huge problem remains
the ORDER BY clause. Again everything works fine with "rank", but attempts to 
use any other field for sorting, causes
huge probems: slow downs up to tens of seconds (usual search time is few 
milliseconds) and out of memory errors. 

Such problems with this second approach are even more serious than on the first 
approach. i.e. with the second approach everything works fine and quick with 
"rank" order by, and very, very slow and with errors, on any other "order by" 

So, he main question follows:

What is the right way to design such complex search systems, based on FTS? How 
to properly approach the sorting of the search results in order to not have so 
big slowdowns and out of memory errors.

Any tips are highly welcome!

John Found <johnfo...@asm32.info>
sqlite-users mailing list

Reply via email to