On Wed, 14 Feb 2018 14:26:21 +0000
Wout Mertens <wout.mert...@gmail.com> wrote:

> I too am interested in this answer, I still have to start using fts5.
> 
> What would be interesting is to see the `EXPLAIN QUERY PLAN [query]` for
> each of your queries, so as to see what causes the slowness.
> 

It is clear what causes the slowness. For example here is one query:

select 
  PostFTS.rowid,
  PostFTS.user as UserName,
  P.userID,
  U.av_time as AVer,
  PostFTS.slug,
  strftime('%d.%m.%Y %H:%M:%S', P.postTime, 'unixepoch') as PostTime,
  P.ReadCount,
  snippet(PostFTS, 0, '*', '*', '...', 16) as Content,
  PostFTS.Caption,
  (select count() from UnreadPosts UP where UP.UserID = 2 and UP.PostID = 
PostFTS.rowid) as Unread
from
  PostFTS
  left join Posts P on P.id = PostFTS.rowid
  left join Users U on U.id = P.userID
where
  PostFTS match "user: s*"
order by SOME_CLAUSE
limit 20;

1. With SOME_CLAUSE=rank, the execution time is between 28ms and 40ms
2. With SOME_CLAUSE=P.PostTime, the execution time is approximately 500ms!
3. Without "order by" clause at all, the execution time is 1.1ms.

The respective EXPLAIN QUERY PLAN:

1. order by rank (28..40ms)
0       0       0       SCAN TABLE PostFTS VIRTUAL TABLE INDEX 327713:
0       1       1       SEARCH TABLE Posts AS P USING INTEGER PRIMARY KEY 
(rowid=?)
0       2       2       SEARCH TABLE Users AS U USING INTEGER PRIMARY KEY 
(rowid=?)

2. order by P.PostTime (500ms)
0       0       0       SCAN TABLE PostFTS VIRTUAL TABLE INDEX 327681:
0       1       1       SEARCH TABLE Posts AS P USING INTEGER PRIMARY KEY 
(rowid=?)
0       2       2       SEARCH TABLE Users AS U USING INTEGER PRIMARY KEY 
(rowid=?)
0       0       0       USE TEMP B-TREE FOR ORDER BY

3. without order by: (1.1ms)
0       0       0       SCAN TABLE PostFTS VIRTUAL TABLE INDEX 327681:
0       1       1       SEARCH TABLE Posts AS P USING INTEGER PRIMARY KEY 
(rowid=?)
0       2       2       SEARCH TABLE Users AS U USING INTEGER PRIMARY KEY 
(rowid=?)

Obviously, the slow down is because of the "USE TEMP B-TREE FOR ORDER BY". 
Order by any field
other than "rank" and "rowid" makes this query very slow.


> On Thu, Feb 8, 2018, 7:14 PM John Found, <johnfo...@asm32.info> wrote:
> 
> >
> > I am using FTS5 for pretty complex search in my application, but recently,
> > trying to make it even more complex I faced some problems that are more
> > general than only FTS5.
> >
> > I have a forum engine where are several tables for the threads, for the
> > posts, for the users etc. At first I want to be able to search in the posts
> > text, but moreover, this search have to be limiter to some subset of the
> > posts, for example in the posts of a particular thread or posts of some
> > user. Also, there are cases where free-text search is not actually
> > necessary, for example when I am searching for all posts from a particular
> > user.
> >
> > At first, I tried to create a FTS5 table, containing only the text data
> > that need to be searched and then to access it by queries of the type:
> >
> >     select
> >       some,
> >       fields
> >     from
> >       fts
> >       left join posts p on p.id = fts.rowid
> >       left join threads t on t.id = p.threadid
> >       left join users u on u.id = p.userid
> >     where
> >       fts match ?1 and u.nick = ?2 and t.id = ?3
> >     order by ORDER
> >
> > Such queries are pretty fast when there is only fts match directive in the
> > where clause.
> > But any additional condition added ruins the performance, especially if
> > the fts match returns big amount of matches.
> >
> > Additional problem is the order by clause. If the ORDER BY term is "rank"
> > everything works great, but changing it to
> > other field (for example the post time in order to get first most recent
> > posts) causes huge slow down of the query.
> >
> > My second attempt was to sacrifice space for speed and to put all
> > searchable data in the fts table - post text, the thread titles and the
> > usernames. This way, building complex fts queries kind of:
> >
> >    (content: ?1 OR caption: ?2) AND thread: ?3 AND user: ?4
> >
> > I can leave only the fts query in the WHERE clause. This way, the search
> > is pretty fast, but the huge problem remains
> > the ORDER BY clause. Again everything works fine with "rank", but attempts
> > to use any other field for sorting, causes
> > huge probems: slow downs up to tens of seconds (usual search time is few
> > milliseconds) and out of memory errors.
> >
> > Such problems with this second approach are even more serious than on the
> > first approach. i.e. with the second approach everything works fine and
> > quick with "rank" order by, and very, very slow and with errors, on any
> > other "order by" option.
> >
> > So, he main question follows:
> >
> > What is the right way to design such complex search systems, based on FTS?
> > How to properly approach the sorting of the search results in order to not
> > have so big slowdowns and out of memory errors.
> >
> > Any tips are highly welcome!
> >
> > Regards
> > --

-- 
http://fresh.flatassembler.net
http://asm32.info
John Found <johnfo...@asm32.info>
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to