Re: [sqlite] Need some tips on using FTS5 with SQLite

2018-02-16 Thread John Found
On Wed, 14 Feb 2018 14:26:21 +
Wout Mertens  wrote:

> I too am interested in this answer, I still have to start using fts5.
> 
> What would be interesting is to see the `EXPLAIN QUERY PLAN [query]` for
> each of your queries, so as to see what causes the slowness.
> 

It is clear what causes the slowness. For example here is one query:

select 
  PostFTS.rowid,
  PostFTS.user as UserName,
  P.userID,
  U.av_time as AVer,
  PostFTS.slug,
  strftime('%d.%m.%Y %H:%M:%S', P.postTime, 'unixepoch') as PostTime,
  P.ReadCount,
  snippet(PostFTS, 0, '*', '*', '...', 16) as Content,
  PostFTS.Caption,
  (select count() from UnreadPosts UP where UP.UserID = 2 and UP.PostID = 
PostFTS.rowid) as Unread
from
  PostFTS
  left join Posts P on P.id = PostFTS.rowid
  left join Users U on U.id = P.userID
where
  PostFTS match "user: s*"
order by SOME_CLAUSE
limit 20;

1. With SOME_CLAUSE=rank, the execution time is between 28ms and 40ms
2. With SOME_CLAUSE=P.PostTime, the execution time is approximately 500ms!
3. Without "order by" clause at all, the execution time is 1.1ms.

The respective EXPLAIN QUERY PLAN:

1. order by rank (28..40ms)
0   0   0   SCAN TABLE PostFTS VIRTUAL TABLE INDEX 327713:
0   1   1   SEARCH TABLE Posts AS P USING INTEGER PRIMARY KEY 
(rowid=?)
0   2   2   SEARCH TABLE Users AS U USING INTEGER PRIMARY KEY 
(rowid=?)

2. order by P.PostTime (500ms)
0   0   0   SCAN TABLE PostFTS VIRTUAL TABLE INDEX 327681:
0   1   1   SEARCH TABLE Posts AS P USING INTEGER PRIMARY KEY 
(rowid=?)
0   2   2   SEARCH TABLE Users AS U USING INTEGER PRIMARY KEY 
(rowid=?)
0   0   0   USE TEMP B-TREE FOR ORDER BY

3. without order by: (1.1ms)
0   0   0   SCAN TABLE PostFTS VIRTUAL TABLE INDEX 327681:
0   1   1   SEARCH TABLE Posts AS P USING INTEGER PRIMARY KEY 
(rowid=?)
0   2   2   SEARCH TABLE Users AS U USING INTEGER PRIMARY KEY 
(rowid=?)

Obviously, the slow down is because of the "USE TEMP B-TREE FOR ORDER BY". 
Order by any field
other than "rank" and "rowid" makes this query very slow.


> On Thu, Feb 8, 2018, 7:14 PM John Found,  wrote:
> 
> >
> > I am using FTS5 for pretty complex search in my application, but recently,
> > trying to make it even more complex I faced some problems that are more
> > general than only FTS5.
> >
> > I have a forum engine where are several tables for the threads, for the
> > posts, for the users etc. At first I want to be able to search in the posts
> > text, but moreover, this search have to be limiter to some subset of the
> > posts, for example in the posts of a particular thread or posts of some
> > user. Also, there are cases where free-text search is not actually
> > necessary, for example when I am searching for all posts from a particular
> > user.
> >
> > At first, I tried to create a FTS5 table, containing only the text data
> > that need to be searched and then to access it by queries of the type:
> >
> > select
> >   some,
> >   fields
> > from
> >   fts
> >   left join posts p on p.id = fts.rowid
> >   left join threads t on t.id = p.threadid
> >   left join users u on u.id = p.userid
> > where
> >   fts match ?1 and u.nick = ?2 and t.id = ?3
> > order by ORDER
> >
> > Such queries are pretty fast when there is only fts match directive in the
> > where clause.
> > But any additional condition added ruins the performance, especially if
> > the fts match returns big amount of matches.
> >
> > Additional problem is the order by clause. If the ORDER BY term is "rank"
> > everything works great, but changing it to
> > other field (for example the post time in order to get first most recent
> > posts) causes huge slow down of the query.
> >
> > My second attempt was to sacrifice space for speed and to put all
> > searchable data in the fts table - post text, the thread titles and the
> > usernames. This way, building complex fts queries kind of:
> >
> >(content: ?1 OR caption: ?2) AND thread: ?3 AND user: ?4
> >
> > I can leave only the fts query in the WHERE clause. This way, the search
> > is pretty fast, but the huge problem remains
> > the ORDER BY clause. Again everything works fine with "rank", but attempts
> > to use any other field for sorting, causes
> > huge probems: slow downs up to tens of seconds (usual search time is few
> > milliseconds) and out of memory errors.
> >
> > Such problems with this second approach are even more serious than on the
> > first approach. i.e. with the second approach everything works fine and
> > quick with "rank" order by, and very, very slow and with errors, on any
> > other "order by" option.
> >
> > So, he main question follows:
> >
> > What is the right way to design such complex search systems, based on FTS?
> > How to properly approach the sorting of the search results in order to not
> > have so big slowdowns and out of memory errors.
> >
> > Any tips are high

Re: [sqlite] Need some tips on using FTS5 with SQLite

2018-02-14 Thread Wout Mertens
I too am interested in this answer, I still have to start using fts5.

What would be interesting is to see the `EXPLAIN QUERY PLAN [query]` for
each of your queries, so as to see what causes the slowness.

On Thu, Feb 8, 2018, 7:14 PM John Found,  wrote:

>
> I am using FTS5 for pretty complex search in my application, but recently,
> trying to make it even more complex I faced some problems that are more
> general than only FTS5.
>
> I have a forum engine where are several tables for the threads, for the
> posts, for the users etc. At first I want to be able to search in the posts
> text, but moreover, this search have to be limiter to some subset of the
> posts, for example in the posts of a particular thread or posts of some
> user. Also, there are cases where free-text search is not actually
> necessary, for example when I am searching for all posts from a particular
> user.
>
> At first, I tried to create a FTS5 table, containing only the text data
> that need to be searched and then to access it by queries of the type:
>
> select
>   some,
>   fields
> from
>   fts
>   left join posts p on p.id = fts.rowid
>   left join threads t on t.id = p.threadid
>   left join users u on u.id = p.userid
> where
>   fts match ?1 and u.nick = ?2 and t.id = ?3
> order by ORDER
>
> Such queries are pretty fast when there is only fts match directive in the
> where clause.
> But any additional condition added ruins the performance, especially if
> the fts match returns big amount of matches.
>
> Additional problem is the order by clause. If the ORDER BY term is "rank"
> everything works great, but changing it to
> other field (for example the post time in order to get first most recent
> posts) causes huge slow down of the query.
>
> My second attempt was to sacrifice space for speed and to put all
> searchable data in the fts table - post text, the thread titles and the
> usernames. This way, building complex fts queries kind of:
>
>(content: ?1 OR caption: ?2) AND thread: ?3 AND user: ?4
>
> I can leave only the fts query in the WHERE clause. This way, the search
> is pretty fast, but the huge problem remains
> the ORDER BY clause. Again everything works fine with "rank", but attempts
> to use any other field for sorting, causes
> huge probems: slow downs up to tens of seconds (usual search time is few
> milliseconds) and out of memory errors.
>
> Such problems with this second approach are even more serious than on the
> first approach. i.e. with the second approach everything works fine and
> quick with "rank" order by, and very, very slow and with errors, on any
> other "order by" option.
>
> So, he main question follows:
>
> What is the right way to design such complex search systems, based on FTS?
> How to properly approach the sorting of the search results in order to not
> have so big slowdowns and out of memory errors.
>
> Any tips are highly welcome!
>
> Regards
> --
> http://fresh.flatassembler.net
> http://asm32.info
> John Found 
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Need some tips on using FTS5 with SQLite

2018-02-08 Thread John Found

I am using FTS5 for pretty complex search in my application, but recently, 
trying to make it even more complex I faced some problems that are more general 
than only FTS5.

I have a forum engine where are several tables for the threads, for the posts, 
for the users etc. At first I want to be able to search in the posts text, but 
moreover, this search have to be limiter to some subset of the posts, for 
example in the posts of a particular thread or posts of some user. Also, there 
are cases where free-text search is not actually necessary, for example when I 
am searching for all posts from a particular user.

At first, I tried to create a FTS5 table, containing only the text data that 
need to be searched and then to access it by queries of the type:

select 
  some, 
  fields 
from 
  fts 
  left join posts p on p.id = fts.rowid
  left join threads t on t.id = p.threadid
  left join users u on u.id = p.userid
where
  fts match ?1 and u.nick = ?2 and t.id = ?3
order by ORDER

Such queries are pretty fast when there is only fts match directive in the 
where clause. 
But any additional condition added ruins the performance, especially if the fts 
match returns big amount of matches.

Additional problem is the order by clause. If the ORDER BY term is "rank" 
everything works great, but changing it to
other field (for example the post time in order to get first most recent posts) 
causes huge slow down of the query.

My second attempt was to sacrifice space for speed and to put all searchable 
data in the fts table - post text, the thread titles and the usernames. This 
way, building complex fts queries kind of:

   (content: ?1 OR caption: ?2) AND thread: ?3 AND user: ?4

I can leave only the fts query in the WHERE clause. This way, the search is 
pretty fast, but the huge problem remains
the ORDER BY clause. Again everything works fine with "rank", but attempts to 
use any other field for sorting, causes
huge probems: slow downs up to tens of seconds (usual search time is few 
milliseconds) and out of memory errors. 

Such problems with this second approach are even more serious than on the first 
approach. i.e. with the second approach everything works fine and quick with 
"rank" order by, and very, very slow and with errors, on any other "order by" 
option.

So, he main question follows:

What is the right way to design such complex search systems, based on FTS? How 
to properly approach the sorting of the search results in order to not have so 
big slowdowns and out of memory errors.

Any tips are highly welcome!

Regards
-- 
http://fresh.flatassembler.net
http://asm32.info
John Found 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users