Re: [sqlite] Optimizing list retrieval with a FTS3 table

2011-03-17 Thread Travis Orr
Scott, 

Thank you for clarifying the inefficiency of FTS3 when not using a MATCH
criteria. Unfortunately there are other use cases that do require the
MATCH criteria so the FTS3 is required. 

I believe reading everything into a temp table would consume too much
memory as in an ideal situation we would not place an upper bound on the
number of songs that can be indexed. My next thought is to go ahead and
keep two separate tables with the same information. One a FTS3 table for
when searching by a word is necessary and another that does not use FTS3
for all other cases. Using TRIGGERS and VIEWS it should be relatively
simple to keep both tables up to date. Also the FTS3 version of the
table will not need all of the information that is in the main table
since not all columns need to be searchable.

In case you still cared, here is the current schema.

query = "CREATE VIRTUAL TABLE " LDB_ST " USING fts3 (
tokenize='unaccent', "
LDB_ST_ID " INTEGER PRIMARY KEY,
"
LDB_ST_FPATH " TEXT NOT NULL, "
LDB_ST_TITLE " TEXT NOT NULL, "
LDB_ST_ARTIST " TEXT NOT NULL, "
LDB_ST_ALBUM " TEXT NOT NULL, "
LDB_ST_TRACKNUM " INTEGER, "
LDB_ST_GENRE " TEXT NOT NULL, "
LDB_ST_BPM " REAL NOT NULL, "
LDB_ST_TAPPED_BPM " REAL NOT
NULL, "
LDB_ST_PLAYTIME " INTEGER NOT
NULL, "
LDB_ST_COMMENTS " TEXT NOT NULL,
"
LDB_ST_ISANALYSED " INTEGER NOT
NULL, "
LDB_ST_SESSION " INTEGER, "
LDB_ST_BPM_TYPE " INTEGER); ";



Thanks again,

Travis

-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Scott Hess
Sent: March-16-11 4:20 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] Optimizing list retrieval with a FTS3 table

On Wed, Mar 16, 2011 at 12:15 PM, Travis Orr <t...@ivl.com> wrote:
> Can someone enlighten me about this. To me a lot of the details appear

> to be hidden since my main SongTable is a FTS3 virtual table.

You don't provide your schema, but based on your queries, I'll make
unwarranted assumptions :-).

In fts3, there is a rowid column (standard SQLite meaning), a docid
column which is an alias of rowid, and all the columns you define are
TEXT columns.  If you say:

CREATE VIRTUAL TABLE MyTable USING FTS3(
  songid INTEGER PRIMARY KEY AUTO_INCREMENT MAGIC KEYWORDS,
  title VARCHAR(23),
  recorded DATETIME
);

All three of those columns are TEXT..  Based on your queries, I'm
betting that you're assuming that the various typing keywords for a
CREATE TABLE statement apply, but they don't.  If you want to know why,
you can scan the archives or read the source code, but suffice to say
that this is the truth at this time.

Anyhow, the gist of it is that the FTS3 table has a full-text index on
the TEXT of the columns, and that any other queries will be full table
scans, as if there were no optimizations at all.  So complicated queries
with ORDER BY, LIMIT, and OFFSET can absolutely destroy performance if
your result sets are all all big (or can be big, watch for the query of
death!).  If you will not be using MATCH, then there is no gain at all
from FTS3, and you should consider just using a regular table.

As I understand your problem, the solution I'd probably use would be to
create a new temporary table to hold the data while scanning it.
So something like:

CREATE TEMPORARY TABLE MyResults AS
  SELECT docid, title, artist FROM songtable WHERE ... ORDER BY ...;

I _think_ the resulting table will effectively capture the ORDER BY
results, so you can then scan it using OFFSET and LIMIT (or rowid)
efficiently.  If this is too big, you could experiment with capturing
only the docid values in order, and then joining MyResults back against
songtable to get the original values.  That won't be particularly
efficient with OFFSET and LIMIT, but it should be able to join directly
with songtable.docid, so it shouldn't be particularly inefficient,
either.

Of course, you could also just read the entire docid set into memory and
manage it that way.  It's a little cumbersome because then you have to
keep re-binding the query to walk through things, but it probably won't
perform any worse.

-scott
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimizing list retrieval with a FTS3 table

2011-03-16 Thread Scott Hess
On Wed, Mar 16, 2011 at 12:15 PM, Travis Orr  wrote:
> Can someone enlighten me about this. To me a lot of the details appear
> to be hidden since my main SongTable is a FTS3 virtual table.

You don't provide your schema, but based on your queries, I'll make
unwarranted assumptions :-).

In fts3, there is a rowid column (standard SQLite meaning), a docid
column which is an alias of rowid, and all the columns you define are
TEXT columns.  If you say:

CREATE VIRTUAL TABLE MyTable USING FTS3(
  songid INTEGER PRIMARY KEY AUTO_INCREMENT MAGIC KEYWORDS,
  title VARCHAR(23),
  recorded DATETIME
);

All three of those columns are TEXT..  Based on your queries, I'm
betting that you're assuming that the various typing keywords for a
CREATE TABLE statement apply, but they don't.  If you want to know
why, you can scan the archives or read the source code, but suffice to
say that this is the truth at this time.

Anyhow, the gist of it is that the FTS3 table has a full-text index on
the TEXT of the columns, and that any other queries will be full table
scans, as if there were no optimizations at all.  So complicated
queries with ORDER BY, LIMIT, and OFFSET can absolutely destroy
performance if your result sets are all all big (or can be big, watch
for the query of death!).  If you will not be using MATCH, then there
is no gain at all from FTS3, and you should consider just using a
regular table.

As I understand your problem, the solution I'd probably use would be
to create a new temporary table to hold the data while scanning it.
So something like:

CREATE TEMPORARY TABLE MyResults AS
  SELECT docid, title, artist FROM songtable WHERE ... ORDER BY ...;

I _think_ the resulting table will effectively capture the ORDER BY
results, so you can then scan it using OFFSET and LIMIT (or rowid)
efficiently.  If this is too big, you could experiment with capturing
only the docid values in order, and then joining MyResults back
against songtable to get the original values.  That won't be
particularly efficient with OFFSET and LIMIT, but it should be able to
join directly with songtable.docid, so it shouldn't be particularly
inefficient, either.

Of course, you could also just read the entire docid set into memory
and manage it that way.  It's a little cumbersome because then you
have to keep re-binding the query to walk through things, but it
probably won't perform any worse.

-scott
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimizing list retrieval with a FTS3 table

2011-03-16 Thread Jim Wilcoxson
On Wed, Mar 16, 2011 at 4:12 PM, Travis Orr <t...@ivl.com> wrote:
> Jim, just did a test with your recommendation, however that ended up
> using too much memory, since it is selecting the entire list in on go
> and keeping the memory for a significant amount of time, for the
> embedded system it is running on. Other tasks started having problems.
>
> Travis

I think it should only use cachesize*pagesize memory, so you should be
able to control memory usage with pragma cache_size.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com


>
> -Original Message-
> From: sqlite-users-boun...@sqlite.org
> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Jim Wilcoxson
> Sent: March-16-11 12:51 PM
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] Optimizing list retrieval with a FTS3 table
>
> On Wed, Mar 16, 2011 at 3:15 PM, Travis Orr <t...@ivl.com> wrote:
>> I am currently working on a project that requires retrieving a list of
>> all the rows from a FTS3 table. The ordering of the results varies by
>> search criteria. Since this is for an embedded project the list
> results
>> are passed in chunks to another module to give the appearance of
> faster
>> operations.
>
> Somewhere you have some state information so that you know what offset
> to use.  Store the SQLite cursor with that state information, and use
> it to fetch the next 2000 rows on each call, ie, only do the query
> once.
>
> Jim
> --
> HashBackup: easy onsite and offsite Unix backup
> http://www.hashbackup.com
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimizing list retrieval with a FTS3 table

2011-03-16 Thread Travis Orr
Jim, just did a test with your recommendation, however that ended up
using too much memory, since it is selecting the entire list in on go
and keeping the memory for a significant amount of time, for the
embedded system it is running on. Other tasks started having problems.

Travis

-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Jim Wilcoxson
Sent: March-16-11 12:51 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] Optimizing list retrieval with a FTS3 table

On Wed, Mar 16, 2011 at 3:15 PM, Travis Orr <t...@ivl.com> wrote:
> I am currently working on a project that requires retrieving a list of
> all the rows from a FTS3 table. The ordering of the results varies by
> search criteria. Since this is for an embedded project the list
results
> are passed in chunks to another module to give the appearance of
faster
> operations.

Somewhere you have some state information so that you know what offset
to use.  Store the SQLite cursor with that state information, and use
it to fetch the next 2000 rows on each call, ie, only do the query
once.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimizing list retrieval with a FTS3 table

2011-03-16 Thread Travis Orr
Jim,

That would require maintaining the current connection and using a
prepared statement, correct? 

Thanks for recommending that I had considered that earlier but couldn't
afford to have the db locked from writes for the length of time a query
could take. At that point in time I was still on Sqlite 3.6.xx without
shared-cache enabled. Since then I have pulled Sqlite 3.7.5 and am using
the WAL making this feasible.

Thanks,
Travis

-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Jim Wilcoxson
Sent: March-16-11 12:51 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] Optimizing list retrieval with a FTS3 table

On Wed, Mar 16, 2011 at 3:15 PM, Travis Orr <t...@ivl.com> wrote:
> I am currently working on a project that requires retrieving a list of

> all the rows from a FTS3 table. The ordering of the results varies by 
> search criteria. Since this is for an embedded project the list 
> results are passed in chunks to another module to give the appearance 
> of faster operations.

Somewhere you have some state information so that you know what offset
to use.  Store the SQLite cursor with that state information, and use it
to fetch the next 2000 rows on each call, ie, only do the query once.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimizing list retrieval with a FTS3 table

2011-03-16 Thread Jim Wilcoxson
On Wed, Mar 16, 2011 at 3:15 PM, Travis Orr  wrote:
> I am currently working on a project that requires retrieving a list of
> all the rows from a FTS3 table. The ordering of the results varies by
> search criteria. Since this is for an embedded project the list results
> are passed in chunks to another module to give the appearance of faster
> operations.

Somewhere you have some state information so that you know what offset
to use.  Store the SQLite cursor with that state information, and use
it to fetch the next 2000 rows on each call, ie, only do the query
once.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Optimizing list retrieval with a FTS3 table

2011-03-16 Thread Travis Orr
I am currently working on a project that requires retrieving a list of
all the rows from a FTS3 table. The ordering of the results varies by
search criteria. Since this is for an embedded project the list results
are passed in chunks to another module to give the appearance of faster
operations.

 

Currently the selection of rows is done as follows:

SELECT songid, title, artist FROM SongTable ORDER BY title COLLATE
NOCASE DESC, artist COLLATE NOCASE DESC, songid DESC LIMIT 2000 OFFSET
4000;

With the offset increasing for each subsequent page of data. The big
problem I have with this is that when the offset gets large there is a
huge inefficiency because all rows are selected and then stepped through
until the offset is reached. My current test database has 34,000 rows.

 

I am considering changing to queries of the following style but am
unsure of the performance gains since I will still not be using the
MATCH operator on the data.

SELECT songid, title, artist FROM songtable WHERE (title = "some title
AND artist = "some artist" AND songid > 4194419) OR (title = "some
title" AND artist > "some artist") OR title > "some title" ORDER BY
title, artist, songid LIMIT 2000;

When using the "explain query plan" operator I see that both of the
above cases only use "VIRTUAL TABLE INDEX 0"

 

The advantage of the second case is that is only selects the necessary
rows, however it does use need to perform comparisons and I am unsure
what the performance hit of the comparison operators on a FTS3 table is.

 

sqlite> explain query plan SELECT songid, title, artist FROM songtable
WHERE (title = "some title AND artist = "some artist" AND songid >
4194419) OR (title = "some title" AND artist > "some artist") OR title >
"some title" ORDER BY title, artist, songid LIMIT 2000;

0|0|0|SCAN TABLE songtable VIRTUAL TABLE INDEX 0: (~0 rows)

0|0|0|SCAN TABLE songtable VIRTUAL TABLE INDEX 0: (~0 rows)

0|0|0|SCAN TABLE songtable VIRTUAL TABLE INDEX 0: (~0 rows)

0|0|0|USE TEMP B-TREE FOR ORDER BY

sqlite> explain query plan SELECT songid, title, artist FROM SongTable
ORDER BY title COLLATE NOCASE DESC, artist COLLATE NOCASE DESC, songid
DESC LIMIT 2000 OFFSET 4000;

0|0|0|SCAN TABLE songtable VIRTUAL TABLE INDEX 0: (~0 rows)

0|0|0|USE TEMP B-TREE FOR ORDER BY

 

Can someone enlighten me about this. To me a lot of the details appear
to be hidden since my main SongTable is a FTS3 virtual table.

 

I hope this makes sense.

 

Thanks,

Travis

 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users