OK!  This gives me something I can sink my teeth into!

I hacked together a TCL script which takes info about 100 albums
(snagged from a list on the Internet), makes up 10 song titles each,
makes up a path in the obvious way, and stuffs them into a table.
Then it times SELECT COUNT(*).  Three versions, a single table, split
between a regular table and fts3, and split between two regular tables
(convert the second table from fts3 to a regular table, as a control).
 The results were:

regular - 290 microseconds per iteration
regular w/fts3 - 63858 microseconds per iteration
regular w/regular - 1299 microseconds per iteration

I ran it 5 times and took the middle timing.  The queries looked like:
   SELECT COUNT(*) FROM Songs
   SELECT COUNT(*) FROM SongsBase JOIN SongsText ON ID=docid

So, indeed, there's room for improvement!

I have a patch which improves regular w/fts3 to '8048 microseconds per
iteration' by re-using the prepared statement better.  I need to spend
some time to think on correctness before checking it in, though.

AFAICT, the effect seems to be linear.  In stepping around in the
code, I'm not sure how much improvement can be gotten beyond my patch,
but I'll be thinking on it.

---

Additionally, if I let the fts3 table drive the query:
      SELECT COUNT(*) FROM SongsText CROSS JOIN SongsBase ON ID=docid

I get '2971 microseconds per iteration' without my patch.  The use of
"CROSS JOIN" is to force SongsText to be the outer loop (otherwise it
performs like the earlier statement).  This might be a problem with
the fulltextBestIndex hinting.

This join-ordering trick may have issues if you need a WHERE clause
which references SongsBase, and it may depend a lot on what your
queries are and their result sizes versus the size of your dataset.
Future cloudy!  The patch I mention above should improve joins from
SongsBase to SongsText.

Thanks for bearing with me,
scott


On Wed, Jul 23, 2008 at 7:47 AM, Jiri Hajek <[EMAIL PROTECTED]> wrote:
>> Again, you've given a relatively broad description of what you're
>> trying to do.  I could make up a bunch of stuff and answer my own
>> question, but you'd probably rather than I considered the problem
>> _you_ are having.
>
> Ok, I'll try to be as specific as possible. The main table I have is (the
> real version has much more fields, but it isn't important for our example):
>
> CREATE TABLE Songs (
>  ID INTEGER PRIMARY KEY AUTOINCREMENT,
>  Artist TEXT COLLATE IUNICODE,
>  Album TEXT COLLATE IUNICODE,
>  SongTitle TEXT COLLATE IUNICODE,
>  Path TEXT COLLATE IUNICODE,
>  Year INTEGER,
>  Bitrate INTEGER)
>
> This table can have even >100k records, even close to million and is mostly
> accessed by SELECTing all fields of some records, i.e.:
>
> SELECT * FROM Songs WHERE {something}
>
> In order to use FTS3, I could take all the text fields from Songs table and
> move them to a FTS3 table:
>
> CREATE TABLE SongsBase (
>  ID INTEGER PRIMARY KEY AUTOINCREMENT,
>  Year INTEGER,
>  Bitrate INTEGER)
>
> CREATE VIRTUAL TABLE SongsText USING FTS3(TOKENIZE mm,
>  Artist,
>  Album,
>  SongTitle,
>  Path)
>
> This way I would lose my custom collation (IUNICODE), which would be quite a
> problem, particularly for Path field (and if you're asking, yes, I'd like to
> include Path in the full-text index). Another problem is that joined SELECT
> on SongsBase and SongsText is slower than SELECT on the original Songs
> table.
>
> So, the only solution using FTS3 seems to be to use the original Songs table
> and add SongsText table, automatically updated by triggers like:
>
> CREATE TRIGGER update_songs UPDATE OF Artist,Album,SongTitle,Path ON Songs
> BEGIN
>  UPDATE SongsText SET Artist=new.Artist, Album=new.Album,
> SongTitle=new.Title, Path=new.Path WHERE rowid=new.id;
> END;
>
> This solution probably isn't bad, but according to my knowledge of FTS3, it
> unnecessarily occupies some DB space (all text fields are actually stored
> twice, once in Songs and once in SongsText).
>
> Any ideas or recommedations?
>
> Thanks,
> Jiri
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to