OK! This gives me something I can sink my teeth into! I hacked together a TCL script which takes info about 100 albums (snagged from a list on the Internet), makes up 10 song titles each, makes up a path in the obvious way, and stuffs them into a table. Then it times SELECT COUNT(*). Three versions, a single table, split between a regular table and fts3, and split between two regular tables (convert the second table from fts3 to a regular table, as a control). The results were:
regular - 290 microseconds per iteration regular w/fts3 - 63858 microseconds per iteration regular w/regular - 1299 microseconds per iteration I ran it 5 times and took the middle timing. The queries looked like: SELECT COUNT(*) FROM Songs SELECT COUNT(*) FROM SongsBase JOIN SongsText ON ID=docid So, indeed, there's room for improvement! I have a patch which improves regular w/fts3 to '8048 microseconds per iteration' by re-using the prepared statement better. I need to spend some time to think on correctness before checking it in, though. AFAICT, the effect seems to be linear. In stepping around in the code, I'm not sure how much improvement can be gotten beyond my patch, but I'll be thinking on it. --- Additionally, if I let the fts3 table drive the query: SELECT COUNT(*) FROM SongsText CROSS JOIN SongsBase ON ID=docid I get '2971 microseconds per iteration' without my patch. The use of "CROSS JOIN" is to force SongsText to be the outer loop (otherwise it performs like the earlier statement). This might be a problem with the fulltextBestIndex hinting. This join-ordering trick may have issues if you need a WHERE clause which references SongsBase, and it may depend a lot on what your queries are and their result sizes versus the size of your dataset. Future cloudy! The patch I mention above should improve joins from SongsBase to SongsText. Thanks for bearing with me, scott On Wed, Jul 23, 2008 at 7:47 AM, Jiri Hajek <[EMAIL PROTECTED]> wrote: >> Again, you've given a relatively broad description of what you're >> trying to do. I could make up a bunch of stuff and answer my own >> question, but you'd probably rather than I considered the problem >> _you_ are having. > > Ok, I'll try to be as specific as possible. The main table I have is (the > real version has much more fields, but it isn't important for our example): > > CREATE TABLE Songs ( > ID INTEGER PRIMARY KEY AUTOINCREMENT, > Artist TEXT COLLATE IUNICODE, > Album TEXT COLLATE IUNICODE, > SongTitle TEXT COLLATE IUNICODE, > Path TEXT COLLATE IUNICODE, > Year INTEGER, > Bitrate INTEGER) > > This table can have even >100k records, even close to million and is mostly > accessed by SELECTing all fields of some records, i.e.: > > SELECT * FROM Songs WHERE {something} > > In order to use FTS3, I could take all the text fields from Songs table and > move them to a FTS3 table: > > CREATE TABLE SongsBase ( > ID INTEGER PRIMARY KEY AUTOINCREMENT, > Year INTEGER, > Bitrate INTEGER) > > CREATE VIRTUAL TABLE SongsText USING FTS3(TOKENIZE mm, > Artist, > Album, > SongTitle, > Path) > > This way I would lose my custom collation (IUNICODE), which would be quite a > problem, particularly for Path field (and if you're asking, yes, I'd like to > include Path in the full-text index). Another problem is that joined SELECT > on SongsBase and SongsText is slower than SELECT on the original Songs > table. > > So, the only solution using FTS3 seems to be to use the original Songs table > and add SongsText table, automatically updated by triggers like: > > CREATE TRIGGER update_songs UPDATE OF Artist,Album,SongTitle,Path ON Songs > BEGIN > UPDATE SongsText SET Artist=new.Artist, Album=new.Album, > SongTitle=new.Title, Path=new.Path WHERE rowid=new.id; > END; > > This solution probably isn't bad, but according to my knowledge of FTS3, it > unnecessarily occupies some DB space (all text fields are actually stored > twice, once in Songs and once in SongsText). > > Any ideas or recommedations? > > Thanks, > Jiri > > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users