On Sun, Sep 14, 2008 at 11:01 AM, Holger Lembke <[EMAIL PROTECTED]> wrote: > (Hello all, new user... refreshing my rusty sql knowledge) > > I have a simple table created with > > create virtual table ft using fts3(id, content); > > and store id/text-pairs with > > delete from ft where (id="theid");
Note that this will involve a full table scan to find the row with id="theid". fts3 tables have an index on tokens, and an implicit unique index on docid/rowid, and this can use neither. If having a text id is important, you could do something like: CREATE TABLE ft_id(docid INTEGER PRIMARY KEY, id TEXT UNIQUE); CREATE VIRTUAL TABLE ft USING fts3(content); Then inserts would be something like: INSERT INTO ft_id (docid, id) VALUES (null, ?); -- bind "theid" or whatever INSERT INTO ft (docid, content) VALUES (LAST_INSERT_ROWID(), ?); -- bind "sometext" or whatever and delete would be something like: DELETE FROM ft WHERE docid = (SELECT docid FROM ft_id WHERE id = ?); -- bind "theid" or whatever DELETE FROM ft_id WHERE id = ?; > insert into ft values ("theid", "sometext"); > > To test the table I have 400 files containing old mails. Just an random > choice of contents. Ids are internally counted up simply, so it is 1 to 400. > I remove ":" and cr/lf from mails but leave everything else. But the effect > seems to be content independant. > > If I repeat this process multiple times I get a growing database. select > count(*) from ft and ft_content stays at 400, ft_segdir stays at 30. > > But select count(*) from ft_segments is raising, from 756, 1237 to 2100 and > so on. Ok, that explains the growing of my database file. You're seeing two effects. For performance reasons, fts3 stores new updates in new segments (in ft_segments), and over time merges them together. So index information doesn't always immediately change to reflect "truth". Over time as additional updates happen, older segments are merged and older data is dropped (as newer data supercedes it), so you should eventually see things stabilize. Additionally, fts3 implements delete as a sort of negative posting list. Unfortunately, when implementing things I was not able to think of a clear way to handle negative posting lists supercede positive posting lists while still maintaining correctness across segments. I have an implementation which implements this, but haven't polished it up enough to check it in. Over the summer I added an fts3 function called optimize() which can be used to deal with some of this. If you do something like this: SELECT optimize(ft) FROM ft LIMIT 1; fts3 will combine all index data into a single segment, and drop unnecessary negative posting lists. Also, if you happen to delete all data from your fts3 table, the index will be cleared out. Long-term, the ability to have negative posting lists annihilate positive posting lists is obviously the better solution. I had hoped to get it in last spring, but unfortunately I keep getting distracted by other stuff. The main sticking point was that I also wanted to build an fts_migrate facility to make it easy to migrate data between differently-defined fts tables (either between fts3 and fts4, or between fts tables with different schema). No estimate as to completion point for that. -scott _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users