I am now totally convinced that FTS3/4 does not work for this usage model. If you are deleting and inserting documents, the size of the FTS index will grow in a linear manner with no limit no matter what you do with the merge command (when you run it, what parameters you provide).
I have separated this issue from my application code and created a python test script ( https://drive.google.com/folderview?id=0B2OIoOWKs9isdkpySWhmaDg5dmc&usp=sharing) which does the following; - Creates a simple FTS4 table with a single document column - Explicity turns on automerge - Adds 2000 identical documents - Repeatedly deletes the oldest 100 documents and adds 100 more - Runs merge=300,8 until no more work is being done after each delete or 1000 insertions. - Periodically logs the size of the database and the structure of the segdir table along with other metrics to a csv file Note that you will need a recent sqlite library in your python path (eg. C:\Python27\DLLs). I have been using sqlite version 3.8.4.3 The results of running this test are very clear. Despite the size of the documents totaling less than 60 KB, the database size grows at a linear rate with no signs of trending to a limit. You can actually create this behaviour just by pre-populating the database with 1 document then adding and deleting a subsequent document repeatedly. The segdir table contents seem to back up Dan's hypothesis of the index's FTS behaviour; On 2 May 2014 07:57, Dan Kennedy <danielk1...@gmail.com> wrote: > Say your entire database fits into a single level-N b-tree. You keep adding > data (and delete markers) until there are 15 level-N b-trees and almost > enough data to create the 16th in lower levels. So at this point the FTS > index is 16 times its optimal size. If you then add even more data so that > the 16th level-N b-tree is created, everything gets merged together and > we're back in the optimal state - everything in a single b-tree. However - > this b-tree is deemed to be a level-N+1 b-tree. Meaning that this time, > much more data will have to be added before everything is merged together > again. > Looking at column F in my csv output, I can see that the maximum level is slowly increasing as the test progresses, so the delete markers for recently deleted documents (added at level 0) are getting further and further away from the original trees that reference the deleted documents. And as I understand it, merge is only going to remove delete markers that are on the same level in the segdir table as the original document index data. I'm not sure about Dan's suggested change to FTS though. Would changing the incremental merge to allow merging between levels not be better? > So I'm thinking a solution might be: > > * Fix FTS so that it picks this case - when a merge includes so many > delete markers that the output is small enough to be deemed a level-N > b-tree, not a level-N+1 b-tree, and > > * Instead of using the default 16-way merges, the app could organize > to periodically invoke the "merge=X,Y" command with a smaller Y value > (say 2) to limit the maximum size of the index to Y times its optimal > size (instead of 16 times). > I am really surprised that FTS behaves this way. To my mind this is a bug in the FTS extension that makes it unusable for many applications. Was anyone else aware of this problem or made attempts at resolving it? Andrew _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users