I am now totally convinced that FTS3/4 does not work for this usage model.
If you are deleting and inserting documents, the size of the FTS index will
grow in a linear manner with no limit no matter what you do with the merge
command (when you run it, what parameters you provide).

I have separated this issue from my application code and created a python
test script (
https://drive.google.com/folderview?id=0B2OIoOWKs9isdkpySWhmaDg5dmc&usp=sharing)
which does the following;

   - Creates a simple FTS4 table with a single document column
   - Explicity turns on automerge
   - Adds 2000 identical documents
   - Repeatedly deletes the oldest 100 documents and adds 100 more
   - Runs merge=300,8 until no more work is being done after each delete or
1000 insertions.
   - Periodically logs the size of the database and the structure of the
segdir table along with other metrics to a csv file

Note that you will need a recent sqlite library in your python path (eg.
C:\Python27\DLLs). I have been using sqlite version 3.8.4.3

The results of running this test are very clear. Despite the size of the
documents totaling less than 60 KB, the database size grows at a linear
rate with no signs of trending to a limit. You can actually create this
behaviour just by pre-populating the database with 1 document then adding
and deleting a subsequent document repeatedly.

The segdir table contents seem to back up Dan's hypothesis of the index's
FTS behaviour;

On 2 May 2014 07:57, Dan Kennedy <danielk1...@gmail.com> wrote:

> Say your entire database fits into a single level-N b-tree. You keep adding
> data (and delete markers) until there are 15 level-N b-trees and almost
> enough data to create the 16th in lower levels. So at this point the FTS
> index is 16 times its optimal size. If you then add even more data so that
> the 16th level-N b-tree is created, everything gets merged together and
> we're back in the optimal state - everything in a single b-tree. However -
> this b-tree is deemed to be a level-N+1 b-tree. Meaning that this time,
> much more data will have to be added before everything is merged together
> again.
>

Looking at column F in my csv output, I can see that the maximum level is
slowly increasing as the test progresses, so the delete markers for
recently deleted documents (added at level 0) are getting further and
further away from the original trees that reference the deleted documents.
And as I understand it, merge is only going to remove delete markers that
are on the same level in the segdir table as the original document index
data.

I'm not sure about Dan's suggested change to FTS though. Would changing the
incremental merge to allow merging between levels not be better?


> So I'm thinking a solution might be:
>
>   * Fix FTS so that it picks this case - when a merge includes so many
>     delete markers that the output is small enough to be deemed a level-N
>     b-tree, not a level-N+1 b-tree, and
>
>   * Instead of using the default 16-way merges, the app could organize
>     to periodically invoke the "merge=X,Y" command with a smaller Y value
>     (say 2) to limit the maximum size of the index to Y times its optimal
>     size (instead of 16 times).
>

I am really surprised that FTS behaves this way. To my mind this is a bug
in the FTS extension that makes it unusable for many applications. Was
anyone else aware of this problem or made attempts at resolving it?

Andrew
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to