Re: [sqlite] SQLite with single writer on Windows network share

2019-05-13 Thread Andrew Moss
Many thanks for all your comments and suggestions. I will bear it all in
mind.

On Sun, 12 May 2019 at 12:26, Warren Young  wrote:

> On May 11, 2019, at 5:52 PM, Jose Isaias Cabrera 
> wrote:
> >
> > Warren Young, on Saturday, May 11, 2019 06:20 PM, wrote...
> >>
> >> On May 11, 2019, at 4:10 PM, Thomas Kurz 
> wrote:
> >>
> >> It should run under Cygwin and WSL.
> >
> > I tried to build it. I fould out that I needed brew
>
> Cygwin’s own package repo should have all of the packages you need to
> build Bedrock.  There should be no need to drag Brew into this.
>
> > So, it looks like it sees cygwin as a MacOS system.
>
> Homebrew started off as macOS-only, and is a newcomer on other platforms.
> That’s another reason not to reach for it when you have a large decades-old
> package repo like Cygwin’s to draw from already.
>
> Cygwin’s default install is minimal:
>
> https://stackoverflow.com/a/21233990/142454
>
> It usually takes me several passes through the Cygwin setup.exe to get a
> new system in shape to get useful work done.
>
> > Does anybody know what is gcc@6?
>
> It’s Brew-speak for GCC v6, as opposed to other versions of GCC also in
> the Brew package repo.  The package “gcc” is simply the current version.
> You only give the @NN syntax when you need to pin Brew down to a specific
> version for some reason.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] SQLite with single writer on Windows network share

2019-05-08 Thread Andrew Moss
Hi,

We are currently backed into a corner by a customer and are looking at
using an SQLite database hosted on a windows network share (using server
2012 R2 or later). We are well aware this is not advisable and have read
https://www.sqlite.org/whentouse.html.

My question is, if we limit the application (through other means) to a
single writer, but allow multiple readers, does that remove the risk of
database corruption from multiple SQLite processes?

Any notes from other users who had to do something similar in the past?

Many thanks,

Andrew M
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS3/4 merge function behaviour when deleting rows

2014-05-09 Thread Andrew Moss
I am now totally convinced that FTS3/4 does not work for this usage model.
If you are deleting and inserting documents, the size of the FTS index will
grow in a linear manner with no limit no matter what you do with the merge
command (when you run it, what parameters you provide).

I have separated this issue from my application code and created a python
test script (
https://drive.google.com/folderview?id=0B2OIoOWKs9isdkpySWhmaDg5dmc&usp=sharing)
which does the following;

   - Creates a simple FTS4 table with a single document column
   - Explicity turns on automerge
   - Adds 2000 identical documents
   - Repeatedly deletes the oldest 100 documents and adds 100 more
   - Runs merge=300,8 until no more work is being done after each delete or
1000 insertions.
   - Periodically logs the size of the database and the structure of the
segdir table along with other metrics to a csv file

Note that you will need a recent sqlite library in your python path (eg.
C:\Python27\DLLs). I have been using sqlite version 3.8.4.3

The results of running this test are very clear. Despite the size of the
documents totaling less than 60 KB, the database size grows at a linear
rate with no signs of trending to a limit. You can actually create this
behaviour just by pre-populating the database with 1 document then adding
and deleting a subsequent document repeatedly.

The segdir table contents seem to back up Dan's hypothesis of the index's
FTS behaviour;

On 2 May 2014 07:57, Dan Kennedy  wrote:

> Say your entire database fits into a single level-N b-tree. You keep adding
> data (and delete markers) until there are 15 level-N b-trees and almost
> enough data to create the 16th in lower levels. So at this point the FTS
> index is 16 times its optimal size. If you then add even more data so that
> the 16th level-N b-tree is created, everything gets merged together and
> we're back in the optimal state - everything in a single b-tree. However -
> this b-tree is deemed to be a level-N+1 b-tree. Meaning that this time,
> much more data will have to be added before everything is merged together
> again.
>

Looking at column F in my csv output, I can see that the maximum level is
slowly increasing as the test progresses, so the delete markers for
recently deleted documents (added at level 0) are getting further and
further away from the original trees that reference the deleted documents.
And as I understand it, merge is only going to remove delete markers that
are on the same level in the segdir table as the original document index
data.

I'm not sure about Dan's suggested change to FTS though. Would changing the
incremental merge to allow merging between levels not be better?


> So I'm thinking a solution might be:
>
>   * Fix FTS so that it picks this case - when a merge includes so many
> delete markers that the output is small enough to be deemed a level-N
> b-tree, not a level-N+1 b-tree, and
>
>   * Instead of using the default 16-way merges, the app could organize
> to periodically invoke the "merge=X,Y" command with a smaller Y value
> (say 2) to limit the maximum size of the index to Y times its optimal
> size (instead of 16 times).
>

I am really surprised that FTS behaves this way. To my mind this is a bug
in the FTS extension that makes it unusable for many applications. Was
anyone else aware of this problem or made attempts at resolving it?

Andrew
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS3/4 merge function behaviour when deleting rows

2014-05-05 Thread Andrew Moss
On 2 May 2014 10:22, Dan Kennedy  wrote:
>
>
> A query like:
>
>   SELECT level, count(*) AS ntree FROM yourftstablename_segdir;
>
> will tell you how many b-trees there currently are at each level. Which
> might help you figure out what is going on and when you might expect
> a merge to actually start removing data from the index. More on the
> schema of the underlying data structure used by the FTS index here:
>
>   http://www.sqlite.org/fts3.html#section_9
>
>
I ran my accelerated test for about 50 hours. At the end of the test I had
a database with (393861 pages) 31.% of the space occupied by the index;
Level | # of trees
0|3
1|6
2|5
3|6
4|4
5|5
6|2
7|1
8|1

If I then manually ran merge=100,2 until it stops having an effect (my app
was using merge=300,8) I reduce the index size to (352121) 27.7% with;
Level | # of trees
0|1
1|1
4|1
5|1
8|1
9|1

Running optimise shows that this could be reduced to (145200 pages) 11.4%.
So I had about three times the optimal amount of index data.

I can see that Dan's suggestion of preventing a merge spawning a Level-N+1
b-tree could prevent this growth. I was also wondering if a new merge
function that allows the user to force merging between levels in the index
might be worthwhile.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS3/4 merge function behaviour when deleting rows

2014-05-02 Thread Andrew Moss
On 2 May 2014 07:57, Dan Kennedy  wrote:

> On 05/01/2014 03:30 PM, andrewmo wrote:
>
>> We are using the FTS3 extension to sqlite to store large numbers of short
>> (~300 byte) documents. This is working very well and providing us with
>> very
>> fast text search, but the behaviour around deletion of documents has me
>> confused.
>>
>> Our system must control the the size of the database and will delete the
>> oldest documents when the database size breaches a certain limit. I now
>> understand from comments on this mailing list and elsewhere that this is
>> not
>> an optimal pattern for the FTS extension as doclists for the oldest
>> documents are the least likely to be 'merged'.
>>
>> My question is, does this actually work at all? If I delete a row from my
>> FTS4 table (resulting in a new empty doclist being added to the index),
>> then
>> I subsequently add many (1000s) new documents and call the 'merge'
>> function
>> several times (automerge is also enabled), is there any gaurentee that the
>> empty doclist and the populated doclist that it superseded will ever be
>> removed? My testing suggests this isn't the case.
>>
>> I have a 1GB database with 6million documents. If I keep adding new
>> documents at around 1 per second and deleting documents when the size of
>> the
>> data goes beyond 1GB, the size of the index seems to grow and the number
>> of
>> documents I can store in the 1GB file seems decrease in a linear manner.
>>
>> Calling the 'optimize' function seems to solve this issue (removing all
>> the
>> dead doclists), but that isn't practical for our software, as it implies
>> some downtime for our high availablity service due to the long execution
>> time of the optimize function (Could be minutes for a 1GB file).
>>
>> I have seen this
>> (http://sqlite.1065341.n5.nabble.com/fts3-database-grows-td42069.html)
>> post
>> from 2008. However, it predates the 'automerge' and manual merge features,
>> and from the documentation I assumed these new features would delete all
>> the
>> data related to deleted documents. Am I incorrect in my assumption?
>>
>> Thanks for any clarification you can offer.
>>
>
> Normally, when you write to an FTS index (either to add new doclists or to
> add delete markers) the new entries are accumulated in-memory for a while
> and then flushed to a new "level-0" b-tree. A level-0 b-tree is often
> roughly 1MB in size. Once there are 16 level-0 b-trees, they are merged
> and written to a single level-1 b-tree. Once there are 16 level-1
> b-trees...
> And so on.
>
> So when an entry is deleted from the FTS index, a delete marker is added.
> But the original doclists are not actually deleted until the delete marker
> and the doclists are merged into the same b-tree. Delete markers are
> discarded when they are merged into the oldest b-tree in the index.
>
> At first glance it seems (to me) that this means the index might grow to
> anything up to 16 times its "optimized" size. But I think it's actually
> worse than that.
>
> Say your entire database fits into a single level-N b-tree. You keep adding
> data (and delete markers) until there are 15 level-N b-trees and almost
> enough data to create the 16th in lower levels. So at this point the FTS
> index is 16 times its optimal size. If you then add even more data so that
> the 16th level-N b-tree is created, everything gets merged together and
> we're back in the optimal state - everything in a single b-tree. However -
> this b-tree is deemed to be a level-N+1 b-tree. Meaning that this time,
> much more data will have to be added before everything is merged together
> again.
>
> So I'm thinking a solution might be:
>
>   * Fix FTS so that it picks this case - when a merge includes so many
> delete markers that the output is small enough to be deemed a level-N
> b-tree, not a level-N+1 b-tree, and
>
>   * Instead of using the default 16-way merges, the app could organize
> to periodically invoke the "merge=X,Y" command with a smaller Y value
> (say 2) to limit the maximum size of the index to Y times its optimal
> size (instead of 16 times).
>
> It is an interesting problem. And the above is just guesswork... It would
> be good to verify experimentally that the index really does grow
> indefinitely
> with this kind of input before trying to "fix" anything.
>
> Dan.
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>

Thanks for your response Dan. Our application currently runs "merge=300,8"
( in a loop until no more modifications are being made) whenever it deletes
records due to the database size or after a certain number of insertions
have been made.

I did read the advice in the documentation that this should be "run in an
idle thread". But adding an extra thread to trigger this seemed like
unnecessary complication to our system, where spiky insertion performance
isn't too much