On 20 September 2016 at 14:29, Simon Slavin <slav...@bigfraud.org> wrote:
> On 20 Sep 2016, at 8:59am, Jose Arroyo <jose.m.arroyo...@gmail.com> wrote:
> > My current issue is that I'm encountering WAL checkpoint starvation and
> > trying to figure out a way to create checkpoint gaps only when needed.
> Can I ask why you're trying to have as few checkpoints as possible ? The
> fewer you have, the more time they take, the more annoying it is to your
> users, the more processes reach their timeout times.
> I can understand that there are times when you want as few checkpoints as
> possible, but if you explain why then we may be able to suggest a strategy
> which doesn't involve hacking SQLite and becoming dependent on a particular
> version behaving a specific way.
> For instance, you could use "PRAGMA wal_autocheckpoint=N" and use a
> smaller value for N than 1000. That sort-of does what you asked since it
> will automatically fire when the WAL journal reaches a certain size (of
> unwritten data).
> Another common strategy is that every process which does writing has a
> checkpoint every 1000 transactions. Or 100. or 10. Whatever suits.
> Again, it is a system which takes care of itself and will not stop working
> if a new version of SQLite handles WAL files differently.
> sqlite-users mailing list
Sorry, I used the wrong word, I meant "reader gaps" not "checkpoint gaps"
I don't exactly want to minimize the number of checkpoints, what I want is
figuring out a way to force a successful checkpoint if the WAL starts
growing unbounded, by leaving a "reader gap" without the writer process
itself being the one responsible for handling this "reader gap".
My problem is that with the dynamics of how the writer and the reader
processes interact, there is a certain DB size threshold after which long
periods of checkpoint starvation occur. The writer process (there is only
on in my scenario) actually implements it's own wal checkpointing
mechanism, where it tries to checkpoint in a separate thread when wal_pages
> 1000, and tries to checkpoint synchronously while blocking all queued
updates if ever wal_pages > 10000.
However, this writer process has no control over the reader processes, so
checkpoints may not necessary complete successfully ("If another connection
has a read transaction open, then the checkpoint cannot reset the WAL file
because doing so might delete content out from under the reader"). A large
WAL means slow queries, which means fewer gaps where a checkpoint could
finish successfully, so this issue is a bit of a vicious circle.
I can't modify the writer process too much, and without a busy_handler that
notifies all readers to stop for a while any wal_checkpoints that block
readers (FULL, RESTART, TRUNCATE) just end up "timeout'ing" during these
"checkpoint starvation" periods (this is linked to the SPARQL layer on top
of SQLite that I'm working with, I think).
So that's why I'm looking for a way of detecting unusual WAL growth
moments, notify all readers so that they leave just enough time for a
checkpoint to complete but avoid doing this unless absolutely necessary. If
the writer process does TRUNCATE checkpoints after a certain WAL size,
monitoring the WAL file size directly, as Paul mentioned, might be a good
sqlite-users mailing list