Re: [sqlite] Online/Hot backup of WAL journalling mode database

2014-12-12 Thread Clemens Ladisch
Simon Slavin wrote:
> On 12 Dec 2014, at 10:27am, Clemens Ladisch  wrote:
>> If you write your own backup tool that simply calls
>> "sqlite3_backup_step(b, -1)", the entire database is copied in
>> a single atomic transaction.
>
> OP's problem is that he runs several processes which are constantly
> (every few seconds) writing to the database he needs to copy.  So any
> operation which locks the database for a long period would mean that
> some data was not captured.

This is not a problem because he uses WAL mode.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Online/Hot backup of WAL journalling mode database

2014-12-12 Thread Simon Slavin

On 12 Dec 2014, at 10:27am, Clemens Ladisch  wrote:


> If you write your own backup tool that simply calls
> "sqlite3_backup_step(b, -1)", the entire database is copied in
> a single atomic transaction.

OP's problem is that he runs several processes which are constantly (every few 
seconds) writing to the database he needs to copy.  So any operation which 
locks the database for a long period would mean that some data was not captured.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Online/Hot backup of WAL journalling mode database

2014-12-12 Thread Clemens Ladisch
Nick wrote:
> On 11 Dec 2014, at 20:39, David King wrote:
>> Why are you trying to hard to avoid using the backup API? It sounds
>> like it does exactly what you want
>
> Backup API works great if you have periods of no writing.
> However, if a process writes during the backup then the API would stop
> and start over again. So if you have frequent writes then
> theoretically the backup API would not complete.

The ".backup" command of the sqlite3 command-line shell uses a step
size of 100 pages, i.e., gives other processes opportunities to write
regularly.

If you write your own backup tool that simply calls
"sqlite3_backup_step(b, -1)", the entire database is copied in
a single atomic transaction.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Online/Hot backup of WAL journalling mode database

2014-12-12 Thread Dan Kennedy

On 12/12/2014 03:31 AM, Nick wrote:

On 11 Dec 2014, at 10:08, Dan Kennedy wrote:


On 12/11/2014 05:49 AM, Nick wrote:

On 10 Dec 2014, at 07:35, Dan Kennedy wrote:


Strictly speaking the database file may not be well-formed even if there is no 
ongoing checkpoint. If:

  a) process A opens a read transaction,
  b) process B opens and commits a write transaction to the database,
  c) process C checkpoints the db,

then the db file considered without the *-wal file may be corrupt. The problem 
comes about because process C can only checkpoint frames up until the start of 
B's transaction. And there is an optimization that will prevent it from copying 
any earlier frames for which there exists a frame in B's transaction that 
corresponds to the same database page. So it effectively copis only a subset of 
the modifications made by earlier transactions into the db file - not 
necessarily creating a valid db file.

Can this corruption be detected by running PRAGMA quick_check / 
integrity_check? Having the occasional backup db corrupted would be tolerable.

In many cases, but not generally. There would exist cases where a part of a 
committed transaction was lost, or the values in unindexed columns where 
replaced, that sort of thing.

Ok. Presumably a SQLITE_CHECKPOINT_FULL or SQLITE_CHECKPOINT_RESTART 
checkpoint mode would ensure the db file is valid?


That sounds right. A successful FULL or RESTART checkpoint will always 
copy entire transactions into the db.



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Online/Hot backup of WAL journalling mode database

2014-12-11 Thread Nick

On 11 Dec 2014, at 20:39, David King wrote:

> Why are you trying to hard to avoid using the backup API? It sounds like it 
> does exactly what you want

Backup API works great if you have periods of no writing. However, if a process 
writes during the backup then the API would stop and start over again. So if 
you have frequent writes then theoretically the backup API would not complete. 

In an ideal world the backup API would only copy pages altered during the write 
rather than start over.

Regards
Nick
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Online/Hot backup of WAL journalling mode database

2014-12-11 Thread David King
Why are you trying to hard to avoid using the backup API? It sounds like it 
does exactly what you want



On 11 Dec 2014, at 12:36, Nick  wrote:

> 
> On 11 Dec 2014, at 10:43, Simon Slavin wrote:
> 
>> 
>> I don't know enough about the internals of SQLite to be sure, but various 
>> parts of me are concerned that this is a bad idea.  I don't know what WAL 
>> mode would be like without checkpointing but there has to be a reason for 
>> checkpointing and disabling it between backups sounds bad.
>> 
> 
> I would say the docs (https://www.sqlite.org/wal.html#how_to_checkpoint) do 
> not imply application initiated checkpoints is a bad idea.
> 
> Regards
> Nick
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> 



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Online/Hot backup of WAL journalling mode database

2014-12-11 Thread Nick

On 11 Dec 2014, at 10:43, Simon Slavin wrote:

> 
> I don't know enough about the internals of SQLite to be sure, but various 
> parts of me are concerned that this is a bad idea.  I don't know what WAL 
> mode would be like without checkpointing but there has to be a reason for 
> checkpointing and disabling it between backups sounds bad.
> 

I would say the docs (https://www.sqlite.org/wal.html#how_to_checkpoint) do not 
imply application initiated checkpoints is a bad idea.

Regards
Nick
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Online/Hot backup of WAL journalling mode database

2014-12-11 Thread Nick

On 11 Dec 2014, at 10:08, Dan Kennedy wrote:

> On 12/11/2014 05:49 AM, Nick wrote:
>> On 10 Dec 2014, at 07:35, Dan Kennedy wrote:
>> 
>>> Strictly speaking the database file may not be well-formed even if there is 
>>> no ongoing checkpoint. If:
>>> 
>>>  a) process A opens a read transaction,
>>>  b) process B opens and commits a write transaction to the database,
>>>  c) process C checkpoints the db,
>>> 
>>> then the db file considered without the *-wal file may be corrupt. The 
>>> problem comes about because process C can only checkpoint frames up until 
>>> the start of B's transaction. And there is an optimization that will 
>>> prevent it from copying any earlier frames for which there exists a frame 
>>> in B's transaction that corresponds to the same database page. So it 
>>> effectively copis only a subset of the modifications made by earlier 
>>> transactions into the db file - not necessarily creating a valid db file.
>> Can this corruption be detected by running PRAGMA quick_check / 
>> integrity_check? Having the occasional backup db corrupted would be 
>> tolerable.
> 
> In many cases, but not generally. There would exist cases where a part of a 
> committed transaction was lost, or the values in unindexed columns where 
> replaced, that sort of thing.

Ok. Presumably a SQLITE_CHECKPOINT_FULL or SQLITE_CHECKPOINT_RESTART 
checkpoint mode would ensure the db file is valid?

Regards
Nick
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Online/Hot backup of WAL journalling mode database

2014-12-11 Thread Simon Slavin

On 10 Dec 2014, at 10:40pm, Nick  wrote:

> All the processes would have automatic checkpointing disabled. Just the 
> backup process would perform the checkpoint.

I don't know enough about the internals of SQLite to be sure, but various parts 
of me are concerned that this is a bad idea.  I don't know what WAL mode would 
be like without checkpointing but there has to be a reason for checkpointing 
and disabling it between backups sounds bad.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Online/Hot backup of WAL journalling mode database

2014-12-11 Thread Dan Kennedy

On 12/11/2014 05:49 AM, Nick wrote:

On 10 Dec 2014, at 07:35, Dan Kennedy wrote:


Strictly speaking the database file may not be well-formed even if there is no 
ongoing checkpoint. If:

  a) process A opens a read transaction,
  b) process B opens and commits a write transaction to the database,
  c) process C checkpoints the db,

then the db file considered without the *-wal file may be corrupt. The problem 
comes about because process C can only checkpoint frames up until the start of 
B's transaction. And there is an optimization that will prevent it from copying 
any earlier frames for which there exists a frame in B's transaction that 
corresponds to the same database page. So it effectively copis only a subset of 
the modifications made by earlier transactions into the db file - not 
necessarily creating a valid db file.

Can this corruption be detected by running PRAGMA quick_check / 
integrity_check? Having the occasional backup db corrupted would be tolerable.


In many cases, but not generally. There would exist cases where a part 
of a committed transaction was lost, or the values in unindexed columns 
where replaced, that sort of thing.








Regards
Nick
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Online/Hot backup of WAL journalling mode database

2014-12-10 Thread Nick

On 10 Dec 2014, at 07:35, Dan Kennedy wrote:

> Strictly speaking the database file may not be well-formed even if there is 
> no ongoing checkpoint. If:
> 
>  a) process A opens a read transaction,
>  b) process B opens and commits a write transaction to the database,
>  c) process C checkpoints the db,
> 
> then the db file considered without the *-wal file may be corrupt. The 
> problem comes about because process C can only checkpoint frames up until the 
> start of B's transaction. And there is an optimization that will prevent it 
> from copying any earlier frames for which there exists a frame in B's 
> transaction that corresponds to the same database page. So it effectively 
> copis only a subset of the modifications made by earlier transactions into 
> the db file - not necessarily creating a valid db file.

Can this corruption be detected by running PRAGMA quick_check / 
integrity_check? Having the occasional backup db corrupted would be tolerable.

Regards
Nick 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Online/Hot backup of WAL journalling mode database

2014-12-10 Thread Nick

On 10 Dec 2014, at 02:36, Simon Slavin wrote:

> 
> On 10 Dec 2014, at 12:30am, Nick  wrote:
> 
>> That's interesting Simon I didn't expect the database not to be trustworthy.
> 
> The database will be trustworthy at any instant.  Your copy of it will be 
> corrupt because the file will be changing while you are copying it.
> 
>> In WAL mode I thought the database file is only written to when 
>> checkpointing. Have I misunderstood this journaling mode?
> 
> How do you intend to prevent your other processes from checkpointing while 
> you take the backup ?  You can disable checkpointing for your own connection 
> to the database but not for the connections other processes have.

All the processes would have automatic checkpointing disabled. Just the backup 
process would perform the checkpoint.

>> Again I may have misunderstood the docs around the Backup API, does it not 
>> start again from the beginning copying pages if another process writes to 
>> the database during the process? In practice could it successfully backup a 
>> 2GB database that is being written to once a second?
> 
> Not if the writing never stopped.  But there's no way to take a copy of a 
> file which is constantly being rewritten.  rsync can't do it either.  How can 
> anything copy a file which is constantly being modified ?
> 
> You can BEGIN EXCLUSIVE and then END once your backup is finished.  That 
> should prevent other processes writing to the file.  You will have to deal 
> with what happens if your BEGIN EXCLUSIVE times out, and you will have to put 
> long timeouts in your other processes so they can handle the file being 
> locked long enough for the entire copy to be taken.  That's the only way I 
> can think of to do it.  And yes, it will prevent writing to the database 
> while it's being copied.
> 
> On the other hand, there's a different way to clone a database: log the 
> changes.
> 
> When something issues an INSERT/DELETE/UPDATE command, execute the command 
> but also append a copy of that command to a text file somewhere.  When you 
> want to bring your backup copy up-to-date, take a copy of the log file, then 
> execute all the commands in it to your out-of-date copy.
> 
> You need a method of zeroing out the log file, or knowing where you got to on 
> your last backup.

Thanks for the info, Simon.

Regards
Nick

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Online/Hot backup of WAL journalling mode database

2014-12-09 Thread Dan Kennedy

On 12/10/2014 05:06 AM, Simon Slavin wrote:

On 9 Dec 2014, at 8:57pm, Nick  wrote:


Environment is Linux with multiple (c. 4-6) processes accessing a single sqlite database 
named "test.db".

Backup:
- New process started using cronjob to initiate application checkpoint until 
completion.
- rsync diff the file "test.db" to another drive/location (specifically ignoring the 
"-shm" and "-wal" file).
- exit process

Restore:
- rsync the file "test.db" from another drive/location.

Will not be trustworthy if the database is being written to during the rsync 
operations.




Strictly speaking the database file may not be well-formed even if there 
is no ongoing checkpoint. If:


  a) process A opens a read transaction,
  b) process B opens and commits a write transaction to the database,
  c) process C checkpoints the db,

then the db file considered without the *-wal file may be corrupt. The 
problem comes about because process C can only checkpoint frames up 
until the start of B's transaction. And there is an optimization that 
will prevent it from copying any earlier frames for which there exists a 
frame in B's transaction that corresponds to the same database page. So 
it effectively copis only a subset of the modifications made by earlier 
transactions into the db file - not necessarily creating a valid db file.


Dan.









A) Ensure all processes besides the backup process have the database closed 
while it is being copied. Establish some kind of semaphore so they can tell 
when it's safe to open the database again.

B) Use the SQLite Backup API which was invented to do what you want.





Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Online/Hot backup of WAL journalling mode database

2014-12-09 Thread Simon Slavin

On 10 Dec 2014, at 12:30am, Nick  wrote:

> That's interesting Simon I didn't expect the database not to be trustworthy.

The database will be trustworthy at any instant.  Your copy of it will be 
corrupt because the file will be changing while you are copying it.

> In WAL mode I thought the database file is only written to when 
> checkpointing. Have I misunderstood this journaling mode?

How do you intend to prevent your other processes from checkpointing while you 
take the backup ?  You can disable checkpointing for your own connection to the 
database but not for the connections other processes have.

> Again I may have misunderstood the docs around the Backup API, does it not 
> start again from the beginning copying pages if another process writes to the 
> database during the process? In practice could it successfully backup a 2GB 
> database that is being written to once a second?

Not if the writing never stopped.  But there's no way to take a copy of a file 
which is constantly being rewritten.  rsync can't do it either.  How can 
anything copy a file which is constantly being modified ?

You can BEGIN EXCLUSIVE and then END once your backup is finished.  That should 
prevent other processes writing to the file.  You will have to deal with what 
happens if your BEGIN EXCLUSIVE times out, and you will have to put long 
timeouts in your other processes so they can handle the file being locked long 
enough for the entire copy to be taken.  That's the only way I can think of to 
do it.  And yes, it will prevent writing to the database while it's being 
copied.

On the other hand, there's a different way to clone a database: log the changes.

When something issues an INSERT/DELETE/UPDATE command, execute the command but 
also append a copy of that command to a text file somewhere.  When you want to 
bring your backup copy up-to-date, take a copy of the log file, then execute 
all the commands in it to your out-of-date copy.

You need a method of zeroing out the log file, or knowing where you got to on 
your last backup.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Online/Hot backup of WAL journalling mode database

2014-12-09 Thread Nick

On 9 Dec 2014, at 22:06, Simon Slavin wrote:

> 
> On 9 Dec 2014, at 8:57pm, Nick  wrote:
> 
>> Environment is Linux with multiple (c. 4-6) processes accessing a single 
>> sqlite database named "test.db".
>> 
>> Backup:
>> - New process started using cronjob to initiate application checkpoint until 
>> completion.
>> - rsync diff the file "test.db" to another drive/location (specifically 
>> ignoring the "-shm" and "-wal" file).
>> - exit process
>> 
>> Restore:
>> - rsync the file "test.db" from another drive/location.
> 
> Will not be trustworthy if the database is being written to during the rsync 
> operations.  Recommend either of the following:
> 
> A) Ensure all processes besides the backup process have the database closed 
> while it is being copied. Establish some kind of semaphore so they can tell 
> when it's safe to open the database again.
> 
> B) Use the SQLite Backup API which was invented to do what you want.
> 
> Simon.

That's interesting Simon I didn't expect the database not to be trustworthy. In 
WAL mode I thought the database file is only written to when checkpointing. 
Have I misunderstood this journaling mode?

Again I may have misunderstood the docs around the Backup API, does it not 
start again from the beginning copying pages if another process writes to the 
database during the process? In practice could it successfully backup a 2GB 
database that is being written to once a second?

Regards
Nick
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Online/Hot backup of WAL journalling mode database

2014-12-09 Thread Simon Slavin

On 9 Dec 2014, at 8:57pm, Nick  wrote:

> Environment is Linux with multiple (c. 4-6) processes accessing a single 
> sqlite database named "test.db".
> 
> Backup:
> - New process started using cronjob to initiate application checkpoint until 
> completion.
> - rsync diff the file "test.db" to another drive/location (specifically 
> ignoring the "-shm" and "-wal" file).
> - exit process
> 
> Restore:
> - rsync the file "test.db" from another drive/location.

Will not be trustworthy if the database is being written to during the rsync 
operations.  Recommend either of the following:

A) Ensure all processes besides the backup process have the database closed 
while it is being copied. Establish some kind of semaphore so they can tell 
when it's safe to open the database again.

B) Use the SQLite Backup API which was invented to do what you want.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Online/Hot backup of WAL journalling mode database

2014-12-09 Thread Nick
Hi,

I'd like to check my understanding of Sqlite in WAL journalling mode. With 
automatic checkpointing turned off would the following psuedo-code result in a 
online backup approach that allows robust restore of the database with data 
fresh up to the last checkpoint?

Environment is Linux with multiple (c. 4-6) processes accessing a single sqlite 
database named "test.db".

Backup:
- New process started using cronjob to initiate application checkpoint until 
completion.
- rsync diff the file "test.db" to another drive/location (specifically 
ignoring the "-shm" and "-wal" file).
- exit process

Restore:
- rsync the file "test.db" from another drive/location.

Regards
Nick



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users