[sqlite] Ideas on how to use backup API for remote, incremental backup?
Thanks Richard, this is exactly what I was thinking. One question on this: On Tue, Oct 6, 2015 at 10:22 PM, Richard Hipp wrote: > (2) For the source database connection of the backup, use the same > database connection that is used for writing to the database. That > means that when changes are made to the source database, the backup > mechanism can go back and resend only those pages that actually > changed. If the database is modified by any database connection other > than the one used for backup, then the backup has no way of knowing > which pages changed, and so it has to start over again at the > beginning and rewrite every page. > Hm, that's interesting. I was thinking that when the backup started, all new writes would just accumulate in the WAL file -- and then be checkpointed after the backup completed. Thus the backup would be of the state of the database at the moment the backup was *started*. However, based on your description it sounds like the backup API will copy over changes that have occurred throughout the backup process itself. Our database is modified about 40 times per second, so my fear is the backup API would never actually complete because it would be continuously re-backing up the same pages. Am I understanding this correctly? Is there any way to obtain the behavior I initially assumed? Thanks! -david
[sqlite] Ideas on how to use backup API for remote, incremental backup?
On 10/9/15, David Barrett wrote: > Thanks Richard, this is exactly what I was thinking. One question on this: > > On Tue, Oct 6, 2015 at 10:22 PM, Richard Hipp wrote: > >> (2) For the source database connection of the backup, use the same >> database connection that is used for writing to the database. That >> means that when changes are made to the source database, the backup >> mechanism can go back and resend only those pages that actually >> changed. If the database is modified by any database connection other >> than the one used for backup, then the backup has no way of knowing >> which pages changed, and so it has to start over again at the >> beginning and rewrite every page. >> > > Hm, that's interesting. I was thinking that when the backup started, all > new writes would just accumulate in the WAL file -- and then be > checkpointed after the backup completed. Thus the backup would be of the > state of the database at the moment the backup was *started*. However, > based on your description it sounds like the backup API will copy over > changes that have occurred throughout the backup process itself. Our > database is modified about 40 times per second, so my fear is the backup > API would never actually complete because it would be continuously > re-backing up the same pages. Am I understanding this correctly? Is there > any way to obtain the behavior I initially assumed? > Yes - run the backup using a separate "source" database connection and do the entire backup within a read transaction. Note that having this long-running read transaction going in a separate database connection will prevent checkpoint operations from resetting the WAL file, and so the WAL file will continue to grow until the backup completes. So you may want to take steps to ensure that the backup doesn't take *too* long. -- D. Richard Hipp drh at sqlite.org
[sqlite] Ideas on how to use backup API for remote, incremental backup?
On Tue, Oct 6, 2015 at 8:36 PM, Simon Slavin wrote: > Or copy an existing /open/ database file to the new server using the > SQLite Backup API, [requires other connections to stop modifying the > database for long enough for the copy to be made] > Well the backup API works with WAL mode [1] so it can still be modified while being backed up (though I imagine the WAL file will get huge during the backup, but that's fine). Regardless, all those solutions require me to wait for the entire backup to complete before sending the file to the remote host -- my goal is to send it one page at a time (eg, send the pages as the backup API processes them) so as to avoid the large delay and disk space of the backup itself. -david [1] http://sqlite.1065341.n5.nabble.com/WAL-mode-and-backup-API-td40107.html
[sqlite] Ideas on how to use backup API for remote, incremental backup?
On Tue, Oct 6, 2015 at 2:57 PM, Clemens Ladisch wrote: > It backs up to any disk that you can access. > Do you have a network file system? > Well yes, but I'd like to handle it at the application layer. Basically, we operate a custom replication layer atop sqlite. It replicates individual transactions great with 2-phase commit, but right now you need to manually "bootstrap" a new server by copying the database from a different server. I'd like to auto-bootstrap a new node by just starting it, it'd connect to a peer, and then download the entire database. > > how to use this API to do an incremental backup > > This API is not incremental; it always copies the entire database. > Agreed it will copy the entire database, but incrementally -- one bit at a time. Each call to sqlite3_backup_step() backs up a bit more to the target file on disk. My goal is to instead have sqlite3_backup_step() copy some pages to some block of RAM, so I can send that data over the wire and write it to the remote disk. -david
[sqlite] Ideas on how to use backup API for remote, incremental backup?
Well, there's the obvious Elephant in the room - SQL. You could just reduce any DB to SQL statements and pass those along at whichever pace/destination/byte-mode you fancy. The target system will have zero trouble turning it into a DB, thanks to the SQLite engine already able to parse SQL. This can become really efficient later when you only move updated bits - plus, if bandwidth is an issue, SQL text compresses well. Cheers, Ryan On 2015-10-06 03:44 PM, David Barrett wrote: > On Tue, Oct 6, 2015 at 8:36 PM, Simon Slavin wrote: > >> Or copy an existing /open/ database file to the new server using the >> SQLite Backup API, [requires other connections to stop modifying the >> database for long enough for the copy to be made] >> > Well the backup API works with WAL mode [1] so it can still be modified > while being backed up (though I imagine the WAL file will get huge during > the backup, but that's fine). > > Regardless, all those solutions require me to wait for the entire backup to > complete before sending the file to the remote host -- my goal is to send > it one page at a time (eg, send the pages as the backup API processes them) > so as to avoid the large delay and disk space of the backup itself. > > -david > > [1] http://sqlite.1065341.n5.nabble.com/WAL-mode-and-backup-API-td40107.html > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Ideas on how to use backup API for remote, incremental backup?
On 6 Oct 2015, at 2:44pm, David Barrett wrote: > Regardless, all those solutions require me to wait for the entire backup to > complete before sending the file to the remote host -- my goal is to send > it one page at a time (eg, send the pages as the backup API processes them) > so as to avoid the large delay and disk space of the backup itself. In that case it's probably best, as was suggested by Clemens, to use a network filing system to allow a computer to access both the source file storage and the destination file storage. By opening a file on the remote filespace using SQLite you can use the Backup API to write directly to the destination filespace. Simon.
[sqlite] Ideas on how to use backup API for remote, incremental backup?
On 6 Oct 2015, at 1:52pm, David Barrett wrote: > Well yes, but I'd like to handle it at the application layer. Basically, > we operate a custom replication layer atop sqlite. It replicates > individual transactions great with 2-phase commit, but right now you need > to manually "bootstrap" a new server by copying the database from a > different server. I'd like to auto-bootstrap a new node by just starting > it, it'd connect to a peer, and then download the entire database. Copy an existing /closed/ database file to the new server using file commands, [other connections trying to make modifications will, I think, get error messages] Or copy an existing /open/ database file to the new server using the SQLite Backup API, [requires other connections to stop modifying the database for long enough for the copy to be made] Or copy an existing /open/ database file to the new server by creating the new database using sqlite_open(), then ATTACHing the source database and using INSERT (SELECT ...) [locks other connections trying to use the source database out while copying is done] Simon.
[sqlite] Ideas on how to use backup API for remote, incremental backup?
On 10/6/15, David Barrett wrote: > sqlite has a cool "online backup" API: https://www.sqlite.org/backup.html > However, it only backs up to a local disk. I'm wondering if anybody can > think on how to use this API to do an incremental backup over a network > connection to a remote host? The networking part is easy. But I can't > figure out how to programmatically get the binary of the database file into > memory such that I can send over the network (other than waiting for the > whole thing to backup to disk and then just doing a remote disk copy, > obviously, but my file is so large I don't want to wait that long nor do I > have enough disk to do that). I'm not really familiar with the VFS layer > -- would there be some way to do a custom VFS that the online backup API > writes to, and then I just copy the blocks over from that? Thanks for any > creative ideas! > Suppose you have a really massive database that is being continually updated, and you want to have a live backup to another host over a network. The SQLite backup API will work for this, we believe, as long as the database is only being written from a single database connection. The backup API transfers information from a "source" database connection over to a "destination" database connection. There are several considerations here: (1) For the destination database connection, use a custom VFS that writes content over the network rather than writing content to disk. This custom VFS does not need to be general-purpose. It can be highly specialized so that it only handles the specific VFS methods that the backup interface will generate. (2) For the source database connection of the backup, use the same database connection that is used for writing to the database. That means that when changes are made to the source database, the backup mechanism can go back and resend only those pages that actually changed. If the database is modified by any database connection other than the one used for backup, then the backup has no way of knowing which pages changed, and so it has to start over again at the beginning and rewrite every page. (3) The thread that is running the backup should monitor how fast new content is being shipped over the network and should throttle calls to sqlite3_backup_step() to prevent too much content from being queued up in cache. -- D. Richard Hipp drh at sqlite.org
[sqlite] Ideas on how to use backup API for remote, incremental backup?
David Barrett wrote: > sqlite has a cool "online backup" API: https://www.sqlite.org/backup.html > However, it only backs up to a local disk. It backs up to any disk that you can access. Do you have a network file system? > how to use this API to do an incremental backup This API is not incremental; it always copies the entire database. Regards, Clemens
[sqlite] Ideas on how to use backup API for remote, incremental backup?
You could mount a directory as a ramdisk. On Tue, Oct 6, 2015 at 7:52 AM, David Barrett wrote: > On Tue, Oct 6, 2015 at 2:57 PM, Clemens Ladisch > wrote: > > > It backs up to any disk that you can access. > > Do you have a network file system? > > > > Well yes, but I'd like to handle it at the application layer. Basically, > we operate a custom replication layer atop sqlite. It replicates > individual transactions great with 2-phase commit, but right now you need > to manually "bootstrap" a new server by copying the database from a > different server. I'd like to auto-bootstrap a new node by just starting > it, it'd connect to a peer, and then download the entire database. > > > > > how to use this API to do an incremental backup > > > > This API is not incremental; it always copies the entire database. > > > > Agreed it will copy the entire database, but incrementally -- one bit at a > time. Each call to sqlite3_backup_step() backs up a bit more to the target > file on disk. My goal is to instead have sqlite3_backup_step() copy some > pages to some block of RAM, so I can send that data over the wire and write > it to the remote disk. > > -david > ___ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users >
[sqlite] Ideas on how to use backup API for remote, incremental backup?
sqlite has a cool "online backup" API: https://www.sqlite.org/backup.html However, it only backs up to a local disk. I'm wondering if anybody can think on how to use this API to do an incremental backup over a network connection to a remote host? The networking part is easy. But I can't figure out how to programmatically get the binary of the database file into memory such that I can send over the network (other than waiting for the whole thing to backup to disk and then just doing a remote disk copy, obviously, but my file is so large I don't want to wait that long nor do I have enough disk to do that). I'm not really familiar with the VFS layer -- would there be some way to do a custom VFS that the online backup API writes to, and then I just copy the blocks over from that? Thanks for any creative ideas! -david