Re: Data backup and restore

Andy Seaborne Tue, 09 Jun 2020 09:24:26 -0700



On 09/06/2020 15:25, Tim Flicker wrote:

Hi Andy,

Thanks for the response.
The plan is to implement new endpoints server side to do backup andrestore. The backup process will run in the same JVM as the server anduses the same logic that is implemented inorg.apache.jena.tdb.TDBBackup.backup(...). My only concern is if thisoperation is safe for both storage types while the application isrunning and updates are potentially still occurring during the backupprocess.


Yes - backup runs inside a "read" transaction.


The backup process will also grab all connector files.

To be robust, consider locking so connector files can't created ordeleted while the back of the database runs.

For restore, the plan is to take the application offline to ensure dataintegrity.
Regards,
Tim

On 6/8/2020 5:50 PM, Andy Seaborne wrote:
Hi Tim,
Some context for our readers: gTDB and xTDB are different ways ofusing TDB. Last I heard, it was TDB1, not that makes very muchdifference here.
gTDB - one graph stored in the default graph of a TDB database.
   Many graphs, many databases.
xTDB - single, shared TDB database with graphs stored as named graphs.

On 08/06/2020 14:33, Tim Flicker wrote:
Hi Jena Community,
I'm working on an auto data backup and restore feature for ourplatform which uses Jena for data access (gTDB and xTDB). Therequirement is to have the application up during the backup operationalthough it can be taken down for restore. I've been looking into thetdbbackup and tdbloader scripts that come packaged with Jena.
Only one process can access a TDB database at a time so when theserver is running, only it can use the database.
Live backup of databases has to be done by the server - that's what isdone by the backup servlet [1].
The backup could be written to local disk or delivered over HTTP.
curl -v -XPOST 'http://localhost:8080/tbl/backup?storage=xdb' --outputtest.trig
writes the entire database as a single TriG file.
This backup is a single transactional snapshot of the database so thedata is consistent even if changes are also being made.
gTDB is harder because the graphs are in many databases. There isn'tan easy way to backup all the graph at once without additional code totake a lock or transaction on each database - that's something outsideof TDB.
Restore is either built separately, stop the server and install orwrite to the database from inside a running server.
That is for the TDB databases - in your situation, there are also theconfiguration in disk files (connector files) that go with the graphs- they aren't in TDB so these backup procedures aren't going to beenough on their own. It is a problem if graphs have been added ordeleted from the system between backup and restore.
Is it safe to run the tdbbackup script while graphs are being accesseither read or write?
No.
In fact, it should refuse to do it.
An alternative approach would be to programmatically place lock fileswhich seems like it would put the system in "read only" mode.
The TDB lock files control exclusive access, not a read/write mode.
Once the lock files are in place, I could do a backup at the filesystem level then remove the locks once complete.
Parallel operation with the server is possible with transactions,including having writers while a TDB backup is taken - the backup willnot see the changes, only the data in the database as it was when tehbackup started.
Any advice on how to proceed with this operation safely is greatlyappreciated.
Regards,
Tim
    Hope that helps,
    Andy
[1]https://doc.topquadrant.com/6.3/backup-and-restore/#Live_Data_Backup_of_a_Shared_Graph_TDB

Re: Data backup and restore

Reply via email to