Hi Andy,

Thanks for the response.

The plan is to implement new endpoints server side to do backup and restore. The backup process will run in the same JVM as the server and uses the same logic that is implemented in org.apache.jena.tdb.TDBBackup.backup(...). My only concern is if this operation is safe for both storage types while the application is running and updates are potentially still occurring during the backup process.

The backup process will also grab all connector files.

For restore, the plan is to take the application offline to ensure data integrity.

Regards,
Tim

On 6/8/2020 5:50 PM, Andy Seaborne wrote:
Hi Tim,

Some context for our readers: gTDB and xTDB are different ways of using TDB. Last I heard, it was TDB1, not that makes very much difference here.

gTDB - one graph stored in the default graph of a TDB database.
   Many graphs, many databases.
xTDB - single, shared TDB database with graphs stored as named graphs.

On 08/06/2020 14:33, Tim Flicker wrote:
Hi Jena Community,

I'm working on an auto data backup and restore feature for our platform which uses Jena for data access (gTDB and xTDB). The requirement is to have the application up during the backup operation although it can be taken down for restore. I've been looking into the tdbbackup and tdbloader scripts that come packaged with Jena.

Only one process can access a TDB database at a time so when the server is running, only it can use the database.

Live backup of databases has to be done by the server - that's what is done by the backup servlet [1].

The backup could be written to local disk or delivered over HTTP.

curl -v -XPOST 'http://localhost:8080/tbl/backup?storage=xdb' --output test.trig

writes the entire database as a single TriG file.

This backup is a single transactional snapshot of the database so the data is consistent even if changes are also being made.

gTDB is harder because the graphs are in many databases. There isn't an easy way to backup all the graph at once without additional code to take a lock or transaction on each database - that's something outside of TDB.


Restore is either built separately, stop the server and install or write to the database from inside a running server.

That is for the TDB databases - in your situation, there are also the configuration in disk files (connector files) that go with the graphs - they aren't in TDB so these backup procedures aren't going to be enough on their own. It is a problem if graphs have been added or deleted from the system between backup and restore.

Is it safe to run the tdbbackup script while graphs are being access either read or write?

No.
In fact, it should refuse to do it.

An alternative approach would be to programmatically place lock files which seems like it would put the system in "read only" mode.

The TDB lock files control exclusive access, not a read/write mode.

Once the lock files are in place, I could do a backup at the file system level then remove the locks once complete.

Parallel operation with the server is possible with transactions, including having writers while a TDB backup is taken - the backup will not see the changes, only the data in the database as it was when teh backup started.

Any advice on how to proceed with this operation safely is greatly appreciated.

Regards,
Tim

    Hope that helps,
    Andy

[1] https://doc.topquadrant.com/6.3/backup-and-restore/#Live_Data_Backup_of_a_Shared_Graph_TDB


Reply via email to