On 09/06/2020 15:25, Tim Flicker wrote:
Hi Andy,
Thanks for the response.
The plan is to implement new endpoints server side to do backup and
restore. The backup process will run in the same JVM as the server and
uses the same logic that is implemented in
org.apache.jena.tdb.TDBBackup.backup(...). My only concern is if this
operation is safe for both storage types while the application is
running and updates are potentially still occurring during the backup
process.
Yes - backup runs inside a "read" transaction.
The backup process will also grab all connector files.
To be robust, consider locking so connector files can't created or
deleted while the back of the database runs.
For restore, the plan is to take the application offline to ensure data
integrity.
Regards,
Tim
On 6/8/2020 5:50 PM, Andy Seaborne wrote:
Hi Tim,
Some context for our readers: gTDB and xTDB are different ways of
using TDB. Last I heard, it was TDB1, not that makes very much
difference here.
gTDB - one graph stored in the default graph of a TDB database.
Many graphs, many databases.
xTDB - single, shared TDB database with graphs stored as named graphs.
On 08/06/2020 14:33, Tim Flicker wrote:
Hi Jena Community,
I'm working on an auto data backup and restore feature for our
platform which uses Jena for data access (gTDB and xTDB). The
requirement is to have the application up during the backup operation
although it can be taken down for restore. I've been looking into the
tdbbackup and tdbloader scripts that come packaged with Jena.
Only one process can access a TDB database at a time so when the
server is running, only it can use the database.
Live backup of databases has to be done by the server - that's what is
done by the backup servlet [1].
The backup could be written to local disk or delivered over HTTP.
curl -v -XPOST 'http://localhost:8080/tbl/backup?storage=xdb' --output
test.trig
writes the entire database as a single TriG file.
This backup is a single transactional snapshot of the database so the
data is consistent even if changes are also being made.
gTDB is harder because the graphs are in many databases. There isn't
an easy way to backup all the graph at once without additional code to
take a lock or transaction on each database - that's something outside
of TDB.
Restore is either built separately, stop the server and install or
write to the database from inside a running server.
That is for the TDB databases - in your situation, there are also the
configuration in disk files (connector files) that go with the graphs
- they aren't in TDB so these backup procedures aren't going to be
enough on their own. It is a problem if graphs have been added or
deleted from the system between backup and restore.
Is it safe to run the tdbbackup script while graphs are being access
either read or write?
No.
In fact, it should refuse to do it.
An alternative approach would be to programmatically place lock files
which seems like it would put the system in "read only" mode.
The TDB lock files control exclusive access, not a read/write mode.
Once the lock files are in place, I could do a backup at the file
system level then remove the locks once complete.
Parallel operation with the server is possible with transactions,
including having writers while a TDB backup is taken - the backup will
not see the changes, only the data in the database as it was when teh
backup started.
Any advice on how to proceed with this operation safely is greatly
appreciated.
Regards,
Tim
Hope that helps,
Andy
[1]
https://doc.topquadrant.com/6.3/backup-and-restore/#Live_Data_Backup_of_a_Shared_Graph_TDB