Re: Jena av

Andy Seaborne Fri, 22 Sep 2017 01:40:34 -0700

On 22/09/17 08:39, Dave Reynolds wrote:

On 22/09/17 00:58, Dimov, Stefan wrote:
I’ve got two questions:
 1. The documentation of Jena/TDB states that the replication of the DB
    is possible, just by copying the TDB files. (I tried that, it’s
    working.) It also states that the copying should be done ONLY while
    the Fuseki server is stopped, even if there’s no writing in TDB
    (just reading). Is that true? Is it not possible to replicate TDB
    without stopping the nodes?
In practice, if there's no writing going on, then I've not had it failand seems like it should work, but not something we've used inproduction. It would need someone more expert to say why it could be anissue for a read-only workload.

Hmm. True but ... in TDB1 readers can cause the journal to flush. Ifprevious writers could not cause the write-ahead-log to be written backto the main database, then it's delayed and happens when the readershave finished.


So the rule is when the database is quiet - no activity - it is safe.

You can tell by looking whether the journal becomes empty, and staysempty, and you are blocking all writers so it is pretty risky.Otherwise the copy is irreparably broken.


Doing this is not supported.

Better to cause a backup to happen and restore the backup on the othermachine.

One thing, make sure you delete the lock file (used to prevent multiplejvm's reading the same file set).
 2. We’re considering this architecture: Multiple virtual nodes running
    separate instances of Fuseki and all of them using TDB, which is
    residing on a shared file system
Note there's no attachments on this list so no embedded images.
Would it be possible? Are there going to be any concurrency issues?What if there’s read operations only?
You can't have multiple TDB instances in different JVMs using the samefile set, TDB creates a lock file with the PID in, to prevent this.

And also the case of multiple machines, accessing a shared file systemreally, really does not work. The lock file,which is OS-scoped, willnot prevent this so you will corrupt the database. The damage is silentat the time of writing, and the database is not recoverable afterwards.Data has been lost.

In any case performance depends largely on disk speed so replicating thedata to each instance and using fast local storage is generally a betterplan than running off slower, shared network storage.
Dave

What you need is replicate the TDB databases and keep them in-step aschanges happen. Sometimes, that as easy as taking a backup and buildingread-only slaves to a master database.

If you want semi-live up-to-date (sync'ed within a few seconds) copies,the good news is that there is a system to do that!


https://afs.github.io/rdf-delta/

which will keep two Fuseki installations in-step.

It's early days but it works. We are using it at my employer to keep TDBcopies in-step across a cluster of Tomcat servers (not Fuseki - butreplicated Fuseki is in the test suite for RDF Delta).


    Andy

Re: Jena av

Reply via email to