I am working towards the point of deploying Fuseki in production, and my plan is to try to use DRBD [1] to mirror TDB's data and journal files to the failover machine. The failover machine will be "warm", meaning it won't be running Fuseki during normal operation, but upon failure of the master, a Fuseki instance will be started up using the data/journal files managed by the block-level replication provided by DRBD. As our data is fairly critical, we plan to use fully synchronous block mirroring. Given this set-up (combined with daily nquad dumps for backup), we hope to have a pretty highly available infrastructure.
This is still in the planning stages, and I don't really have any details on the supporting software yet, but right now I'm thinking of using Pacemaker (and possibly Corosync) in a set-up very similar to the one that can be used for MySQL [2] or PostgreSQL [3]. -Stephen [1] http://www.drbd.org/ [2] https://dev.mysql.com/doc/refman/5.0/en/ha-drbd.html [3] http://wiki.postgresql.org/images/0/07/Ha_postgres.pdf On Sun, May 12, 2013 at 11:03 AM, Andy Seaborne <a...@apache.org> wrote: > We have systems which exploit the "publishing" nature of the usage. The > public interface is read-only, the updates happen via a different workflow > and go to a single master copy. This master copy is not public facing and > is exploiting the fact that users read data whereas publishers update it and > the two are different groups. > > It has two useful features: > 1/ The master copy also server as "staging" - the publisher can verify the > data is correct before publication; in one case, it also means they update > > 2/ The replication is to play the original updates to the publication > machines. > > Playing the updates could happen in parallel but partially to increase > resilience to system-messiness and partially to spread the performance > impact out, the updates are played sequentially (they don't take very long > anyway). > > There is a notionally period of inconsistency - in fact, the load balancer > is sticky so one users requests have an affinity to go to the same server > each time, making inconsistency from a single user/application perspective > hard to happen. Not impossible, but then the web is always trading perfect > off against practical. > > What I think Jena can do is provide the common building blocks - there are > different approaches for different use cases. The log/diff of dataset > changes is one example - what other building blocks are there? > > Andy > > > On 03/05/13 09:17, Bill Roberts wrote: >> >> >> Rob, Andy and Dick >> >> Thanks very much for your responses. Andy: yes I'm interested in >> master-slave replication, but also in bringing a backup up to date. >> Your RDF diff log approach sounds very interesting - keen to hear >> more about that in due course. Thanks for explaining why the TDB >> transaction journals are not a good starting point for replication or >> backup. >> >> We've been using a kind of request replication approach in a single >> app server, multiple (currently two) database situation, which is >> currently working fine for us. We have the simplification of only >> using Fuseki via HTTP, no Java access to TDB, and currently doing all >> updates with the graph protocol as no requirement for SPARQL Update >> in this situation. >> >> However, I'm interested in other options, and a triple (quad) diff >> log is definitely an approach that appeals. >> >> >> Cheers >> >> Bill >> >> >