Scalable, failsafe and distributed Jackrabbit setup?

Christian Stocker Mon, 15 Nov 2010 22:56:38 -0800

Hi

Lots of buzzwords in the subject, but let me explain.

(Please read on after the first paragraph even if I'll mention PHP :)
The whole discussion should be of interest for everyone, not just
"PHP-people").

We (we == some people from the PHP/Symfony community) are currently in
the process of developing a new "Content Management Framework" (see
http://cmf.symfony-project.org/ for some more info), where we'd like to
use JCR as the storage API and Jackrabbit as the middle-term easy
backend solution to that. As the transport layer between PHP and
Jackrabbit, we will use the not yet totally finished
http://liip.to/jackalope, which uses the DavEx protocol.

So much for the background, now I'm looking for the best way to setup
the jackrabbit side to make it potentially scalable, failsafe and
distributed. As far as I understand, the "traditional" clustered setup
of a jackrabbit server looks like http://flic.kr/p/8TQL1N with one
central database:

* Failsafe: It is as much as the central database is fail safe. In the
case of mysql (which is no given, btw), we'd have to setup a
master-slave scenario, where the slave takes over the master role,
should the master go down. No idea how feasible that is for Jackrabbit
(I guess it's a lot of manual work and monitoring from outside)

* Scalable: It's as scalable as that single master database is. If your
app produces too many reads or writes for the db to handle, you're
doomed. But as the db schema is veeery lightweight, maybe it can be
assumed one never hits that ceiling for most websites.

* Distributed: Not really (except the jackrabbit nodes). You have one
fat database in one location.

Is that the way most Jackrabbit setups are done? And the database is
never the performance bottleneck?

Now, coming from the LAMP world, I'm traditionally more used to a
Master-Slave-Scenario (if not using one of those new fancy NoSQL
approaches), where there's one master db server for writes, which
replicates to many slave db servers for all the reads. Assuming your
typical Websites has many many more reads than writes, this scales
usually pretty well.

In a Jackrabbit world, I imagine this could look something like
http://flic.kr/p/8TMG1Z

(we would do the read/write differentiation on the PHP/client level, but
if Jackrabbit could do that by itself, that would ease some logic on the
client-side)

* Failsafe: As above, as long as a slave can be put in charge if the
master goes down, this can be done failsafe

* Scalable: It scales for reads very well, still the write issue (but I
can live with that)

* Distributed: You could move parts of the setup into other locations
and the read performance wouldn't degrade. Usually write latency is not
that critical as read latency, so I could live with the longer
roundtrips for write. And if you don't need writes (in general or in a
"fail" scenario), you can even serve your websites, when the two
locations are disconnected.

One of the technical problems with this approach could be:
http://jackrabbit.510166.n4.nabble.com/Reading-repository-content-from-a-read-only-MySQL-td522668.html

This approach still has the write-to-single-db problem, but I can live
with that (as most approaches have this problem) and to avoid this,
you'd maybe need some totally different approach than an RDBMS, like eg.
CouchDB which has replication built in from the ground up. Anyone tried
to use something like that as PM?

So what do you think? Is my approach feasible? Am I overthinking it and
the first approach is by far good enough? I don't say, that I need the
full setup yet, I just don't want to get into trouble later, when we
actually would need it and have to refactor a lot.

Any input is very appreciated

chregu

--
Liip AG // Feldstrasse 133 // CH-8004 Zurich
Tel +41 43 500 39 81 // Mobile +41 76 561 88 60
www.liip.ch // blog.liip.ch // GnuPG 0x0748D5FE

Scalable, failsafe and distributed Jackrabbit setup?

Reply via email to