Hi Lots of buzzwords in the subject, but let me explain.
(Please read on after the first paragraph even if I'll mention PHP :) The whole discussion should be of interest for everyone, not just "PHP-people"). We (we == some people from the PHP/Symfony community) are currently in the process of developing a new "Content Management Framework" (see http://cmf.symfony-project.org/ for some more info), where we'd like to use JCR as the storage API and Jackrabbit as the middle-term easy backend solution to that. As the transport layer between PHP and Jackrabbit, we will use the not yet totally finished http://liip.to/jackalope, which uses the DavEx protocol. So much for the background, now I'm looking for the best way to setup the jackrabbit side to make it potentially scalable, failsafe and distributed. As far as I understand, the "traditional" clustered setup of a jackrabbit server looks like http://flic.kr/p/8TQL1N with one central database: * Failsafe: It is as much as the central database is fail safe. In the case of mysql (which is no given, btw), we'd have to setup a master-slave scenario, where the slave takes over the master role, should the master go down. No idea how feasible that is for Jackrabbit (I guess it's a lot of manual work and monitoring from outside) * Scalable: It's as scalable as that single master database is. If your app produces too many reads or writes for the db to handle, you're doomed. But as the db schema is veeery lightweight, maybe it can be assumed one never hits that ceiling for most websites. * Distributed: Not really (except the jackrabbit nodes). You have one fat database in one location. Is that the way most Jackrabbit setups are done? And the database is never the performance bottleneck? Now, coming from the LAMP world, I'm traditionally more used to a Master-Slave-Scenario (if not using one of those new fancy NoSQL approaches), where there's one master db server for writes, which replicates to many slave db servers for all the reads. Assuming your typical Websites has many many more reads than writes, this scales usually pretty well. In a Jackrabbit world, I imagine this could look something like http://flic.kr/p/8TMG1Z (we would do the read/write differentiation on the PHP/client level, but if Jackrabbit could do that by itself, that would ease some logic on the client-side) * Failsafe: As above, as long as a slave can be put in charge if the master goes down, this can be done failsafe * Scalable: It scales for reads very well, still the write issue (but I can live with that) * Distributed: You could move parts of the setup into other locations and the read performance wouldn't degrade. Usually write latency is not that critical as read latency, so I could live with the longer roundtrips for write. And if you don't need writes (in general or in a "fail" scenario), you can even serve your websites, when the two locations are disconnected. One of the technical problems with this approach could be: http://jackrabbit.510166.n4.nabble.com/Reading-repository-content-from-a-read-only-MySQL-td522668.html This approach still has the write-to-single-db problem, but I can live with that (as most approaches have this problem) and to avoid this, you'd maybe need some totally different approach than an RDBMS, like eg. CouchDB which has replication built in from the ground up. Anyone tried to use something like that as PM? So what do you think? Is my approach feasible? Am I overthinking it and the first approach is by far good enough? I don't say, that I need the full setup yet, I just don't want to get into trouble later, when we actually would need it and have to refactor a lot. Any input is very appreciated chregu -- Liip AG // Feldstrasse 133 // CH-8004 Zurich Tel +41 43 500 39 81 // Mobile +41 76 561 88 60 www.liip.ch // blog.liip.ch // GnuPG 0x0748D5FE
