Hey Jukka, Thanks for taking the time :)
> The servlet you're looking for is > org.apache.jackrabbit.j2ee.SimpleWebdavServlet. Great! >> Maybe even a webdav servlet that transparently versions changes? > > It doesn't do versioning transparently, but it does support the WebDAV > versioning features. Hm ...so how would that work if you use the standard OSX/Windows client and you just mount the repository. Would it version the files or not? I remember there is such autoversioning option to mod_dav (SVNAutoversioning) >> * Repository Browser >> >> While for webdav it would be nice to show less, it would be nice to >> show more on the 'browse' of the standalone jar. In fact switching the >> amount of information of both (webdav/browse) would be great. I know >> other 3rd parties have sophisticated browsers for JCR. But is there >> one that comes with jackrabbit that I've missed? What do people use? > > No. We planned to have a content browser included already in 1.5.0 > (see JCR-1455), but in the end that unfortunately didn't happen. At > Day we have the commercial CRX Content Explorer that we're planning to > contribute to Jackrabbit, but that effort is a bit stalled due to > technical and legal issues. There are also a few good open source > browsers around, I've personally used and liked the JCR Explorer > available at http://www.jcr-explorer.org/. That one looks indeed quite good. It's ASL 2.0 - why not include that if there are problems with the CRX one? IMO it would be a big step forward to have something like that out of the box. (Still congrats on the standalone jar ... that is pretty sweet!) >> * Scaling Out and SOA >> >> I am wondering what the suggested architecture would look like for >> jackrabbit in a bigger installation. The classic setup would be a >> couple of front end machines rendering the content that comes out of a >> bigger database or a database cluster. Question is how to translate >> this into a jackrabbit setup. > > As you noticed, the recommended approach for now would be to use a > Jackrabbit cluster with each cluster node running locally on each > front end server (and in the same JVM process as your application). OK ... what about the persistence part? I know CRX has the mighty Tar PM :) ...but what about scaling at this end? Has this ever been a problem? If you have a cluster of 5-10 machines and just a single database for persistence I would imagine this could potentially become a bottleneck. Anyone ever used a whole database cluster for persistence? Any suggestions there? I might "have to" use an Oracle. > This is mostly due to current performance limitations of the JCR-RMI > layer. There are no architectural reasons why the performance of > remote JCR access couldn't be similar (or even notably better due to > the cache-friendly design of JCR) to that of many relational > databases, but so far not much work has been done to optimize remote > access performance as the common deployment model has been to have the > repository running locally within the application or the application > server. Remote API access has mostly been used for administrative > purposes where performance is not that critical. My first though was: shouldn't the JCR server just have a REST API? ...and then thought of Sling. And CouchDB. Or probably much more FeatherDB (http://fourspaces.com/blog/2008/4/11/FeatherDB_Java_JSON_Document_database) How this fits the picture might probably more something for the dev list. > In fact one of our reasons for introducing the new standalone server > jar is to raise the awareness about this performance issue and to > perhaps get some contributions to improve it. :-) hint, hint ...I noticed ;-) >> Especially as RMI is hinted to be slow and also syncing the replay >> logs across the cluster is a bit of an overhead I would grateful for >> some more details and advise here. > > See above. The main reason for the current slow performance is that > the JCR-RMI layer was originally designed to map most JCR API calls > one-to-one to equivalent remote method calls with no caching or > batching features. This approach worked great in that we were able to > support almost the entire range of JCR functionality quite easily, but > it does come with quite severe performance limitations as for example > each individual Node.getProperty() call causes a network roundtrip > instead of being executed against a locally cached copy of the node. I see that is indeed very expensive then. I was actually surprised about the choice of RMI anyway. (Forgive my words - but it's a bitch of a protocol) >> * Searching in a Cluster >> >> Assuming I have a jackrabbit cluster - how is the index generation >> handled? Will every jackrabbit instance have it's own index and also >> be the one that keeps the local index up-to-date? > > Yes, each node keeps their own indexes. OK. That can be good. >> Does the index get synchronized through the jackrabbit cluster >> mechanism? > > Yes. The cluster nodes listen for changes recorded in the cluster > journal, and update the indexes based on the observed updates. Incrementally? Are there any guarantees for the observation? I just imagine a node to go down, miss an update and be out of sync when it comes back up. Something you really don't want to have in a cluster. Because then the individual indexes become a problem :) >> * Searching and Versioning >> >> When I search and I have versioned resources. Will it search all >> versions? ...or only the latest one? How is this handled? > > The version histories of all versionable nodes are available in the > /jcr:system/jcr:versionStorage subtree. You can search for all past > versions in that subtree, or for the checked out versions in normal > workspace storage outside /jcr:system. So the index includes and references all versions? >> I heard about InfoQ using jackrabbit. Could not find exact details >> about their infrastructure though. > > Have you seen > http://www.infoq.com/presentations/design-and-architecture-of-infoq > ? I guess that's the best introduction there is to how they're set up. Thanks for the pointer! >> Someone else using it in a bigger installations? > > At Day we use Jackrabbit as the core of all our current products. We > do have some performance and scalability features that go beyond > what's there in Jackrabbit, but most of the customer cases you can > find on our web site are based on the clustering and bundle > persistence features that have been also in Jackrabbit already for > some while. "bundle persistence features"? WDYM? cheers -- Torsten
