Re: my first hops

Torsten Curdt Mon, 08 Dec 2008 14:25:01 -0800

Hey Jukka,

Thanks for taking the time :)


> The servlet you're looking for is
> org.apache.jackrabbit.j2ee.SimpleWebdavServlet.

Great!

>> Maybe even a webdav servlet that transparently versions changes?
>
> It doesn't do versioning transparently, but it does support the WebDAV
> versioning features.

Hm ...so how would that work if you use the standard OSX/Windows
client and you just mount the repository.
Would it version the files or not? I remember there is such
autoversioning option to mod_dav (SVNAutoversioning)

>> * Repository Browser
>>
>> While for webdav it would be nice to show less, it would be nice to
>> show more on the 'browse' of the standalone jar. In fact switching the
>> amount of information of both (webdav/browse) would be great. I know
>> other 3rd parties have sophisticated browsers for JCR. But is there
>> one that comes with jackrabbit that I've missed? What do people use?
>
> No. We planned to have a content browser included already in 1.5.0
> (see JCR-1455), but in the end that unfortunately didn't happen. At
> Day we have the commercial CRX Content Explorer that we're planning to
> contribute to Jackrabbit, but that effort is a bit stalled due to
> technical and legal issues. There are also a few good open source
> browsers around, I've personally used and liked the JCR Explorer
> available at http://www.jcr-explorer.org/.

That one looks indeed quite good. It's ASL 2.0 - why not include that
if there are problems with the CRX one?
IMO it would be a big step forward to have something like that out of the box.

(Still congrats on the standalone jar ... that is pretty sweet!)

>> * Scaling Out and SOA
>>
>> I am wondering what the suggested architecture would look like for
>> jackrabbit in a bigger installation. The classic setup would be a
>> couple of front end machines rendering the content that comes out of a
>> bigger database or a database cluster. Question is how to translate
>> this into a jackrabbit setup.
>
> As you noticed, the recommended approach for now would be to use a
> Jackrabbit cluster with each cluster node running locally on each
> front end server (and in the same JVM process as your application).

OK ... what about the persistence part? I know CRX has the mighty Tar
PM :) ...but what about scaling at this end? Has this ever been a
problem? If you have a cluster of 5-10 machines and just a single
database for persistence I would imagine this could potentially become
a bottleneck. Anyone ever used a whole database cluster for
persistence?

Any suggestions there? I might "have to" use an Oracle.

> This is mostly due to current performance limitations of the JCR-RMI
> layer. There are no architectural reasons why the performance of
> remote JCR access couldn't be similar (or even notably better due to
> the cache-friendly design of JCR) to that of many relational
> databases, but so far not much work has been done to optimize remote
> access performance as the common deployment model has been to have the
> repository running locally within the application or the application
> server. Remote API access has mostly been used for administrative
> purposes where performance is not that critical.

My first though was: shouldn't the JCR server just have a REST API?
...and then thought of Sling. And CouchDB. Or probably much more
FeatherDB 
(http://fourspaces.com/blog/2008/4/11/FeatherDB_Java_JSON_Document_database)

How this fits the picture might probably more something for the dev list.

> In fact one of our reasons for introducing the new standalone server
> jar is to raise the awareness about this performance issue and to
> perhaps get some contributions to improve it. :-)

hint, hint ...I noticed ;-)

>> Especially as RMI is hinted to be slow and also syncing the replay
>> logs across the cluster is a bit of an overhead I would grateful for
>> some more details and advise here.
>
> See above. The main reason for the current slow performance is that
> the JCR-RMI layer was originally designed to map most JCR API calls
> one-to-one to equivalent remote method calls with no caching or
> batching features. This approach worked great in that we were able to
> support almost the entire range of JCR functionality quite easily, but
> it does come with quite severe performance limitations as for example
> each individual Node.getProperty() call causes a network roundtrip
> instead of being executed against a locally cached copy of the node.

I see that is indeed very expensive then. I was actually surprised
about the choice of RMI anyway. (Forgive my words - but it's a bitch
of a protocol)

>> * Searching in a Cluster
>>
>> Assuming I have a jackrabbit cluster - how is the index generation
>> handled? Will every jackrabbit instance have it's own index and also
>> be the one that keeps the local index up-to-date?
>
> Yes, each node keeps their own indexes.

OK. That can be good.

>> Does the index get synchronized through the jackrabbit cluster
>> mechanism?
>
> Yes. The cluster nodes listen for changes recorded in the cluster
> journal, and update the indexes based on the observed updates.

Incrementally? Are there any guarantees for the observation? I just
imagine a node to go down, miss an update and be out of sync when it
comes back up. Something you really don't want to have in a cluster.
Because then the individual indexes become a problem :)

>> * Searching and Versioning
>>
>> When I search and I have versioned resources. Will it search all
>> versions? ...or only the latest one? How is this handled?
>
> The version histories of all versionable nodes are available in the
> /jcr:system/jcr:versionStorage subtree. You can search for all past
> versions in that subtree, or for the checked out versions in normal
> workspace storage outside /jcr:system.

So the index includes and references all versions?

>> I heard about InfoQ using jackrabbit. Could not find exact details
>> about their infrastructure though.
>
> Have you seen 
> http://www.infoq.com/presentations/design-and-architecture-of-infoq
> ? I guess that's the best introduction there is to how they're set up.

Thanks for the pointer!

>> Someone else using it in a bigger installations?
>
> At Day we use Jackrabbit as the core of all our current products. We
> do have some performance and scalability features that go beyond
> what's there in Jackrabbit, but most of the customer cases you can
> find on our web site are based on the clustering and bundle
> persistence features that have been also in Jackrabbit already for
> some while.

"bundle persistence features"? WDYM?

cheers
--
Torsten

Re: my first hops

Reply via email to