Hi, I know this isn't related directly to "in-memory microkernel", but it seems to me the reason to propose an "in-memory microkernel" is to improve performance. Unless I misunderstood the mail?
As for Jackrabbit 2.x read and write performance, I found that JCR-2857 helps, specially for larger repositories - depending on the use case more than an order of magnitude. This is without any additional heap memory. Also, I found that the more sessions are open, the slower the writes, due to internal event processing. So opening fewer sessions may help to improve write performance. Regards, Thomas On 8/9/12 10:29 AM, "Felix Meschberger" <[email protected]> wrote: >Hi, > >Interesting thoughts. > >To double on the in-memory assumption of the complete tree, I'd like to >add that the most dramatic improvement in overall Jackrabbit performance >on a configuration can probably be reached by increasing the bundle cache >size which eventually is more or less what you are proposing. > >Regards >Felix > >Am 07.08.2012 um 16:07 schrieb Jukka Zitting: > >> Hi, >> >> [Just throwing an idea around, no active plans for further work on >>this.] >> >> One of the biggest performance bottlenecks with current repository >> implementations is disk speed, especially seek times but also raw data >> transfer rate in many cases. To work around those limitations we've in >> Jackrabbit used various caching strategies that considerably >> complicate the codebase and still have trouble with cache misses and >> write-through performance. >> >> As an alternative to such designs, I was thinking of a microkernel >> implementation that would keep the *entire* tree structure in memory, >> i.e. only use the disk or another backend for binaries and possibly >> for periodic backup dumps. Fault tolerance against hardware failures >> or other restarts would be achieved by requiring a clustered >> deployment where all content is kept as copies on at least three >> separate physical servers. Redis (http://redis.io/) is a good example >> of the potential performance gains of such a design. >> >> To estimate how much memory such a model would need, I looked at the >> average bundle size of a vanilla CQ5 installation. There the average >> bundle (i.e. a node with all its properties and child node references) >> size is just 251 bytes. Even assuming larger bundles and some level of >> storage and index overhead it seems safe to assume up to about 1kB of >> memory per node on average. That would allow one to store some 1M >> nodes in each 1GB of memory. >> >> Assuming that all content is evenly spread across the cluster in a way >> that puts copies of each individual bundle on at least three different >> cluster nodes and that each cluster node additionally keeps a large >> cache of most frequently accessed content, a large repository with >> 100+M content nodes could easily run on a twelve-node cluster where >> each cluster node has 32GB RAM, a reasonable size for a modern server >> (also available from EC2 as m2.2xlarge). A mid-size repository with >> 10+M content nodes could run on a three- or four-node cluster with >> just 16GB RAM per cluster node (or m2.xlarge in EC2). >> >> I believe such a microkernel could set a pretty high bar on >> performance! The only major performance limit I foresee is the network >> overhead when writing (need to send updates to other cluster nodes) >> and during cache misses (need to retrieve data from other nodes), but >> the cache misses would only start affecting repositories that go >> beyond what fits in memory on a single server (i.e. the mid-size >> repository described above wouldn't yet be hit by that limit) and the >> write overhead could be amortized by allowing the nodes to temporarily >> diverge until they have a chance to sync up again in the background >> (as allowed by the MK contract). >> >> BR, >> >> Jukka Zitting >
