Hi, There's a number of points in this thread that I wanted to address, so instead of replying to them individually, let me try to summarize my thinking.
One of the bigger pain points in the Jackrabbit 2.x architecture has been the query engine and the workspace-global query index that has been pretty difficult to customize for special needs and to handle in terms of backup/recovery and scaling to multiple cluster nodes. My wish for Oak is that we come up with a much more flexible search and indexing architecture that solves these issues and is easy to extend for any future use cases we may encounter. I think the biggest issue, as brought up by Alex and then elaborated by Ard, is the way we handle indexing. Instead of having a single, more or less fixed index for a repository like in Jackrabbit 2.x, Oak should provide generic extension points that various different kinds of indexing components could hook into. We should have at least three such extension points: pre- and post-commit hooks, and observation based on the commit journal. For example a low-level UUID-to-path index should preferably use the pre-commit hook for atomic index updates as a part of each commit. A post-commit hook could be used to trigger full-text extraction of nt:file binaries, a bit like we currently do in Jackrabbit 2.x. And an observation client could use the commit journal to feed an external Solr index for application-level index features. A given deployment can choose which ones of these and any other indexing components are needed based on relevant application needs and related performance/scalability overhead. A single solution does not fit all needs, so we need to make such customization as easy as possible. On the other hand there's a lot of value in having a single, unified query abstraction instead of having client applications reach out directly to Solr, Lucene, or custom indexes. Thus, in addition to the extensions points for indexing, we need a way for the indexing components to extend the Oak query engine with ways to evaluate given queries against the various configured indexes. This way all applications can use the same generic Oak query API (exposed through QueryManager in JCR, DASL in WebDAV, and/or something else in JSOP) while leveraging the custom indexes available in each deployment. BR, Jukka Zitting
