On Mon, Mar 26, 2012 at 12:56 PM, Jukka Zitting <jukka.zitt...@gmail.com> wrote: > Hi, > > There's a number of points in this thread that I wanted to address, so > instead of replying to them individually, let me try to summarize my > thinking. > > One of the bigger pain points in the Jackrabbit 2.x architecture has > been the query engine and the workspace-global query index that has > been pretty difficult to customize for special needs and to handle in > terms of backup/recovery and scaling to multiple cluster nodes. My > wish for Oak is that we come up with a much more flexible search and > indexing architecture that solves these issues and is easy to extend > for any future use cases we may encounter. > > I think the biggest issue, as brought up by Alex and then elaborated > by Ard, is the way we handle indexing. Instead of having a single, > more or less fixed index for a repository like in Jackrabbit 2.x, Oak > should provide generic extension points that various different kinds > of indexing components could hook into. We should have at least three > such extension points: pre- and post-commit hooks, and observation > based on the commit journal. > > For example a low-level UUID-to-path index should preferably use the > pre-commit hook for atomic index updates as a part of each commit. A > post-commit hook could be used to trigger full-text extraction of > nt:file binaries, a bit like we currently do in Jackrabbit 2.x. And an > observation client could use the commit journal to feed an external > Solr index for application-level index features. A given deployment > can choose which ones of these and any other indexing components are > needed based on relevant application needs and related > performance/scalability overhead. A single solution does not fit all > needs, so we need to make such customization as easy as possible. > > On the other hand there's a lot of value in having a single, unified > query abstraction instead of having client applications reach out > directly to Solr, Lucene, or custom indexes. Thus, in addition to the > extensions points for indexing, we need a way for the indexing > components to extend the Oak query engine with ways to evaluate given > queries against the various configured indexes. This way all > applications can use the same generic Oak query API (exposed through > QueryManager in JCR, DASL in WebDAV, and/or something else in JSOP) > while leveraging the custom indexes available in each deployment.
Thanks for this summary. I now really understand what the goals are and how to achieve it. Especially the unified generic Oak query API is something I really like. Currently, for Hippo, I am doing something similar for the query api, that can seamlessly delegate to Solr or jackrabbit, both returning a jcr node iterator (although the solr index through solrj can also return plain pojo's). I really like the first option (pre-commit example) and third (observation based), and still see many bears on the road for the second (full-text on post-commit) I've one more question regarding the oak search/indexes : Will we be able to query that returns something else than jcr nodes/rows? I frequently want to be able to get a query result from the repository that cannot be returned as node iterators. For example query on stats, or a query for 'auto-completion' on some property (thus return some part of the TermEnum for example) Regards Ard > > BR, > > Jukka Zitting -- Amsterdam - Oosteinde 11, 1017 WT Amsterdam Boston - 1 Broadway, Cambridge, MA 02142 US +1 877 414 4776 (toll free) Europe +31(0)20 522 4466 www.onehippo.com