[RESULT] [VOTE] Release Apache Jackrabbit Oak 1.0.5
Hi, On 26/08/14 08:42, Thomas Mueller wrote: Please vote on releasing this package as Apache Jackrabbit Oak 1.0.5. The vote is open for the next 72 hours and passes if a majority of at least three +1 Jackrabbit PMC votes are cast. The vote passes as follows: +1 Thomas Mueller +1 Michael Dürig +1 Alex Parvulescu +1 Tommaso Teofili +1 Davide Giannella +1 Julian Reschke Thanks for voting! I'll push the release out. Regards, Thomas
Re: [DISCUSS] supporting faceting in Oak query engine
Hello, On Mon, Aug 25, 2014 at 7:02 PM, Lukas Smith sm...@pooteeweet.org wrote: Aloha, you should definitely talk to the HippoCMS developers. They forked Jackrabbit 2.x to add facetting as virtual nodes. They ran into some performance issues but I am sure they still have value-able feedback on this. Well, performance actually wasn't the biggest hurdle : Exposing and integrating virtual nodes was quite a bit tougher. Indeed I think I might have quite some feedback, but honestly, I am also these days full of doubts what the best approach will be. I'll try to keep it short: 1) When exposing faceting from Jackrabbit, we wouldn't use virtual layers any more to expose them over pure JCR spec API's. Instead, we would extend the jcr QueryResult to have next to getRows/getNodes/etc also expose for example methods on the QueryResult like public MapString, Integer getFacetValues(final String facet) { return result.getFacetValues(facet); } public QueryResult drilldown(final FacetValue facetValue) { // return current query result drilled down for facet value return ... } 2) Authorized countsfor faceting, it doesn't make sense to expose there are 314 results if you can only read 54 of them. Accounting for authorization through access manager can be way too slow. The alternatives are to not show authorized counts, or try to translate the authorization model to a lucene query which is in general not possible unless you restrict your authorization model severely (which results in a domain specific solution unusable for JR) 3) If you support faceting through Oak, will that be competitive enough to what Solr and Elasticsearch offer? Customers these days have some expectations on search result quality and faceting capabilities, performance included. Oak's faceting support will be compared to dedicated search servers and is quite unlikely to be nearly as good and to keep up with what is being build: Aggregations is the new buzz which is very cool super set of faceting. You really don't wanna have to leverage that next from Oak. So, my take would be to invest time in easy integration with solr/elasticsearch and focus in Oak on the parts (hierarchy, authorization, merging, versioning) that aren't covered by already existing frameworks. Perhaps provide an extended JCR API as described in (1) which under the hood can delegate to a solr or es java client. In the end, you'll still end up having the authorized counts issue, but if you make the integration pluggable enough, it might be possible to leverage domain specific solutions to this (solr/es doesn't do anything with authorization either, it is a tough nut to crack) Regards Ard regards, Lukas Kahwe Smith On 25 Aug 2014, at 18:43, Laurie Byrum lby...@adobe.com wrote: Hi Tommaso, I am happy to see this thread! Questions: Do you expect to want to support hierarchical or pivoted facets soonish? If so, does that influence this decision? Do you know how ACLs will come into play with your facet implementation? If so, does that influence this decision? :-) Thanks! Laurie On 8/25/14 7:08 AM, Tommaso Teofili tommaso.teof...@gmail.com wrote: Hi all, since this has been asked every now and then [1] and since I think it's a pretty useful and common feature for search engine nowadays I'd like to discuss introduction of facets [2] for the Oak query engine. Pros: having facets in search results usually helps filtering (drill down) the results before browsing all of them, so the main usage would be for client code. Impact: probably change / addition in both the JCR and Oak APIs to support returning other than just nodes (a NodeIterator and a Cursor respectively). Right now a couple of ideas on how we could do that come to my mind, both based on the approach of having an Oak index for them: 1. a (multivalued) property index for facets, meaning we would store the facets in the repository, so that we would run a query against it to have the facets of an originating query. 2. a dedicated QueryIndex implementation, eventually leveraging Lucene faceting capabilities, which could use the Lucene index we already have, together with a sidecar index [3]. What do you think? Regards, Tommaso [1] : http://markmail.org/search/?q=oak%20faceting#query:oak%20faceting%20list%3 Aorg.apache.jackrabbit.oak-dev+page:1+state:facets [2] : http://en.wikipedia.org/wiki/Faceted_search [3] : http://lucene.apache.org/core/4_0_0/facet/org/apache/lucene/facet/doc-file s/userguide.html -- Amsterdam - Oosteinde 11, 1017 WT Amsterdam Boston - 1 Broadway, Cambridge, MA 02142 US +1 877 414 4776 (toll free) Europe +31(0)20 522 4466 www.onehippo.com
Re: oak-run public distribution
Hi, On Thu, Aug 28, 2014 at 8:42 PM, Michael Dürig mdue...@apache.org wrote: So far we didn't deploy oak-run. I'm not sure why but I think there where concerns regarding making developer tooling available to end users. During 0.x we figured that anyone working with Oak should be able to build the jar directly from sources, so making the pre-built binary available as a download wasn't too important. Now with 1.x I think it would make sense to include the oak-run jar as a download just like we do with jackrabbit-standalone. 2014-08-28 11:53 GMT-04:00 Chetan Mehrotra chetan.mehro...@gmail.com: This was discussed earlier [1] and Jukka mentioned that there were some restriction of deployment size. I tried pushing a snapshot version sometime back and that got deploy fine. So I think we should try to deploy artifacts again That's a bit orthogonal, as the deployment question is about making oak-run available on the central Maven repository, which we can do regardless of whether we also post the jar on the Jackrabbit downloads page. BR, Jukka Zitting
Re: [DISCUSS] supporting faceting in Oak query engine
On 29.08.2014, at 03:10, Ard Schrijvers a.schrijv...@onehippo.com wrote: 1) When exposing faceting from Jackrabbit, we wouldn't use virtual layers any more to expose them over pure JCR spec API's. Instead, we would extend the jcr QueryResult to have next to getRows/getNodes/etc also expose for example methods on the QueryResult like public MapString, Integer getFacetValues(final String facet) { return result.getFacetValues(facet); } public QueryResult drilldown(final FacetValue facetValue) { // return current query result drilled down for facet value return ... } We actually have a similar API in our CQ/AEM product: Query = represents a query [1] SearchResult result = query.getResult(); MapString, Facet facets = result.getFacets(); A facet is a list of Buckets [2] - same as FacetValue above, I assume - an abstraction over different values. You could have distinctive values (e.g. red, green, blue), but also ranges (last year, last month etc.). Each bucket has a count, i.e. the number of times it occurs in the current result. Then on Query you have a method Query refine(Bucket bucket) which is the same as the drilldown above. So in the end it looks pretty much the same, and seems to be a good way to represent this as API. Doesn't say much about the implementation yet, though :) 2) Authorized countsfor faceting, it doesn't make sense to expose there are 314 results if you can only read 54 of them. Accounting for authorization through access manager can be way too slow. ... 3) If you support faceting through Oak, will that be competitive enough to what Solr and Elasticsearch offer? Customers these days have some expectations on search result quality and faceting capabilities, performance included. ... So, my take would be to invest time in easy integration with solr/elasticsearch and focus in Oak on the parts (hierarchy, authorization, merging, versioning) that aren't covered by already existing frameworks. Perhaps provide an extended JCR API as described in (1) which under the hood can delegate to a solr or es java client. In the end, you'll still end up having the authorized counts issue, but if you make the integration pluggable enough, it might be possible to leverage domain specific solutions to this (solr/es doesn't do anything with authorization either, it is a tough nut to crack) Good points. When facets are used, the worst case (showing facets for all your content) might actually be the very first thing you see, when something like a product search/browse page is shown, before any actual search by the user is done. Optimizing for performance right from the start is a must, I agree. What I can imagine though, is if you can leverage some kind of caching though. In practice, if you have a public site with content that does not change permanently, the facet values are pretty much stable, and authorization shouldn't cost much. [1] http://docs.adobe.com/docs/en/aem/6-0/develop/ref/javadoc/com/day/cq/search/Query.html [2] http://docs.adobe.com/docs/en/aem/6-0/develop/ref/javadoc/com/day/cq/search/facets/Bucket.html Cheers, Alex