[RESULT] [VOTE] Release Apache Jackrabbit Oak 1.0.5

2014-08-29 Thread Thomas Mueller
Hi,

On 26/08/14 08:42, Thomas Mueller wrote:

 Please vote on releasing this package as Apache Jackrabbit Oak 1.0.5.
 The vote is open for the next 72 hours and passes if a majority of at
 least three +1 Jackrabbit PMC votes are cast.


The vote passes as follows:

+1 Thomas Mueller
+1 Michael Dürig
+1 Alex Parvulescu
+1 Tommaso Teofili
+1 Davide Giannella
+1 Julian Reschke

Thanks for voting! I'll push the release out.

Regards,
Thomas




Re: [DISCUSS] supporting faceting in Oak query engine

2014-08-29 Thread Ard Schrijvers
Hello,

On Mon, Aug 25, 2014 at 7:02 PM, Lukas Smith sm...@pooteeweet.org wrote:
 Aloha,

 you should definitely talk to the HippoCMS developers. They forked Jackrabbit 
 2.x to add facetting as virtual nodes. They ran into some performance issues 
 but I am sure they still have value-able feedback on this.

Well, performance actually wasn't the biggest hurdle : Exposing and
integrating virtual nodes was quite a bit tougher.

Indeed I think I might have quite some feedback, but honestly, I am
also these days full of doubts what the best approach will be. I'll
try to keep it short:

1) When exposing faceting from Jackrabbit, we wouldn't use virtual
layers any more to expose them over pure JCR spec API's. Instead, we
would extend the jcr QueryResult to have next to getRows/getNodes/etc
also expose for example methods on the QueryResult like

public MapString, Integer getFacetValues(final String facet) {
  return result.getFacetValues(facet);
}

public QueryResult drilldown(final FacetValue facetValue) {
// return current query result drilled down for facet value
return ...
}

2) Authorized countsfor faceting, it doesn't make sense to expose
there are 314 results if you can only read 54 of them. Accounting for
authorization through access manager can be way too slow. The
alternatives are to not show authorized counts, or try to translate
the authorization model to a lucene query which is in general not
possible unless you restrict your authorization model severely (which
results in a domain specific solution unusable for JR)

3) If you support faceting through Oak, will that be competitive
enough to what Solr and Elasticsearch offer? Customers these days have
some expectations on search result quality and faceting capabilities,
performance included. Oak's faceting support will be compared to
dedicated search servers and is quite unlikely to be nearly as good
and to keep up with what is being build: Aggregations is the new buzz
which is very cool super set of faceting. You really don't wanna have
to leverage that next from Oak.

So, my take would be to invest time in easy integration with
solr/elasticsearch and focus in Oak on the parts (hierarchy,
authorization, merging, versioning) that aren't covered by already
existing frameworks. Perhaps provide an extended JCR API as described
in (1) which under the hood can delegate to a solr or es java client.
In the end, you'll still end up having the authorized counts issue,
but if you make the integration pluggable enough, it might be possible
to leverage domain specific solutions to this (solr/es doesn't do
anything with authorization either, it is a tough nut to crack)

Regards Ard


 regards,
 Lukas Kahwe Smith

 On 25 Aug 2014, at 18:43, Laurie Byrum lby...@adobe.com wrote:

 Hi Tommaso,
 I am happy to see this thread!

 Questions:
 Do you expect to want to support hierarchical or pivoted facets soonish?
 If so, does that influence this decision?
 Do you know how ACLs will come into play with your facet implementation?
 If so, does that influence this decision? :-)

 Thanks!
 Laurie



 On 8/25/14 7:08 AM, Tommaso Teofili tommaso.teof...@gmail.com wrote:

 Hi all,

 since this has been asked every now and then [1] and since I think it's a
 pretty useful and common feature for search engine nowadays I'd like to
 discuss introduction of facets [2] for the Oak query engine.

 Pros: having facets in search results usually helps filtering (drill down)
 the results before browsing all of them, so the main usage would be for
 client code.

 Impact: probably change / addition in both the JCR and Oak APIs to support
 returning other than just nodes (a NodeIterator and a Cursor
 respectively).

 Right now a couple of ideas on how we could do that come to my mind, both
 based on the approach of having an Oak index for them:
 1. a (multivalued) property index for facets, meaning we would store the
 facets in the repository, so that we would run a query against it to have
 the facets of an originating query.
 2. a dedicated QueryIndex implementation, eventually leveraging Lucene
 faceting capabilities, which could use the Lucene index we already have,
 together with a sidecar index [3].

 What do you think?
 Regards,
 Tommaso

 [1] :
 http://markmail.org/search/?q=oak%20faceting#query:oak%20faceting%20list%3
 Aorg.apache.jackrabbit.oak-dev+page:1+state:facets
 [2] : http://en.wikipedia.org/wiki/Faceted_search
 [3] :
 http://lucene.apache.org/core/4_0_0/facet/org/apache/lucene/facet/doc-file
 s/userguide.html




-- 
Amsterdam - Oosteinde 11, 1017 WT Amsterdam
Boston - 1 Broadway, Cambridge, MA 02142

US +1 877 414 4776 (toll free)
Europe +31(0)20 522 4466
www.onehippo.com


Re: oak-run public distribution

2014-08-29 Thread Jukka Zitting
Hi,

On Thu, Aug 28, 2014 at 8:42 PM, Michael Dürig mdue...@apache.org wrote:
 So far we didn't deploy oak-run. I'm not sure why but I think there where
 concerns regarding making developer tooling available to end users.

During 0.x we figured that anyone working with Oak should be able to
build the jar directly from sources, so making the pre-built binary
available as a download wasn't too important. Now with 1.x I think it
would make sense to include the oak-run jar as a download just like we
do with jackrabbit-standalone.

2014-08-28 11:53 GMT-04:00 Chetan Mehrotra chetan.mehro...@gmail.com:
 This was discussed earlier [1] and Jukka mentioned that there were
 some restriction of deployment size. I tried pushing a snapshot
 version sometime back and that got deploy fine. So I think we should
 try to deploy artifacts again

That's a bit orthogonal, as the deployment question is about making
oak-run available on the central Maven repository, which we can do
regardless of whether we also post the jar on the Jackrabbit downloads
page.

BR,

Jukka Zitting


Re: [DISCUSS] supporting faceting in Oak query engine

2014-08-29 Thread Alexander Klimetschek
On 29.08.2014, at 03:10, Ard Schrijvers a.schrijv...@onehippo.com wrote:

 1) When exposing faceting from Jackrabbit, we wouldn't use virtual
 layers any more to expose them over pure JCR spec API's. Instead, we
 would extend the jcr QueryResult to have next to getRows/getNodes/etc
 also expose for example methods on the QueryResult like
 
 public MapString, Integer getFacetValues(final String facet) {
  return result.getFacetValues(facet);
 }
 
 public QueryResult drilldown(final FacetValue facetValue) {
// return current query result drilled down for facet value
return ...
 }

We actually have a similar API in our CQ/AEM product:

Query = represents a query [1]
SearchResult result = query.getResult();
MapString, Facet facets = result.getFacets();

A facet is a list of Buckets [2] - same as FacetValue above, I assume - an 
abstraction over different values. You could have distinctive values (e.g. 
red, green, blue), but also ranges (last year, last month etc.). Each 
bucket has a count, i.e. the number of times it occurs in the current result.

Then on Query you have a method

Query refine(Bucket bucket)

which is the same as the drilldown above.

So in the end it looks pretty much the same, and seems to be a good way to 
represent this as API. Doesn't say much about the implementation yet, though :)

 2) Authorized countsfor faceting, it doesn't make sense to expose
 there are 314 results if you can only read 54 of them. Accounting for
 authorization through access manager can be way too slow.
 ...
 3) If you support faceting through Oak, will that be competitive
 enough to what Solr and Elasticsearch offer? Customers these days have
 some expectations on search result quality and faceting capabilities,
 performance included.
 ...
 So, my take would be to invest time in easy integration with
 solr/elasticsearch and focus in Oak on the parts (hierarchy,
 authorization, merging, versioning) that aren't covered by already
 existing frameworks. Perhaps provide an extended JCR API as described
 in (1) which under the hood can delegate to a solr or es java client.
 In the end, you'll still end up having the authorized counts issue,
 but if you make the integration pluggable enough, it might be possible
 to leverage domain specific solutions to this (solr/es doesn't do
 anything with authorization either, it is a tough nut to crack)

Good points. When facets are used, the worst case (showing facets for all your 
content) might actually be the very first thing you see, when something like a 
product search/browse page is shown, before any actual search by the user is 
done. Optimizing for performance right from the start is a must, I agree.

What I can imagine though, is if you can leverage some kind of caching though. 
In practice, if you have a public site with content that does not change 
permanently, the facet values are pretty much stable, and authorization 
shouldn't cost much.

[1] 
http://docs.adobe.com/docs/en/aem/6-0/develop/ref/javadoc/com/day/cq/search/Query.html
[2] 
http://docs.adobe.com/docs/en/aem/6-0/develop/ref/javadoc/com/day/cq/search/facets/Bucket.html

Cheers,
Alex