Hello,

On Mon, Aug 25, 2014 at 7:02 PM, Lukas Smith <[email protected]> wrote:
> Aloha,
>
> you should definitely talk to the HippoCMS developers. They forked Jackrabbit 
> 2.x to add facetting as virtual nodes. They ran into some performance issues 
> but I am sure they still have value-able feedback on this.

Well, performance actually wasn't the biggest hurdle : Exposing and
integrating virtual nodes was quite a bit tougher.

Indeed I think I might have quite some feedback, but honestly, I am
also these days full of doubts what the best approach will be. I'll
try to keep it short:

1) When exposing faceting from Jackrabbit, we wouldn't use virtual
layers any more to expose them over pure JCR spec API's. Instead, we
would extend the jcr QueryResult to have next to getRows/getNodes/etc
also expose for example methods on the QueryResult like

public Map<String, Integer> getFacetValues(final String facet) {
      return result.getFacetValues(facet);
}

public QueryResult drilldown(final FacetValue facetValue) {
        // return current query result drilled down for facet value
        return ...
}

2) Authorized counts....for faceting, it doesn't make sense to expose
there are 314 results if you can only read 54 of them. Accounting for
authorization through access manager can be way too slow. The
alternatives are to not show authorized counts, or try to translate
the authorization model to a lucene query which is in general not
possible unless you restrict your authorization model severely (which
results in a domain specific solution unusable for JR)

3) If you support faceting through Oak, will that be competitive
enough to what Solr and Elasticsearch offer? Customers these days have
some expectations on search result quality and faceting capabilities,
performance included. Oak's faceting support will be compared to
dedicated search servers and is quite unlikely to be nearly as good
and to keep up with what is being build: Aggregations is the new buzz
which is very cool super set of faceting. You really don't wanna have
to leverage that next from Oak.

So, my take would be to invest time in easy integration with
solr/elasticsearch and focus in Oak on the parts (hierarchy,
authorization, merging, versioning) that aren't covered by already
existing frameworks. Perhaps provide an extended JCR API as described
in (1) which under the hood can delegate to a solr or es java client.
In the end, you'll still end up having the authorized counts issue,
but if you make the integration pluggable enough, it might be possible
to leverage domain specific solutions to this (solr/es doesn't do
anything with authorization either, it is a tough nut to crack)

Regards Ard

>
> regards,
> Lukas Kahwe Smith
>
>> On 25 Aug 2014, at 18:43, Laurie Byrum <[email protected]> wrote:
>>
>> Hi Tommaso,
>> I am happy to see this thread!
>>
>> Questions:
>> Do you expect to want to support hierarchical or pivoted facets soonish?
>> If so, does that influence this decision?
>> Do you know how ACLs will come into play with your facet implementation?
>> If so, does that influence this decision? :-)
>>
>> Thanks!
>> Laurie
>>
>>
>>
>>> On 8/25/14 7:08 AM, "Tommaso Teofili" <[email protected]> wrote:
>>>
>>> Hi all,
>>>
>>> since this has been asked every now and then [1] and since I think it's a
>>> pretty useful and common feature for search engine nowadays I'd like to
>>> discuss introduction of facets [2] for the Oak query engine.
>>>
>>> Pros: having facets in search results usually helps filtering (drill down)
>>> the results before browsing all of them, so the main usage would be for
>>> client code.
>>>
>>> Impact: probably change / addition in both the JCR and Oak APIs to support
>>> returning other than "just nodes" (a NodeIterator and a Cursor
>>> respectively).
>>>
>>> Right now a couple of ideas on how we could do that come to my mind, both
>>> based on the approach of having an Oak index for them:
>>> 1. a (multivalued) property index for facets, meaning we would store the
>>> facets in the repository, so that we would run a query against it to have
>>> the facets of an originating query.
>>> 2. a dedicated QueryIndex implementation, eventually leveraging Lucene
>>> faceting capabilities, which could "use" the Lucene index we already have,
>>> together with a "sidecar" index [3].
>>>
>>> What do you think?
>>> Regards,
>>> Tommaso
>>>
>>> [1] :
>>> http://markmail.org/search/?q=oak%20faceting#query:oak%20faceting%20list%3
>>> Aorg.apache.jackrabbit.oak-dev+page:1+state:facets
>>> [2] : http://en.wikipedia.org/wiki/Faceted_search
>>> [3] :
>>> http://lucene.apache.org/core/4_0_0/facet/org/apache/lucene/facet/doc-file
>>> s/userguide.html
>>



-- 
Amsterdam - Oosteinde 11, 1017 WT Amsterdam
Boston - 1 Broadway, Cambridge, MA 02142

US +1 877 414 4776 (toll free)
Europe +31(0)20 522 4466
www.onehippo.com

Reply via email to