Re: really poor search performance

Lukas Kahwe Smith Fri, 30 Jan 2015 01:53:12 -0800

> On 30 Jan 2015, at 10:44, Ard Schrijvers <a.schrijv...@onehippo.com> wrote:
> 
> On Fri, Jan 30, 2015 at 10:03 AM, cfalletta <cedric.falle...@gmail.com> wrote:
>> Hello Thomas,
>> 
>> Thanks for your answer.
>> 
>> I'm using version 2.6.5 of jackrabbit.
>> 
>> We're loading 300.000+ documents in production and it takes 3-5 minutes to
>> load it all. 2 queries are run : the select * with a limit, and the select *
>> without limit. I'll attach the source file  source_jackrabbit.txt
>> <http://jackrabbit.510166.n4.nabble.com/file/n4661929/source_jackrabbit.txt>
>> 
>> In the development environment, i set the logging of jackrabbit to debug,
>> and it appeared that the first query was taking a lot of time. However,
>> setting the logging level to DEBUG seriously decreased the overall
>> performance. I'll run another test without count and without debug mode on a
>> large set of documents to be sure, thanks for the advice.
>> 
>> By the way, i've heard of another implementation of QueryResult that would
>> return the totalSize of the query without "limit" :
>> org.apache.jackrabbit.core.query.lucene.QueryResultImpl. But
>> org.apache.jackrabbit.core.query.lucene.QueryResult only works with
>> SingleColumnQueryResult.
>> -> Any idea how to use QueryResultImpl and if it is a viable solution ?
>> 
>> Is jackrabbit able to properly handle queries on millions of documents as
>> long as we have a limit in the query ?
> 
> In general, yes.
> 
> A bit more detailed: the problem is not really the query itself (most
> of the time), but the authorization of the results. If you set a
> limit, say of 100, then the authorization part can stop after the
> query granted read access to 100 nodes. A limit will still result in
> bad performance if your use has only read access to, say, 0.1%,
> because then on average, for 100 granted results, there must be
> 100.000 nodes checked. Again, the performance also depends on your
> bundle caches: If all nodes are in memory, checking 100.000 nodes
> won't be blistering fast, but not really slow either. If you run
> through your caches, then, when nodes have to be fetched from a
> backing database, performance will drop insanely.
> 
> Please realize, that if you want to compare jackrabbit searches with
> something like Solr or Elastic Search, a fair comparison would be to
> check every result from Solr or Elastic Search separately for read
> access against some external system for example. It is for a reason
> that Solr or ES hardly do anything for fine (fine!!)grained ACL kind
> of indexing...that is a really complex part
> 
> Hope this helps
> 
> Last thing: Some queries, mainly queries with hierarchical constraints
> do not perform well for millions of nodes. Again, something that is
> hard to achieve with Lucene


We also found that for some queries SQL1 performed much better than SQL2:
http://blog.liip.ch/archive/2012/06/26/jackrabbit-and-its-two-sql-languages-some-findings.html

regards,
Lukas Kahwe Smith
sm...@pooteeweet.org

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: really poor search performance

Reply via email to