> On 30 Jan 2015, at 10:44, Ard Schrijvers <a.schrijv...@onehippo.com> wrote: > > On Fri, Jan 30, 2015 at 10:03 AM, cfalletta <cedric.falle...@gmail.com> wrote: >> Hello Thomas, >> >> Thanks for your answer. >> >> I'm using version 2.6.5 of jackrabbit. >> >> We're loading 300.000+ documents in production and it takes 3-5 minutes to >> load it all. 2 queries are run : the select * with a limit, and the select * >> without limit. I'll attach the source file source_jackrabbit.txt >> <http://jackrabbit.510166.n4.nabble.com/file/n4661929/source_jackrabbit.txt> >> >> In the development environment, i set the logging of jackrabbit to debug, >> and it appeared that the first query was taking a lot of time. However, >> setting the logging level to DEBUG seriously decreased the overall >> performance. I'll run another test without count and without debug mode on a >> large set of documents to be sure, thanks for the advice. >> >> By the way, i've heard of another implementation of QueryResult that would >> return the totalSize of the query without "limit" : >> org.apache.jackrabbit.core.query.lucene.QueryResultImpl. But >> org.apache.jackrabbit.core.query.lucene.QueryResult only works with >> SingleColumnQueryResult. >> -> Any idea how to use QueryResultImpl and if it is a viable solution ? >> >> Is jackrabbit able to properly handle queries on millions of documents as >> long as we have a limit in the query ? > > In general, yes. > > A bit more detailed: the problem is not really the query itself (most > of the time), but the authorization of the results. If you set a > limit, say of 100, then the authorization part can stop after the > query granted read access to 100 nodes. A limit will still result in > bad performance if your use has only read access to, say, 0.1%, > because then on average, for 100 granted results, there must be > 100.000 nodes checked. Again, the performance also depends on your > bundle caches: If all nodes are in memory, checking 100.000 nodes > won't be blistering fast, but not really slow either. If you run > through your caches, then, when nodes have to be fetched from a > backing database, performance will drop insanely. > > Please realize, that if you want to compare jackrabbit searches with > something like Solr or Elastic Search, a fair comparison would be to > check every result from Solr or Elastic Search separately for read > access against some external system for example. It is for a reason > that Solr or ES hardly do anything for fine (fine!!)grained ACL kind > of indexing...that is a really complex part > > Hope this helps > > Last thing: Some queries, mainly queries with hierarchical constraints > do not perform well for millions of nodes. Again, something that is > hard to achieve with Lucene
We also found that for some queries SQL1 performed much better than SQL2: http://blog.liip.ch/archive/2012/06/26/jackrabbit-and-its-two-sql-languages-some-findings.html regards, Lukas Kahwe Smith sm...@pooteeweet.org
signature.asc
Description: Message signed with OpenPGP using GPGMail