Re: Query that sorts a large result set.

Ian Boston Wed, 17 Jun 2009 08:58:45 -0700


On 17 Jun 2009, at 09:13, Marcel Reutegger wrote:

Hi,

the sorting is pretty well optimized, it basically uses underlying
lucene functionality for that. there are two other important points
that will influence performance:

1) workspace configuration

the default workspace configuration will cause initial fetching of the
entire result set. you can change this behavior by setting the
resultFetchSize parameter. See [0].

yes, we already have this in place, its made a huge difference,serveral orders of magnitude.


2) Ian wrote: "I only want to see a small number of items eg 100 after
a particular date."

that might actually become a problem. it will result in a range query
that potentially selects lots (millions?) of nodes with distinct date
properties. this case is not optimized. there's a new indexing
technique in lucene called trierange queries [1] which was
specifically built to perform such queries efficiently. but this is
not yet integrated with jackrabbit.

So if I don't query for all items after a certain date, but just askfor a sort and do paging of the sorted result set..... with that beoptimized by lucene ?


I've created a JIRA issue to discuss and keep track of such an
enhancement in jackrabbit: [2]

Thank you, I will go an do some reading, we use Lucene in so manyplaces outside jackrabbit knowing the details of things like this isalways valuable.

Thanks
Ian

regards
marcel

[0] http://issues.apache.org/jira/browse/JCR-651
[1] 
http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/
[2] https://issues.apache.org/jira/browse/JCR-2151

On Wed, Jun 17, 2009 at 01:50, Ian Boston<[email protected]> wrote:
Hi,
I want to perform a query where the full result set could bemillions ofitems. That set needs to be sorted by the lastModified attribute onthe
node, and I only want to see a small number of items eg 100 after a
particular date.
If I do this, will there be scalability issues, or is the sortingof a date
field optimized in the query engine ?

Thanks
Ian

Re: Query that sorts a large result set.

Reply via email to