On 17 Jun 2009, at 09:13, Marcel Reutegger wrote:
Hi,
the sorting is pretty well optimized, it basically uses underlying
lucene functionality for that. there are two other important points
that will influence performance:
1) workspace configuration
the default workspace configuration will cause initial fetching of the
entire result set. you can change this behavior by setting the
resultFetchSize parameter. See [0].
yes, we already have this in place, its made a huge difference,
serveral orders of magnitude.
2) Ian wrote: "I only want to see a small number of items eg 100 after
a particular date."
that might actually become a problem. it will result in a range query
that potentially selects lots (millions?) of nodes with distinct date
properties. this case is not optimized. there's a new indexing
technique in lucene called trierange queries [1] which was
specifically built to perform such queries efficiently. but this is
not yet integrated with jackrabbit.
So if I don't query for all items after a certain date, but just ask
for a sort and do paging of the sorted result set..... with that be
optimized by lucene ?
I've created a JIRA issue to discuss and keep track of such an
enhancement in jackrabbit: [2]
Thank you, I will go an do some reading, we use Lucene in so many
places outside jackrabbit knowing the details of things like this is
always valuable.
Thanks
Ian
regards
marcel
[0] http://issues.apache.org/jira/browse/JCR-651
[1]
http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/
[2] https://issues.apache.org/jira/browse/JCR-2151
On Wed, Jun 17, 2009 at 01:50, Ian Boston<[email protected]> wrote:
Hi,
I want to perform a query where the full result set could be
millions of
items. That set needs to be sorted by the lastModified attribute on
the
node, and I only want to see a small number of items eg 100 after a
particular date.
If I do this, will there be scalability issues, or is the sorting
of a date
field optimized in the query engine ?
Thanks
Ian