Hi all,
I've included some background below but my question is..... "is using
jcr:path in a query to be avoided due to performance"?
We have some Jackrabbit repositories which have grown to include circa
200,000 nodes of a acme:Story. The nodes have been structured in a deep
hierarchy to comply with Jackrabbit best practises in terms of max nodes per
folder. As an example of our JCR structure
/library/
news/
entertainment/
sport/
tennis/
2009/
01/
Football/
..
When accessing the repository we're typically using queries which are a)
using path as a restriction clause and b) ordering by a date field.
e.g. get the latest sport items.
SELECT * FROM acme:Story WHERE jcr:path LIKE '/library/sport/%' ORDER BY
acme:createdDate DESC
As the number of stories has increased we're seeing more and more incidents
of queries exceeding 2 secs. Furthermore we've seen lucene start to use more
and more heap space executing these queries. Based on
http://www.nabble.com/Explanation-and-solutions-of-some-Jackrabbit-queries-r
egarding-performance-td15028655.html I understand that the above query would
be much faster if we added a tag/category type property to the object e.g.
acme:category = "Sport" so that the query could be
SELECT * FROM acme:Story WHERE acme:category LIKE 'Sport' ORDER BY
acme:createdDate DESC
Is that a fair assessment?
All comments appreciated.
Regards,
Shaun