Hi Ard, Thanks! The performance went up by a factor x10. Still not what I hoped for, but I'm not sure the query itself is still a problem.
A related question: could it be that when a query returns no results, this is slower than when it does return a result? Might it have something to do with Lucene not having an index for that particular property value? > Hello Dennis, > > it's because your using 2 times a child axis query (jcr:root and one > within the where clause) that makes it slow and. Explaining why is out > of scope for the user list, but I wrote quite some time ago a few > guidelines (most of them still valid): > > http://n4.nabble.com/Explanation-and-solutions-of-some-Jackrabbit-queries-regarding-performance-td516614.html#a516614 > > I am not sure what nodetype jcr:content is, but suppose: my:contenttype > > now, if your query would be: > > //element(*,my:contenttype)[fn:lower-case(@cms:virtualPath)= '" + vpath + > "']"; > > the query will be instant. Just take the parent node of the result and > you should be fine. > Just wondering, are you building a brand new cms > on jcr? I am not sure what the @cms:virtualPath holds, but if you also > need virtual environments showing the same jcr nodes in different tree > structures you might wanna take a look here [1]. > We're not building a brand new CMS, we're migrating our old Oracle iFS storage to a JCR repository. The CMS itself stays the same. regards, Dennis > Regards Ard > > [1] http://www.onehippo.org/cms7 > > > On Thu, Dec 3, 2009 at 9:47 AM, Dennis van der Laan > <[email protected]> wrote: > >> Hi, >> >> It seems querying on a property is very slow on our system (running >> Jackrabbit 1.6.0): almost 1 second per query which would normally return >> 0 or 1 result. >> >> We use jcr:content nodes of type nt:unstructured to store the contents >> of a file (in the jcr:data property) and we store an array of Strings in >> a property cms:virtualPath on the same node. So basically, every file in >> our repository has the JCR path and zero or more virtual paths in the >> cms:virtualPath property. If we want to add a virtual path to a file, we >> have to check if the virtual path does not exist already. For this, we >> use an XPath query in the following code snippet: >> >> String vpath = QueryUtil.escapeForAttributeSearch(path.toLowerCase()); >> String query = >> "/jcr:root//element(*,nt:hierarchyNode)[fn:lower-case(jcr:content/@cms:virtualPath) >> = '" + vpath + "']"; >> Query q = queryManager.createQuery(query, Query.XPATH); >> NodeIterator ni = q.execute().getNodes(); >> if (ni.getSize() == 0) { >> throw new ItemNotFoundException("Unable to find item by virtual >> path: " + path); >> } >> else if (ni.getSize() > 1) { >> throw new IllegalStateException("More than 1 item on virtual path: " >> + path); >> } >> else { >> return ni.nextNode(); >> } >> >> Our repository now contains around 500,000 virtual paths, more or less >> divided over 150,000 files which are evenly distributed over more than >> 1000 (nested) folders. >> >> The repository runs on an Intel Nehalem Xeon (2 x 2.5GHz) running >> Solaris 10 and the repository database (for datastore, filesystem, etc) >> runs on the same specs, on a different server, running Oracle 10g. >> >> When we try to add virtual paths in a batch (about 2000 virtual path >> properties for 1000 files) and all virtual paths already exist (so the >> above query returns 1 virtual path), we see a 100% load of the our >> Tomcat application (which means 1 core fully utilized). >> >> I would expect a JCR repository to be able to handle this kind of >> queries. How are these properties indexed? Is it possible to optimize >> the repository for this kind of queries? Or should I use a different >> query? The alternative would be to keep a different database which keeps >> track of the virtual paths, but keeping that in sync with the JCR >> repository would be a pain, at the least. >> >> Thanks for your ideas about this issue, >> Kind regards, >> Dennis van der Laan >> >> -- Dennis van der Laan
