Ok. I tried //*[@calais='http://d.opencalais.com/er/company/ralg-tr1r/2a2094a9-9a0b-3fb6-bb51-0c9e940deaaf']
instead of original //companies/company[@calais='http://d.opencalais.com/er/company/ralg-tr1r/2a2094a9-9a0b-3fb6-bb51-0c9e940deaaf'] and now query.execute+nodeIterator.getNext is 10times faster. (As property names we use are unique I can use //* and get the same result. I must say it is quite unintuitive behaviour. Hopefully it will help others too.) Thanks for hint Marek/ > ------------ Původní zpráva ------------ > Od: <[email protected]> > Předmět: Re: Re: XPath query performance question > Datum: 03.2.2012 16:25:07 > ---------------------------------------- > > No property 'calais' is not used anywhere else. So if I use query without path > info it will return the same result. > > Marek > > > ------------ Původní zpráva ------------ > > Od: Alessandro <[email protected]> > > Předmět: Re: XPath query performance question > > Datum: 03.2.2012 16:12:15 > > ---------------------------------------- > > If you were running the query without path restrictions, would it return > > more > > than one node? In other words, outside the /companies tree, are there other > > company nodes with the same calais attribute value? > > Results are generated from the predicate, and then filtered by the path. > > > > Alessandro > > > > On Feb 3, 2012, at 7:13 AM, [email protected] wrote: > > > > > Hi, > > > > > > I have following use case: > > > > > > I have about 2000 company nodes under node companies: > > > /companies/company[1] > > > /companies/company[2] > > > .... > > > /companies/company[N] > > > > > > I query for one company by property value - exact match, no wildcards. And > > result should contain just one node. For example I use query: > > > > > > > > > //companies/company[@calais='http://d.opencalais.com/er/company/ralg-tr1r/2c970a55-e08d-3af8-ad1d-3c46f341e749'] > > > > > > and then one call of NodeIterator.next to get unique (or first as there is > no > > constraint on uniqueness) result. So there is no big resultset. > > > > > > Property 'calais' is string type and when set it is unique ie. small > > > number > of > > company nodes may have this property either empty or missing. Property value > can > > be up to 100chars long if it can make any difference for index. > > > > > > When only one thread is running it takes 100-200ms. When 4 threads are > running > > it is about 500ms on average. I used > > > profiler with sampling to get some profiling data. I seems to be too much > > provided that number on nodes is not that high > > > and it is using Lucene index. Calls of query.execute and nodeIterator.next > > take both about the same time. > > > When I checked thread dumps it uses Lucene index so it does not look like > it > > scans all nodes. > > > > > > Question: Is there any way how speedup this kind of lookup? The only way I > > found so far is to incorporate the most often property used for lookup to > node > > path as session.getNode(path) is much faster. > > > > > > I use Jackrabbit 2.2.9 and Postgres 9.1 for saving all data but Lucene > index. > > It runs on JBoss 7. > > > > > > I searched for Jackrabbit XPath performance but no match for my use case: > > > a) exact property match without like/wildcards > > > b) small resultset - just one result item > > > > > > Thanks > > > > > > Marek > > > > > > > > Marek Slama > [email protected] > > > Marek Slama [email protected]
