Re: Re: Re: XPath query performance question

mslama Fri, 03 Feb 2012 07:52:30 -0800

Ok. I tried

//*[@calais='http://d.opencalais.com/er/company/ralg-tr1r/2a2094a9-9a0b-3fb6-bb51-0c9e940deaaf']


instead of original

//companies/company[@calais='http://d.opencalais.com/er/company/ralg-tr1r/2a2094a9-9a0b-3fb6-bb51-0c9e940deaaf']

and now query.execute+nodeIterator.getNext is 10times faster. (As property 
names we use are unique I can use //* and get the same result. I must say it is 
quite unintuitive behaviour. Hopefully it will help others too.)

Thanks for hint

Marek/
> ------------ Původní zpráva ------------
> Od:  <[email protected]>
> Předmět: Re: Re: XPath query performance question
> Datum: 03.2.2012 16:25:07
> ----------------------------------------
>
> No property 'calais' is not used anywhere else. So if I use query without path
> info it will return the same result.
>
> Marek
>
> > ------------ Původní zpráva ------------
> > Od: Alessandro <[email protected]>
> > Předmět: Re: XPath query performance question
> > Datum: 03.2.2012 16:12:15
> > ----------------------------------------
> > If you were running the query without path restrictions, would it return 
> > more
> > than one node? In other words, outside the /companies tree, are there other
> > company nodes with the same calais attribute value?
> > Results are generated from the predicate, and then filtered by the path.
> >
> > Alessandro
> >
> > On Feb 3, 2012, at 7:13 AM, [email protected] wrote:
> >
> > > Hi,
> > >
> > > I have following use case:
> > >
> > > I have about 2000 company nodes under node companies:
> > > /companies/company[1]
> > > /companies/company[2]
> > > ....
> > > /companies/company[N]
> > >
> > > I query for one company by property value - exact match, no wildcards. And
> > result should contain just one node. For example I use query:
> > >
> > >
> >
> //companies/company[@calais='http://d.opencalais.com/er/company/ralg-tr1r/2c970a55-e08d-3af8-ad1d-3c46f341e749']
> > >
> > > and then one call of NodeIterator.next to get unique (or first as there is
> no
> > constraint on uniqueness) result. So there is no big resultset.
> > >
> > > Property 'calais' is string type and when set it is unique ie. small 
> > > number
> of
> > company nodes may have this property either empty or missing. Property value
> can
> > be up to 100chars long if it can make any difference for index.
> > >
> > > When only one thread is running it takes 100-200ms. When 4 threads are
> running
> > it is about 500ms on average. I used
> > > profiler with sampling to get some profiling data. I seems to be too much
> > provided that number on nodes is not that high
> > > and it is using Lucene index. Calls of query.execute and nodeIterator.next
> > take both about the same time.
> > > When I checked thread dumps it uses Lucene index so it does not look like
> it
> > scans all nodes.
> > >
> > > Question: Is there any way how speedup this kind of lookup? The only way I
> > found so far is to incorporate the most often property used for lookup to
> node
> > path as session.getNode(path) is much faster.
> > >
> > > I use Jackrabbit 2.2.9 and Postgres 9.1 for saving all data but Lucene
> index.
> > It runs on JBoss 7.
> > >
> > > I searched for Jackrabbit XPath performance but no match for my use case:
> > > a) exact property match without like/wildcards
> > > b) small resultset - just one result item
> > >
> > > Thanks
> > >
> > > Marek
> >
> >
> >
>
> Marek Slama
> [email protected]
>
>
>

Marek Slama
[email protected]

Re: Re: Re: XPath query performance question

Reply via email to