Hi, On Wed, Apr 1, 2009 at 09:10, daveg0 <[email protected]> wrote: > I am trying to do "wildcard" queries that should return multiple nodes such > as: > > /jcr:root/portal/wap/images//element(*, > atom:Entry)[jcr:like(@atom:titletext,'soccer%'] > > the performance has degraded over time with more entries to take nearly 8 > seconds which is unacceptable. I am aware that wildcard queries take longer, > but shouldn't this type of query create a Lucene PrefixQuery which is much > quicker. Most of our "wildcard" queries will be "prefix" queries as they > will typically be searches for matching entries that start with a specific > value eg "st*". > > I tried looking through the source code and I can't see any use of Lucene > PrefixQuery only WildcardQuery, is this a design decision?
Yes, it is. There are basically two reasons: - PrefixQuery is basically a boolean query that consists of optional TermQueries (one for each term that matches the prefix). This design has an inherent limit, because as soon as you have more than 1024 distinct terms that match the prefix the BooleanQuery will throw a TooManyClauses exception. - Jackrabbit supports prefix queries in combination with lower- and upper-casing. This is not possible with the lucene PrefixQuery In any case, prefix queries perform linearly to the number of distinct terms in the index that match the prefix. Is it possible that your prefix matches lots of distinct terms? i.e. the prefix is very short or very common. > Am I missing something or is it possible for Jackrabbit to perform a > PrefixQuery for queries like this. > > I also tried to use "jcr:contains" e.g: > > /jcr:root/portal/wap/images//element(*, > atom:Entry)[jcr:contains(@atom:titletext,'soccer'] that's not exactly the same, because it matches only terms that were indexed as soccer. You could use: /jcr:root/portal/wap/images//element(*, atom:Entry)[jcr:contains(@atom:titletext,'soccer*'] but I'd say the performance is about the same. > but this only returns the first matching entry. Am I > misunderstanding/misusing "jcr:contains" in this way or would you expect it > to return the same as the query with "jcr:like" jcr:contains and jcr:like behave differently. see the specification for details. regards marcel
