Re: in jena text search, can't use wildcard as first character

Osma Suominen Fri, 13 Jun 2014 00:57:13 -0700

On 13/06/14 00:55, Andy Seaborne wrote:

voidsetLowercaseExpandedTerms(boolean lowercaseExpandedTerms)
Set to true to allow leading wildcard characters.
When set, * or ? are allowed as the first character of a PrefixQuery and
WildcardQuery. Note that this can produce very slow queries on big
indexes.


Default: false.


I've just added that to jena-text in svn.

That's great, it might be useful for me as well, though I have to testthe performance. In my application, leading wildcards are currentlyprocessed with regex (or simpler string functions which are slightlyfaster), but it tends to be slow.

I can't say. I also would not try to, in this case. I expect performance
would be better using the filter.


A regex may well be better depending on the size and composition of the
lucene index.

If you need to have fast suffix queries, I think it's possible withjena-text + Solr, but I haven't tried. In Solr, you can configure (inschema.xml) a ReversedWildcardFilterFactory that will store the termsreversed in the index (this will double the index size) and use that forfast suffix searches. See e.g. here:


http://docs.lucidworks.com/display/lweug/Wildcard+Queries

-Osma

--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Teollisuuskatu 23)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi

Re: in jena text search, can't use wildcard as first character

Reply via email to