Re: in jena text search, can't use wildcard as first character

Osma Suominen Wed, 20 Aug 2014 23:48:53 -0700

Hi all,

I just wanted to say "thank you" to everyone for this discussion andfeature - Willie for asking the question, Paul for contributing to thediscussion, and Andy for implementing the change. We have just upgradedto Fuseki 1.1.0 which contains this fix, and now we can use wildcards asthe first character in text search in our application (Skosmos).Previously this type of search was implemented using only a FILTER. Hereare some response times for a typical query:


No text search, FILTER only:      1050ms
Text search with leading wildcard: 300ms
Text search with no wildcard:      150ms

So the query is not quite as fast as a normal text search (300ms vs150ms), because Lucene has to do more work, but still considerablyfaster than without a text index (1050ms). Our users greatly appreciatethis speedup!


See also: https://github.com/NatLibFi/Skosmos/issues/46

-Osma

On 13/06/14 10:56, Osma Suominen wrote:

On 13/06/14 00:55, Andy Seaborne wrote:

voidsetLowercaseExpandedTerms(boolean lowercaseExpandedTerms)
Set to true to allow leading wildcard characters.
When set, * or ? are allowed as the first character of a PrefixQuery
and
WildcardQuery. Note that this can produce very slow queries on big
indexes.

Default: false.


I've just added that to jena-text in svn.


That's great, it might be useful for me as well, though I have to test
the performance. In my application, leading wildcards are currently
processed with regex (or simpler string functions which are slightly
faster), but it tends to be slow.



--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Teollisuuskatu 23)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi

Re: in jena text search, can't use wildcard as first character

Reply via email to