I know it's documented that Lucene/Solr doesn't apply filters to queries with 
wildcards, but this seems to trip up a lot of users.  I can also see why 
wildcards break a number of filters, but a number of filters (e.g. mapping 
charsets) could mostly or entirely work.  The N-gram filter is another one that 
would be great to still run when there wildcards.  If you indexed 4-grams and 
the query is a "*testp*", you currently won't get any results; but the N-gram 
filter could have a wildcard mode that, in this case, would return just the 
first 4-gram as a token.

Is this something you've considered?  It would have to be enabled in the core 
network, but disabled by default for existing filters; then it could be enabled 
1-by-1 for existing filters.  Apologies if the dev list is a better place for 
this.

Scott


> -----Original Message-----
> From: Ahmet Arslan [mailto:iori...@yahoo.com]
> Sent: Thursday, November 21, 2013 8:40 AM
> To: solr-user@lucene.apache.org
> Subject: Re: search with wildcard
> 
> Hi Adnreas,
> 
> If you don't want to use wildcards at query time, alternative way is to
> use NGrams at indexing time. This will produce a lot of tokens. e.g.
> For example 4grams of your example : Supertestplan => supe uper pert
> erte rtes *test* estp stpl tpla plan
> 
> 
> Is that you want? By the way why do you want to search inside of words?
> 
> <filter class="solr.NGramFilterFactory" minGramSize="3"
> maxGramSize="4"/>
> 
> 
> 
> 
> On Thursday, November 21, 2013 5:23 PM, Andreas Owen <a...@conx.ch>
> wrote:
> 
> I suppose i have to create another field with diffenet tokenizers and
> set
> the boost very low so it doesn't really mess with my ranking because
> there
> the word is now in 2 fields. What kind of tokenizer can do the job?
> 
> 
> 
> From: Andreas Owen [mailto:a...@conx.ch]
> Sent: Donnerstag, 21. November 2013 16:13
> To: solr-user@lucene.apache.org
> Subject: search with wildcard
> 
> 
> 
> I am querying "test" in solr 4.3.1 over the field below and it's not
> finding
> all occurences. It seems that if it is a substring of a word like
> "Supertestplan" it isn't found unless I use a wildcards "*test*". This
> is
> write because of my tokenizer but does someone know a way around this?
> I
> don't want to add wildcards because that messes up queries with
> multiple
> words.
> 
> 
> 
> <fieldType name="text_de" class="solr.TextField"
> positionIncrementGap="100">
> 
>       <analyzer>
> 
>         <tokenizer class="solr.StandardTokenizerFactory"/>
> 
>         <filter class="solr.LowerCaseFilterFactory"/>
> 
> 
> 
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="lang/stopwords_de.txt" format="snowball"
> enablePositionIncrements="true"/> <!-- remove common words -->
> 
>         <filter class="solr.GermanNormalizationFilterFactory"/>
> 
>                                <filter
> class="solr.SnowballPorterFilterFactory" language="German"/> <!--
> remove
> noun/adjective inflections like plural endings -->
> 
> 
> 
>       </analyzer>
> 
>     </fieldType>

Reply via email to