search returns matches for non-starting wildcard prefix queries

Rupert Fiasco Mon, 09 Feb 2009 09:46:51 -0800

(I think I have a horrible subject line but I wasnt sure how to
properly explain myself).


I have a text field that I store last names in (and everything is
lowercased prior to insertion, not sure if that matters).

The field is described as:

   <field name="last_name" type="text" indexed="true" stored="false"
multiValued="true"/>
    <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>



When running a query such as

last_name:m*

I get data back like:

Pashman, Md
Maldonado
Manolidis
Fleisher, M.D., D.Ht., D.A.B.F.M.
Merino
Monroe
McLay
Maltsberger
McMurtray
Murphy Md
Loeb Md


As you can see most are perfect matches, but there are some that
*dont* start with the letter "M" but do have "M" at the beginning of
another "word" in the field.

Wouldnt the query "m*" just query for matches where the first letter
is "M" in the whole field and not within another "word" in that field?

Do I need to make another field to store last names and not perform
any analysis on that field (akin to a spell check field)?

Thanks in advance.

-Rupert

search returns matches for non-starting wildcard prefix queries

Reply via email to