Re: Not finding part of fulltext field when word ends in dot

Thomas Michael Engelke Wed, 29 Jan 2014 08:24:39 -0800

The fieldType definition is a tad on the longer side:

                <fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
                        <analyzer type="index">
                                <tokenizer
class="solr.WhitespaceTokenizerFactory"/>


                                <filter
class="solr.WordDelimiterFilterFactory"
                                        catenateWords="1"
                                        catenateNumbers="1"
                                        generateNumberParts="1"
                                        splitOnCaseChange="1"
                                        generateWordParts="1"
                                        catenateAll="0"
                                        preserveOriginal="1"
                                        splitOnNumerics="0"
                                />

                                <filter
class="solr.LowerCaseFilterFactory"/>
                                <filter class="solr.SynonymFilterFactory"
synonyms="german/synonyms.txt" ignoreCase="true" expand="true"/>
                                <filter
class="solr.DictionaryCompoundWordTokenFilterFactory"

dictionary="german/german-common-nouns.txt"
                                        minWordSize="5"
                                        minSubwordSize="4"
                                        maxSubwordSize="15"
                                        onlyLongestMatch="true"
                                />

                                <filter class="solr.StopFilterFactory"
words="german/stopwords.txt" ignoreCase="true"
enablePositionIncrements="true"/>
                                <filter
class="solr.SnowballPorterFilterFactory" language="German2"
protected="german/protwords.txt"/>
                                <filter
class="solr.RemoveDuplicatesTokenFilterFactory"/>
                        </analyzer>
                        <analyzer type="query">
                                <tokenizer
class="solr.WhitespaceTokenizerFactory"/>

                                <filter
class="solr.WordDelimiterFilterFactory"
                                        catenateWords="0"
                                        catenateNumbers="0"
                                        generateWordParts="1"
                                        splitOnCaseChange="1"
                                        generateNumberParts="1"
                                        catenateAll="0"
                                        preserveOriginal="1"
                                        splitOnNumerics="0"
                                />
                                <filter
class="solr.LowerCaseFilterFactory"/>
                                <filter class="solr.StopFilterFactory"
words="german/stopwords.txt" ignoreCase="true"
enablePositionIncrements="true"/>
                                <filter
class="solr.SnowballPorterFilterFactory" language="German2"
protected="german/protwords.txt"/>
                                <filter
class="solr.RemoveDuplicatesTokenFilterFactory"/>
                        </analyzer>
                </fieldType>


Thank you for taking a look.


2014-01-29 Jack Krupansky <j...@basetechnology.com>

> What field type and analyzer/tokenizer are you using?
>
> -- Jack Krupansky
>
> -----Original Message----- From: Thomas Michael Engelke Sent: Wednesday,
> January 29, 2014 10:45 AM To: solr-user@lucene.apache.org Subject: Not
> finding part of fulltext field when word ends in dot
> Hello everybody,
>
> we have a legacy solr installation in version 3.6.0.1. One of the indices
> defines a field named "content" as a fulltext field where a product
> description will reside. One of the records indexed contains the following
> data (excerpt):
>
> z. B. in der Serie 26KA.
>
> I had the problem that searching the value "26KA" didn't find anything.
> Using the analyzer of the adminstrative interface and using the full text
> on one hand and "26KA" as the query string, I can see how the search string
> is transformed by the used filter factories. The WordDelimiterFilterFactory
> transforms the "26KA." into "26KA", which is displayed like this (excerpt):
>
> 73 74  75    76
> in der Serie 26KA.
>             26KA
>
> It seems that it stripped the "26KA." of the dot. Using the option to
> highlight matches, an analysis search of "26KA" shows the lower of the two
> entries matches (after reaching the LowerCaseFilterFactory). However,
> querying the index using the query interface doesn't show any matches.
>
> I discovered that adding an asterisk to the search seems to work, as does
> adding the dot. I am puzzled by this, as I thought that the second added
> entry was the word actually indexed. I've tried looking up the definition
> of the administrative interface, but the documentation only specifies this
> for the latest version, where the display is different and (at least in the
> sample) doesn't show such "duplication".
>
> Can anybody shed some light onto this?
>

Re: Not finding part of fulltext field when word ends in dot

Reply via email to