Re: Not finding part of fulltext field when word ends in dot

Jack Krupansky Wed, 29 Jan 2014 08:56:52 -0800

You might want to add autoGeneratePhraseQueries="true" to your field type,but I don't think that would cause a break when going from 3.6 to 4.x. Thedefault for that attribute changed in Solr 3.5. What release was your dataindexed using? There may have been some subtle word delimiter filter changesbetween 3.x and 4.x.


Read:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201202.mbox/%3cc0551c512c863540bc59694a118452aa0764a...@its-embx-03.adsroot.itcs.umich.edu%3E

-----Original Message-----From: Thomas Michael Engelke

Sent: Wednesday, January 29, 2014 11:16 AM
To: solr-user@lucene.apache.org
Subject: Re: Not finding part of fulltext field when word ends in dot

The fieldType definition is a tad on the longer side:

               <fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
                       <analyzer type="index">
                               <tokenizer
class="solr.WhitespaceTokenizerFactory"/>

                               <filter
class="solr.WordDelimiterFilterFactory"
                                       catenateWords="1"
                                       catenateNumbers="1"
                                       generateNumberParts="1"
                                       splitOnCaseChange="1"
                                       generateWordParts="1"
                                       catenateAll="0"
                                       preserveOriginal="1"
                                       splitOnNumerics="0"
                               />

                               <filter
class="solr.LowerCaseFilterFactory"/>
                               <filter class="solr.SynonymFilterFactory"
synonyms="german/synonyms.txt" ignoreCase="true" expand="true"/>
                               <filter
class="solr.DictionaryCompoundWordTokenFilterFactory"

dictionary="german/german-common-nouns.txt"
                                       minWordSize="5"
                                       minSubwordSize="4"
                                       maxSubwordSize="15"
                                       onlyLongestMatch="true"
                               />

                               <filter class="solr.StopFilterFactory"
words="german/stopwords.txt" ignoreCase="true"
enablePositionIncrements="true"/>
                               <filter
class="solr.SnowballPorterFilterFactory" language="German2"
protected="german/protwords.txt"/>
                               <filter
class="solr.RemoveDuplicatesTokenFilterFactory"/>
                       </analyzer>
                       <analyzer type="query">
                               <tokenizer
class="solr.WhitespaceTokenizerFactory"/>

                               <filter
class="solr.WordDelimiterFilterFactory"
                                       catenateWords="0"
                                       catenateNumbers="0"
                                       generateWordParts="1"
                                       splitOnCaseChange="1"
                                       generateNumberParts="1"
                                       catenateAll="0"
                                       preserveOriginal="1"
                                       splitOnNumerics="0"
                               />
                               <filter
class="solr.LowerCaseFilterFactory"/>
                               <filter class="solr.StopFilterFactory"
words="german/stopwords.txt" ignoreCase="true"
enablePositionIncrements="true"/>
                               <filter
class="solr.SnowballPorterFilterFactory" language="German2"
protected="german/protwords.txt"/>
                               <filter
class="solr.RemoveDuplicatesTokenFilterFactory"/>
                       </analyzer>
               </fieldType>


Thank you for taking a look.


2014-01-29 Jack Krupansky <j...@basetechnology.com>

What field type and analyzer/tokenizer are you using?

-- Jack Krupansky

-----Original Message----- From: Thomas Michael Engelke Sent: Wednesday,
January 29, 2014 10:45 AM To: solr-user@lucene.apache.org Subject: Not
finding part of fulltext field when word ends in dot
Hello everybody,

we have a legacy solr installation in version 3.6.0.1. One of the indices
defines a field named "content" as a fulltext field where a product
description will reside. One of the records indexed contains the following
data (excerpt):

z. B. in der Serie 26KA.

I had the problem that searching the value "26KA" didn't find anything.
Using the analyzer of the adminstrative interface and using the full text

on one hand and "26KA" as the query string, I can see how the searchstringis transformed by the used filter factories. TheWordDelimiterFilterFactorytransforms the "26KA." into "26KA", which is displayed like this(excerpt):


73 74  75    76
in der Serie 26KA.
            26KA

It seems that it stripped the "26KA." of the dot. Using the option to
highlight matches, an analysis search of "26KA" shows the lower of the two
entries matches (after reaching the LowerCaseFilterFactory). However,
querying the index using the query interface doesn't show any matches.

I discovered that adding an asterisk to the search seems to work, as does
adding the dot. I am puzzled by this, as I thought that the second added
entry was the word actually indexed. I've tried looking up the definition
of the administrative interface, but the documentation only specifies this

for the latest version, where the display is different and (at least inthe

sample) doesn't show such "duplication".

Can anybody shed some light onto this?

Re: Not finding part of fulltext field when word ends in dot

Reply via email to