We have a pretty simple Solr Schema:

<fields>
   <field name="DocId" type="long" indexed="true" stored="true"
required="true" />
         <field name="DocTitle" type="string" indexed="true" stored="true"
required="true" />
         <field name="Content" type="text_general" indexed="false" stored="true"
required="true" />
         
         <field name="ContentSearch" type="text_general" indexed="true"
stored="false" multiValued="true"/>
         <field name="ContentSearchStemming" type="text_stem" indexed="true"
stored="false" multiValued="true"/>
         <field name="ContentSearchPhonetic" type="text_phonetic" indexed="true"
stored="false" multiValued="true"/>
         <field name="ContentSearchSynonym" type="text_synonym" indexed="true"
stored="false" multiValued="true"/>
         <field name="_version_" type="long" indexed="true" stored="true"/>
 </fields>
 
 <uniqueKey>DocId</uniqueKey>
 <copyField source="Content" dest="ContentSearch"/>
 <copyField source="Content" dest="ContentSearchStemming"/>
 <copyField source="Content" dest="ContentSearchPhonetic"/>
 <copyField source="Content" dest="ContentSearchSynonym"/>
 
 <fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
</fieldType>

<fieldType name="text_stem" class="solr.TextField" >
    <analyzer>
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       <filter class="solr.SnowballPorterFilterFactory"/>
    </analyzer>          
 </fieldType>
 
 <fieldType name="text_phonetic" class="solr.TextField" >
    <analyzer>
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       <filter class="solr.PhoneticFilterFactory" encoder="Soundex"
inject="false"/>
    </analyzer>          
 </fieldType>
 
 <fieldType name="text_synonym" class="solr.TextField" >
 <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
  </analyzer>         
 </fieldType>

We are indexing documents in Solr using Solrnet and have a requirement to
support Phonetic Search based on the Soundex algorithm. Once we have indexed
documents, we can search in the Solr Admin Panel using a Phonetic query and
the relevant document is returned in the Search Results but the highlight
collection is blank.

Eg. Use Case:
--------------
We index a text document which contains the word "electromagnetic"(Soundex
Code: E423)
We execute a Search in the Solr Admin Panel using the following query:
ContentSearchPhonetic:electing(Soundex Code: E423).
The Search shows one document returned but the highlight collection is
blank.
Solr is definitely using the Phonetic Soundex algorithm to locate the
document as the word "electing" is not present in the document. But somehow
it is not being able to return the highlight data.
The same schema and config can successfully return documents along with
highlight data for other approximate searches like synonym, fuzzy or
stemming. Only for phonetic search, we are not getting the highlight data.
The screenshot from the Solr Admin Panle is shown below:
<http://lucene.472066.n3.nabble.com/file/n4075492/HighlightIssue.png> 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Phonetic-Search-returning-documents-but-not-Highlight-Information-tp4075492.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to