Re: Solr Phonetic Search Highlight issue in search results

Jan Høydahl Tue, 02 Apr 2013 14:39:50 -0700

If you want to highlight, you need to turn on highlighting for the actual field 
you search, and that field needs to be stored, i.e. &hl.fl=ContentSearchPhonetic


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

1. apr. 2013 kl. 14:16 skrev Erick Erickson <erickerick...@gmail.com>:

> Good question, you're causing me to think... about code I know very
> little about <G>.
> 
> So rather than spouting off, I tried it and.. it works fine for me, either 
> with
> or without using fast vector highlighter on, admittedly, a very simple test.
> 
> So I think I'd try peeling off all the extra stuff you've put into your 
> configs
> (sorry, I don't have time right now to try to reproduce) and get the very
> simple case working, then build the rest back up and see where the
> problem begins.
> 
> Sorry for the mis-direction!
> 
> Erick
> 
> 
> 
> On Mon, Apr 1, 2013 at 1:07 AM, Soumyanayan Kar
> <soumyanayan....@rebaca.com> wrote:
>> Hi Erick,
>> 
>> Thanks for the reply. But help me understand this: If Solr is able to
>> isolate the two documents which contain the term "fact" being the phonetic
>> equivalent of the search term "fakt", then why will it be unable to
>> highlight the terms based on the same logic it uses to search the documents.
>> 
>> Also, it is correctly highlighting the results in other searches which are
>> also approximate searches and not exact ones for eg. Fuzzy or Synonym
>> search. In these cases also the highlights in the search results are far
>> from the actual search term but still they are getting correctly
>> highlighted.
>> 
>> Maybe I am getting it completely wrong but it looks like there is something
>> wrong with my implementation.
>> 
>> Thanks & Regards,
>> 
>> Soumya.
>> 
>> 
>> -----Original Message-----
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: 27 March 2013 06:07 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr Phonetic Search Highlight issue in search results
>> 
>> How would you expect it to highlight successfully? The term is "fakt",
>> there's nothing built in (and, indeed couldn't be) to un-phoneticize it into
>> "fact" and apply that to the Content field. The whole point of phonetic
>> processing is to do a lossy translation from the word into some variant,
>> losing precision all the way.....
>> 
>> So this behavior is unsurprising...
>> 
>> Best
>> Erick
>> 
>> 
>> 
>> 
>> On Tue, Mar 26, 2013 at 7:28 AM, Soumyanayan Kar <soumyanayan....@rebaca.com
>>> wrote:
>> 
>>> When we are issuing a query with Phonetic Search, it is returning the
>>> correct documents but not returning the highlights. When we use
>>> Stemming or Synonym searches we are getting the proper highlights.
>>> 
>>> 
>>> 
>>> For example, when we execute a phonetic query for the term
>>> fakt(ContentSearchPhonetic:fakt) in the Solr Admin interface, it
>>> returns two documents containing the term "fact"(phonetic token
>>> equivalent), but the list of highlights is empty as shown in the
>>> response below.
>>> 
>>> 
>>> 
>>>    <response>
>>> 
>>>    <lst name="responseHeader">
>>> 
>>>    <int name="status">0</int>
>>> 
>>>    <int name="QTime">16</int>
>>> 
>>>    <lst name="params">
>>> 
>>>      <str name="q">ContentSearchPhonetic:fakt</str>
>>> 
>>>      <str name="wt">xml</str>
>>> 
>>>    </lst>
>>> 
>>>  </lst>
>>> 
>>>    <result name="response" numFound="2" start="0">
>>> 
>>>        <doc>
>>> 
>>>          <long name="DocId">1</long>
>>> 
>>>          <str name="DocTitle">Doc 1</str>
>>> 
>>>          <str name="Content">Anyway, this game was excellent and was
>>> well worth the time.  The graphics are truly amazing and the sound
>>> track was pretty pleasant also. The  preacher was in  fact a
>>> thief.</str>
>>> 
>>>          <long name="_version_">1430480998833848320</long>
>>> 
>>>        </doc>
>>> 
>>>        <doc>
>>> 
>>>          <long name="DocId">2</long>
>>> 
>>>          <str name="DocTitle">Doc 2</str>
>>> 
>>>          <str name="Content">stunning. The  preacher was in  fact an
>>> excellent thief who  had stolen the original manuscript of Hamlet
>>> from an exhibit on the  Riviera, where  he also  acquired his
>>> remarkable and tan.</str>
>>> 
>>>          <long name="_version_">1430480998841188352</long>
>>> 
>>>        </doc>
>>> 
>>>      </result>
>>> 
>>>      <lst name="highlighting">
>>> 
>>>        <lst name="1"/>
>>> 
>>>        <lst name="2"/>
>>> 
>>>      </lst>
>>> 
>>>    </response>
>>> 
>>> 
>>> 
>>> Relevant section of Solr schema:
>>> 
>>> 
>>> 
>>>    <field name="DocId" type="long" indexed="true" stored="true"
>>> required="true"/>
>>> 
>>>    <field name="DocTitle" type="string" indexed="false" stored="true"
>>> required="true"/>
>>> 
>>>    <field name="Content" type="text_general" indexed="false"
>> stored="true"
>>> required="true"/>
>>> 
>>> 
>>> 
>>>    <field name="ContentSearch" type="text_general" indexed="true"
>>> stored="false" multiValued="true"/>
>>> 
>>>    <field name="ContentSearchStemming" type="text_stem" indexed="true"
>>> stored="false" multiValued="true"/>
>>> 
>>>    <field name="ContentSearchPhonetic" type="text_phonetic"
>> indexed="true"
>>> stored="false" multiValued="true"/>
>>> 
>>>    <field name="ContentSearchSynonym" type="text_synonym" indexed="true"
>>> stored="false" multiValued="true"/>
>>> 
>>> 
>>> 
>>>    <uniqueKey>DocId</uniqueKey>
>>> 
>>>    <copyField source="Content" dest="ContentSearch"/>
>>> 
>>>    <copyField source="Content" dest="ContentSearchStemming"/>
>>> 
>>>    <copyField source="Content" dest="ContentSearchPhonetic"/>
>>> 
>>>    <copyField source="Content" dest="ContentSearchSynonym"/>
>>> 
>>> 
>>> 
>>>    <fieldType name="text_stem" class="solr.TextField" >
>>> 
>>>      <analyzer>
>>> 
>>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>> 
>>>         <filter class="solr.SnowballPorterFilterFactory"/>
>>> 
>>>      </analyzer>
>>> 
>>>    </fieldType>
>>> 
>>> 
>>> 
>>>    <fieldType name="text_phonetic" class="solr.TextField" >
>>> 
>>>      <analyzer>
>>> 
>>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>> 
>>>         <filter class="solr.PhoneticFilterFactory"
>>> encoder="DoubleMetaphone" inject="false"/>
>>> 
>>>      </analyzer>
>>> 
>>>    </fieldType>
>>> 
>>> 
>>> 
>>>    <fieldType name="text_synonym" class="solr.TextField" >
>>> 
>>>    <analyzer>
>>> 
>>>      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>> 
>>>      <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>>> ignoreCase="true" expand="true"/>
>>> 
>>>    </analyzer>
>>> 
>>>    </fieldType>
>>> 
>>> 
>>> 
>>> Relevant section of Solr config:
>>> 
>>> 
>>> 
>>>    <requestHandler name="/select" class="solr.SearchHandler">
>>> 
>>>    <!-- default values for query parameters can be specified, these
>>> 
>>>         will be overridden by parameters in the request
>>> 
>>>      -->
>>> 
>>>     <lst name="defaults">
>>> 
>>>       <str name="echoParams">explicit</str>
>>> 
>>>       <int name="rows">100</int>
>>> 
>>>       <str name="df">ContentSearch</str>
>>> 
>>>     <bool name="hl">true</bool>
>>> 
>>>    <str name="hl.fl">Content</str>
>>> 
>>>    <str name="f.Content.hl.fragsize">150</str>
>>> 
>>>      <str name="f.Content.hl.snippets">40</str>
>>> 
>>>     </lst>
>>> 
>>>    </requestHandler>
>>> 
>>>    <searchComponent class="solr.HighlightComponent" name="highlight">
>>> 
>>>    <highlighting>
>>> 
>>>    <!-- Configure the standard fragmenter -->
>>> 
>>>    <!-- This could most likely be commented out in the "default" case
>>> -->
>>> 
>>>    <fragmenter name="gap"
>>> 
>>>                default="true"
>>> 
>>>                class="solr.highlight.GapFragmenter">
>>> 
>>>      <lst name="defaults">
>>> 
>>>        <int name="hl.fragsize">100</int>
>>> 
>>>      </lst>
>>> 
>>>    </fragmenter>
>>> 
>>> 
>>> 
>>>    <!-- A regular-expression-based fragmenter
>>> 
>>>         (for sentence extraction)
>>> 
>>>      -->
>>> 
>>>    <fragmenter name="regex"
>>> 
>>>                class="solr.highlight.RegexFragmenter">
>>> 
>>>      <lst name="defaults">
>>> 
>>>        <!-- slightly smaller fragsizes work better because of slop
>>> -->
>>> 
>>>        <int name="hl.fragsize">70</int>
>>> 
>>>        <!-- allow 50% slop on fragment sizes -->
>>> 
>>>        <float name="hl.regex.slop">0.5</float>
>>> 
>>>        <!-- a basic sentence pattern -->
>>> 
>>>        <str name="hl.regex.pattern">[-\w
>>> ,/\n\&quot;&apos;]{20,200}</str>
>>> 
>>>      </lst>
>>> 
>>>    </fragmenter>
>>> 
>>> 
>>> 
>>> Has anyone experienced this kind of behaviour before? Need some
>>> direction for troubleshooting.
>>> 
>>> 
>>> 
>>> Soumya.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> The information contained in this electronic message and any attachments to 
>> this message are intended for the exclusive use of the addressee(s) and may 
>> contain proprietary, confidential or privileged information. If you are not 
>> the intended recipient, you should not disseminate, distribute or copy this 
>> e-mail. Please notify the sender immediately and destroy all copies of this 
>> message and any attachments.
>> 
>> WARNING: Computer viruses can be transmitted via email. The recipient should 
>> check this email and any attachments for the presence of viruses. The 
>> company accepts no liability for any damage caused by any virus transmitted 
>> by this email.

Re: Solr Phonetic Search Highlight issue in search results

Reply via email to