Hi Everyone,

I'm facing an issue in which my solr query is returning highlighted snippets 
for some, but not all results.  For reference, I'm searching through an index 
that contains web crawls of human-rights-related websites.  I'm running solr as 
a webapp under Tomcat and I've included the query's solr params from the Tomcat 
log:

...
webapp=/solr-4.2
path=/select
params={facet=true&sort=score+desc&group.limit=10&spellcheck.q=Unangan&f.mimetype_code.facet.limit=7&hl.simple.pre=<code>&q.alt=*:*&f.organization_type__facet.facet.limit=6&f.language__facet.facet.limit=6&hl=true&f.date_of_capture_yyyy.facet.limit=6&group.field=original_url&hl.simple.post=</code>&facet.field=domain&facet.field=date_of_capture_yyyy&facet.field=mimetype_code&facet.field=geographic_focus__facet&facet.field=organization_based_in__facet&facet.field=organization_type__facet&facet.field=language__facet&facet.field=creator_name__facet&hl.fragsize=600&f.creator_name__facet.facet.limit=6&facet.mincount=1&qf=text^1&hl.fl=contents&hl.fl=title&hl.fl=original_url&wt=ruby&f.geographic_focus__facet.facet.limit=6&defType=edismax&rows=10&f.domain.facet.limit=6&q=Unangan&f.organization_based_in__facet.facet.limit=6&q.op=AND&group=true&hl.usePhraseHighlighter=true}
 hits=8 status=0 QTime=108
...

For the query above (which can be simplified to say: find all documents that 
contain the word "unangan" and return facets, highlights, etc.), I get five 
search results.  Only three of these are returning highlighted snippets.  
Here's the "highlighting" portion of the solr response (note: printed in ruby 
notation because I'm receiving this response in a Rails app):

--------
"highlighting"=>
  
{"20100602195444/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%202002%20tentang%20Perlindungan%20Anak.pdf"=>
    {},
   
"20100902203939/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%202002%20tentang%20Perlindungan%20Anak.pdf"=>
    {},
   
"20111202233029/http://www.kontras.org/uu_ri_ham/UU%20Nomor%2023%20Tahun%202002%20tentang%20Perlindungan%20Anak.pdf"=>
    {},
   "20100618201646/http://www.komnasham.go.id/portal/files/39-99.pdf"=>
    {"contents"=>
      ["...actual snippet is returned here..."]},
   "20100902235358/http://www.komnasham.go.id/portal/files/39-99.pdf"=>
    {"contents"=>
      ["...actual snippet is returned here..."]},
   
"20110302213056/http://www.komnasham.go.id/publikasi/doc_download/2-uu-no-39-tahun-1999"=>
    {"contents"=>
      ["...actual snippet is returned here..."]},
   
"20110302213102/http://www.komnasham.go.id/publikasi/doc_view/2-uu-no-39-tahun-1999?tmpl=component&format=raw"=>
    {"contents"=>
      ["...actual snippet is returned here..."]},
   
"20120303113654/http://www.iwgia.org/iwgia_files_publications_files/0028_Utimut_heritage.pdf"=>
    {}}
--------

I have eight (as opposed to five) results above because I'm also doing a 
grouped query, grouping by a field called "original_url", and this leads to 
five grouped results.

I've confirmed that my highlight-lacking results DO contain the word "unangan", 
as expected, and this term is appearing in a text field that's indexed and 
stored, and being searched for all text searches.  For example, one of the 
search results is for a crawl of this document: 
http://www.iwgia.org/iwgia_files_publications_files/0028_Utimut_heritage.pdf

And if you view that document on the web, you'll see that it does contain 
"unangan".

Has anyone seen this before?  And does anyone have any good suggestions for 
troubleshooting/fixing the problem?

Thanks!

- Eric

Reply via email to