[ http://issues.apache.org/jira/browse/SOLR-37?page=all ]

Andrew May updated SOLR-37:
---------------------------

    Attachment: patch.diff

New patch incorporating feedback (hopefully the diff is more usable this time).

* default fragsize now 100
* removed redundant defaults when getting fragsize and snippets
* fixed tests and added new tests
* added "Enable Highlighting" and "Fields to Highlight" to the advanced form in 
the admin pages

The other change, which is more complex is to add a new "hl.exact" parameter 
(which defaults to false) which affects how the QueryScorer is created. The 
logic is now this:

if using gradient formatter
    new QueryScorer(query, indexReader, fieldName)
else if hl.exact=true
    new QueryScorer(query, fieldName)
else
    new QueryScorer(query)

My understanding is that the GradientFormatter requires the scorer to be 
created with IndexReader and field name to work properly, so using a gradient 
formatter for any field overrides the hl.exact flag.
I've assumed that it's more efficient to create a QueryScorer that doesn't use 
an IndexReader in the case of hl.exact=true. If not then that could be rolled 
in with the gradient formatter case.
Then the default behaviour is to create a QueryScorer without the field name 
and have less exact highlighting.

Does that sound like reasonable behaviour?

Ah - looks like I'm overlapping with a comment from Mike. I'm suggesting 
'exact' because of how adding the fieldname to the QueryScorer affects searches 
across multiple fields - basically for what I'm using it for I don't want a 
value I searched for in the journal field appearing in the highlight for the 
title field (which was searched with something different) - so I would want 
hl.exact=true. But you're right - this is probably an overly broad term, and it 
is all about the scorer.

As for removing the gradient highlighter - I still don't really know how it's 
supposed to work, and I can't get it to do anything useful when searching my 
data, but perhaps that's my configuration error (rather than a coding error in 
this patch). I'll probably end up using the simple formatter with custom pre 
and post values.

> Add additional configuration options for Highlighting
> -----------------------------------------------------
>
>                 Key: SOLR-37
>                 URL: http://issues.apache.org/jira/browse/SOLR-37
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Andrew May
>         Attachments: patch, patch, patch.diff
>
>
> As discussed in the mailing list, I've been looking at adding additional 
> configuration options for highlighting. 
> I've made quite a few changes to the properties for highlighting:
> Properties that can be set on request, or in solrconfig.xml at the top level:
>   highlight (true/false)
>   highlightFields
> Properties that can be set in solrconfig.xml at the top level or per-field
>   formatter (simple/gradient)
>   formatterPre (preTag for simple formatter)
>   formatterPost (postTag for simple formatter)
>   formatterMinFgCl (min foreground colour for gradient formatter)
>   formatterMaxFgCl (max foreground colour for gradient formatter)
>   formatterMinBgCl (min background colour for gradient formatter)
>   formatterMaxBgCl (max background colour for gradient formatter)
>   fragsize (if <=0 use NullFragmenter, otherwise use GapFragmenter with this 
> value)
> I've added variables for these values to CommonParams, plus there's a fields 
> Map<String,CommonParams> that is parsed from nested NamedLists (i.e. a lst 
> named "fields", with a nested lst for each field).
> Here's a sample of how you can mix and match properties in solrconfig.xml:
>   <requestHandler name="hl" class="solr.StandardRequestHandler" >
>     <str name="formatter">simple</str>
>     <str name="formatterPre">&lt;i></str>
>     <str name="formatterPost">&lt;/i></str>
>     <str name="highlightFields">title,authors,journal</str>
>     <int name="fragsize">0</int>
>     <lst name="fields">
>       <lst name="abstract">
>         <str name="formatter">gradient</str>
>         <str name="formatterMinBgCl">#FFFF99</str>
>         <str name="formatterMaxBgCl">#FF9900</str>
>         <int name="fragsize">30</int>
>         <int name="maxSnippets">2</int>
>       </lst>
>       <lst name="authors">
>         <str name="formatterPre">&lt;strong></str>
>         <str name="formatterPost">&lt;/strong></str>
>       </lst>
>     </lst>
>   </requestHandler>
> I've created HighlightingUtils to handle most of the parameter parsing, but 
> the hightlighting is still done in SolrPluginUtils and the 
> doStandardHighlighting() method still has the same signature, but the other 
> highlighting methods have had to be changed (because highlighters are now 
> created per highlighted field).
> I'm not particularly happy with the code to pull parameters from 
> CommonParams, first checking the field then falling back, e.g.:
>          String pre = (params.fields.containsKey(fieldName) && 
> params.fields.get(fieldName).formatterPre != null) ?
>                params.fields.get(fieldName).formatterPre : 
>                   params.formatterPre != null ? params.formatterPre : "<em>";
> I've removed support for a custom formatter - just choosing between 
> simple/gradient. Probably that's a bad decision, but I wanted an easy way to 
> choose between the standard formatters without having to invent a generic way 
> of supplying arguments for the constructor. Perhaps there should be 
> formatterType=simple/gradient and formatterClass=... which overrides 
> formatterType if set at a lower level - with the formatterClass having to 
> have a zero-args constructor? Note: gradient is actually 
> SpanGradientFormatter.
> I'm not sure I properly understand how Fragmenters work, so supplying 
> fragsize to GapFragmenter where >0 (instead of what was a default of 50) may 
> not make sense.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to