[ 
http://issues.apache.org/jira/browse/SOLR-37?page=comments#action_12431590 ] 
            
Andrew May commented on SOLR-37:
--------------------------------

I've spent a bit of time trying to understand Gradient formatting and how 
QueryScorer is used. As I didn't see any very good documentation for this (I 
may have missed it) - I thought I'd share.

It appears that GradientFormatter colours according to the term's weight within 
the index - so terms that appear less frequently in the index will be coloured 
closer to the max foreground/background colour. So, the colour is not related 
to the specific document or fragment being evaluated and that term will be 
highlighted the same for the entire results set. If two terms appear with a 
similar frequency in the index they will have similar colours - and this seems 
to happen a lot (perhaps because scaling is done between 0 and maxWeight rather 
than minWeight and maxWeight).

There's also a fairly serious bug in the colouring that makes a lot of 
combinations give meaningless results (e.g. minBg=#FF0000, maxBg=#00FF00 will 
give results coloured #FFFF00) - see GradientFormatter.getColorVal().

In other words, I now agree with Mike that we should not support Gradient 
formatting. Perhaps we still want to retain the hl.formatter= parameter in case 
we have any other values than "simple" in the future - and keep hl.simple.pre 
and hl.simple.post as they are.

As for the QueryScorer, I think it makes sense to support all three ways it can 
be construted:
1) hl.scoring=simple (the default)  - construct with Query only. May have some 
matches from other terms, but allows you to highlight different fields to the 
ones searched.
2) hl.scoring=field - constructed with Query and fieldName. Only highlights 
terms matched in this field by the query.
3) hl.scoring=fieldidx - constructed with Query, fieldName and IndexReader. I 
think the selection of the best fragment(s) will be improved because the terms 
will be weighted according to their frequency in the index - but this has to be 
more costly as it calls IndexReader.docFreq for each term.

Does that sound reasonable?

> Add additional configuration options for Highlighting
> -----------------------------------------------------
>
>                 Key: SOLR-37
>                 URL: http://issues.apache.org/jira/browse/SOLR-37
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Andrew May
>         Attachments: patch, patch, patch.diff
>
>
> As discussed in the mailing list, I've been looking at adding additional 
> configuration options for highlighting. 
> I've made quite a few changes to the properties for highlighting:
> Properties that can be set on request, or in solrconfig.xml at the top level:
>   highlight (true/false)
>   highlightFields
> Properties that can be set in solrconfig.xml at the top level or per-field
>   formatter (simple/gradient)
>   formatterPre (preTag for simple formatter)
>   formatterPost (postTag for simple formatter)
>   formatterMinFgCl (min foreground colour for gradient formatter)
>   formatterMaxFgCl (max foreground colour for gradient formatter)
>   formatterMinBgCl (min background colour for gradient formatter)
>   formatterMaxBgCl (max background colour for gradient formatter)
>   fragsize (if <=0 use NullFragmenter, otherwise use GapFragmenter with this 
> value)
> I've added variables for these values to CommonParams, plus there's a fields 
> Map<String,CommonParams> that is parsed from nested NamedLists (i.e. a lst 
> named "fields", with a nested lst for each field).
> Here's a sample of how you can mix and match properties in solrconfig.xml:
>   <requestHandler name="hl" class="solr.StandardRequestHandler" >
>     <str name="formatter">simple</str>
>     <str name="formatterPre">&lt;i></str>
>     <str name="formatterPost">&lt;/i></str>
>     <str name="highlightFields">title,authors,journal</str>
>     <int name="fragsize">0</int>
>     <lst name="fields">
>       <lst name="abstract">
>         <str name="formatter">gradient</str>
>         <str name="formatterMinBgCl">#FFFF99</str>
>         <str name="formatterMaxBgCl">#FF9900</str>
>         <int name="fragsize">30</int>
>         <int name="maxSnippets">2</int>
>       </lst>
>       <lst name="authors">
>         <str name="formatterPre">&lt;strong></str>
>         <str name="formatterPost">&lt;/strong></str>
>       </lst>
>     </lst>
>   </requestHandler>
> I've created HighlightingUtils to handle most of the parameter parsing, but 
> the hightlighting is still done in SolrPluginUtils and the 
> doStandardHighlighting() method still has the same signature, but the other 
> highlighting methods have had to be changed (because highlighters are now 
> created per highlighted field).
> I'm not particularly happy with the code to pull parameters from 
> CommonParams, first checking the field then falling back, e.g.:
>          String pre = (params.fields.containsKey(fieldName) && 
> params.fields.get(fieldName).formatterPre != null) ?
>                params.fields.get(fieldName).formatterPre : 
>                   params.formatterPre != null ? params.formatterPre : "<em>";
> I've removed support for a custom formatter - just choosing between 
> simple/gradient. Probably that's a bad decision, but I wanted an easy way to 
> choose between the standard formatters without having to invent a generic way 
> of supplying arguments for the constructor. Perhaps there should be 
> formatterType=simple/gradient and formatterClass=... which overrides 
> formatterType if set at a lower level - with the formatterClass having to 
> have a zero-args constructor? Note: gradient is actually 
> SpanGradientFormatter.
> I'm not sure I properly understand how Fragmenters work, so supplying 
> fragsize to GapFragmenter where >0 (instead of what was a default of 50) may 
> not make sense.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to