[ http://issues.apache.org/jira/browse/SOLR-37?page=all ]
Andrew May updated SOLR-37:
---------------------------
Attachment: patch.diff
New patch incorporating feedback (hopefully the diff is more usable this time).
* default fragsize now 100
* removed redundant defaults when getting fragsize and snippets
* fixed tests and added new tests
* added "Enable Highlighting" and "Fields to Highlight" to the advanced form in
the admin pages
The other change, which is more complex is to add a new "hl.exact" parameter
(which defaults to false) which affects how the QueryScorer is created. The
logic is now this:
if using gradient formatter
new QueryScorer(query, indexReader, fieldName)
else if hl.exact=true
new QueryScorer(query, fieldName)
else
new QueryScorer(query)
My understanding is that the GradientFormatter requires the scorer to be
created with IndexReader and field name to work properly, so using a gradient
formatter for any field overrides the hl.exact flag.
I've assumed that it's more efficient to create a QueryScorer that doesn't use
an IndexReader in the case of hl.exact=true. If not then that could be rolled
in with the gradient formatter case.
Then the default behaviour is to create a QueryScorer without the field name
and have less exact highlighting.
Does that sound like reasonable behaviour?
Ah - looks like I'm overlapping with a comment from Mike. I'm suggesting
'exact' because of how adding the fieldname to the QueryScorer affects searches
across multiple fields - basically for what I'm using it for I don't want a
value I searched for in the journal field appearing in the highlight for the
title field (which was searched with something different) - so I would want
hl.exact=true. But you're right - this is probably an overly broad term, and it
is all about the scorer.
As for removing the gradient highlighter - I still don't really know how it's
supposed to work, and I can't get it to do anything useful when searching my
data, but perhaps that's my configuration error (rather than a coding error in
this patch). I'll probably end up using the simple formatter with custom pre
and post values.
> Add additional configuration options for Highlighting
> -----------------------------------------------------
>
> Key: SOLR-37
> URL: http://issues.apache.org/jira/browse/SOLR-37
> Project: Solr
> Issue Type: Improvement
> Components: search
> Reporter: Andrew May
> Attachments: patch, patch, patch.diff
>
>
> As discussed in the mailing list, I've been looking at adding additional
> configuration options for highlighting.
> I've made quite a few changes to the properties for highlighting:
> Properties that can be set on request, or in solrconfig.xml at the top level:
> highlight (true/false)
> highlightFields
> Properties that can be set in solrconfig.xml at the top level or per-field
> formatter (simple/gradient)
> formatterPre (preTag for simple formatter)
> formatterPost (postTag for simple formatter)
> formatterMinFgCl (min foreground colour for gradient formatter)
> formatterMaxFgCl (max foreground colour for gradient formatter)
> formatterMinBgCl (min background colour for gradient formatter)
> formatterMaxBgCl (max background colour for gradient formatter)
> fragsize (if <=0 use NullFragmenter, otherwise use GapFragmenter with this
> value)
> I've added variables for these values to CommonParams, plus there's a fields
> Map<String,CommonParams> that is parsed from nested NamedLists (i.e. a lst
> named "fields", with a nested lst for each field).
> Here's a sample of how you can mix and match properties in solrconfig.xml:
> <requestHandler name="hl" class="solr.StandardRequestHandler" >
> <str name="formatter">simple</str>
> <str name="formatterPre"><i></str>
> <str name="formatterPost"></i></str>
> <str name="highlightFields">title,authors,journal</str>
> <int name="fragsize">0</int>
> <lst name="fields">
> <lst name="abstract">
> <str name="formatter">gradient</str>
> <str name="formatterMinBgCl">#FFFF99</str>
> <str name="formatterMaxBgCl">#FF9900</str>
> <int name="fragsize">30</int>
> <int name="maxSnippets">2</int>
> </lst>
> <lst name="authors">
> <str name="formatterPre"><strong></str>
> <str name="formatterPost"></strong></str>
> </lst>
> </lst>
> </requestHandler>
> I've created HighlightingUtils to handle most of the parameter parsing, but
> the hightlighting is still done in SolrPluginUtils and the
> doStandardHighlighting() method still has the same signature, but the other
> highlighting methods have had to be changed (because highlighters are now
> created per highlighted field).
> I'm not particularly happy with the code to pull parameters from
> CommonParams, first checking the field then falling back, e.g.:
> String pre = (params.fields.containsKey(fieldName) &&
> params.fields.get(fieldName).formatterPre != null) ?
> params.fields.get(fieldName).formatterPre :
> params.formatterPre != null ? params.formatterPre : "<em>";
> I've removed support for a custom formatter - just choosing between
> simple/gradient. Probably that's a bad decision, but I wanted an easy way to
> choose between the standard formatters without having to invent a generic way
> of supplying arguments for the constructor. Perhaps there should be
> formatterType=simple/gradient and formatterClass=... which overrides
> formatterType if set at a lower level - with the formatterClass having to
> have a zero-args constructor? Note: gradient is actually
> SpanGradientFormatter.
> I'm not sure I properly understand how Fragmenters work, so supplying
> fragsize to GapFragmenter where >0 (instead of what was a default of 50) may
> not make sense.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira