Re: Payloads for multiValued fields?

Mike Klaas Thu, 25 Oct 2007 15:41:54 -0700

On 24-Oct-07, at 12:39 PM, Alf Eaton wrote:

Mike Klaas wrote:
On 24-Oct-07, at 7:10 AM, Alf Eaton wrote:
Yes, I was just trying that this morning and it's an improvement,thoughnot ideal if the field contains a lot of text (in other wordsit's still
a suboptimal workaround).
I do think it might be useful for the response to contain an element
saying which fields were matched by the query, including which
sub-sections of a multi-valued field were matched.
This isn't readily-accessible information. Text search engineswork bystoring a list of documents and occurrence frequency for eachdocument_per term_. At that point, the information about the structure ofthe
document is not available.
The highlighting engine seems to know which fields were matched by the
query though - enough to be able to use hl.requireFieldMatch to only
return snippets from matched fields. The highlighter seems to have a
small problem with snippets reaching across multivalued fields, but if
that was sorted out then in theory the highlighter should be able to
tell you which field, and which of the multiple values, wasmatched, no?

In theory, sure. The contrib Highlighter (that Solr uses) doesn'twork based on a Lucene stored field; it is instead fed a singleString. This means that Solr has to piece together all the values inthe field to do highlighting, and in the process, the distinctionamong them is lost (or at least muted---some effort is made to keep aposition increment gap between them). So, it isn't trivial to returnthis data.

Have you considered storing each section as a separate Solr Document?
I have considered this - in theory it would be easy enough to create a
separate index just for these items, but it adds an extra lump of
complexity to the search engine that I'd rather avoid. Theworkaround ofadding a marked-up value to the indexed field, setting hl.fragsizeto 0and parsing out the marked-up value from the highlighted fragmentshould
be good enough for now.

It is also important to note that the highlighter _reanalyzes_ thedocument to find the matches. So, there is nothing stopping you fromwriting a bit of code that accomplishes exactly the same thing, andreturns the data in a custom way. TermVector could be used to speedthis up further.


-Mike

Re: Payloads for multiValued fields?

Reply via email to