On 24-Oct-07, at 12:39 PM, Alf Eaton wrote:

Mike Klaas wrote:
On 24-Oct-07, at 7:10 AM, Alf Eaton wrote:
Yes, I was just trying that this morning and it's an improvement, though not ideal if the field contains a lot of text (in other words it's still
a suboptimal workaround).

I do think it might be useful for the response to contain an element
saying which fields were matched by the query, including which
sub-sections of a multi-valued field were matched.

This isn't readily-accessible information. Text search engines work by storing a list of documents and occurrence frequency for each document _per term_. At that point, the information about the structure of the
document is not available.

The highlighting engine seems to know which fields were matched by the
query though - enough to be able to use hl.requireFieldMatch to only
return snippets from matched fields. The highlighter seems to have a
small problem with snippets reaching across multivalued fields, but if
that was sorted out then in theory the highlighter should be able to
tell you which field, and which of the multiple values, was matched, no?

In theory, sure. The contrib Highlighter (that Solr uses) doesn't work based on a Lucene stored field; it is instead fed a single String. This means that Solr has to piece together all the values in the field to do highlighting, and in the process, the distinction among them is lost (or at least muted---some effort is made to keep a position increment gap between them). So, it isn't trivial to return this data.

Have you considered storing each section as a separate Solr Document?

I have considered this - in theory it would be easy enough to create a
separate index just for these items, but it adds an extra lump of
complexity to the search engine that I'd rather avoid. The workaround of adding a marked-up value to the indexed field, setting hl.fragsize to 0 and parsing out the marked-up value from the highlighted fragment should
be good enough for now.

It is also important to note that the highlighter _reanalyzes_ the document to find the matches. So, there is nothing stopping you from writing a bit of code that accomplishes exactly the same thing, and returns the data in a custom way. TermVector could be used to speed this up further.

-Mike

Reply via email to