On 24-Oct-07, at 12:39 PM, Alf Eaton wrote:
Mike Klaas wrote:
On 24-Oct-07, at 7:10 AM, Alf Eaton wrote:
Yes, I was just trying that this morning and it's an improvement,
though
not ideal if the field contains a lot of text (in other words
it's still
a suboptimal workaround).
I do think it might be useful for the response to contain an element
saying which fields were matched by the query, including which
sub-sections of a multi-valued field were matched.
This isn't readily-accessible information. Text search engines
work by
storing a list of documents and occurrence frequency for each
document
_per term_. At that point, the information about the structure of
the
document is not available.
The highlighting engine seems to know which fields were matched by the
query though - enough to be able to use hl.requireFieldMatch to only
return snippets from matched fields. The highlighter seems to have a
small problem with snippets reaching across multivalued fields, but if
that was sorted out then in theory the highlighter should be able to
tell you which field, and which of the multiple values, was
matched, no?
In theory, sure. The contrib Highlighter (that Solr uses) doesn't
work based on a Lucene stored field; it is instead fed a single
String. This means that Solr has to piece together all the values in
the field to do highlighting, and in the process, the distinction
among them is lost (or at least muted---some effort is made to keep a
position increment gap between them). So, it isn't trivial to return
this data.
Have you considered storing each section as a separate Solr Document?
I have considered this - in theory it would be easy enough to create a
separate index just for these items, but it adds an extra lump of
complexity to the search engine that I'd rather avoid. The
workaround of
adding a marked-up value to the indexed field, setting hl.fragsize
to 0
and parsing out the marked-up value from the highlighted fragment
should
be good enough for now.
It is also important to note that the highlighter _reanalyzes_ the
document to find the matches. So, there is nothing stopping you from
writing a bit of code that accomplishes exactly the same thing, and
returns the data in a custom way. TermVector could be used to speed
this up further.
-Mike