On 4/17/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:
> The current situation with XMLWriter actually pulling the Document
> from the index

Yeah, but seeing people ask for *all* matching documents (or sometimes
evel all documents in the index), makes me think that we need to keep
streamability.

> coupled with the lack of access to the Query causes
> this to currently be a tricky situation.
> My hack is just within the
> handleRequest method of the request handler and makes a second pass
> over the DocList and re-retrieves the Document objects to highlight
> them,


There are a number of ways this could be handled, I think.

1) Preventing documents from being retrieved more than once:
  a) may not be a big deal with the document cache enabled, since they
should still be there
  b) could create a subclass of DocList or another class that contains
Document objects, not just the ids.  XMLWriter would need to be
changed to handle this type of class.

2) Access to the query for highlighting:
  a) I don't think streamability of results is important for
highlighting (I assume no one will ask for a million documents and
have them all highlighted), so it could be done ahead of time for all
the documents.
  b) More context (or even user-specified context) could be added to
the SolrRequest, and the Query(s) could go there.
  c) If we had a custom DocList object from 1.b then it could also
have a custom one for highlighting that carried this extra info.

> and adds the highlighted text to additional XML elements in the
> response, not to the <doc>'s.  So my current hack is not worth
> contributing.

I'm not even sure what the ideal highlighter syntax would look like...
Do you have an example of what you would consider ideal?
Highlighting seems important and universal enough that I wouldn't be
opposed to adding special syntax for it if it's reallly needed.  We
would want to make it flexible/powerful enough to handle whatever Mark
Harwood is cooking up for future highlighting as well.

> Yonik additionally brought up some other very good points regarding
> term vectors and stored fields.  Stored fields would be necessary for
> highlighting in the general sense, certainly, but I envision some
> applications wanting to store the original text elsewhere and a
> custom highlighting hook used to retrieve the original text through
> other means.

Hmmm, some sort of callback interface for XMLWriter for classes it
doesn't know about?

> I'm not quite sure where to go with this highlighting issue from here
> given what seems to be a bit of an overhaul in where the Document
> objects are accessed, or in being able to get the full context of the
> Query (and filters, etc) down to the XMLWriter.

Ahh, just details... nothing that can't be fixed.

> Thoughts?

Focus on the interface:
 - how clients will specify what extra info they want
 - how clients typically parse and use the XML (extra bonus if we can
make it semi-friendly to stylesheets/XSLT), and the ideal syntax for
representing the extra info

Then it's just a small matter of implementing it :-)

-Yonik

Reply via email to