On 4/17/06, Erik Hatcher <[EMAIL PROTECTED]> wrote: > The current situation with XMLWriter actually pulling the Document > from the index
Yeah, but seeing people ask for *all* matching documents (or sometimes evel all documents in the index), makes me think that we need to keep streamability. > coupled with the lack of access to the Query causes > this to currently be a tricky situation. > My hack is just within the > handleRequest method of the request handler and makes a second pass > over the DocList and re-retrieves the Document objects to highlight > them, There are a number of ways this could be handled, I think. 1) Preventing documents from being retrieved more than once: a) may not be a big deal with the document cache enabled, since they should still be there b) could create a subclass of DocList or another class that contains Document objects, not just the ids. XMLWriter would need to be changed to handle this type of class. 2) Access to the query for highlighting: a) I don't think streamability of results is important for highlighting (I assume no one will ask for a million documents and have them all highlighted), so it could be done ahead of time for all the documents. b) More context (or even user-specified context) could be added to the SolrRequest, and the Query(s) could go there. c) If we had a custom DocList object from 1.b then it could also have a custom one for highlighting that carried this extra info. > and adds the highlighted text to additional XML elements in the > response, not to the <doc>'s. So my current hack is not worth > contributing. I'm not even sure what the ideal highlighter syntax would look like... Do you have an example of what you would consider ideal? Highlighting seems important and universal enough that I wouldn't be opposed to adding special syntax for it if it's reallly needed. We would want to make it flexible/powerful enough to handle whatever Mark Harwood is cooking up for future highlighting as well. > Yonik additionally brought up some other very good points regarding > term vectors and stored fields. Stored fields would be necessary for > highlighting in the general sense, certainly, but I envision some > applications wanting to store the original text elsewhere and a > custom highlighting hook used to retrieve the original text through > other means. Hmmm, some sort of callback interface for XMLWriter for classes it doesn't know about? > I'm not quite sure where to go with this highlighting issue from here > given what seems to be a bit of an overhaul in where the Document > objects are accessed, or in being able to get the full context of the > Query (and filters, etc) down to the XMLWriter. Ahh, just details... nothing that can't be fixed. > Thoughts? Focus on the interface: - how clients will specify what extra info they want - how clients typically parse and use the XML (extra bonus if we can make it semi-friendly to stylesheets/XSLT), and the ideal syntax for representing the extra info Then it's just a small matter of implementing it :-) -Yonik