On 3-Dec-07, at 10:58 AM, Owens, Martin wrote:
You can tell lucene to store token offsets using TermVectors
(configurable via schema.xml). Then you can customize the request
handler to return the token offsets (and/or positions) by retrieving
the TVs.
I think that is the best plan of action, how do I create a custom
request handler that will use the existing indexed fields? There
will be 2 requests as I see it, 1 for the search and 1 to retrieve
the offsets when you view one of those found items. Any advice you
can give me will be much appricated as I've had no luck with google
so far.
First, you need to store token offets for the field:
See http://wiki.apache.org/solr/SchemaXml , "Expert field options".
You definitely want termVectors=true, termOffsets=true.
You do not necessarily need two requests; instead, you can override
or modify the request handler you are using (StandardRequestHandler,
DisMaxREquestHandler) to return the information. You'll have to
process the Query to extract the terms (like HighlighingUtils does),
then get the TermVector token offset data for each matching doc and
look for the terms in the Query. I haven't worked with Term Vectors
(a Lucene API), so I'm not sure exactly how to go about this.
-Mike