Hi,

Looking more at the new DocValues for 4.0, they are only per-document, right?

So I guess what I'm thinking is to use the good old Payloads per term to store 
this info. Since that's a single value, we could encode the values as byte[] 
somehow.

But the crucial point here is how to iterate through every single matching term 
in a field and pull out the payloads?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 10. okt. 2011, at 16:19, Jan Høydahl wrote:

> Hi,
> 
> We index structured documents, with numbered chapters, paragraphs and 
> sentences. After doing a (rather complex) search, we may get multiple matches 
> in each result doc. We want to highlight those matches in our front-end and 
> currently we do a simple string match of the query words against the raw text.
> 
> However, this highlights some words that do not satisfy the original query, 
> and also does not highlight other words where the match was in a stem, or 
> synonym or wildcard. We thus need to improve this, and my plan was to utilize 
> DocValues (Payloads). Would the following work?
> 
> 1. For each term in the field "text", index DocValues with info about 
> chapter#, paragraph#, sentence# and word#.
>   This can be done in our application code, e.g. "foo|1,2,3,4" for chapter 1, 
> paragraph 2, sentence 3 and word 4.
> 
> 2. Then, for a specific document in the result list, retrieve a list of all 
> matches in field "text", and for each match,
>   retrieve the associated DocValues.
> 
> 3. The client application can now use this information to highlight matches, 
> as well as "jump to next match" etc,
>   and would highlight the correct words only, e.g. it would be able to 
> highlight "colour" even if the match was on the
>   synonym "color".
> 
> Another use case for this technique would be OCR applications where we store 
> with each term its x,y offsets for where it occurs in
> the original TIFF image scan.
> 
> What is in already in place and what code needs to be written? I don't 
> currently see how to get a complete list of matches for a particular document.
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
> 

Reply via email to