Hi, Looking more at the new DocValues for 4.0, they are only per-document, right?
So I guess what I'm thinking is to use the good old Payloads per term to store this info. Since that's a single value, we could encode the values as byte[] somehow. But the crucial point here is how to iterate through every single matching term in a field and pull out the payloads? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 10. okt. 2011, at 16:19, Jan Høydahl wrote: > Hi, > > We index structured documents, with numbered chapters, paragraphs and > sentences. After doing a (rather complex) search, we may get multiple matches > in each result doc. We want to highlight those matches in our front-end and > currently we do a simple string match of the query words against the raw text. > > However, this highlights some words that do not satisfy the original query, > and also does not highlight other words where the match was in a stem, or > synonym or wildcard. We thus need to improve this, and my plan was to utilize > DocValues (Payloads). Would the following work? > > 1. For each term in the field "text", index DocValues with info about > chapter#, paragraph#, sentence# and word#. > This can be done in our application code, e.g. "foo|1,2,3,4" for chapter 1, > paragraph 2, sentence 3 and word 4. > > 2. Then, for a specific document in the result list, retrieve a list of all > matches in field "text", and for each match, > retrieve the associated DocValues. > > 3. The client application can now use this information to highlight matches, > as well as "jump to next match" etc, > and would highlight the correct words only, e.g. it would be able to > highlight "colour" even if the match was on the > synonym "color". > > Another use case for this technique would be OCR applications where we store > with each term its x,y offsets for where it occurs in > the original TIFF image scan. > > What is in already in place and what code needs to be written? I don't > currently see how to get a complete list of matches for a particular document. > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > Solr Training - www.solrtraining.com >