> What do you mean by "word location"? The number on the page? What
> purpose would this serve?

I mean the (x, y) coordinates of the word on the page. We want to be able to 
highlight the image of the word that was extracted from the text.

> I think that you might be confusing things:
> * If you have the full-text, you can highlight where the word was found. Solr
>   highlighting handles this for you, and there is no need to store word 
> location
> * You can have different images (presumably, individual scanned pages)
> linked
>    to different sections of text, and show the entire image.
> Highlighting in the image
>    is not possible, unless by "word location" you mean the (x, y) coordinates 
> of
>    the word on the page. Even then:
>    - It will be prohibitively expensive to store the location of every word in
> every
>      image for a large number of documents
>    - Some image processing will be required to handle the highlighting after
> the
>      scanned image is retrieved

We will have the full text stored, but we want to highlight the text in the 
original image. I expect to process the image after retrieval. We do plan on 
storing the (x, y) coordinates of the words in a database - I suspected that it 
would be too expensive to store them in Solr. I guess I'm still confused about 
how to use Solr to index the document, but then retrieve the (x, y) coordinates 
of the search term from the database. Is this possible? If it can, can you give 
an example how this can be done?

Thank you!

Reply via email to