Hi again,

We have another program that will be extracting the text, and it will be 
extracting the top right and bottom left corners of the words. You are right, I 
do expect to have a lot of data.

When would solr start experiencing issues in performance? Is it better to:

INDEX: 
- document metadata 
- words  

STORE: 
- document metadata
- words 
- coordinates 

in Solr rather than in the database? How would I set up the schema in order to 
store the coordinates?

If storing the coordinates in solr is not recommended, what would be the best 
process to get the coordinates after indexing the words and metadata? Do I 
search in solr and then use the documentID to then search the database for the 
words and coordinates?

Thanks for your patience. I don't have much choice in the use case. 


> -----Original Message-----
> From: Gora Mohanty [mailto:g...@mimirtech.com]
> Sent: Sunday, December 29, 2013 2:48 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to use Solr in my project
> 
> On 29 December 2013 11:10, Fatima Issawi <issa...@qu.edu.qa> wrote:
> [...]
> > We will have the full text stored, but we want to highlight the text in the
> original image. I expect to process the image after retrieval. We do plan on
> storing the (x, y) coordinates of the words in a database - I suspected that 
> it
> would be too expensive to store them in Solr. I guess I'm still confused about
> how to use Solr to index the document, but then retrieve the (x, y)
> coordinates of the search term from the database. Is this possible? If it can,
> can you give an example how this can be done?
> 
> Storing, and retrieving the coordinates from Solr will likely be faster than
> from the database. However, I still think that you should think more carefully
> about your use case of highlighting the images. It can be done, but is a
> significant amount of work, and will need storage, and computational
> resources.
> 1. For highlighting in the image, you will need to store two sets
>     of coordinates (e.g., top right and bottom left corners) as you
>     not know the length of the word in the image. Thus, say with
>     15 words per line, 50 lines per page, 100 pages per document,
>     you will need to store:
>       4 x 15 x 50 x 100 = 3,00,000 coordinates/document 2. Also, how are you
> going to get the coordinates in the first
>     place?
> 
> Regards,
> Gora

Reply via email to