RE: How to use Solr in my project

Fatima Issawi Thu, 26 Dec 2013 02:18:52 -0800

Hi,

I should clarify. We have another application extracting the text from the 
document. The full text from each document will be stored in a database either 
at the document level or page level (this hasn't been decided yet). We will 
also be storing word location of each word on the page in the database.

What I'm having problems with is deciding on the schema. We want a user to be 
able to search for a word in the database, have a list of documents that word 
is located in, and location in the document that word is located it. When he 
selects the search results, we want the scanned picture to have that word 
highlighted on the page. 

I want to index the document using Solr, but I'm having trouble figuring out 
how to design the schema to return that "word location" of a search term on the 
scanned picture in order to highlight it.

Does this make more sense?

Fatima

-----Original Message-----
From: Gora Mohanty [mailto:g...@mimirtech.com] 
Sent: Thursday, December 26, 2013 1:00 PM
To: solr-user@lucene.apache.org
Subject: Re: How to use Solr in my project

On 26 December 2013 10:54, Fatima Issawi <issa...@qu.edu.qa> wrote:
> Hello,
>
> First off, I apologize if this was sent twice. I was having issues 
> subscribing to the list.
>
> I'm a complete noob in Solr (and indexing), so I'm hoping someone can help me 
> figure out how to implement Solr in my project. I have gone through some 
> tutorials online and I was able to import and query text in some Arabic PDF 
> documents.
>
> We have some scans of Historical Handwritten Arabic documents that will have 
> text extracted into a database (or PDF). We would like the user to be able to 
> search the document for text, then have the scanned image show up in a viewer 
> with the text highlighted.

This will not work for scanned images which do not actually contain the text. 
If you have the text of the documents, the best that you can do is break the 
text into pages corresponding to the scanned images, and index into Solr the 
text from the pages and the scanned image that should be linked to the text. 
For a user search, you will need to show the scanned image for the entire page: 
Highlighting of the search term in an image is not possible without optical 
character recognition (OCR).

Similarly, if you are indexing from PDFs, you will need to ensure that they 
contain text, and not just images.

Regards,
Gora

RE: How to use Solr in my project

Reply via email to