Hi!

Can anybody point me into the right direction? this text in tiff seems to be a special tag used by Microsoft and some other applications.

regards
eliott


On 10/03/2011 16:18, Eliott wrote:
Dear  Users!

We are using tika indirectly for a project based on jackrabbit. during the final phase of this project came into my attention that tiff files are also capable of storing the image and the ocr-ed text in a same file, just like PDFs do. Since we have many of such files, we have a business need to extract text from these tiffs to be able to do full text searches. As I understand tikka does not support this functionality in case of tiffs, while pdfs do work ok. Is there any special reason for this?

Has anybody written a text extractor or knows a library that can get the text layer from these files?

thanks in advance
eliott


Reply via email to