Re: extracting text from tiff files from jackrabbit

Eliott Fri, 11 Mar 2011 08:17:00 -0800

Hi!

Can anybody point me into the right direction? this text in tiff seemsto be a special tag used by Microsoft and some other applications.


regards
eliott


On 10/03/2011 16:18, Eliott wrote:

Dear  Users!
We are using tika indirectly for a project based on jackrabbit. duringthe final phase of this project came into my attention that tiff filesare also capable of storing the image and the ocr-ed text in a samefile, just like PDFs do. Since we have many of such files, we have abusiness need to extract text from these tiffs to be able to do fulltext searches. As I understand tikka does not support thisfunctionality in case of tiffs, while pdfs do work ok. Is there anyspecial reason for this?
Has anybody written a text extractor or knows a library that can getthe text layer from these files?
thanks in advance
eliott

Re: extracting text from tiff files from jackrabbit

Reply via email to