Hi!
Can anybody point me into the right direction? this text in tiff seems
to be a special tag used by Microsoft and some other applications.
regards
eliott
On 10/03/2011 16:18, Eliott wrote:
Dear Users!
We are using tika indirectly for a project based on jackrabbit. during
the final phase of this project came into my attention that tiff files
are also capable of storing the image and the ocr-ed text in a same
file, just like PDFs do. Since we have many of such files, we have a
business need to extract text from these tiffs to be able to do full
text searches. As I understand tikka does not support this
functionality in case of tiffs, while pdfs do work ok. Is there any
special reason for this?
Has anybody written a text extractor or knows a library that can get
the text layer from these files?
thanks in advance
eliott