Dear Users!
We are using tika indirectly for a project based on jackrabbit. during
the final phase of this project came into my attention that tiff files
are also capable of storing the image and the ocr-ed text in a same
file, just like PDFs do. Since we have many of such files, we have a
business need to extract text from these tiffs to be able to do full
text searches. As I understand tikka does not support this functionality
in case of tiffs, while pdfs do work ok. Is there any special reason
for this?
Has anybody written a text extractor or knows a library that can get the
text layer from these files?
thanks in advance
eliott
- extracting text from tiff files from jackrabbit Eliott
-