extracting text from tiff files from jackrabbit

Eliott Thu, 10 Mar 2011 07:19:21 -0800

Dear  Users!

We are using tika indirectly for a project based on jackrabbit. duringthe final phase of this project came into my attention that tiff filesare also capable of storing the image and the ocr-ed text in a samefile, just like PDFs do. Since we have many of such files, we have abusiness need to extract text from these tiffs to be able to do fulltext searches. As I understand tikka does not support this functionalityin case of tiffs, while pdfs do work ok. Is there any special reasonfor this?

Has anybody written a text extractor or knows a library that can get thetext layer from these files?


thanks in advance
eliott

extracting text from tiff files from jackrabbit

Reply via email to