Hi,

there as been a similar question on the Tika mailing list recently:

http://mail-archives.apache.org/mod_mbox/tika-user/201505.mbox/%3cdm2pr09mb071346d01729fc9367308e94c7...@dm2pr09mb0713.namprd09.prod.outlook.com%3E

If you get Tika to OCR the embedded images, the parse-tika
plugin will probably also do if the Tika jar is replaced.

Sebastian

On 10/06/2015 03:55 PM, [email protected] wrote:
> Hello,
> 
> I use Nutch v1.10, i just want to know if Nutch with Tika parser v1.8 can 
> natively OCR images from PDF files? I can OCR JPEG or PNG files but Tika do 
> not convert images from PDF. I use Elastic to index.
> 
> Thank you
> 

Reply via email to