Ok this is one of those situations where I must be doing something stupid, but 
I can't get Tika to properly process the attached file.  It's an image based 
PDF.  It's just not getting any text out of it.  Even if I run with OCRStrategy 
= ONLY_OCR.

It's definitely getting to the call to doOCROnCurrentPage(AUTO)in 
AbstractPDF2XHTML, so it's not a matter of the character counts preventing the 
OCR.

Don't think it has anything to do with the fact that it is in German.  Tried 
setting the language to DEU, but same results

What is going on?

Peter Kronenberg  |  Senior AI Analytic ENGINEER
C: 703.887.5623
[Torch AI]<http://www.torch.ai/>
4303 W. 119th St., Leawood, KS 66209
WWW.TORCH.AI<http://www.torch.ai/>


Attachment: sample german image.pdf
Description: sample german image.pdf

Reply via email to