Ok this is one of those situations where I must be doing something stupid, but I can't get Tika to properly process the attached file. It's an image based PDF. It's just not getting any text out of it. Even if I run with OCRStrategy = ONLY_OCR.
It's definitely getting to the call to doOCROnCurrentPage(AUTO)in AbstractPDF2XHTML, so it's not a matter of the character counts preventing the OCR. Don't think it has anything to do with the fact that it is in German. Tried setting the language to DEU, but same results What is going on? Peter Kronenberg | Senior AI Analytic ENGINEER C: 703.887.5623 [Torch AI]<http://www.torch.ai/> 4303 W. 119th St., Leawood, KS 66209 WWW.TORCH.AI<http://www.torch.ai/>
sample german image.pdf
Description: sample german image.pdf
