Any thoughts I why I can't get OCR to work on this PDF ? Peter Kronenberg | Senior AI Analytic ENGINEER C: 703.887.5623 [Torch AI]<http://www.torch.ai/> 4303 W. 119th St., Leawood, KS 66209 WWW.TORCH.AI<http://www.torch.ai/>
From: Peter Kronenberg <[email protected]> Sent: Wednesday, September 22, 2021 9:33 PM To: [email protected] Cc: [email protected] Subject: {EXTERNAL}Problem running OCR This email was sent from outside your organisation, yet is displaying the name of someone from your organisation. This often happens in phishing attempts. Please only interact with this email if you know its source and that the content is safe. CAUTION: This email originated from outside of the organization. DO NOT click links or open attachments unless you recognize the sender and know the content is safe. Ok this is one of those situations where I must be doing something stupid, but I can't get Tika to properly process the attached file. It's an image based PDF. It's just not getting any text out of it. Even if I run with OCRStrategy = ONLY_OCR. It's definitely getting to the call to doOCROnCurrentPage(AUTO)in AbstractPDF2XHTML, so it's not a matter of the character counts preventing the OCR. Don't think it has anything to do with the fact that it is in German. Tried setting the language to DEU, but same results What is going on? Peter Kronenberg | Senior AI Analytic ENGINEER C: 703.887.5623 [Torch AI]<https://us-east-2.protection.sophos.com/?d=torch.ai&u=aHR0cDovL3d3dy50b3JjaC5haS8=&i=NjAwMDY2MjNjNzQ1NDY0ODkyYTNmNTg3&t=dHRDUUJralFuRnRCU2tvcmRLNUUycFdBV2RmazdTZU0zZUZVM21GSXhobz0=&h=5a6182eefa654537ab7f264257135b6e> 4303 W. 119th St., Leawood, KS 66209 WWW.TORCH.AI<https://us-east-2.protection.sophos.com?d=torch.ai&u=aHR0cDovL3d3dy50b3JjaC5haS8=&i=NjAwMDY2MjNjNzQ1NDY0ODkyYTNmNTg3&t=dHRDUUJralFuRnRCU2tvcmRLNUUycFdBV2RmazdTZU0zZUZVM21GSXhobz0=&h=5a6182eefa654537ab7f264257135b6e>
