Any thoughts I why I can't get OCR to work on this PDF ?

Peter Kronenberg  |  Senior AI Analytic ENGINEER
C: 703.887.5623
[Torch AI]<http://www.torch.ai/>
4303 W. 119th St., Leawood, KS 66209
WWW.TORCH.AI<http://www.torch.ai/>


From: Peter Kronenberg <[email protected]>
Sent: Wednesday, September 22, 2021 9:33 PM
To: [email protected]
Cc: [email protected]
Subject: {EXTERNAL}Problem running OCR

This email was sent from outside your organisation, yet is displaying the name 
of someone from your organisation. This often happens in phishing attempts. 
Please only interact with this email if you know its source and that the 
content is safe.

CAUTION: This email originated from outside of the organization. DO NOT click 
links or open attachments unless you recognize the sender and know the content 
is safe.


Ok this is one of those situations where I must be doing something stupid, but 
I can't get Tika to properly process the attached file.  It's an image based 
PDF.  It's just not getting any text out of it.  Even if I run with OCRStrategy 
= ONLY_OCR.



It's definitely getting to the call to doOCROnCurrentPage(AUTO)in 
AbstractPDF2XHTML, so it's not a matter of the character counts preventing the 
OCR.



Don't think it has anything to do with the fact that it is in German.  Tried 
setting the language to DEU, but same results

What is going on?

Peter Kronenberg  |  Senior AI Analytic ENGINEER
C: 703.887.5623
[Torch 
AI]<https://us-east-2.protection.sophos.com/?d=torch.ai&u=aHR0cDovL3d3dy50b3JjaC5haS8=&i=NjAwMDY2MjNjNzQ1NDY0ODkyYTNmNTg3&t=dHRDUUJralFuRnRCU2tvcmRLNUUycFdBV2RmazdTZU0zZUZVM21GSXhobz0=&h=5a6182eefa654537ab7f264257135b6e>
4303 W. 119th St., Leawood, KS 66209
WWW.TORCH.AI<https://us-east-2.protection.sophos.com?d=torch.ai&u=aHR0cDovL3d3dy50b3JjaC5haS8=&i=NjAwMDY2MjNjNzQ1NDY0ODkyYTNmNTg3&t=dHRDUUJralFuRnRCU2tvcmRLNUUycFdBV2RmazdTZU0zZUZVM21GSXhobz0=&h=5a6182eefa654537ab7f264257135b6e>


Reply via email to