I don't think the TesseractOCRParser is set up to parse this type of output. PRs welcomed...if there's a generalizable use case for this(?).
On Fri, Jan 22, 2021 at 9:31 AM Peter Kronenberg <[email protected]> wrote: > > What is the expected behavior of Tika when using PSM 0? When using > Tesseract directly from the command line, I get this > > > > c:\TestFiles>tesseract --psm 0 Dickens.png stdout > > Page number: 0 > > Orientation in degrees: 0 > > Rotate: 0 > > Orientation confidence: 8.75 > > Script: Latin > > Script confidence: 2.86 > > > > But from Tika, I’m not getting any output. There’s obviously no OCR output, > since PSM 0 doesn’t do OCR. It just does Orientation and Script detection. > So where is that Tesseract output going?
