Hi all,
I am able to convert pdf into images and after that, I am using tesseract
to convert jpg images into HOCR, but output HOCR doesnot have any CSS. Is
there is any way to get the exact copy of the image as an HOCR output file?
I am using pytesseract for the conversion
Thanks in advance
I have this simple image with a date:
[image: test.png]
Tesseract produces the output:
*$ tesseract test.png -*
*Estimating resolution as 233*
*03:41 pm*
In similar images, I have the problem that it misunderstands 1's for 7's
and the other way around. How can I help Tesseract to recognise
2 matches
Mail list logo