date:20210617

[tesseract-ocr] pdf to HTML conversion using tesseract

2021-06-17 Thread 'Madhu' via tesseract-ocr

Hi all, I am able to convert pdf into images and after that, I am using tesseract to convert jpg images into HOCR, but output HOCR doesnot have any CSS. Is there is any way to get the exact copy of the image as an HOCR output file? I am using pytesseract for the conversion Thanks in advance

[tesseract-ocr] Tesseract does not recognise these numbers

2021-06-17 Thread Juanjo Gómez Navarro

I have this simple image with a date: [image: test.png] Tesseract produces the output: *$ tesseract test.png -* *Estimating resolution as 233* *03:41 pm* In similar images, I have the problem that it misunderstands 1's for 7's and the other way around. How can I help Tesseract to recognise