You use old tesseract version.... Dňa so 11. 1. 2020, 22:43 Matthew Getzin <[email protected]> napísal(a):
> Hello, > > I created an issue (see below) on Github. Not sure if it is a bug or > something for discussion forum... > > ### Environment > > * **Tesseract Version**: tesseract 4.0.0-beta.1 > leptonica-1.75.3 > libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : > libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 > > Found AVX2 > Found AVX > Found SSE > > * **Platform**: Linux getzinmw-XPS-15-9550 4.15.0-72-generic #81-Ubuntu > SMP Tue Nov 26 12:20:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux > > ### Current Behavior: > I am currently having issues with the hOCR output from tesseract as > compared to the default .txt output. In the attached image, for example, my > hOCR output does not register the majority of the numbers on the left side > of the page, while they are registered in the .txt output file. > > Commands tried: > tesseract input.png output -l eng --psm 6 > tesseract input.png output -l eng --psm 6 hocr > > ### Expected Behavior: > I would expect that the recognition of text would be consistent between > the two modes with the output format being the only difference. > > ### Suggested Fix: > Ensuring consistent output from the various formats. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/62a1f36b-cfd2-4c4d-901d-337d6bbcc12d%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/62a1f36b-cfd2-4c4d-901d-337d6bbcc12d%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zTQAKnOxyTiXG2X9on9L%2BkbacMVN_CYXQDyzi9_7sVsA%40mail.gmail.com.

