If you are not sure if you have a single line or a single block use psm 6.
See tesseract --help-extra
Psm 6 generally works fine for single lines too.
If you have full pages and single lines mixed you need a pre processing
step (threshold, morphology, etc.) to understand what psm is the
I realized that it also occurs for strings without the symbol. The image
given below for example returns an empty string as well. But in this case,
it is recognized correctly with config='--psm 7' But unfortunately I cannot
presume generally for this case that it is only one line text. Maybe
please provide image for testing.
Zdenko
št 19. 9. 2019 o 18:06 'Sandra M.' via tesseract-ocr <
tesseract-ocr@googlegroups.com> napísal(a):
> But therefore I get empty strings now, because it occurs a symbol that
> tesseract does not know. I had this problem before as well, but could fix
> it
But therefore I get empty strings now, because it occurs a symbol that
tesseract does not know. I had this problem before as well, but could fix
it for whatever reason with config='--psm 7'. This doesn't work now
anymore... Do you have an idea for this as well? I don't need to detect the
You were both right - updating to version 5 fixed the problem more or less!
Only in one case there is still a problem with lower and upper case
letters, but for the other cases it's working now!
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr"
You were both right - updating to version 5 fixed the problem more or less!
Only in one case there is still a problem with lower and upper case
letters, but for the other cases it's working now!
Am Donnerstag, 19. September 2019 12:49:43 UTC+2 schrieb zdenop:
>
> your tesseract version is old.
I tried to upscale, downscale, with and without the white border and I
always get Calibrations. I even tried a few psm modes.
I'm using:
tesseract 4.0.0
leptonica-1.76.0
libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib
1.2.11
What I would do is this:
- prepare a test
Please provide more information (versions info, how you do OCR - seem like
you use some coding).
I just tried tesseract (tesseract 5.0.0-alpha-416-g408d6) command line with
tessdata_best and if work for me:
tesseract unnamed.png -
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating
8 matches
Mail list logo