Re: [tesseract-ocr] Re: problems with upper-case character

2019-09-21 Thread Lorenzo Bolzani
If you are not sure if you have a single line or a single block use psm 6. See tesseract --help-extra Psm 6 generally works fine for single lines too. If you have full pages and single lines mixed you need a pre processing step (threshold, morphology, etc.) to understand what psm is the

Re: [tesseract-ocr] Re: problems with upper-case character

2019-09-20 Thread 'Sandra M.' via tesseract-ocr
I realized that it also occurs for strings without the symbol. The image given below for example returns an empty string as well. But in this case, it is recognized correctly with config='--psm 7' But unfortunately I cannot presume generally for this case that it is only one line text. Maybe

Re: [tesseract-ocr] Re: problems with upper-case character

2019-09-19 Thread Zdenko Podobny
please provide image for testing. Zdenko št 19. 9. 2019 o 18:06 'Sandra M.' via tesseract-ocr < tesseract-ocr@googlegroups.com> napísal(a): > But therefore I get empty strings now, because it occurs a symbol that > tesseract does not know. I had this problem before as well, but could fix > it

Re: [tesseract-ocr] Re: problems with upper-case character

2019-09-19 Thread 'Sandra M.' via tesseract-ocr
But therefore I get empty strings now, because it occurs a symbol that tesseract does not know. I had this problem before as well, but could fix it for whatever reason with config='--psm 7'. This doesn't work now anymore... Do you have an idea for this as well? I don't need to detect the

Re: [tesseract-ocr] Re: problems with upper-case character

2019-09-19 Thread 'Sandra M.' via tesseract-ocr
You were both right - updating to version 5 fixed the problem more or less! Only in one case there is still a problem with lower and upper case letters, but for the other cases it's working now! -- You received this message because you are subscribed to the Google Groups "tesseract-ocr"

Re: [tesseract-ocr] Re: problems with upper-case character

2019-09-19 Thread 'Sandra M.' via tesseract-ocr
You were both right - updating to version 5 fixed the problem more or less! Only in one case there is still a problem with lower and upper case letters, but for the other cases it's working now! Am Donnerstag, 19. September 2019 12:49:43 UTC+2 schrieb zdenop: > > your tesseract version is old.

Re: [tesseract-ocr] Re: problems with upper-case character

2019-09-19 Thread Lorenzo Bolzani
I tried to upscale, downscale, with and without the white border and I always get Calibrations. I even tried a few psm modes. I'm using: tesseract 4.0.0 leptonica-1.76.0 libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 What I would do is this: - prepare a test

Re: [tesseract-ocr] Re: problems with upper-case character

2019-09-19 Thread Zdenko Podobny
Please provide more information (versions info, how you do OCR - seem like you use some coding). I just tried tesseract (tesseract 5.0.0-alpha-416-g408d6) command line with tessdata_best and if work for me: tesseract unnamed.png - Warning: Invalid resolution 0 dpi. Using 70 instead. Estimating