Re: [tesseract-ocr] Re: problems with upper-case character

Zdenko Podobny Thu, 19 Sep 2019 03:49:59 -0700

your tesseract version is old. Current version is 4.1 (or dev version is
5.0).
For 4.x and above you can you different tessdata: best, fast or with 3.x
module.


Zdenko


št 19. 9. 2019 o 11:55 'Sandra M.' via tesseract-ocr <
[email protected]> napísal(a):

> I use Tesseract 3.02 leptonica-1.68. What do you mean with tessdata_best?
> I'm new in this field and just know how to call tesseract with the given
> code line.... How can the resolution be 0 dpi?
>
> I'm using this Python code:
>
> import pytesseractimport argparseimport cv2import os
> # construct the argument parse and parse the arguments
> ap = argparse.ArgumentParser()
> ap.add_argument("-i", "--image", required=True,
>     help="path to input image to be OCR'd")
> args = vars(ap.parse_args())
> # load the example image and convert it to grayscale
> image = cv2.imread(args["image"])
> gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
> # write the grayscale image to disk as a temporary file so we can# apply OCR 
> to it
> filename = "{}.png".format(os.getpid())
> cv2.imwrite(filename, gray)
> # load the image as a PIL/Pillow image, apply OCR, and then delete# the 
> temporary file
> text = pytesseract.image_to_string(gray)print("Output: " + text)
>
>
> Am Donnerstag, 19. September 2019 11:23:50 UTC+2 schrieb zdenop:
>>
>> Please provide more information (versions info, how you do OCR - seem
>> like you use some coding).
>> I just tried tesseract (tesseract 5.0.0-alpha-416-g408d6) command line
>> with tessdata_best and if work for me:
>> tesseract unnamed.png -
>> Warning: Invalid resolution 0 dpi. Using 70 instead.
>> Estimating resolution as 497
>> Calibrations
>>
>> Zdenko
>>
>>
>> št 19. 9. 2019 o 10:43 'Sandra M.' via tesseract-ocr <
>> [email protected]> napísal(a):
>>
>>> [image: currentImage.png]
>>> @Lorenzo Blz: This is an example image. The output of my code is
>>> "calibrations". The height of the letters is not the same. Of course it
>>> cannot be recognized if there is only a "c", but in the context to the
>>> other letters tesseract should be able to detect if it is a small or
>>> capital letter, I think. This image has no noise or anything else, I don't
>>> unterstand the problem. But nevertheless, your comment to change the size
>>> helped! If I resize it with 150% or 75% for example, it works. I just don't
>>> know how to solve it if I don't have a reference value later on. How to
>>> decide which is the right spelling, 100% image size or 150%. Or is it
>>> possible to say that it's always a more reliable result if I resize the
>>> image in preprocessing?
>>>
>>> Am Mittwoch, 18. September 2019 17:19:22 UTC+2 schrieb Sandra M.:
>>>>
>>>> I'm using Tesseract with Python. I have an image with 1-6 words in it
>>>> and need to read the text. Sometimes the character "C", which look the same
>>>> in upper and lower case, is detected as lower case c instead of upper case
>>>> C. I see the problem, but in context to the following letters it should be
>>>> possible to detect the right notation. Is there any configuration or
>>>> something to improve this?
>>>>
>>>> I had a look at the configuration options of config='-psm x' with
>>>> different values for x, but nothing fits to my problem
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/e4ed704a-cee0-4bb2-80ae-9fc9b82ab55d%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/e4ed704a-cee0-4bb2-80ae-9fc9b82ab55d%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/9faf77f7-c862-47f6-b01d-629773025a7f%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/9faf77f7-c862-47f6-b01d-629773025a7f%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xdhC6Nne9D52q_nk8u%2BnHjA-%2BvQpDb%3D3c5FfuLdNEeKA%40mail.gmail.com.

Re: [tesseract-ocr] Re: problems with upper-case character

Reply via email to