Re: [tesseract-ocr] image_to_sting() alsways delivers empty string (Python)

Zdenko Podobný Fri, 05 May 2017 09:38:22 -0700

Really? And you thing your image fits to that examples?
E.g. texts are in the line, there is not noise - just the text, DPI is OK
etc???


You will never get good output from bad input.

Zdenko

On Fri, May 5, 2017 at 10:31 AM, anita josic <[email protected]> wrote:

> Hi
>
> I read it now, but still don't know what I need to use. I already read a
> lot but I still don't know what part is missing. I am hoping for real
> feedback and help. I am not really coming forward trying stuff on my own as
> you can see.
>
> Am Freitag, 5. Mai 2017 09:23:58 UTC+2 schrieb zdenop:
>>
>> Did you read https://github.com/tesseract-ocr/tesseract/wiki/Improve
>> Quality?
>>
>> Zdenko
>>
>> On Fri, May 5, 2017 at 9:10 AM, anita josic <[email protected]> wrote:
>>
>>>
>>> <https://lh3.googleusercontent.com/-OmlROZ0oDU8/WQwkpyPuSiI/AAAAAAAAF0Y/K_vAR52DRMEfruiqxCObmEEk0HA1tuS3wCLcB/s1600/IMG_20170504_200627.jpg>
>>> Hello
>>>
>>> I am trying to extract text from a picture, but I always geht an empty
>>> text.
>>> The used picture in the code for image_to_string('temp2.jpg') is added
>>> below.
>>> I tried to treshold with opencv, but there was just a slice difference
>>> to the picture added below.
>>>
>>> Is there a step missing? is the picture format jpg wrong? is it
>>> impossible because of white and balck fields appearing as text on the
>>> picture ..?
>>>
>>> I am urgently searching for help and hoping for an answer in short time.
>>>
>>> #!/usr/bin/env python
>>> import os
>>> import subprocess
>>> from picamera.array import PiRGBArray
>>> from time import *
>>> from picamera import PiCamera
>>> from datetime import datetime, timedelta
>>> import cv2
>>> try:
>>>     import Image
>>> except ImportError:
>>>     from PIL import Image, ImageEnhance, ImageFilter
>>> from pytesseract import *
>>>
>>> #EXTRACT TEXT
>>> print 'pytesser:'
>>> #img = Image.open('/home/pi/camera/IMAGE-2017-05-04_141433.png')
>>> img = Image.open('artikelbild-02.jpg')
>>> im = img.convert('RGBA')
>>> enhancer = ImageEnhance.Contrast(im)
>>> im = enhancer.enhance(3)
>>> im = im.convert('1')
>>> im.save('temp2.jpg')
>>>
>>> #use tesseract library to extract text from
>>> text = pytesseract.image_to_string(Image.open('temp2.jpg'))
>>>
>>> print "Text:"+text
>>>
>>> #what the text contains
>>> if "DHL" in text:
>>>     print 'DHL Lieferant'
>>> elif "Post" in text:
>>>     print 'Postbote'
>>> elif "GLS" in text:
>>>
>>> ....
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/e97baa76-1ee5-49af-b824-766ab2ec0b03%40goo
>>> glegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/e97baa76-1ee5-49af-b824-766ab2ec0b03%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/47b1ce8d-82f7-45e6-8680-b646e362e739%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/47b1ce8d-82f7-45e6-8680-b646e362e739%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wAWOPQaJvRFAmMO_jZGG9BiVwp%2BnPBpvhx8aOn%3D6Ed3A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] image_to_sting() alsways delivers empty string (Python)

Reply via email to