Hey guys
I am now so far that I have the picture in really rich gray tones, so that
not everything is so "noisy" (image.convert ('L') instead of image.convert
('1').
But still no output.
I think I really need to cut the text and then remove the background.
Maybe an expert can show me the best way here.
I think with treshold I could remove the background after I first cut the
text.
But I guess the crop always has to be done manually?
Cheers
Am Freitag, 5. Mai 2017 09:10:49 UTC+2 schrieb anita josic:
>
>
> <https://lh3.googleusercontent.com/-OmlROZ0oDU8/WQwkpyPuSiI/AAAAAAAAF0Y/K_vAR52DRMEfruiqxCObmEEk0HA1tuS3wCLcB/s1600/IMG_20170504_200627.jpg>
> Hello
>
> I am trying to extract text from a picture, but I always geht an empty
> text.
> The used picture in the code for image_to_string('temp2.jpg') is added
> below.
> I tried to treshold with opencv, but there was just a slice difference to
> the picture added below.
>
> Is there a step missing? is the picture format jpg wrong? is it impossible
> because of white and balck fields appearing as text on the picture ..?
>
> I am urgently searching for help and hoping for an answer in short time.
>
> #!/usr/bin/env python
> import os
> import subprocess
> from picamera.array import PiRGBArray
> from time import *
> from picamera import PiCamera
> from datetime import datetime, timedelta
> import cv2
> try:
> import Image
> except ImportError:
> from PIL import Image, ImageEnhance, ImageFilter
> from pytesseract import *
>
> #EXTRACT TEXT
> print 'pytesser:'
> #img = Image.open('/home/pi/camera/IMAGE-2017-05-04_141433.png')
> img = Image.open('artikelbild-02.jpg')
> im = img.convert('RGBA')
> enhancer = ImageEnhance.Contrast(im)
> im = enhancer.enhance(3)
> im = im.convert('1')
> im.save('temp2.jpg')
>
> #use tesseract library to extract text from
> text = pytesseract.image_to_string(Image.open('temp2.jpg'))
>
> print "Text:"+text
>
> #what the text contains
> if "DHL" in text:
> print 'DHL Lieferant'
> elif "Post" in text:
> print 'Postbote'
> elif "GLS" in text:
>
> ....
>
>
>
>
>
>
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/4d185365-1abb-4be2-b234-7bf48b8bd4dd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.