tesseract - reduce processing time

Inês Martins Thu, 18 Jun 2015 06:59:18 -0700

 

I am new to OCR but so far I have achieved some good results. I am able to 
extract text from image fairly well. What is concerning me now it the time 
required for the processing... I am using tesseract 3.02.02 and 
leptonica-1.71. My script does:

1. receive a jpeg image with 2MB from an url
2. do resize to image in order to have a width of 1000 pixels and a
height proportional to the new width.
3. convert my resized image to greyscale image NOTE: my image is now
only 60kb
4. create 4 copies of grey image to be appleid 4 PIL default filters:
'SHARPEN', 'SMOOTH', 'UnsharpMask'(radius=2, percent=150, threshold=3),
'AutoContrast'.
5. for each image already processed by a filter then I apply
binarization like this: image = image.point(lambda x: 0 if x<128 else 255,
'1') #refers to Convert RGB to black OR white
<http://stackoverflow.com/questions/18777873/convert-rgb-to-black-or-white>
6. the images one by one are passed for OCR by: text =
pytesseract.image_to_string(image)
7. then and finally i do some text cleanup, to verify valid tokens and
some forced replacements.

What is taking so long? where can I improve or speed up a little? is taking
10sec to run all the script and show results.

--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/723ed76f-1757-4725-9079-12b2bc516eec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] tips/advices for speed up ocr/tesseract - reduce processing time

Reply via email to