[tesseract-ocr] Recognition affected by blank space

AxB Sat, 05 Sep 2015 03:37:29 -0700

Hello everyone.

I just started using Tesseract-OCR 3.02 to recognise numbers only.


The number themselves are *probably* in Futura Bold font, styled in a 
particular manner (see images).

Using the "digits" parameter, Tesseract-OCR would either get it perfectly 
or fail completely (return a blank).

After quite a bit of testing, it appears that it is the "crop" of the image 
is what makes or break. For instance:

<https://lh3.googleusercontent.com/-I6vx1-5KxGY/VepwFvh_OmI/AAAAAAAAABw/kSXSI8qsJiU/s1600/Test1.png>
When poorly cropped as above, with quite a bit of horizontal and vertical 
blank, the engine will always fail to return anything


<https://lh3.googleusercontent.com/-8IMD05QoIYY/VepweKPrTxI/AAAAAAAAAB4/EFfQGgoD4CM/s1600/Test2.png>
A crop like this, with a some space for extra digits would fail in this 
particular example, but succeed at time.


<https://lh3.googleusercontent.com/--fH0jI8pEeQ/VepyLQAw6zI/AAAAAAAAACE/Qm22VlnbqGI/s1600/Test3.png>

A crop like this, has so far always worked.

 
The problem is that I am capturing the image automatically and need to 
cover for a range of at least 5-7 digits. 

I would never need to crop as badly as the first example, but I do need 
more leeway than the last one allow.

Is there anything I could try to make something like the middle crop work 
better?

Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/15242123-c775-47ae-be49-e839e081a8c7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Recognition affected by blank space

Reply via email to