Everything Tom said and I would also stress that I have had a lot of trouble 
with text that borders noise - the grey carton may be easy enough to remove but 
the dark crease/join where the box closes and the proximity of text to it (and 
angle as tess breaks at angles > 10 drug in my tests) will cause segmentation 
and recon to fail, so yes getting rid of these hard edges to allow the text to 
breathe is important 

Sent from my iPhone

> On 3 Jun 2016, at 17:44, Tom Morris <[email protected]> wrote:
> 
> 
>> On Friday, June 3, 2016 at 10:09:04 AM UTC-4, Cristian wrote:
>> 
>> I'm new on tesseract. I'm working on application that has to recognize the 
>> expiration date of some products like foods. The input will be an image 
>> (very good resolution) with only the date on it.
>> Before putting my hand on the code, I'll be appreciated if some of you with 
>> more experience could give me some suggestion about format on input image, 
>> dimension, colors, and also on tesseract possible configurations, training 
>> data etc. 
>> The project is still in starting phase so I can put some good starting 
>> condition in order to have the best from tesseract our.
> 
> Thanks for attaching an example image. That helps make the discussion more 
> concrete and productive.
> 
> The first thing that I'll note is that the image does not have only the date 
> on it, but also a textured carton with additional printing on top, a 
> background, etc. We often forget how much noise our human visual system 
> excludes automatically.
> 
> To compute a region of interest or crop box so that your image really does 
> only include the date and related text, look at using a text detection 
> algorithm such as the one included in OpenCV  
> http://docs.opencv.org/3.1.0/da/d56/group__text__detect.html
> 
> Tesseract is going to work on a bitonal image. Since you have more knowledge 
> about the conditions the image was made under, the subject, etc, you can 
> probably do a better job of converting to bitonal. For resolution, check the 
> FAQ. There are some guidelines there about the height of characters, etc. 
> 
> Good luck!
> 
> Tom
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/tesseract-ocr/d3563f4b-c8f0-4989-bbee-42cbeda10c1e%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/FD013438-519B-4D19-96BD-FAF6FB3BE2F1%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to