Hi, I am working on iPhone application that recognizes ISBN numbers (ISBN: 
978-83-7380-900-0) I use tesseract for this, but it is not working very 
well. I can see other applications, using same engine to work better.

to limit the characters i use this config line: 
tess->SetVariable("tessedit_char_whitelist", "SN:0123456789X-"); so all "I" 
are converted to "1", and "B" to 8. Using this it wont make mistake with 
those letters, whick are not important to me. After that i use regular 
expression to find the correct part of recognized text.

I also crop the image, so tesseract recognizes only part of the image, where 
isbn is visible (i placed color rect on camera overlay, so user have to 
place code in correct place) I also resize the image to 1000px width (also 
tried other sizes)

It works quite well when the light is excellent, but it is really hard to 
recognize correctly when the lighting isn't perfect.

The last digit of isbn number is a control sum.

What can I do to make it work better? Is there any way to say tesserect to 
recognize text only in given regular expression? Maybe i should do something 
with image first?

Sample images, that are not recognized correctly:
http://img412.imageshack.us/i/img0367si.jpg/
http://img264.imageshack.us/i/img0361d.jpg/

At the moment I am also making pictures at 2x zoom so camera is not so close 
to the object. It gives better results, but it is easier to move the camera 
and take fuzzy image.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to