png, box, and apply_boxes msges You will find in attachment

thanks in advance!

I think I know, what could be the issue here. Refer to
http://code.google.com/p/tesseract-ocr/issues/detail?id=446&can=5.
Despite your using another layout mode, this issue can still hold
true.

In brief, for small images Tess confuses background and foreground
pixels. That's why it treats characters' inner holes as characters and
recognizes them as such. To avoid this you can try to add more
characters to the training image or make corrections to the Tesseract
code - I've indicated what should be done inside the issue.

However I might be wrong. To give more relevant advice I need to see
your images, cmd line etc.

Warm regards,
Dmitri Silaev
www.CustomOCR.com





On Thu, May 26, 2011 at 5:30 AM, Joyse1<[email protected]>  wrote:
Hi,
   I have small font ( Microsoft Sans serif , 8, string to learn: " 0 1 2 3 4
5 6 7 8 9 . , : " ). I cant train single pixels recognition ( ex.:  ".", ","
, ":" ). I have failures when generating tr files.
I have two versions of tess: with layout analizator turned on, and
one_word_only option turned on. Only difference between them is that with
one word  option ( PSM_ONE_WORD  in tesseract )  - it generates box and
recognizes a comma . So i have failures ( "no blobs ..." )  only for "." and
":" ( with layout analizator turned on i have failures for three of them :
  ". , :" ). I dont think that changing one_word option to single_char could
help here. Please could somebody tell me what is a soution here ( without
resizing training images ).

Best
Jakub

--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en


--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
0 7 198 13 208 0
1 16 198 20 208 0
2 25 198 31 208 0
3 34 198 40 208 0
4 43 198 49 208 0
5 52 198 58 208 0
6 61 198 67 208 0
7 70 198 76 208 0
8 79 198 85 208 0
9 88 198 94 208 0
. 97 198 99 200 0
, 103 197 106 200 0
: 109 198 111 205 0

<<attachment: apply_boxes_info.PNG>>

<<attachment: normal.PNG>>

Reply via email to