png, box, and apply_boxes msges You will find in attachment
thanks in advance!
I think I know, what could be the issue here. Refer to http://code.google.com/p/tesseract-ocr/issues/detail?id=446&can=5. Despite your using another layout mode, this issue can still hold true. In brief, for small images Tess confuses background and foreground pixels. That's why it treats characters' inner holes as characters and recognizes them as such. To avoid this you can try to add more characters to the training image or make corrections to the Tesseract code - I've indicated what should be done inside the issue. However I might be wrong. To give more relevant advice I need to see your images, cmd line etc. Warm regards, Dmitri Silaev www.CustomOCR.com On Thu, May 26, 2011 at 5:30 AM, Joyse1<[email protected]> wrote:Hi, I have small font ( Microsoft Sans serif , 8, string to learn: " 0 1 2 3 4 5 6 7 8 9 . , : " ). I cant train single pixels recognition ( ex.: ".", "," , ":" ). I have failures when generating tr files. I have two versions of tess: with layout analizator turned on, and one_word_only option turned on. Only difference between them is that with one word option ( PSM_ONE_WORD in tesseract ) - it generates box and recognizes a comma . So i have failures ( "no blobs ..." ) only for "." and ":" ( with layout analizator turned on i have failures for three of them : ". , :" ). I dont think that changing one_word option to single_char could help here. Please could somebody tell me what is a soution here ( without resizing training images ). Best Jakub -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en
0 7 198 13 208 0 1 16 198 20 208 0 2 25 198 31 208 0 3 34 198 40 208 0 4 43 198 49 208 0 5 52 198 58 208 0 6 61 198 67 208 0 7 70 198 76 208 0 8 79 198 85 208 0 9 88 198 94 208 0 . 97 198 99 200 0 , 103 197 106 200 0 : 109 198 111 205 0
<<attachment: apply_boxes_info.PNG>>
<<attachment: normal.PNG>>

