The problem is probably that the textline finder is splitting your characters over multiple lines. While it is not supposed to do this, it does it sometimes. A fix to applybox is needed so it can still work in this situation.Ray.
On Thu, May 14, 2009 at 11:26 PM, Raj <[email protected]> wrote: > > Hi frendz, > > I'm working on Tesseract to recognize 7-segment display in C# > Application. > > I have successfully trained the tesseract and it is working perfectly > with the Digital Meter Images. > > But i have one problem. > > I have one more Digital Meter Image where the Digits/Numbers are > broken/segmented i want the application to recoginze it also.so, when > i tried to train the samples for the digits .,0,1,2,3,4,5,6,7,8,9, > i'm getting only the digits "."(dot) and "7" in the "TR" file and for > the other numbers i got this messege in the tesseract text file. But > i have checked the image and in the Box file, the box's for each of > the digits are perfect i.e coordinates of box of each digits is > perfect. > > > u can have a look at the sample image used for training tesseract at > http://www.flickr.com/photos/30806...@n02/3532294181/sizes/l/ > > Tesseract Open Source OCR Engine > Image has 24 bits per pixel and size (966,520) > Resolution=0 > APPLY_BOXES: FATALITY - 0 labelled samples of "0" - target is 10 > APPLY_BOXES: FATALITY - 0 labelled samples of "1" - target is 10 > APPLY_BOXES: FATALITY - 0 labelled samples of "2" - target is 10 > APPLY_BOXES: FATALITY - 0 labelled samples of "3" - target is 10 > APPLY_BOXES: FATALITY - 0 labelled samples of "4" - target is 10 > APPLY_BOXES: FATALITY - 0 labelled samples of "5" - target is 10 > APPLY_BOXES: FATALITY - 0 labelled samples of "6" - target is 10 > APPLY_BOXES: FATALITY - 0 labelled samples of "8" - target is 10 > APPLY_BOXES: FATALITY - 0 labelled samples of "9" - target is 10 > APPLY_BOXES: > Boxes read from boxfile: 108 > Initially labelled blobs: 18 in 10 rows > Box failures detected: 90 > Duped blobs for rebalance: 0 > "0" has fewest samples: 0 > Total unlabelled words: 27 > Final labelled words: 18 > Generating training data > TRAINING ... Font name = UnknownFont. > Generated training data for 18 blobs > > > > > I have studied the document it says > > " If there are FATALITIES reported, then there is no point > continuing with the training process until you fix the box file. A > FATALITY usually indicates that this step failed to find any training > samples of one of the characters listed in your box file. Either the > coordinates are wrong, or there is something wrong with the image of > the character concerned. If there is no workable sample of a > character, it can't be recognized, and the generated inttemp file > won't match the unicharset file later and Tesseract will abort. " > > > I just wanted to know where it is possible to train tesseract for > segmented/broken digits ? > > > > > > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

