Please post it as an issue in http://code.google.com/p/tesseract-ocr/issues/list I am having similar problems too.
Shree Shree Devi Kumar ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sun, Jun 30, 2013 at 3:21 AM, TedJ <[email protected]> wrote: > Hello, > > Can anyone tell me why 3.0.1 works flawlessly for the command line: > 'tesseract 01.bmp 01 nobatch box.train' operating on the attached source > bmp and box files with this output: > > *Tesseract Open Source OCR Engine v3.01 with Leptonica* > *APPLY_BOXES:* > * Boxes read from boxfile: 4883* > * Boxes failed resegmentation: 0* > * Found 4883 good blobs and 0 unlabelled blobs in 0 words.* > * 0 remaining unlabelled words deleted.* > *TRAINING ... Font name = UnknownFont* > *Generated training data for 1011 words* > > but 3.0.02 produces this output: > > *Tesseract Open Source OCR Engine v3.02 with Leptonica* > *row xheight=11.3333, but median xheight = 19.4987* > *row xheight=10, but median xheight = 19.4987* > *row xheight=10, but median xheight = 19.4987* > *row xheight=10, but median xheight = 19.4987* > *row xheight=10, but median xheight = 19.4987* > *row xheight=10, but median xheight = 19.4987* > *row xheight=10.6667, but median xheight = 19.4987* > *row xheight=16.5455, but median xheight = 19.4987* > *row xheight=16.5455, but median xheight = 19.4987* > *row xheight=16.5455, but median xheight = 19.4987* > *row xheight=12, but median xheight = 19.4987* > *row xheight=16.5455, but median xheight = 19.4987* > *row xheight=16.5455, but median xheight = 19.4987* > *row xheight=16.5455, but median xheight = 19.4987* > *row xheight=16.5455, but median xheight = 19.4987* > *row xheight=16.5455, but median xheight = 19.4987* > *row xheight=13, but median xheight = 19.4987* > *row xheight=16.5455, but median xheight = 19.4987* > *row xheight=28.5, but median xheight = 19.4987* > *row xheight=28.5, but median xheight = 19.4987* > *row xheight=28.5, but median xheight = 19.4987* > *row xheight=28.5, but median xheight = 19.4987* > *row xheight=28.5, but median xheight = 19.4987* > *row xheight=28.5, but median xheight = 19.4987* > *row xheight=28.5, but median xheight = 19.4987* > *row xheight=28.5, but median xheight = 19.4987* > *row xheight=28.5, but median xheight = 19.4987* > *row xheight=28.5, but median xheight = 19.4987* > *row xheight=28.5, but median xheight = 19.4987* > *row xheight=28.5, but median xheight = 19.4987* > *row xheight=28.5, but median xheight = 19.4987* > *row xheight=28.5, but median xheight = 19.4987* > *row xheight=28.5, but median xheight = 19.4987* > *row xheight=28.5, but median xheight = 19.4987* > *row xheight=28.5, but median xheight = 19.4987* > *FAIL!* > *APPLY_BOXES: boxfile line 38/N ((756,1844),(766,1872)): FAILURE! > Couldn't find a matching blob* > *FAIL!* > *APPLY_BOXES: boxfile line 148/N ((1172,1808),(1182,1836)): FAILURE! > Couldn't find a matching blob* > *FAIL!* > *APPLY_BOXES: boxfile line 233/N ((948,1772),(958,1800)): FAILURE! > Couldn't find a matching blob* > *FAIL!* > *APPLY_BOXES: boxfile line 331/N ((948,1736),(958,1764)): FAILURE! > Couldn't find a matching blob* > *FAIL!* > *APPLY_BOXES: boxfile line 414/N ((644,1700),(654,1728)): FAILURE! > Couldn't find a matching blob* > *FAIL!* > *APPLY_BOXES: boxfile line 463/N ((1668,1700),(1678,1728)): FAILURE! > Couldn't find a matching blob* > *FAIL!* > *APPLY_BOXES: boxfile line 627/M ((1204,1628),(1214,1656)): FAILURE! > Couldn't find a matching blob* > *FAIL!* > *APPLY_BOXES: boxfile line 647/' ((1560,1648),(1562,1656)): FAILURE! > Couldn't find a matching blob* > *FAIL!* > *APPLY_BOXES: boxfile line 653/W ((1668,1628),(1678,1656)): FAILURE! > Couldn't find a matching blob* > *APPLY_BOXES:* > * Boxes read from boxfile: 4883* > * Boxes failed resegmentation: 173* > *APPLY_BOXES: Unlabelled word at :Bounding box=(1622,1844)->(1630,1860)* > * Found 4710 good blobs.* > * Leaving 4 unlabelled blobs in 0 words.* > * 1 remaining unlabelled words deleted.* > *TRAINING ... Font name = UnknownFont* > *Generated training data for 1272 words* > > I think I've ruled out character/line spacing issues. The idealized fonts > appear perfect. What could be wrong? You'll notice that 3.02 fails > initially on a series of N's and on other chars thereafter. Note that I > changed the mixture of uppercase and lowercase characters from the standard > "than Phone:" training text into all uppercase. Note that I changed > nothing in between installing 3.01, running it and producing the language > 01, then deleting 3.01 and installing 3.02 instead on Windows. Note that > I've removed most of the APPLY_BOXES errors for brevity. Can anyone tell > me what the cryptic row xHeight and median xHeight output is saying about > how Tesseract is interpreting the source bmp? Could that have to do with > relative line length? Can anyone please tell me why 3.02.02 is failing so > often whereas 3.01 is not? > > Thanks, > Ted > > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

