Your input image file is bad. You will get errors unless you fix it. Tesseract use for ocr (and training) binary images ([1]) - see attachment (it was created via api->DumpPGM and I converted from pgm to png)
[1] http://en.wikipedia.org/wiki/Binary_image -- Zdenko On Thu, Jun 7, 2012 at 5:11 AM, cchhsu <[email protected]> wrote: > I'm new to tesseract. I'm trying to train a font using the following > instructions. > (http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3) > > I had made the box file successfully. > - tesseract moe.calibri.exp0.tif moe.calibri.exp0 -l eng batch.nochop > makebox > - Using the jTessBoxEditor externel tool to coordinate the box file. > > Then, I'm trying to generate a box.train file by using this command > C:\Program Files\Tesseract-OCR\001>tesseract moe.calibri.exp0.tif > moe.calibri.ex > p0 nobatch box.train > > I get the unexpected output: > --- > Tesseract Open Source OCR Engine v3.01 with Leptonica > Page 0 > APPLY_BOXES: boxfile line 3/r ((28,201),(39,223)): FAILURE! Couldn't > find a matc > hing blob > APPLY_BOXES: boxfile line 4/e ((44,201),(59,223)): FAILURE! Couldn't > find a matc > hing blob > APPLY_BOXES: boxfile line 5/m ((65,201),(93,223)): FAILURE! Couldn't > find a matc > hing blob > APPLY_BOXES: boxfile line 6/i ((96,202),(103,231)): FAILURE! Couldn't > find a mat > ching blob > APPLY_BOXES: boxfile line 7/u ((107,201),(125,224)): FAILURE! Couldn't > find a ma > tching blob > APPLY_BOXES: boxfile line 8/m ((129,201),(158,224)): FAILURE! Couldn't > find a ma > tching blob > APPLY_BOXES: boxfile line 9/G ((318,201),(337,231)): FAILURE! Couldn't > find a ma > tching blob > APPLY_BOXES: boxfile line 10/e ((343,202),(357,223)): FAILURE! > Couldn't find a m > atching blob > APPLY_BOXES: boxfile line 11/n ((365,202),(379,223)): FAILURE! > Couldn't find a m > atching blob > APPLY_BOXES: boxfile line 12/r ((386,202),(396,223)): FAILURE! > Couldn't find a m > atching blob > APPLY_BOXES: boxfile line 13/e ((401,202),(415,223)): FAILURE! > Couldn't find a m > atching blob > APPLY_BOXES: boxfile line 14/S ((574,201),(589,231)): FAILURE! > Couldn't find a m > atching blob > APPLY_BOXES: boxfile line 15/e ((592,201),(607,223)): FAILURE! > Couldn't find a m > atching blob > APPLY_BOXES: boxfile line 16/n ((614,201),(629,223)): FAILURE! > Couldn't find a m > atching blob > APPLY_BOXES: boxfile line 17/s ((636,201),(647,223)): FAILURE! > Couldn't find a m > atching blob > APPLY_BOXES: boxfile line 18/M ((651,202),(680,231)): FAILURE! > Couldn't find a m > atching blob > APPLY_BOXES: boxfile line 19/e ((684,201),(700,223)): FAILURE! > Couldn't find a m > atching blob > APPLY_BOXES: boxfile line 20/T ((704,215),(713,227)): FAILURE! > Couldn't find a m > atching blob > APPLY_BOXES: boxfile line 21/M ((716,215),(728,227)): FAILURE! > Couldn't find a m > atching blob > Box file format error on line 1; ignored > APPLY_BOXES: > Boxes read from boxfile: 33 > Boxes failed resegmentation: 19 > Found 14 good blobs and 0 unlabelled blobs in 0 words. > 0 remaining unlabelled words deleted. > TRAINING ... Font name = calibri > LearnBLob: CharDesc was NULL. Aborting. > Generated training data for 3 words > > C:\Program Files\Tesseract-OCR\001> > > > After getting these unexpected output, I'm trying to two solution to > solve it. > 1. Change the hue/ contrast value in image file and then, execute the > box.train command again. > I get the less errors. > 2. Recover the box file to initlize status (it means that I made the > box file without change any value) > Then, I can generate the box.train file successfully. > - I used the JTessBoxEditor to open the box file to verify. > I found the black characters with white background can be > identified. But, the gray characters with black background are hard to > identify. > > I had been uploaded the reference file on my skydrive space. > You can download it if you interested. (http://sdrv.ms/MIOzZF) > > But, I don't know how to solve this issue. Please help me to solve > this issue. > Thanks. > > > tesseract 3.01 > OS : Windows XP > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en
<<attachment: moe.calibri.exp0.png>>

