More information and question: - Putting the 'tesseract found' boxes on to the image, it can be seen that one box is correct and the other actually is the 'inside' of a region in the character. - Does anyone know how to stop tesseract from doing the resegementaion and just make use of the provided bounding box for training? (It is certain that the bounding box is correct as it is generated, not manually marked or computer detected) Regards, W. K. Lo
On Tuesday, February 19, 2013 1:03:27 PM UTC+8, W. K. LO wrote: > Dear Tessearct users/developers, > > > > I have problem using Tesseract to train a Chinese OCR. Examples are > described as follows: > > > > 1. Empty page, even though the TIF is not empty and the box file is > bounding the character tighthly > > .\tesseract test.ming.24.tif test.ming.24 batch.nochop box.train > > > > === begin output === > > Tesseract Open Source OCR Engine v3.02 with Leptonica > > Empty page!! > > Empty page!! > > === end output === > > > > > > 2. Failed resegmentation (specifically tell that there is only one > character) > > .\tesseract test.ming.24.tif test.ming.24 -psm 10 batch.nochop box.train > > > > === begin output === > > Tesseract Open Source OCR Engine v3.02 with Leptonica > > Bounding box=(16,23)->(28,32) > > Bounding box=(16,15)->(28,24) > > APPLY_BOXES: boxfile line 0/??((8,14),(36,41)): FAILURE! Couldn't find a > matchin > > g blob > > APPLY_BOXES: > > Boxes read from boxfile: 1 > > Boxes failed resegmentation: 1 > > APPLY_BOXES: Unlabelled word at :Bounding box=(16,15)->(28,32) > > APPLY_BOXES: Unlabelled word at :Bounding box=(8,14)->(36,41) > > Found 0 good blobs. > > 2 remaining unlabelled words deleted. > > Generated training data for 0 words > > === end output === > > > > Anyone can help? > > > > > > Regards, > > W. K. Lo > > > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

