Hello friend, have you got a solution for this issue? I need to process an IBM 3270 terminal screen but I got no success with tesseract training for slashed zeros.
Thanks in advance Carlos Em quinta-feira, 10 de fevereiro de 2011 17h48min24s UTC-2, [email protected] escreveu: > > [Avatar] > 2011-02-10 14:34:32 EST > The log file below is the result of training with an image containing > "slashed" zeros (zero with a diagonal line in it to differentiate it > from Upper-case O.) > > If I edit out the diagonal, there are no errors in tesseract.log, but > interpretation of zero and O are unreliable, even with a line in > eng.unicharambigs. > > How can I get tesseract to accept the slashed zero? So far I have > converted the image to black text on white background and scaled up to > approx. 300 dpi. > > ----------------- tesseract.log > ------------------------------------- > Found fonts: ['IA'] > Tesseract Open Source OCR Engine with Leptonica > APPLY_BOXES: boxfile 1/51/0 ((2295,326),(2323,370)): FAILURE! box > overlaps no bl obs or blobs in multiple rows > APPLY_BOXES: boxfile 3/51/0 ((2289,137),(2317,181)): FAILURE! box > overlaps no bl obs or blobs in multiple rows > APPLY_BOXES: More than one block?? > APPLY_BOXES: FATALITY - 0 labelled samples of "0 [30 ]" - target is > 2: > APPLY_BOXES: Boxes read from boxfile: 226 > Initially labelled blobs: 224 in 4 rows > Box failures detected: 2 > Duped blobs for rebalance: 0 > "0" has fewest samples: 0 > Total unlabelled words: 0 > Final labelled words: 224 > Generating training data TRAINING ... Font name = IA > Generated training data for 224 blobs > > > See tif image at: http://www.flickr.com/photos/59351419@N05/5434403800/ -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

