Dear group: Is there a way to make tesseract print out box information upon/during recognition?
I am trying to recognize low-rez images (mentioned in other threads), and tesseract does excellent, correct chopping of the text (the errors are mostly misrecognized individual glyphs but NOT lumping two or more glyphs into one) Strangely, however, when I do the makebox command, I get really bad chopping errors -- in other words, two, three, or even more glyphs get inscribed into one box. I take it that the option "batch.nochop" decreases the chopping level? I tried to makebox with "batch.nochop" option -- that didn't help -- still multi-glyph lumps. Is there a command option to force tesseract to chop more aggressively? Equally aggressively to how it does it during the actual recognition? (which ends up with correct chopping?) -- OR -- simply have tesseract printout a boxfile containing box information corresponding to each glyph it has identified during the actual recognition. In my case, with the given documents I'm processing -- that box file would have the absolutely correct choppings... TIA -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

