Dear group:

Is there a way to make tesseract print out box information upon/during
recognition?

I am trying to recognize low-rez images (mentioned in other threads),
and tesseract does excellent, correct chopping of the text (the errors
are mostly misrecognized individual glyphs but NOT lumping two or more
glyphs into one)

Strangely, however, when I do the makebox command, I get really bad
chopping errors -- in other words, two, three, or even more glyphs get
inscribed into one box.

I take it that the option "batch.nochop" decreases the chopping level?

I tried to makebox with "batch.nochop" option -- that didn't help --
still multi-glyph lumps.

Is there a command option to force tesseract to chop more
aggressively? Equally aggressively to how it does it during the actual
recognition? (which ends up with correct chopping?)

 -- OR --

simply have tesseract printout a boxfile containing box information
corresponding to each glyph it has identified during the actual
recognition. In my case, with the given documents I'm processing --
that box file would have the absolutely correct choppings...

TIA

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to