Hi!
My program parses a line of text. In the following picture, I have drawn
the bounding boxes around each char as coming from the Tesseract result
iterator:
[image: Tesseract bounding boxes]
Apparently Tesseract has some problems segmenting the last character ('5')
in the line, detecting 3 bounding boxes. The last character is in fact a
tad larger than the other characters, but why would Tesseract segment that
character so differently when the pixel blob is thresholded so clearly?
I have set these Tesseract variables:
tess.setVariable("save_blob_choices", "1");
tess.setPageSegMode(PageSegMode.PSM_SINGLE_LINE);
tess.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, "0123456789"
and textord_min_xheight set to the pixel height of the above Image
Any suggestions?
--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
---
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.