Question about Tesseract Bounding Boxes

Li Se Tue, 10 Dec 2013 04:35:36 -0800

Hi!
My program parses a line of text. In the following picture, I have drawn 
the bounding boxes around each char as coming from the Tesseract result 
iterator:


[image: Tesseract bounding boxes]

Apparently Tesseract has some problems segmenting the last character ('5') 
in the line, detecting 3 bounding boxes. The last character is in fact a 
tad larger than the other characters, but why would Tesseract segment that 
character so differently when the pixel blob is thresholded so clearly?

I have set these Tesseract variables:

tess.setVariable("save_blob_choices", "1");
tess.setPageSegMode(PageSegMode.PSM_SINGLE_LINE);
tess.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, "0123456789"
and textord_min_xheight set to the pixel height of the above Image


Any suggestions?

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Question about Tesseract Bounding Boxes

Reply via email to