Restrict Character Size to be Detected (Filter based on character size)

Arun Raveendran Tue, 13 Mar 2012 23:42:36 -0700

Hi,

I am looking to improve accuracy of detections in my application by
specifying a character size to be detected by tesseract.


There is an option textord_min_xheight "Min credible pixel height" to
set a lower bound size of a character. This works well to filter out
smaller characters and noise, but could no not find a similar option
to set the MAX height or width(in pixels) related to size of
character.

Also I could find a few similar configurations like
classify_min_norm_scale_x, 0.0, "Min char x-norm scale ..."
classify_max_norm_scale_x, 0.325, "Max char x-norm scale ..."
classify_min_norm_scale_y, 0.0, "Min char y-norm scale ..."
classify_max_norm_scale_y, 0.325, "Max char y-norm scale ..."

I assume these are normalized values probably with value between 0-1.
But the problem is i gave different values in the range 0.0 - 1.0 but
not much difference in the detection result. No results are filtered.
So I am kind of wondering how these values are normalized? I mean with
which ratio is it taken to normalize the values to the range of 0-1?

I just want to recognize characters of a specific size and ignore all
other characters.
So if I could give the dimensions of characters for tesseract, may be
I can increase accuracy pretty much in my application.

Also is there any detailed documentation of the configuration
available other than the ones in the sourcecode?
Any help would be really helpful.

Regards,
Arun

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Restrict Character Size to be Detected (Filter based on character size)

Reply via email to