Hi, I am looking to improve accuracy of detections in my application by specifying a character size to be detected by tesseract.
There is an option textord_min_xheight "Min credible pixel height" to set a lower bound size of a character. This works well to filter out smaller characters and noise, but could no not find a similar option to set the MAX height or width(in pixels) related to size of character. Also I could find a few similar configurations like classify_min_norm_scale_x, 0.0, "Min char x-norm scale ..." classify_max_norm_scale_x, 0.325, "Max char x-norm scale ..." classify_min_norm_scale_y, 0.0, "Min char y-norm scale ..." classify_max_norm_scale_y, 0.325, "Max char y-norm scale ..." I assume these are normalized values probably with value between 0-1. But the problem is i gave different values in the range 0.0 - 1.0 but not much difference in the detection result. No results are filtered. So I am kind of wondering how these values are normalized? I mean with which ratio is it taken to normalize the values to the range of 0-1? I just want to recognize characters of a specific size and ignore all other characters. So if I could give the dimensions of characters for tesseract, may be I can increase accuracy pretty much in my application. Also is there any detailed documentation of the configuration available other than the ones in the sourcecode? Any help would be really helpful. Regards, Arun -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

