I have an image here: http://dl.dropbox.com/u/1531272/pg1-CROP-OCR.jpg This image when run through the tesseract renders out three words...
05 04571 6 I have adjusted tosp_table_xht_sp_ratio to no avail... I cannot understand why 6 is not included in the 04571 word. In looking at the characters that are returned the height of 1 is 69px and the space to the next character 6 is 12px. Even using the default value for tosp_table_xht_sp_ratio of .33 should yield a space of 69*.33 = 23px for spacing - which would make this 6 come into the same grouping. Can anyone offer a view into this that helps me understand why the 6 is not read as part of the 045716 word? Thanks -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

