I have trained and retrained using various fonts, use 1.5 line spacing, 300dpi and try to follow instructions but have an issue that I do not seem to be able to get over. On many occasions a bracket or underscore in a line or block of text simply sets tesseract off and from that point, for that line or in that block of text, it suddenly becomes unnacceptably inaccurate. My guess is that it loses the idea of where the bottom of a character starts and where its top is. Certainly looking at the output that seems to be the issue. I have attached a couple of photos where this has happened. In the first a bracket around Hong Kong causes the issue (text reads fine without brackets) and in the second an underscore on the bottom line causes a problem (fine without underscore), I have added brackets and underscores into the training doc at various points (start of line, end of line etc) but to no effect. Any ideas?
-- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en
<<attachment: example.TIF>>

