I have trained and retrained using various fonts, use 1.5 line spacing, 
300dpi and try to follow instructions but have an issue that I do not seem 
to be able to get over. On many occasions a bracket or underscore in a line 
or block of text simply sets tesseract off and from that point, for that 
line or in that block of text, it suddenly becomes unnacceptably 
inaccurate. My guess is that it loses the idea of where the bottom of a 
character starts and where its top is. Certainly looking at the output that 
seems to be the issue. I have attached a couple of photos where this has 
happened. In the first a bracket around Hong Kong causes the issue (text 
reads fine without brackets) and in the second an underscore on the bottom 
line causes a problem (fine without underscore), I have added brackets and 
underscores into the training doc at various points (start of line, end of 
line etc) but to no effect.
 
Any ideas?   

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

<<attachment: example.TIF>>

Reply via email to