I have some scanned, machine typed, that have a lot of noise. I can reduce 
the noise, and I have done so. But there is some noise that is 
statistically indistinguishable from letters: as dark as the letters and as 
big as the letters, therefore I cannot just take it out.

I have tried to only train Tesseract on Courier New, and although the 
accuracy went down, which was expected because I did not use enough data, 
there were still letters detected in the noisy areas.

How can I keep Tesseract from detecting letters in noise? One simple rule 
would be to only detect characters of one size, since this is machine typed 
text.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/24c0b6ae-e07b-443b-ba60-38470b852275n%40googlegroups.com.

Reply via email to