I have some scanned, machine typed, that have a lot of noise. I can reduce the noise, and I have done so. But there is some noise that is statistically indistinguishable from letters: as dark as the letters and as big as the letters, therefore I cannot just take it out.
I have tried to only train Tesseract on Courier New, and although the accuracy went down, which was expected because I did not use enough data, there were still letters detected in the noisy areas. How can I keep Tesseract from detecting letters in noise? One simple rule would be to only detect characters of one size, since this is machine typed text. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/24c0b6ae-e07b-443b-ba60-38470b852275n%40googlegroups.com.

