Maybe it would be good to provide some examples of input. Zdenko
pi 25. 9. 2020 o 7:57 Radu Stoicescu <[email protected]> napĂsal(a): > I have some scanned, machine typed, that have a lot of noise. I can reduce > the noise, and I have done so. But there is some noise that is > statistically indistinguishable from letters: as dark as the letters and as > big as the letters, therefore I cannot just take it out. > > I have tried to only train Tesseract on Courier New, and although the > accuracy went down, which was expected because I did not use enough data, > there were still letters detected in the noisy areas. > > How can I keep Tesseract from detecting letters in noise? One simple rule > would be to only detect characters of one size, since this is machine typed > text. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/24c0b6ae-e07b-443b-ba60-38470b852275n%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/24c0b6ae-e07b-443b-ba60-38470b852275n%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zm1zQw1YJO8weKmO1_Y6mz4HK4FDtC1aUqaXHgRSYmPw%40mail.gmail.com.

