On 6/14/2016 8:33 AM, Matus UHLAR - fantomas wrote:
that is just what I would like to know: If OCR produces results good enough
for BAYES and other rules.

I don't think there's difference between bayes and other rules.
It's also possible that BAYES would have better results with misread
characters than other rules.
I've dealt with OCR in the past, and have always had to go back afterwards and manually proofread the results. I expect the impact on Bayes would be a massively increased dictionary of rare words that result from poor "keming" in the image. Some PDFs are written in extractable text instead of images, but those tend to use fractional-width spaces for kerning so it's not always easy to figure out what's a real word there either.

That said, Google seems to use OCR on images in their filtering (quoth Wikipedia), so maybe it works when you have a sufficiently enormous data set that the OCR glitches are no longer rare and a decent inference can be made from them.

Reply via email to