Re: How SA reactes to a bunch of garbage characters

Matus UHLAR - fantomas Tue, 14 Jun 2016 05:34:00 -0700

Sure the OCR results are not very precise. But could we imagine that
they are pushed in a part of the message that will not go through Bayes?

where do you want to push the ORC'ed test, if not back to SA to check other
rules like bayes?


On 14.06.16 13:50, Olivier wrote:

To a part that would do regexp rules, but not Bayes? I don't know if it
is possible.


someone who knoes SA internals will have to answer this one, but I doubt
it's useful, see below.

the PDF is technically something different: PDF (often) contains plain text,
that does not have to be OCRed and this it will not be misinterpreted.


But isn't it troubling the Bayes process if we inject the mail body +
the part extracted from PDF? Should we not better submit only the
original message? I have no answer on that.


that is just what I would like to know: If OCR produces results good enough
for BAYES and other rules.

I don't think there's difference between bayes and other rules.
It's also possible that BAYES would have better results with misread
characters than other rules.

I would skip gocr and ocrad, since tesseract behaves great now...
(the debian fuzzyocr package requires all of them, dunno why)


I'll take your advice, I jus noticed that tesseract was not enabled by
default! I use FreeBSD, could it be required at install only, but
disabled later in your configuration of FuzzyOcr?


I believe so. if you have spamples, try running all OCR on them to
decide which are usefull...

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.

Honk if you love peace and quiet.

Re: How SA reactes to a bunch of garbage characters

Reply via email to