Jim Maul wrote:
Marc Perkel wrote:
Jim Maul wrote:
Marc Perkel wrote:
Perhaps what I need to do is to get rid of autolearn and write my
own learning system that strips out the body of messages with
images and just learns the headers. My problem is that when users
get image spam they put it in the spam folders and they get
learned. But the text in the image spam causes ham type text to be
learned as spam. That causes ham to get higher scores.
Are you sure of this? Have you also trained these ham messages to
counter this effect? Not too long ago we were in the same
situation. I have autolearn enabled but I have adjusted the
thresholds to avoid learning false positives/negatives. We were
getting ham (although arguably - they were newsletter type ham) that
was hitting BAYES_99. As soon as i started training them as ham the
problem went away. Spam is still detected correctly by bayes and
these newsletters no longer hit bayes_99.
-Jim
What I think my problem might be is that I have done so much work
prescreening messages with Exim that what's left isn't good stock for
autolearn. I think what I need is a separate dedicated learner server
that is selective and smart about what it learns.
This is quite possible. I have heard other stories of people using
things like greylisting and rbls to reject at smtp time that the only
things that eventually made it to SA were so limited that it would
produce odd results for bayes. From my experience, the more you throw
at bayes, the better it gets. The more selective you are, the less it
has to work with.
Jim
Yes - I think that's what's happening to me. I also create an automatic
whitelisting system that shaves off about 1/2 of ham bypassing SA. What
I need to do is fork off a copy of a lot of email that's bypassing SA
and stuff it into the learner. Like I said originally, bayes used to be
my best tool. I'd like to get that back.
- Re: Is Bayes Dead? Have the spammers won? Marc Perkel
-