Jim Maul wrote:
Marc Perkel wrote:


Jim Maul wrote:
Marc Perkel wrote:
Perhaps what I need to do is to get rid of autolearn and write my own learning system that strips out the body of messages with images and just learns the headers. My problem is that when users get image spam they put it in the spam folders and they get learned. But the text in the image spam causes ham type text to be learned as spam. That causes ham to get higher scores.



Are you sure of this? Have you also trained these ham messages to counter this effect? Not too long ago we were in the same situation. I have autolearn enabled but I have adjusted the thresholds to avoid learning false positives/negatives. We were getting ham (although arguably - they were newsletter type ham) that was hitting BAYES_99. As soon as i started training them as ham the problem went away. Spam is still detected correctly by bayes and these newsletters no longer hit bayes_99.

-Jim


What I think my problem might be is that I have done so much work prescreening messages with Exim that what's left isn't good stock for autolearn. I think what I need is a separate dedicated learner server that is selective and smart about what it learns.



This is quite possible. I have heard other stories of people using things like greylisting and rbls to reject at smtp time that the only things that eventually made it to SA were so limited that it would produce odd results for bayes. From my experience, the more you throw at bayes, the better it gets. The more selective you are, the less it has to work with.

Jim


Yes - I think that's what's happening to me. I also create an automatic whitelisting system that shaves off about 1/2 of ham bypassing SA. What I need to do is fork off a copy of a lot of email that's bypassing SA and stuff it into the learner. Like I said originally, bayes used to be my best tool. I'd like to get that back.

Reply via email to