I've been trying to `teach' SA to spam from ham in my mail system. I've made it thru two main learning sessions where I ran around 450 msgs (each time) thru sa-learn spam/ham and yet SA is still incapable of getting it right more than about 40 % or maybe less. Not sure how to figure that out very exactly.
My incoming mail is probably no more than 10-12% ham.. maybe not even that. So major spam is coming in. Now after the above mentioned amount of training I've run 1100 messages thru my send box setup... its the last 11 messages that have come in. I'm using only 2 rules in procmailrc... spam and ham following the call to SA. Look at the (mbox style) files that resulted: -rw-------[...] 10045521 Sep 15 22:26 ham -rw-------[...] 6372824 Sep 15 22:26 spam That is about: 9.6 MB ham 6.1 MB spam So truly massive amounts of spam are STILL being seen as ham by SA. That should be something like: 2.0 MB ham 13.5 MB spam Even more aggravating is that many many of the spam msgs are just like the messages that SA was 'trained' on. Does this seem unreasonable enough that it must mean I'm doing this all wrong? Can anyone post some figures of what to expect with default SA 3.3.2 and what to expect after some specific amount of training?