Jennifer Fountain <[EMAIL PROTECTED]> wrote: > I cannot get it to do anything - no bayes files are created (I > have autolearn on), etc. I have read everything I can read > but just can't get it to run.
Do you get any useful info from sa-learn --rebuild -D or --dump ? Presumably you're well above the 200 message threshold for bayes, right? Are you setting up per-user or site-wide? > Someone mentioned bogofilter with spam. I would like more > informaiton on that. I run SA, Qmailscanner (qmail) and clam. > Email is relayed to an internal box. I'm tinkering with several bayes checkers, including bogofilter, spamprobe (http://spamprobe.sourceforge.net/index.html) and ifile (http://www.nongnu.org/ifile/) in parallel. I think I recall someone on this list mentioning that future SA will incorporate the bogofilter engine... or am I imagining things? I do like the bogofilter "tri-state" option, for spam=yes, no or unsure. I can do the same testing with various bayes test levels in SA, but "unsure" helps with procmail rules (a tiny bit). There's some VERY GOOD material on training in the bogofilter documentation. Ifile is interesting because it uses bayes to categorize into various _user-defined_ categories. So, with different databases and configs, 1st level filtering could detect spam, 2nd level could detect bulk versus personal email, 3rd level sort among mailing list folder by category/topic. I've only set up "ham/spam" so far. spamprobe does two-word phrases, which interested me. It apparently "caught on" quicker, though with additional training, I find bogofilter doing about as well at detecting spam that I get. In any case, on my little test setup, I'm very happy with the existing (2.63) bayes setup. One complaint about SA on "another" list (usually disputed) is about speed. My questions: 1.) From previous posts on this list, it sounds like SA is used with great success in high-volume environments. What are the high-end numbers? 2.) What is the impact of additional rule sets on performance? I find SA excellent at accurately scoring messages, and have used it to quickly train the other bayes tools. I'm no expert on this, but it seems to be once could use SA to accurately train another, faster bayes tool, then use that (with periodic retraining) in a high-volume environment. Then again, I'd just be trading complexity for speed, so I'm not convinced it's a good idea. - Bob
