Re: Identifying Source of False Positives -- RESOLVED

2009-06-05 Thread Rich Shepard
On Tue, 2 Jun 2009, Rich Shepard wrote: I started doing this today. Each of the false positive messages was exported from alpine to a file, and I ran sa-learn on that file telling it the text is ham. Today the mail and logwatch summary reports appeared in my inbox and there were no false

Re: Identifying Source of False Positives -- RESOLVED

2009-06-05 Thread Bowie Bailey
Rich Shepard wrote: The empty body problem is a more difficult problem. Have procmail save a copy of the raw message somewhere and take a look at it. Make sure there is a blank line between the headers and the body. Run 'spamassassin -D' on this saved message and look for anything unusual

Re: Identifying Source of False Positives -- RESOLVED

2009-06-05 Thread Rich Shepard
On Fri, 5 Jun 2009, Bowie Bailey wrote: In that case, you should be able to track down the issue by comparing the two files. Is the EMPTY_BODY rule defined in the old local.cf file? If so, what does it say? Bowie, Yes, it was in the old local.cf: # for empty message bodies: body

Re: Identifying Source of False Positives

2009-06-04 Thread Rich Shepard
On Mon, 1 Jun 2009, Bowie Bailey wrote: The empty body problem is a more difficult problem. Have procmail save a copy of the raw message somewhere and take a look at it. Make sure there is a blank line between the headers and the body. Bowie, et al.: Progress is being made. I discovered

Re: Identifying Source of False Positives

2009-06-03 Thread Rich Shepard
On Tue, 2 Jun 2009, Charles Gregory wrote: This *really* suggests that one of two things MUST be occuring: 1) What you are seeing is NOT what spamassassin sees. Charles, Quite possible. 2) A character (null/ascii-zeros?) has been injected into the e-mail somewhere in the headers,

Re: Identifying Source of False Positives

2009-06-02 Thread Rich Shepard
On Mon, 1 Jun 2009, Bowie Bailey wrote: Your biggest problems here are BAYES_99 and EMPTY_BODY. To fix the Bayes problem, sa-learn some of these messages as ham. Make sure you are learning as the right user... Bowie, I started doing this today. Each of the false positive messages was

Re: Identifying Source of False Positives

2009-06-02 Thread Charles Gregory
On Tue, 2 Jun 2009, Rich Shepard wrote: This morning not only was the mail log report and logwatch report falsely flagged as spam, but so were several messages posted to the google group mail list for an application I use. What is interesting to me is that every one had a +2.5 score for

Re: Identifying Source of False Positives

2009-06-01 Thread McDonald, Dan
On Mon, 2009-06-01 at 09:28 -0700, Rich Shepard wrote: I'm running SA-3.2.5 on Slackware-12.2 and encountering false positives on messages that have not before been seen as spam by SA. Specifically, the daily postfix mail log summary report and the daily logwatch report are marked at spam;

Re: Identifying Source of False Positives

2009-06-01 Thread Charles Gregory
On Mon, 1 Jun 2009, Rich Shepard wrote: messages that have not before been seen as spam by SA. Specifically, the daily postfix mail log summary report and the daily logwatch report are marked at spam; Well, firstly, examine the mail full headers. There should be an X-Spam-Status header listing

Re: Identifying Source of False Positives

2009-06-01 Thread John Hardin
On Mon, 1 Jun 2009, Rich Shepard wrote: I'm running SA-3.2.5 on Slackware-12.2 and encountering false positives on messages that have not before been seen as spam by SA. Specifically, the daily postfix mail log summary report and the daily logwatch report are marked at spam; they are sent by

Re: Identifying Source of False Positives

2009-06-01 Thread Rich Shepard
On Mon, 1 Jun 2009, Charles Gregory wrote: Well, firstly, examine the mail full headers. There should be an X-Spam-Status header listing the tests that matched on the e-mail. Charles/Dan/John: I certainly managed to forget this. I just ran /etc/cron.daily/1pflogsumm and looked at the

Re: [sa] Re: Identifying Source of False Positives

2009-06-01 Thread Charles Gregory
On Mon, 1 Jun 2009, Rich Shepard wrote: * 2.5 EMPTY_BODY BODY: Message has subject but no body There is certainly body content in the message; it's not empty so I don't understand the 2.5 on that third test. I also don't know where the 3.5 on the second test arises. Just to be

Re: Identifying Source of False Positives

2009-06-01 Thread John Hardin
On Mon, 1 Jun 2009, Rich Shepard wrote: Here are the headers: From r...@salmo.appl-ecosys.com Mon Jun 1 11:25:44 2009 Return-Path: r...@salmo.appl-ecosys.com X-Spam-Flag: YES X-Spam-Checker-Version: SpamAssassin 3.2.5-ph20040310.0 (2008-06-10) on salmo.appl-ecosys.com X-Spam-Level:

Re: [sa] Re: Identifying Source of False Positives

2009-06-01 Thread Rich Shepard
On Mon, 1 Jun 2009, Charles Gregory wrote: Just to be clear, are you looking at the body in the actual rejected message, Charles, Yes. The body consists of the mail log summary. First guess, look at the procmail code that 'chooses' to run spamassassin. Have you used an 'h' where you

Re: Identifying Source of False Positives

2009-06-01 Thread Rich Shepard
On Mon, 1 Jun 2009, John Hardin wrote: If these are system-generated messages, something is improperly training SA that they are spam. Do you use autolearn? John, No. Once a week or so I run sa-learn specifying spam on the spam-uncaught mbox file. Less frequently I run it on mail list

Re: Identifying Source of False Positives

2009-06-01 Thread Bowie Bailey
Rich Shepard wrote: Here are all headers from the mail log summary: From r...@salmo.appl-ecosys.com Mon Jun 1 11:25:44 2009 Return-Path: r...@salmo.appl-ecosys.com X-Spam-Flag: YES X-Spam-Checker-Version: SpamAssassin 3.2.5-ph20040310.0 (2008-06-10) on salmo.appl-ecosys.com X-Spam-Level:

Re: Identifying Source of False Positives

2009-06-01 Thread John Hardin
On Mon, 1 Jun 2009, Rich Shepard wrote: On Mon, 1 Jun 2009, John Hardin wrote: If these are system-generated messages, something is improperly training SA that they are spam. Do you use autolearn? John, No. Once a week or so I run sa-learn specifying spam on the spam-uncaught mbox file.

Re: [sa] Re: Identifying Source of False Positives

2009-06-01 Thread Charles Gregory
First guess, look at the procmail code that 'chooses' to run spamassassin. Have you used an 'h' where you meant to use an 'H', thereby feeding *only* the header to spamassassin? ## Call SpamAssassin : 0fw: spamassassin.lock * 256000 | spamassassin Is there anywhere in the procmail

Re: [sa] Re: Identifying Source of False Positives

2009-06-01 Thread Rich Shepard
On Mon, 1 Jun 2009, Charles Gregory wrote: Is there anywhere in the procmail recipe *above* this one that some specila condition has been specified as: :0fwh ...which has the effect of 'filtering' the message down to just its headers? It wouldn't necessarily have to be a recent change to

Re: Identifying Source of False Positives

2009-06-01 Thread Rich Shepard
On Mon, 1 Jun 2009, Bowie Bailey wrote: Your biggest problems here are BAYES_99 and EMPTY_BODY. To fix the Bayes problem, sa-learn some of these messages as ham. Make sure you are learning as the right user... Bowie, I just did this on a run from this morning. I'll do so again tomorrow

Re: Identifying Source of False Positives

2009-06-01 Thread Theo Van Dinter
fwiw, even if there isn't a blank line, SA will figure it out (though it'll trigger a MISSING_HB_SEP rule hit). As for the debug output ... it depends, how did you run the command (ie: what was the command you tried). My guess is you did something like spamassassin -D filename, where filename

Re: Identifying Source of False Positives

2009-06-01 Thread Rich Shepard
On Mon, 1 Jun 2009, John Hardin wrote: And I assume you look at the sapm-uncaught file before learning it? Yes. The messages in there are those I deliberately move there after they've ended up in my inbox because neither the postfix filters nor the spamassassin rules caught them. If some

Re: Identifying Source of False Positives

2009-06-01 Thread Rich Shepard
On Mon, 1 Jun 2009, Theo Van Dinter wrote: My guess is you did something like spamassassin -D filename, where filename gets treated as the argument to -D, so then it was waiting for input. Theo, Yes, this is what I did. If this is the case, try spamassassin -D filename /dev/null. :)

Re: Identifying Source of False Positives

2009-06-01 Thread John Hardin
On Mon, 1 Jun 2009, Rich Shepard wrote: On Mon, 1 Jun 2009, John Hardin wrote: Have you kept your spam and ham corpa? I'm not sure. The spam comes from the spam-uncaught file which is cleared each time it's run. Pity. If you're manually training it's a very good idea to retain your