At 4:37 PM -0400 06/14/2013, Alex wrote:
On Fri, Jun 14, 2013 at 4:18 PM, Amir 'CG' Caspi <ceph...@3phase.com> wrote:
> I wonder if there's some
> difference between running spamassassin manually on the message versus
> running spamd.
I think the only difference would be if spamd somehow didn't recognize
all the locations for your rules.
OK, I've got some more weirdness here. I just received two FN
spams... one had bayes00, another bayes50. To test what the heck
might be going on, I ran both of the emails through spamc manually...
this SHOULD recreate the same thing that occurs when sendmail
delivers the email and spamc gets run automatically.
The first email, which was bayes00 originally, hit with bayes99 when
I ran it manually through spamc. There were only a few minutes
between the first and second run (see timestamps below)... nothing
very important happened to the Bayes DB between those two runs. The
second email, bayes50, stayed exactly the same (also bayes50). I
looked through the /var/log/maillog to see if I could figure out some
difference between the two runs, but they look basically identical.
The only difference I can figure is that the second (manual) run
happens on mail source that I copy/paste from my email program...
that is, it's pure text, copied and pasted. The first (automatic)
run is on the mail as it enters the system, which might be somehow
formatted differently. All of my sa-learn training is done directly
on my mbox files, which perhaps is more akin to copy/paste than
anything else...
Anyone know what the hell is going on here? For reference, here is
the maillog entry for the bayes00 message when it went through
automatically:
Jun 18 05:00:32 kismet sendmail[27721]: r5I90WGI027721:
from=<junekohlssur...@stetacusesse.us>, size=16502, class=0,
nrcpts=1,
msgid=<nnnnnnnnn19483006nnnnnn...@efeo6h8pf.stetacusesse.us>,
proto=ESMTP, relay=root@localhost
Jun 18 05:00:32 kismet sendmail[27707]: r5I90U4N027657:
to=<u...@domain.com>, delay=00:00:01, xdelay=00:00:00,
mailer=virthostmail, pri=136089, relay=domain.com, dsn=2.0.0,
stat=Sent (r5I90WGI027721 Message accepted for delivery)
Jun 18 05:00:32 kismet spamd[27586]: spamd: connection from
localhost.localdomain [127.0.0.1] at port 53424
Jun 18 05:00:32 kismet spamd[27586]: spamd: setuid to u...@domain.com succeeded
Jun 18 05:00:32 kismet spamd[27586]: spamd: processing message
<nnnnnnnnn19483006nnnnnn...@efeo6h8pf.stetacusesse.us> for
u...@domain.com:22001
Jun 18 05:00:33 kismet spamd[27586]: spf: lookup failed: Can't locate
object method "new_from_string" via package "Mail::SPF::v1::Record"
at /usr/lib/perl5/vendor_perl/5.8.8/Mail/SPF/Server.pm line 524.
Jun 18 05:00:37 kismet spamd[27586]: pyzor: [27730] error:
TERMINATED, signal 15 (000f)
Jun 18 05:00:37 kismet spamd[27586]: spamd: clean message (-1.1/5.0)
for u...@domain.com:22001 in 5.0 seconds, 16781 bytes.
Jun 18 05:00:37 kismet spamd[27586]: spamd: result: . -1 -
BAYES_00,HTML_EXTRA_CLOSE,HTML_IMAGE_RATIO_08,HTML_MESSAGE,RDNS_NONE
scantime=5.0,size=16781,user=u...@domain.com,uid=22001,required_score=5.0,rhost=localhost.localdomain,raddr=127.0.0.1,rport=53424,mid=<nnnnnnnnn19483006nnnnnn...@efeo6h8pf.stetacusesse.us>,
bayes=0.000000,autolearn=no
And here is when it went through manually:
Jun 18 05:05:45 kismet spamd[27984]: spamd: connection from
localhost.localdomain [127.0.0.1] at port 53447
Jun 18 05:05:45 kismet spamd[27984]: spamd: setuid to u...@domain.com succeeded
Jun 18 05:05:45 kismet spamd[27984]: spamd: processing message
<nnnnnnnnn19483006nnnnnn...@efeo6h8pf.stetacusesse.us> for
u...@domain.com:22001
Jun 18 05:05:45 kismet spamd[27984]: spf: lookup failed: Can't locate
object method "new_from_string" via package "Mail::SPF::v1::Record"
at /usr/lib/perl5/vendor_perl/5.8.8/Mail/SPF/Server.pm line 524.
Jun 18 05:05:47 kismet spamd[27984]: spamd: identified spam (6.0/5.0)
for u...@domain.com:22001 in 2.2 seconds, 16223 bytes.
Jun 18 05:05:47 kismet spamd[27984]: spamd: result: Y 6 -
BAYES_99,MISSING_MIME_HB_SEP,RDNS_NONE,T_MIME_NO_TEXT,URIBL_BLACK
scantime=2.2,size=16223,user=u...@domain.com,uid=22001,required_score=5.0,rhost=localhost.localdomain,raddr=127.0.0.1,rport=53447,mid=<nnnnnnnnn19483006nnnnnn...@efeo6h8pf.stetacusesse.us>,bayes=1.000000,autolearn=no
So... what the heck is going on? I see basically no difference
between the two maillog entries. The only difference between the two
runs, as far as I can tell, is that pyzor died on the first one (and
I don't know why, but that shouldn't have ANY effect on the Bayes
score), and the manual run was using the copy/paste from my mail
program.
But, as mentioned, the bayes50 spam looked identical for both the
automatic and manual runs.
Anyone have any idea what the heck is going on, and how I can fix it?
Is my Bayes DB worthless because I've been training it on MBOX format
(i.e. ASCII), but when it runs the first time around, it's running on
binary (MIME) instead? If so, how can I fix this -- do I need to
store my mail in some different format instead of MBOX? (Except that
sendmail delivers my mail in MBOX format...)
Thanks.
--- Amir