On 6/18/2013 5:31 AM, Amir 'CG' Caspi wrote: > At 4:37 PM -0400 06/14/2013, Alex wrote: >> On Fri, Jun 14, 2013 at 4:18 PM, Amir 'CG' Caspi <ceph...@3phase.com> >> wrote: >> > I wonder if there's some >> > difference between running spamassassin manually on the message versus >> > running spamd. >> >> I think the only difference would be if spamd somehow didn't recognize >> all the locations for your rules. > > OK, I've got some more weirdness here. I just received two FN spams... > one had bayes00, another bayes50. To test what the heck might be going > on, I ran both of the emails through spamc manually... this SHOULD > recreate the same thing that occurs when sendmail delivers the email and > spamc gets run automatically. > > The first email, which was bayes00 originally, hit with bayes99 when I > ran it manually through spamc. There were only a few minutes between > the first and second run (see timestamps below)... nothing very > important happened to the Bayes DB between those two runs. The second > email, bayes50, stayed exactly the same (also bayes50). I looked > through the /var/log/maillog to see if I could figure out some > difference between the two runs, but they look basically identical. > > The only difference I can figure is that the second (manual) run happens > on mail source that I copy/paste from my email program... that is, it's > pure text, copied and pasted. The first (automatic) run is on the mail > as it enters the system, which might be somehow formatted differently. > All of my sa-learn training is done directly on my mbox files, which > perhaps is more akin to copy/paste than anything else... > > Anyone know what the hell is going on here? For reference, here is the > maillog entry for the bayes00 message when it went through automatically: > > Jun 18 05:00:32 kismet sendmail[27721]: r5I90WGI027721: > from=<junekohlssur...@stetacusesse.us>, size=16502, class=0, nrcpts=1, > msgid=<nnnnnnnnn19483006nnnnnn...@efeo6h8pf.stetacusesse.us>, > proto=ESMTP, relay=root@localhost > Jun 18 05:00:32 kismet sendmail[27707]: r5I90U4N027657: > to=<u...@domain.com>, delay=00:00:01, xdelay=00:00:00, > mailer=virthostmail, pri=136089, relay=domain.com, dsn=2.0.0, stat=Sent > (r5I90WGI027721 Message accepted for delivery) > Jun 18 05:00:32 kismet spamd[27586]: spamd: connection from > localhost.localdomain [127.0.0.1] at port 53424 > Jun 18 05:00:32 kismet spamd[27586]: spamd: setuid to u...@domain.com > succeeded > Jun 18 05:00:32 kismet spamd[27586]: spamd: processing message > <nnnnnnnnn19483006nnnnnn...@efeo6h8pf.stetacusesse.us> for > u...@domain.com:22001 > Jun 18 05:00:33 kismet spamd[27586]: spf: lookup failed: Can't locate > object method "new_from_string" via package "Mail::SPF::v1::Record" at > /usr/lib/perl5/vendor_perl/5.8.8/Mail/SPF/Server.pm line 524. > Jun 18 05:00:37 kismet spamd[27586]: pyzor: [27730] error: TERMINATED, > signal 15 (000f) > Jun 18 05:00:37 kismet spamd[27586]: spamd: clean message (-1.1/5.0) for > u...@domain.com:22001 in 5.0 seconds, 16781 bytes. > Jun 18 05:00:37 kismet spamd[27586]: spamd: result: . -1 - > BAYES_00,HTML_EXTRA_CLOSE,HTML_IMAGE_RATIO_08,HTML_MESSAGE,RDNS_NONE > scantime=5.0,size=16781,user=u...@domain.com,uid=22001,required_score=5.0,rhost=localhost.localdomain,raddr=127.0.0.1,rport=53424,mid=<nnnnnnnnn19483006nnnnnn...@efeo6h8pf.stetacusesse.us>, > bayes=0.000000,autolearn=no > > > And here is when it went through manually: > > Jun 18 05:05:45 kismet spamd[27984]: spamd: connection from > localhost.localdomain [127.0.0.1] at port 53447 > Jun 18 05:05:45 kismet spamd[27984]: spamd: setuid to u...@domain.com > succeeded > Jun 18 05:05:45 kismet spamd[27984]: spamd: processing message > <nnnnnnnnn19483006nnnnnn...@efeo6h8pf.stetacusesse.us> for > u...@domain.com:22001 > Jun 18 05:05:45 kismet spamd[27984]: spf: lookup failed: Can't locate > object method "new_from_string" via package "Mail::SPF::v1::Record" at > /usr/lib/perl5/vendor_perl/5.8.8/Mail/SPF/Server.pm line 524. > Jun 18 05:05:47 kismet spamd[27984]: spamd: identified spam (6.0/5.0) > for u...@domain.com:22001 in 2.2 seconds, 16223 bytes. > Jun 18 05:05:47 kismet spamd[27984]: spamd: result: Y 6 - > BAYES_99,MISSING_MIME_HB_SEP,RDNS_NONE,T_MIME_NO_TEXT,URIBL_BLACK > scantime=2.2,size=16223,user=u...@domain.com,uid=22001,required_score=5.0,rhost=localhost.localdomain,raddr=127.0.0.1,rport=53447,mid=<nnnnnnnnn19483006nnnnnn...@efeo6h8pf.stetacusesse.us>,bayes=1.000000,autolearn=no > > > > So... what the heck is going on? I see basically no difference between > the two maillog entries. The only difference between the two runs, as > far as I can tell, is that pyzor died on the first one (and I don't know > why, but that shouldn't have ANY effect on the Bayes score), and the > manual run was using the copy/paste from my mail program. > > But, as mentioned, the bayes50 spam looked identical for both the > automatic and manual runs. > > Anyone have any idea what the heck is going on, and how I can fix it? > > Is my Bayes DB worthless because I've been training it on MBOX format > (i.e. ASCII), but when it runs the first time around, it's running on > binary (MIME) instead? If so, how can I fix this -- do I need to store > my mail in some different format instead of MBOX? (Except that sendmail > delivers my mail in MBOX format...) > > Thanks. > > --- Amir
While my setup is slightly different (I use AMaViS), I had a similar problem (discrepancies in Bayes scores for the same message) and with the help of this list, we went through the entire setup -- rather exhaustively. Here is that thread: http://mail-archives.apache.org/mod_mbox/spamassassin-users/201301.mbox/%3c50edebad.2030...@indietorrent.org%3E . Basically, it sounds as though: a.) You are copying/pasting the body of the email, but not the headers. I made the same mistake. I use Thunderbird, and to view the actual message source there, one presses Ctrl+U. *That's* the text you would want to copy and paste. b.) You are running Bayes as two different users when you perform your tests. It's possible that SpamAssassin has its own user for executing Bayes-related tasks, but you're using your own system account, for example, which would explain the observed behavior. (By default, each user has his own Bayes DB; it is possible to "hard-code" the Bayes user, which is exactly what I had to do, for more reason than one.) I sincerely doubt that this is a problem with your mailbox format. Have a look at the thread I cited and see if anything jumps-out at you. -Ben