Re: bug 4234 - MIME_HTML_ONLY + MPART_ALT_DIFF both firing on html-only email
Karsten Bräckelmann wrote: On Sat, 2009-09-19 at 10:22 +0200, Per Jessen wrote: Karsten Bräckelmann wrote: So, yes, the scores are higher again -- and have been for over 2 years now. ;) However, I wouldn't say creeping back in, cause it never has been manually fixed or adjusted, but always generated. Mea culpa, I didn't concern myself with _how_ the scores had been changed back. Nonetheless, perfectly legitimate mails are now given 2.8 (1.1+1.7) points purely for consisting of only an HTML part. Seems a bit excessive. That's still ham. Did you see a FP due to these rules plus others (which ones?), or are you merely about the cumulative score for these on their own? I noticed an FP on a mail from networksolutions, and judging from my logs, on the 18Sep I had 9 emails that scored just above 5 including these two rules. Either way, it'd be interesting to see what the next GA run returns for them. Could you keep an eye on this and get back later with the scores for 3.3? I'll try. /Per Jessen, Zürich
Re: bug 4234 - MIME_HTML_ONLY + MPART_ALT_DIFF both firing on html-only email
Per Jessen wrote: That's still ham. Did you see a FP due to these rules plus others (which ones?), or are you merely about the cumulative score for these on their own? I noticed an FP on a mail from networksolutions, and judging from my logs, on the 18Sep I had 9 emails that scored just above 5 including these two rules. Of which at least three were FPs (from a Vietnamese bank). /Per Jessen, Zürich
spamassassin and plesk
Hi, I've searched the FAQ, but found no help. At least that I understand. I've a Ubuntu 8.04.1 server, running plesk . When I first rented the server, I forgot to buy the spamassassin key for plesk. Now, they tell me I've to change to a different server, paying much more to use spamassassin. I've installed spamassassin running apt-get install spamassassin, but now I can't make sure it is working correctly. if I run spamd -D it starts working, but after that, if I examine a email received, there's no indication of any spam filter examining the email. Is there any way I can test to see if spamassassin is working with qmail? I didn't changed anything in cfg files. running ps aux |grep spamass I only get this line root /usr/sbin/spamd --username=popuser --daemonize --nouser-config --helper-home-dir=/var/qmail --max-children 5 --create-prefs --virtual-config-dir=/var/qmail/mailnames/%d/%l/.spamassassin --pidfile=/var/run/spamd/spamd_full.pid --socketpath=/tmp/spamd_full.sock Is this enough to spamassassin to work with qmail? I've 4 domains, do I have to run a spamd for each? Thanks for your time. Regards Centeno
Re-running SA on an mbox
Hi, I have an mbox with about a 100 messages in it from a few days ago. The mbox is a combination of spam and ham. What is the best way to run SA through these messages again, so I can catch the ones that have URLs in them that weren't on the blacklist at the time they were received? Must I break them all apart to do this, or can SA somehow parse the whole mbox? If not, what program do you suggest I use to accomplish this? Thanks, Alex
Re: Re-running SA on an mbox
Hi, Do you just want to re-scan the whole mbox and see what rules hit now for research reasons? That's a good start, but I'd like to see if I can break out the ham to train bayes. There's no way to (directly) get SA to modify email that's already in an mbox file. The mass-check and sa-learn tools can read them, but nothing in SA can write to that. However, there might be a utility out there to do this (although I'm not aware of any).. Yeah, that's kind of what I thought. Maybe a program that can split each message back into an individual file? Would procmail even help here? Or even a simple shell script that looks for '^From ', redirects it to a file, runs spamassassin -d on it, then re-runs SA on each file? I could then concatenate each of them back together and pass it through sa-learn. Thanks, Alex
Re: Re-running SA on an mbox
MySQL Student wrote: Hi, I have an mbox with about a 100 messages in it from a few days ago. The mbox is a combination of spam and ham. What is the best way to run SA through these messages again, so I can catch the ones that have URLs in them that weren't on the blacklist at the time they were received? Must I break them all apart to do this, or can SA somehow parse the whole mbox? If not, what program do you suggest I use to accomplish this? Do you just want to re-scan the whole mbox and see what rules hit now for research reasons? You could probably abuse the mass-check tool for that purpose: http://svn.apache.org/repos/asf/spamassassin/branches/3.2/masses/ It's normally used to generate logs we feed into the score generation process, but it can be run on a single mbox. The downside, is all it does is generate a report, one line per message, with a list of hits. There's no way to (directly) get SA to modify email that's already in an mbox file. The mass-check and sa-learn tools can read them, but nothing in SA can write to that. However, there might be a utility out there to do this (although I'm not aware of any)..
Re: Re-running SA on an mbox
You probably want spamassassin --mbox. :) It won't modify the messages in-place, but you can do something like spamassassin --mbox infile outfile. If you're talking about sa-learn, though, it also knows --mbox. On Sun, Sep 20, 2009 at 9:46 PM, MySQL Student mysqlstud...@gmail.com wrote: Yeah, that's kind of what I thought. Maybe a program that can split each message back into an individual file? Would procmail even help here? Or even a simple shell script that looks for '^From ', redirects it to a file, runs spamassassin -d on it, then re-runs SA on each file? I could then concatenate each of them back together and pass it through sa-learn.
Re: Re-running SA on an mbox
MySQL Student wrote: Hi, Do you just want to re-scan the whole mbox and see what rules hit now for research reasons? That's a good start, but I'd like to see if I can break out the ham to train bayes. There's no way to (directly) get SA to modify email that's already in an mbox file. The mass-check and sa-learn tools can read them, but nothing in SA can write to that. However, there might be a utility out there to do this (although I'm not aware of any).. Yeah, that's kind of what I thought. Maybe a program that can split each message back into an individual file? Would procmail even help here? Or even a simple shell script that looks for '^From ', redirects it to a file, runs spamassassin -d on it, then re-runs SA on each file? I could then concatenate each of them back together and pass it through sa-learn. That sounds like a good plan. If you google around for mbox split or mbox splitter you can find some sample code out there that does it. It's all just simple code looking for the ^From boundary.
Re: Re-running SA on an mbox
Hi, You probably want spamassassin --mbox. :) It won't modify the messages in-place, but you can do something like spamassassin --mbox infile outfile. My apologies if it wasn't clear, but these messages have already been marked by SA. Some are ham, and the rest are FPs that I'd like to re-run through SA, in hopes of it now properly detecting them as spam. Thank you all for your help. The mbox split suggestion is a good one. I'll follow that route and post my experience later. Thanks again, Alex
Re: Re-running SA on an mbox
Hi, You probably want spamassassin --mbox. :) It won't modify the messages in-place, but you can do something like spamassassin --mbox infile outfile. My apologies if it wasn't clear, but these messages have already been Wait, my mistake. I read that too fast. Does that work, and rewrite the X-Spam-Status header? Guess I could find out for myself, but it just contradicts my experience and info I've learned previously. Thanks again, Alex
Re: Re-running SA on an mbox
On Sep 20, 2009, at 20:45, MySQL Student mysqlstud...@gmail.com wrote: Thank you all for your help. The mbox split suggestion is a good one. I'll follow that route and post my experience later. formail -s is the way to go.
Re: Re-running SA on an mbox
Theo Van Dinter wrote: You probably want spamassassin --mbox. :) It won't modify the messages in-place, but you can do something like spamassassin --mbox infile outfile. If you're talking about sa-learn, though, it also knows --mbox. Yes, but he's got mixed spam and nonspam in one mbox. You've got to split that before you can feed sa-learn. On Sun, Sep 20, 2009 at 9:46 PM, MySQL Student mysqlstud...@gmail.com wrote: Yeah, that's kind of what I thought. Maybe a program that can split each message back into an individual file? Would procmail even help here? Or even a simple shell script that looks for '^From ', redirects it to a file, runs spamassassin -d on it, then re-runs SA on each file? I could then concatenate each of them back together and pass it through sa-learn.
Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting
On 19/09/2009 3:33 PM, Warren Togami wrote: On 09/16/2009 11:47 AM, Warren Togami wrote: On 09/04/2009 10:51 AM, Justin Mason wrote: OK, if you're planning to send us mass-check logs for the 3.3.0 rescoring, now's the time! http://wiki.apache.org/spamassassin/RescoreDetails has all the details. cheers! --j. -rw-r--r-- 174911850 2009/09/16 01:03:40 ham-bayes-net-hege.log -rw-r--r-- 36909774 2009/09/11 20:39:47 ham-bayes-net-mmartinec.log -rw-r--r-- 3179193 2009/09/14 23:16:15 ham-bayes-net-wt-en1.log -rw-r--r-- 1591286 2009/09/14 23:24:19 ham-bayes-net-wt-en2.log -rw-r--r-- 5687443 2009/09/14 23:53:41 ham-bayes-net-wt-en3.log -rw-r--r-- 354 2009/09/14 23:56:00 ham-bayes-net-wt-en4.log -rw-r--r-- 575780 2009/09/14 22:13:01 ham-bayes-net-wt-jp1.log -rw-r--r-- 2139873 2009/09/14 22:23:07 ham-bayes-net-wt-jp2.log -rw-r--r-- 40760753 2009/09/16 01:04:24 spam-bayes-net-hege.log -rw-r--r-- 35666309 2009/09/11 20:52:01 spam-bayes-net-mmartinec.log -rw-r--r-- 4341537 2009/09/14 23:16:16 spam-bayes-net-wt-en1.log -rw-r--r-- 1576 2009/09/14 23:24:20 spam-bayes-net-wt-en2.log -rw-r--r-- 310 2009/09/14 23:53:42 spam-bayes-net-wt-en3.log -rw-r--r-- 494742 2009/09/14 23:56:00 spam-bayes-net-wt-en4.log -rw-r--r-- 79101 2009/09/14 22:13:02 spam-bayes-net-wt-jp1.log -rw-r--r-- 311 2009/09/14 22:23:08 spam-bayes-net-wt-jp2.log One day from the deadline for spamassassin-3.3.0 scoring and we currently have only three people reporting. The deadline has been extended until Monday, September 21st. But at this moment the number of logs reporting for the rescore masscheck has not changed. Are the uploaded corpa being processed? They'll all be processed together when its declared that time to submit has expired. Who else is still working on their own corpus? Due to unreleated to SA memory leaks in haldaemon on my machines, and me not noticing and instead fighting with Perl to build modules, I'm just starting my mass-check now. I imagine that it will be sometime Tuesday after work before I have results submitted. Daryl
Re: Re-running SA on an mbox
On man 21 sep 2009 04:47:23 CEST, MySQL Student wrote Wait, my mistake. I read that too fast. Does that work, and rewrite the X-Spam-Status header? imho spamassassin always remove its own known headers, but only once it can add self so yes the trick is to retest, where you will see if its still listed in rbl :) but this will invalidtate dkim headers if this headers is signed, are spamassassin aware of this problem ? (in general) Guess I could find out for myself, but it just contradicts my experience and info I've learned previously. mutt -f mbox in mutt save to another folder if missclassified -- xpoint