Re: bug 4234 - MIME_HTML_ONLY + MPART_ALT_DIFF both firing on html-only email

2009-09-20 Thread Per Jessen
Karsten Bräckelmann wrote:

 On Sat, 2009-09-19 at 10:22 +0200, Per Jessen wrote:
 Karsten Bräckelmann wrote:
  So, yes, the scores are higher again -- and have been for over 2
  years
  now. ;)  However, I wouldn't say creeping back in, cause it never
  has been manually fixed or adjusted, but always generated.
 
 Mea culpa, I didn't concern myself with _how_ the scores had been
 changed back.  Nonetheless, perfectly legitimate mails are now given
 2.8 (1.1+1.7) points purely for consisting of only an HTML part. 
 Seems a bit excessive.
 
 That's still ham.  Did you see a FP due to these rules plus others
 (which ones?), or are you merely about the cumulative score for these
 on their own?

I noticed an FP on a mail from networksolutions, and judging from my
logs, on the 18Sep I had 9 emails that scored just above 5 including
these two rules. 

 Either way, it'd be interesting to see what the next GA run returns
 for them. Could you keep an eye on this and get back later with the
 scores for 3.3?

I'll try. 


/Per Jessen, Zürich



Re: bug 4234 - MIME_HTML_ONLY + MPART_ALT_DIFF both firing on html-only email

2009-09-20 Thread Per Jessen
Per Jessen wrote:

 That's still ham.  Did you see a FP due to these rules plus others
 (which ones?), or are you merely about the cumulative score for these
 on their own?
 
 I noticed an FP on a mail from networksolutions, and judging from my
 logs, on the 18Sep I had 9 emails that scored just above 5 including
 these two rules.

Of which at least three were FPs (from a Vietnamese bank). 


/Per Jessen, Zürich



spamassassin and plesk

2009-09-20 Thread Luis Centeno
Hi,
I've searched the FAQ, but found no help. At least that I understand.
I've a Ubuntu 8.04.1 server, running plesk .
When I first rented the server, I forgot to buy the spamassassin key for
plesk. Now, they tell me I've to change to a different server, paying much
more to use spamassassin.
I've installed spamassassin running apt-get install spamassassin, but now
I can't make sure it is working correctly.
if I run spamd -D it starts working, but after that, if I examine a email
received, there's no indication of any spam filter examining the email.
Is there any way I can test to see if spamassassin is working with qmail?
I didn't changed anything in cfg files.

running ps aux |grep spamass
I only get this line
root /usr/sbin/spamd --username=popuser --daemonize --nouser-config
--helper-home-dir=/var/qmail --max-children 5 --create-prefs
--virtual-config-dir=/var/qmail/mailnames/%d/%l/.spamassassin
--pidfile=/var/run/spamd/spamd_full.pid --socketpath=/tmp/spamd_full.sock
Is this enough to spamassassin to work with qmail?
I've 4 domains, do I have to run a spamd for each?
Thanks for your time.
Regards
Centeno


Re-running SA on an mbox

2009-09-20 Thread MySQL Student
Hi,

I have an mbox with about a 100 messages in it from a few days ago.
The mbox is a combination of spam and ham. What is the best way to run
SA through these messages again, so I can catch the ones that have
URLs in them that weren't on the blacklist at the time they were
received?

Must I break them all apart to do this, or can SA somehow parse the
whole mbox? If not, what program do you suggest I use to accomplish
this?

Thanks,
Alex


Re: Re-running SA on an mbox

2009-09-20 Thread MySQL Student
Hi,

 Do you just want to re-scan the whole mbox and see what rules hit now
 for research reasons?

That's a good start, but I'd like to see if I can break out the ham to
train bayes.

 There's no way to (directly) get SA to modify email that's already in an
 mbox file. The mass-check and sa-learn tools can read them, but nothing
 in SA can write to that. However, there might be a utility out there to
 do this (although I'm not aware of any)..

Yeah, that's kind of what I thought. Maybe a program that can split
each message back into an individual file? Would procmail even help
here? Or even a simple shell script that looks for '^From ', redirects
it to a file, runs spamassassin -d on it, then re-runs SA on each
file? I could then concatenate each of them back together and pass it
through sa-learn.

Thanks,
Alex


Re: Re-running SA on an mbox

2009-09-20 Thread Matt Kettler
MySQL Student wrote:
 Hi,

 I have an mbox with about a 100 messages in it from a few days ago.
 The mbox is a combination of spam and ham. What is the best way to run
 SA through these messages again, so I can catch the ones that have
 URLs in them that weren't on the blacklist at the time they were
 received?

 Must I break them all apart to do this, or can SA somehow parse the
 whole mbox? If not, what program do you suggest I use to accomplish
 this?
   
Do you just want to re-scan the whole mbox and see what rules hit now
for research reasons?

You could probably abuse the mass-check tool for that purpose:

http://svn.apache.org/repos/asf/spamassassin/branches/3.2/masses/

It's normally used to generate logs we feed into the score generation
process, but it can be run on a single mbox.

The downside, is all it does is generate a report, one line per message,
with a list of hits.

There's no way to (directly) get SA to modify email that's already in an
mbox file. The mass-check and sa-learn tools can read them, but nothing
in SA can write to that. However, there might be a utility out there to
do this (although I'm not aware of any)..




Re: Re-running SA on an mbox

2009-09-20 Thread Theo Van Dinter
You probably want spamassassin --mbox. :)
It won't modify the messages in-place, but you can do something like
spamassassin --mbox infile  outfile.

If you're talking about sa-learn, though, it also knows --mbox.


On Sun, Sep 20, 2009 at 9:46 PM, MySQL Student mysqlstud...@gmail.com wrote:
 Yeah, that's kind of what I thought. Maybe a program that can split
 each message back into an individual file? Would procmail even help
 here? Or even a simple shell script that looks for '^From ', redirects
 it to a file, runs spamassassin -d on it, then re-runs SA on each
 file? I could then concatenate each of them back together and pass it
 through sa-learn.


Re: Re-running SA on an mbox

2009-09-20 Thread Matt Kettler
MySQL Student wrote:
 Hi,

   
 Do you just want to re-scan the whole mbox and see what rules hit now
 for research reasons?
 

 That's a good start, but I'd like to see if I can break out the ham to
 train bayes.

   
 There's no way to (directly) get SA to modify email that's already in an
 mbox file. The mass-check and sa-learn tools can read them, but nothing
 in SA can write to that. However, there might be a utility out there to
 do this (although I'm not aware of any)..
 

 Yeah, that's kind of what I thought. Maybe a program that can split
 each message back into an individual file? Would procmail even help
 here? Or even a simple shell script that looks for '^From ', redirects
 it to a file, runs spamassassin -d on it, then re-runs SA on each
 file? I could then concatenate each of them back together and pass it
 through sa-learn.
   

That sounds like a good plan.

If you google around for mbox split or mbox splitter you can find
some sample code out there that does it. It's all just simple code
looking for the ^From  boundary.



Re: Re-running SA on an mbox

2009-09-20 Thread MySQL Student
Hi,

 You probably want spamassassin --mbox. :)
 It won't modify the messages in-place, but you can do something like
 spamassassin --mbox infile  outfile.

My apologies if it wasn't clear, but these messages have already been
marked by SA. Some are ham, and the rest are FPs that I'd like to
re-run through SA, in hopes of it now properly detecting them as spam.

Thank you all for your help. The mbox split suggestion is a good
one. I'll follow that route and post my experience later.

Thanks again,
Alex


Re: Re-running SA on an mbox

2009-09-20 Thread MySQL Student
Hi,

 You probably want spamassassin --mbox. :)
 It won't modify the messages in-place, but you can do something like
 spamassassin --mbox infile  outfile.

 My apologies if it wasn't clear, but these messages have already been

Wait, my mistake. I read that too fast. Does that work, and rewrite
the X-Spam-Status header?

Guess I could find out for myself, but it just contradicts my
experience and info I've learned previously.

Thanks again,
Alex


Re: Re-running SA on an mbox

2009-09-20 Thread LuKreme

On Sep 20, 2009, at 20:45, MySQL Student mysqlstud...@gmail.com wrote:

Thank you all for your help. The mbox split suggestion is a good
one. I'll follow that route and post my experience later.


formail -s is the way to go.



Re: Re-running SA on an mbox

2009-09-20 Thread Matt Kettler
Theo Van Dinter wrote:
 You probably want spamassassin --mbox. :)
 It won't modify the messages in-place, but you can do something like
 spamassassin --mbox infile  outfile.

 If you're talking about sa-learn, though, it also knows --mbox.
   
Yes, but he's got mixed spam and nonspam in one mbox. You've got to
split that before you can feed sa-learn.


 On Sun, Sep 20, 2009 at 9:46 PM, MySQL Student mysqlstud...@gmail.com wrote:
   
 Yeah, that's kind of what I thought. Maybe a program that can split
 each message back into an individual file? Would procmail even help
 here? Or even a simple shell script that looks for '^From ', redirects
 it to a file, runs spamassassin -d on it, then re-runs SA on each
 file? I could then concatenate each of them back together and pass it
 through sa-learn.
 


   



Re: NOTICE: SpamAssassin 3.3.0 mass-checks now starting

2009-09-20 Thread Daryl C. W. O'Shea
On 19/09/2009 3:33 PM, Warren Togami wrote:
 On 09/16/2009 11:47 AM, Warren Togami wrote:
 On 09/04/2009 10:51 AM, Justin Mason wrote:
 OK, if you're planning to send us mass-check logs for the
 3.3.0 rescoring, now's the time!

 http://wiki.apache.org/spamassassin/RescoreDetails has all the details.

 cheers!

 --j.

 -rw-r--r-- 174911850 2009/09/16 01:03:40 ham-bayes-net-hege.log
 -rw-r--r-- 36909774 2009/09/11 20:39:47 ham-bayes-net-mmartinec.log
 -rw-r--r-- 3179193 2009/09/14 23:16:15 ham-bayes-net-wt-en1.log
 -rw-r--r-- 1591286 2009/09/14 23:24:19 ham-bayes-net-wt-en2.log
 -rw-r--r-- 5687443 2009/09/14 23:53:41 ham-bayes-net-wt-en3.log
 -rw-r--r-- 354 2009/09/14 23:56:00 ham-bayes-net-wt-en4.log
 -rw-r--r-- 575780 2009/09/14 22:13:01 ham-bayes-net-wt-jp1.log
 -rw-r--r-- 2139873 2009/09/14 22:23:07 ham-bayes-net-wt-jp2.log
 -rw-r--r-- 40760753 2009/09/16 01:04:24 spam-bayes-net-hege.log
 -rw-r--r-- 35666309 2009/09/11 20:52:01 spam-bayes-net-mmartinec.log
 -rw-r--r-- 4341537 2009/09/14 23:16:16 spam-bayes-net-wt-en1.log
 -rw-r--r-- 1576 2009/09/14 23:24:20 spam-bayes-net-wt-en2.log
 -rw-r--r-- 310 2009/09/14 23:53:42 spam-bayes-net-wt-en3.log
 -rw-r--r-- 494742 2009/09/14 23:56:00 spam-bayes-net-wt-en4.log
 -rw-r--r-- 79101 2009/09/14 22:13:02 spam-bayes-net-wt-jp1.log
 -rw-r--r-- 311 2009/09/14 22:23:08 spam-bayes-net-wt-jp2.log

 One day from the deadline for spamassassin-3.3.0 scoring and we
 currently have only three people reporting.
 
 The deadline has been extended until Monday, September 21st.  But at
 this moment the number of logs reporting for the rescore masscheck has
 not changed.
 
 Are the uploaded corpa being processed?

They'll all be processed together when its declared that time to submit
has expired.

 Who else is still working on their own corpus?

Due to unreleated to SA memory leaks in haldaemon on my machines, and me
not noticing and instead fighting with Perl to build modules, I'm just
starting my mass-check now.

I imagine that it will be sometime Tuesday after work before I have
results submitted.

Daryl



Re: Re-running SA on an mbox

2009-09-20 Thread Benny Pedersen

On man 21 sep 2009 04:47:23 CEST, MySQL Student wrote


Wait, my mistake. I read that too fast. Does that work, and rewrite
the X-Spam-Status header?


imho spamassassin always remove its own known headers, but only once  
it can add self so yes the trick is to retest, where you will see if  
its still listed in rbl :)


but this will invalidtate dkim headers if this headers is signed, are  
spamassassin aware of this problem ? (in general)



Guess I could find out for myself, but it just contradicts my
experience and info I've learned previously.


mutt -f mbox

in mutt save to another folder if missclassified

--
xpoint