Forwarded spam

2008-07-31 Thread Chris Lear
I'm trying to improve the effectiveness of a spamassassin installation, 
and there's one user who gets a lot of spam that is forwarded from 
another address, which effectively kills the network tests and in some 
cases messes with the BAYES score as well. I want to get rid of it.


My solution to the problem was originally to add the forwarding mtas to 
trusted_networks (seems ironic, but I think this is appropriate).


Unfortunately, this doesn't work, because the headers look like this 
(with apologies for the munging, but it's not my e-mail):


Received: from mta3.iomartmail.com ([62.128.193.153])
by smtp.DOMAIN.com with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
(Exim 4.69)
(envelope-from [EMAIL PROTECTED])
id 1KOUZB-0001Xq-Eb
for [EMAIL PROTECTED]; Thu, 31 Jul 2008 10:35:29 +0100
Received: from mta3.iomartmail.com (localhost.localdomain [127.0.0.1])
	by mta3.iomartmail.com (8.12.11.20060308/8.12.11) with ESMTP id 
m6V9ZOVc018574

for [EMAIL PROTECTED]; Thu, 31 Jul 2008 10:35:24 +0100
Received: from p548AAE80.dip0.t-ipconnect.de 
(p548AB09B.dip0.t-ipconnect.de [84.138.176.155])
	by mta3.iomartmail.com (8.12.11.20060308/8.12.11) with SMTP id 
m6V9ZNUK018506

for [EMAIL PROTECTED]; Thu, 31 Jul 2008 10:35:24 +0100

[EMAIL PROTECTED] is the original address, which is handled by 
mta[X].iomartmail.com, and it's forwarded to [EMAIL PROTECTED], which is 
handled by smtp.DOMAIN.com.


I can put 62.128.193.153 into trusted_networks, which should make 
spamassassin look at the next header back, but that's another 
iomartmail.com machine (presumably a virus/spam checker), and I'm fairly 
sure adding 127.0.0.1 to trusted_networks would be a mistake.


Question one: Is there a way of getting the network tests working on 
these forwarded e-mails?



My next idea is just to add a load of score to messages to 
ORIGINALDOMAIN.com. Looking in the wiki at 
http://wiki.apache.org/spamassassin/WritingRules#head-36104467608e64f77e1878ec3201073b8180c728 
I see this:


===
Checking the From: line, or any other header, works much the same:

header LOCAL_DEMONSTRATION_FROM From =~ /test\.com/i
score LOCAL_DEMONSTRATION_FROM  0.1

Now, that rule is pretty silly, as it doesn't do much that a 
blacklist_from can't.

===

What I want to do is blacklist_to [EMAIL PROTECTED], but with a 
score of 3 (ie, it's not really a blacklisting). The quote above seems 
to suggest I can do that, but I can't see it in the docs. Question two: 
is it possible to set a score on a blacklisted address?


Finally, I can use header ToCC, and that'll probably do, but I wanted to 
know if there's a better way.


Thanks,
Chris


Re: Forwarded spam

2008-07-31 Thread Chris Lear

* Matt Kettler wrote (31/07/08 11:25):

Chris Lear wrote:
I'm trying to improve the effectiveness of a spamassassin 
installation, and there's one user who gets a lot of spam that is 
forwarded from another address, which effectively kills the network 
tests and in some cases messes with the BAYES score as well. I want to 
get rid of it.


My solution to the problem was originally to add the forwarding mtas 
to trusted_networks (seems ironic, but I think this is appropriate).


Unfortunately, this doesn't work, because the headers look like this 
(with apologies for the munging, but it's not my e-mail):


Received: from mta3.iomartmail.com ([62.128.193.153])
by smtp.DOMAIN.com with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
(Exim 4.69)
(envelope-from [EMAIL PROTECTED])
id 1KOUZB-0001Xq-Eb
for [EMAIL PROTECTED]; Thu, 31 Jul 2008 10:35:29 +0100
Received: from mta3.iomartmail.com (localhost.localdomain [127.0.0.1])
by mta3.iomartmail.com (8.12.11.20060308/8.12.11) with ESMTP id 
m6V9ZOVc018574

for [EMAIL PROTECTED]; Thu, 31 Jul 2008 10:35:24 +0100
Received: from p548AAE80.dip0.t-ipconnect.de 
(p548AB09B.dip0.t-ipconnect.de [84.138.176.155])
by mta3.iomartmail.com (8.12.11.20060308/8.12.11) with SMTP id 
m6V9ZNUK018506

for [EMAIL PROTECTED]; Thu, 31 Jul 2008 10:35:24 +0100

[EMAIL PROTECTED] is the original address, which is handled by 
mta[X].iomartmail.com, and it's forwarded to [EMAIL PROTECTED], which is 
handled by smtp.DOMAIN.com.


I can put 62.128.193.153 into trusted_networks, which should make 
spamassassin look at the next header back, but that's another 
iomartmail.com machine (presumably a virus/spam checker), and I'm 
fairly sure adding 127.0.0.1 to trusted_networks would be a mistake.
Why would adding 127.0.0.1 to trusted_networks be a mistake? Since trust 
is a path this won't lead to spammers being able to forge trust, as 
they'd have to first get to your system from a trusted IP address. (or 
manage to do a TCP blind-spoofing attack and make it look like it came 
from one)


OK, you've persuaded me. It seemed fishy, but I wasn't being logical. 
I'll do that and keep an eye on it. Don't worry - I'm not going to 
obsess about TCP spoofing.




Question one: Is there a way of getting the network tests working on 
these forwarded e-mails?



My next idea is just to add a load of score to messages to 
ORIGINALDOMAIN.com. Looking in the wiki at 
http://wiki.apache.org/spamassassin/WritingRules#head-36104467608e64f77e1878ec3201073b8180c728 
I see this:


===
Checking the From: line, or any other header, works much the same:

header LOCAL_DEMONSTRATION_FROM From =~ /test\.com/i
score LOCAL_DEMONSTRATION_FROM  0.1

Now, that rule is pretty silly, as it doesn't do much that a 
blacklist_from can't.

===

What I want to do is blacklist_to [EMAIL PROTECTED], but with a 
score of 3 (ie, it's not really a blacklisting). The quote above seems 
to suggest I can do that, but I can't see it in the docs. Question 
two: is it possible to set a score on a blacklisted address?

No, unless you reset the score for all blacklist_to's
 score USER_IN_BLACKLIST_TO 3.0

When I said it doesn't do much that a blacklist_from can't, I didn't 
mean to say there's nothing it can do that a blacklist_from/to can't.. 
there's just not much. Custom per-address scoring, using a full regex 
instead of a file-glob, and per-address combinations with other rules in 
a meta are things blacklist_from/to can't do that  a rule can.




Thanks. That all makes sense. I was reading too much into the remark. As 
a side note, in my perusal of the documentation, I didn't stumble easily 
on the link between the blacklist_to option and the USER_IN_BLACKLIST_TO 
rule.




Finally, I can use header ToCC, and that'll probably do, but I wanted 
to know if there's a better way.
That's the best way I know of. Also, be aware that unless your MTA drops 
hints about the recipient in the Received: headers with a for clause, 
SA won't know who the real recipient is when a message is BCC'ed. This 
is important, as lots of spam is effectively BCC'ed (i.e.: actual 
recipient is in the envelope, but not the To: or Cc:), so your ToCC may 
not match spam.


Understood. That's part of the reason I didn't take to this solution 
originally. I assumed that the blacklist_to option would fetch the real 
recipient out of the received headers (which, as you can see above, do 
contain the for clause).


Thanks for the help.

Chris


Re: Forwarded spam

2008-07-31 Thread Chris Lear

* Matus UHLAR - fantomas wrote (31/07/08 14:07):

On 31.07.08 11:05, Chris Lear wrote:
I'm trying to improve the effectiveness of a spamassassin installation, 
and there's one user who gets a lot of spam that is forwarded from 
another address, which effectively kills the network tests and in some 
cases messes with the BAYES score as well. I want to get rid of it.


many tests (e.g. those who chcek for dynamic IP) use last external IP, which
means some network checks will still be killed by such forwarder.


I seem to remember someone saying a while ago that it's not clear to the 
average spamassassin admin (eg me) which rules use trusted and which use 
external. Is there either a place that explains it all - or is there 
some logic that anyone can tell me? Not crucial, but I'm interested.




I think it's the forwarder who has to take care of spam... any further
forwarding blurs the difference between ham and spam...


I agree entirely.

Chris


Re: PDF rule not matching -- split line content type?

2007-08-16 Thread Chris Lear

* Jo Rhett wrote (16/08/07 07:41):

Since nobody is paying attention


Or they're asleep. Your messages were at 23:44 and 07:41 here.

, let me clarify.  The current rule is 
wrong:


mimeheader __TVD_MIME_ATT_APContent-Type =~ /^application\/pdf/i
mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ 
/^application\/octet-stream.*\.pdf/i


meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM  __TVD_MIME_ATT_TP  
__TVD_MIME_ATT  !__TVD_BODY


This evaluates to exactly the same as this:

meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM  __TVD_MIME_ATT_TP  !__TVD_BODY

I believe that the original rule's intent was this:

meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM  __TVD_MIME_ATT  !__TVD_BODY


I don't think you're right.

The rule looks like this to me:

meta TVD_PDF_FINGER01
__TVD_MIME_CT_MM # content-type is multi-part mixed
 __TVD_MIME_ATT_TP # and has a text-plain part
 __TVD_MIME_ATT# and has an attachment that is either
__TVD_MIME_ATT_AP# application/pdf
__TVD_MIME_ATT_AOPDF # or application/octet-stream.*.pdf
 !__TVD_BODY   # and has no non-whitespace text content

Your rule would seem to match anything with no non-whitespace text 
content regardless of whether or not a pdf was attached.


I was looking into this very rule about 3 days ago, because of false 
positives (client mailing out auto-generated pdfs which are being 
rejected by messagelabs), and I found that spamassassin -D told me all I 
needed to know about why some e-mail hit this rule and some didn't.


Chris


Re: PDF rule not matching -- split line content type?

2007-08-16 Thread Chris Lear

Jo Rhett wrote:

Chris Lear wrote:

* Jo Rhett wrote (16/08/07 07:41):

Since nobody is paying attention


Or they're asleep. Your messages were at 23:44 and 07:41 here.


, let me clarify.  The current rule is wrong:

mimeheader __TVD_MIME_ATT_APContent-Type =~ /^application\/pdf/i
mimeheader __TVD_MIME_ATT_AOPDF Content-Type =~ 
/^application\/octet-stream.*\.pdf/i


meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM  __TVD_MIME_ATT_TP  
__TVD_MIME_ATT  !__TVD_BODY


This evaluates to exactly the same as this:

meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM  __TVD_MIME_ATT_TP  
!__TVD_BODY


I believe that the original rule's intent was this:

meta TVD_PDF_FINGER01  __TVD_MIME_CT_MM  __TVD_MIME_ATT  !__TVD_BODY


I don't think you're right.

The rule looks like this to me:

meta TVD_PDF_FINGER01
__TVD_MIME_CT_MM # content-type is multi-part mixed
 __TVD_MIME_ATT_TP # and has a text-plain part
 __TVD_MIME_ATT# and has an attachment that is either
__TVD_MIME_ATT_AP# application/pdf
__TVD_MIME_ATT_AOPDF # or application/octet-stream.*.pdf
 !__TVD_BODY   # and has no non-whitespace text content

Your rule would seem to match anything with no non-whitespace text 
content regardless of whether or not a pdf was attached.


I did a full analysis of why the rule is broken, line by line in the 
message you replied to.  But I'll do it again.


(dropping __TVT_MIME_ for ease of typing)

ATT is a meta of ATT_AP *or* ATT_AOPDF.

But the PDF_FINGER01 requires ATT_TP as well as ATT.  This means that 
really it will only work if ATT_TP matches.  If ATT_A0PDF matches then 
it won't match.


No go back up and read the text I quoted at the top.  Because if this is 
the authors intent then you can shorten the rule, but I somehow don't 
think so.


I read it. I think you got it wrong. The author's intent seems to accord 
with my analysis.




I was looking into this very rule about 3 days ago, because of false 
positives (client mailing out auto-generated pdfs which are being 
rejected by messagelabs), and I found that spamassassin -D told me all 
I needed to know about why some e-mail hit this rule and some didn't.


Perhaps.  But maybe you have difficulty reading the line by line 
analysis I posted below, hm?  I have ~200 messages here that are 100% 
spam that would match the fixed rule, which seems to be the authors intent.




As I say, I read it. It was clear from the start that you didn't 
understand why the rule wasn't firing (and TVD, the rule author, 
explained that). It also appeared to me that your rewrite of the rule 
was the result of a misreading of the logic (or a misunderstanding of 
multipart mime). I thought I could elucidate. I stand by my comments, 
except that I misread your rewrite and thought it was looking only for 
text/plain, whereas it's looking only for pdf mime parts. Theo has 
explained it all now anyway, so there's no more to add.


But forgive me. I should have known better than to step in to a Jo Rhett 
thread. I'll try not to do it again.


Chris


Re: URIBL_BLACK matching on messages with no URLs in them...

2007-07-02 Thread Chris Lear

Jo Rhett wrote:
Note: yes, uribl has their own mailing list.  That server has been down 
for quite some time, so I gave up and posted it here in case someone is 
dual listed and can fix it.


There's no URL in this message.  What is it mis-matching against?


This has been answered, but, if you're still interested, also see 
http://marc.info/?l=spamassassin-usersm=113533589419731w=2 with 
details of a similar problem.


Chris


Re: Rules report

2007-04-19 Thread Chris Lear

* Matt Kettler wrote (19/04/07 14:49):

Matt Kettler wrote:

If you try to build it off a live feed and use SA's marking as the spam
criteria, your statistics are useless. Any rule with a high enough score
would get perfect results.. all the mail it matched would be spam, and
no nonspam. You have, essentially, created a self fulfilling prophecy.
The higher-scoring a rule is, the more likely messages that match it
will be tagged as spam, even if they're not really spam.
  

Self correction. Such stats aren't useless, it depends on what you
want out of them.

If you want to know how accurate a particular rule is, by comparing the
spam vs nonspam hit rates, those stats are useless, because of the bias.
You need a manually sorted corpus to get this kind of information.

If you want to see which rules are getting used a lot, vs those that are
rarely getting used, these stats are quite useful.

If you want a top x rules list, sa-stats can do that for you:

http://www.rulesemporium.com/programs/sa-stats.txt


http://www.rulesemporium.com/programs/sa-stats-1.0.txt is probably a bit 
better in this case.




It will parse a spamd logfile and report the most-frequently used spam
and nonspam rules (and you can configure how many it will list for each)


The 1.0 version can do per-domain and per-user info, given a 3.1 log.

Chris


Re: New stock spam (2/14/07)

2007-02-15 Thread Chris Lear

* Jonathan Nichols wrote (15/02/07 05:19):

Maciej Friedel wrote:

On 02/14/07 Jonathan wrote:


http://www.pbp.net/~jnichols/spam2.txt

0.0 BOTNET_NORDNS IP address has no PTR record
0.1 HTML_50_60 BODY: Message is 50% to 60% HTML  
0.0 HTML_MESSAGE BODY: HTML included in message
1.0 BAYES_50 BODY: Bayesian spam probability is 40 to 60% 
[score: 0.5002]

5.0 BOTNET The submitting mail server looks like part of a Botnet

i think botnet is a good idea

maciek



I thought botnet was unstable.. is it working ok now?


It's not (in my experience) unstable. It's excellent. But the default 
score of 5 is way too high. It gets a lot of false positives, especially 
(again, in my experience) from small mail-order operations who don't 
understand dns (Exchange users, I rather uncharitably assume). I score 
botnet at 2 and I'm very happy with it.
I reckon better network tests are the future of spam filtering, now that 
spammers are sending blocks of text from Harry Potter books along with 
undetectable URLs containing spaces etc.


Chris


Re: complete false hits for BASE64 and LW_STOCK_SPAM4

2007-02-09 Thread Chris Lear
* Loren Wilton wrote (08/02/07 19:46):
 As for LW_STOCK_SPAM4, it's being triggered by the fact that the message
 is base-64 encoded text AND has a Date: header that's missing a proper
 timezone. Apparently a batch of stock spam went out at some point with
 both of these abnormal features. I have to admit, it's a pretty rare
 combination.

 Date: February 6, 2007 9:52:29 AM PST

 That should, properly, should read something like this:
   Date: Wed, 06 Feb 2007 09:52:29 -0800
 
 Actually LW_STOCK_SPAM4 was written on 02/19/2006, and is looking for a 
 Base64 encoded message that has a valid timezone that is specifically 
 \s\+, not an invalid time zone.
 
 Internally I have it scored at 5 points and haven't had a problem with it, 
 but people don't send me messages from Blackberrys.
 
 I suppose a blackberry might not have a clock so send all messages as though 
 they came from London regardless of where they are.  That would somewhat 
 surprise me, since cell phones certainly know where they are and what time 
 it is.  But if Verizon is involved then it is certainly possible that the 
 software has been deliberately crippled in a number of ways, and creating a 
 proper date header might be one of those deliberate malfunctions.


Just to confirm that this unmodified rule does hit some legit blackberry
e-mail, here's an example (apologies for the obfuscation, but I've only
messed with addresses. It's not my e-mail):

Return-path: someone's address
Envelope-to: my wife
Delivery-date: Wed, 07 Feb 2007 17:21:42 +
Received: from smtp02.bis.eu.blackberry.com ([216.9.253.49])
by mail.barcombe.net with esmtp (Exim 4.63)
(envelope-from the sender)
id 1HEqUG-0008Ku-IV
for my wife's address; Wed, 07 Feb 2007 17:21:41 +
Message-ID:
[EMAIL PROTECTED]
Content-Transfer-Encoding: base64
Reply-To: the sender
References: [EMAIL PROTECTED]
In-Reply-To: [EMAIL PROTECTED]
Sensitivity: Normal
Importance: Normal
To: My Wife Her address
Subject: Re: 25th august
From: the sender
Date: Wed, 7 Feb 2007 17:22:58 +
Content-Type: text/plain; charset=Windows-1252
MIME-Version: 1.0
X-AntiVirus: Clean
X-Spam-Score: 2.1
X-Spam-Level: ++
X-Spam-Report: Barcombe.net spam report: Score = 2.1.
Tests=BAYES_00=-2.599,LW_STOCK_SPAM4=1.66,MIME_BASE64_NO_NAME=0.224,MIME_BASE64_TEXT=1.885,NO_REAL_NAME=0.961

A bit of grepping suggests that LW_STOCK_SPAM4 has hit 5 ham and 3 spam
(all scoring 20+) on that server since about November. So its usefulness
is perhaps questionable. Normal disclaimer applies: this is only one
low-traffic server. I live in the UK which might make the + timezone
more likely.

[Also see the thread Blackberry email]

Chris (whose mail from blackberries has all been received OK)


Re: Techworld says spam shows sudden slide'?

2007-01-12 Thread Chris Lear

Tony Finch wrote:

On Thu, 11 Jan 2007, Michael Scheidell wrote:


I don't think I see any sudden drop, was the worlds #1 spammer in that
hut in fluga that got bombed last night?


I haven't seen any drop recently either. For my systems (daily legit
volume 300,000 and spam 10x that) the spam peak was in the first half of
November and levels have been fairly constant (but with a level slightky
lower than the peak) since then.


I noticed a significant (absolute) drop towards the end of November. I 
put it down to a change of tactics: a reduction in the number of 
repeat-the-same-message-with-small-differences spam. These were 
previously skewing our stats upwards, because effectively the same spam 
from the same machine was being sent ~10-15 times to the same user with 
small text changes (we were rate-limiting connections to reduce the SA 
cost). This seems to be rarer now, or maybe even abandoned as a 
technique by spammers.


Chris


Re: Easyjet e-mail scoring very high

2007-01-08 Thread Chris Lear
* Chris Lear wrote (01/12/06 16:57):
 * Adam Stephens wrote (01/12/06 16:10):
 Chris Lear wrote:
 * Loren Wilton wrote (01/12/06 14:54):
   
 The html contains this sort of thing:
 http://www#46;easyjet#46;com/EN/Members/

 Which looks like the culprit. In fact, every full stop in the html is
 represented as #46; for some reason.

 Still wondering though... how do you solve a problem like EasyJet?
   
 Sure looks like spam to me.  ;-)

 Which also looks like just about every airline message I've seen from any 
 airline.  :-(  Apparently they hired spammers to design their marketing 
 campain mail.

 You could try sending to mostmaster or whatever at whichever marketing 
 company is really sending that mail and see if you can get any attention 
 from them.  Probably not, but it might be worth trying.
 

 The trouble is, it's not marketing. It's a confirmation of a flight
 booking, which I paid for. The airline doesn't issue tickets. So it's
 something I genuinely want in my inbox. It looks like it's generated
 directly by the easyjet.com web server.
   
 
 I had some complaints about that this week; it's obviously a new issue, 
 and it looks like it only applies to the ticket confirmations. Since 
 people really need these booking confirmations I've whitelisted it - 
 using a whitelist_from_rcvd rule seems to catch the booking 
 confirmations only as the marketing material is sent from a different 
 machine.
 
 Thanks for all the advice. I've reluctantly whitelisted them and written
 a polite message to [EMAIL PROTECTED] It doesn't seem to have
 bounced, so maybe someone will read it. I'll let you know if I get a
 response.
 Meanwhile, I suppose this is something for others to be aware of if you
 run an mta that rejects on high SA scores (and have users that might
 want to fly EasyJet).

This thread is ancient now, but here's a followup: I never got a
response from Easyjet, but I did get (today) a replica of the original
e-mail. It's almost identical (same appalling html, still from
savvis.net, but from a different ip), but missing a chunk of advertising
(hotels, car rental, etc), and with some very slightly different wording
about hand luggage.

The new version hits these rules:

DNS_FROM_RFC_ABUSE,
FORGED_RCVD_HELO, [this is new]
HTML_FONT_FACE_BAD,
HTML_MESSAGE,
HTML_TINY_FONT,
MIME_HTML_MOSTLY,
SARE_OBFU_AMP2B,
SARE_SPEC_LEO_LINE03a,
USER_IN_WHITELIST [because I whitelisted them]

DNS_FROM_RFC_ABUSE
HTML_FONT_FACE_BAD
HTML_MESSAGE
HTML_TINY_FONT
MARKETING_PARTNERS [This has gone]
MIME_HTML_MOSTLY
MPART_ALT_DIFF [This has gone]
SARE_OBFU_AMP2B
SARE_SPEC_LEO_LINE03a

Chris


Re: Botnet 0.6 plugin for Spam Assassin availabile

2006-12-18 Thread Chris Lear
* Oliver Schulze L. wrote (18/12/06 15:42):
 Nice stats!
 How do you generate them in SA 3.1.7 ?

I use this: http://www.rulesemporium.com/programs/sa-stats-1.0.txt

Chris

 
 Thanks
 Oliver
 
 Chris Lear wrote:
 Here's some sa-stats output:

 TOP SPAM RULES FIRED
 --
 RANKRULE NAME   COUNT  %OFMAIL %OFSPAM  %OFHAM
 --
1BOTNET   138166.37   90.866.44
2BAYES_99 127459.50   83.820.00
3HTML_MESSAGE 118475.06   77.89   68.12
4BOTNET_CLIENT104850.21   68.954.35
5BOTNET_IPINHOSTNAME   96245.45   63.291.77
6URIBL_BLACK   75135.12   49.410.16
7RCVD_IN_SORBS_DUL 72533.96   47.700.32
8URIBL_JP_SURBL68832.13   45.260.00
9BOTNET_CLIENTWORDS60829.61   40.004.19
   10URIBL_SC_SURBL52424.47   34.470.00

   
 



Re: MSRBL

2006-12-15 Thread Chris Lear

Bret Miller wrote:
 I'm more interested in the Image signatures it has.  If
 they're really
 useful and reliable.  I expect that keeping up with image
 spam wouldn't
 be very scalable, but it might at least help reduce some load
 (since we
 do virus scanning before letting Spam Assassin see a message) for
 whichever images are known.


 I ran about half a day yesterday with both images and spam signatures.
 Images hit a whopping 4 messages and spam hit about 40 with 3 FPs, both
 a very, very low percentage (way under 1%) of spam. ImageInfo does a
 much better job IMO.

I'm using http://www.sanesecurity.com/clamav/ (on my home domain only at 
the moment) which saves sa some work (clamav runs before sa). About a 
third of the spam that was previously caught by sa is now caught by 
clamav instead. I tried MSRBL, but got very few hits. Sorry - no info 
about false positives, because anything that hits is rejected. I haven't 
heard from anyone, though.

I'm surprised by how effective it is.

Chris


Re: Botnet 0.6 plugin for Spam Assassin availabile

2006-12-08 Thread Chris Lear
* John Rudd wrote (07/12/06 18:33):
 (I had a bout of insomnia last night, and got more done than I had 
 pre-announced yesterday...)
 
 
 The next version of the Botnet plugin for Spam Assassin is ready.  The 
 install instructions are in the Botnet.txt file, and in the INSTALL file.
 
 For those who don't know what Botnet is, it's a plugin which tries to 
 identify whether or not the message has been submitted by a 
 botnet/spam-zombie type host by looking at its DNS characteristics (no 
 reverse DNS, reverse DNS that doesn't resolve, or doesn't resolve back 
 to the relay's IP, or reverse DNS that contains things that look like an 
 ISP's client address).  The places I've been using it, and the people I 
 hear about who are using it, have seen a high degree of success.
 
 It can be downloaded from:
 
   http://people.ucsc.edu/~jrudd/spamassassin/Botnet.tar
 
 
 As usual, feedback, statistics, bug reports, feature suggestions, are 
 all welcome.

I've been running the BOTNET rules for a little while now. It's the
most-hit rule on the machine (above BAYES_99 even). But I get a
significant number of false positives.

Here's some sa-stats output:

TOP SPAM RULES FIRED
--
RANKRULE NAME   COUNT  %OFMAIL %OFSPAM  %OFHAM
--
   1BOTNET   138166.37   90.866.44
   2BAYES_99 127459.50   83.820.00
   3HTML_MESSAGE 118475.06   77.89   68.12
   4BOTNET_CLIENT104850.21   68.954.35
   5BOTNET_IPINHOSTNAME   96245.45   63.291.77
   6URIBL_BLACK   75135.12   49.410.16
   7RCVD_IN_SORBS_DUL 72533.96   47.700.32
   8URIBL_JP_SURBL68832.13   45.260.00
   9BOTNET_CLIENTWORDS60829.61   40.004.19
  10URIBL_SC_SURBL52424.47   34.470.00

I think the default score of 5 is far too high. I'm scoring it at 2 at
the moment, which seems OK.

I'd quite like to be able to give more score to BOTNET_IPINHOSTNAME than
BOTNET_CLIENTWORDS, because it seems to give fewer false positives [I
think this will probably improve in 0.6, though]. But this isn't a very
big deal. So that's a mild vote against the __ prefix.

I added p0f to my arsenal recently, hoping it would work to lower the
false-positive rate of BOTNET by checking for Windows machines, but it
seems that almost all the BOTNET false positives are Exchange servers,
so p0f aggravates rather than mitigates that.

Hope this feedback is useful. Thanks for the plugin. I take the view
that network tests and RBLs (especially URIBLs), rather than body
checks, are the best long-term spam-fighting tools.

Chris


Re: SV: Help with understanding a rule

2006-12-07 Thread Chris Lear

* [EMAIL PROTECTED] wrote (07/12/06 12:03):

The list managers are the first ones who have to change.



Yes, you are probably right. But: there must be a reason why the
rule no_real_name exists? And if there is a rule (written or not)
that From: headers should contain a real name, I want to follow it.

And to follow it I need to convince my IT staff somehow...

So, what is the reason behind no_real_name?


Most MUAs, most of the time, put a real name into mail they send. It's 
standard setup. So not having a real name is, perhaps, a spam sign This 
isn't the same as contravening RFCs. Remember that there's a rule called 
HTML_MESSAGE as well, which might be a spam sign. Both of these are 
bound to hit ham a lot of the time, so scoring them high would be, at 
best, an unusual decision. Scoring them high enough to reject would be 
very unusual.


As it happens, on a server I manage NO_REAL_NAME hits 5% of spam, and 
25% of ham (much of which is not MUA-originated). So it's not a rule I'd 
like to reject on.


But if a mailing list or a user has a you must provide a real name 
policy, spamassassin's flexible enough to be able to enforce it.


Chris


Easyjet e-mail scoring very high

2006-12-01 Thread Chris Lear
I got an EasyJet confirmation E-mail that scored like this:

BAYES_00=-2.599
DNS_FROM_RFC_ABUSE=0.2
FORGED_RCVD_HELO=0.135
HTML_FONT_FACE_BAD=0.156
HTML_MESSAGE=0.001
HTML_TINY_FONT=2.324
MARKETING_PARTNERS=1.765
MIME_HTML_MOSTLY=1.102
SARE_OBFU_AMP2B=2.555
SARE_SPEC_LEO_LINE03a=0.408

Which adds to 6.0, and only the Bayes score stopped it being rejected
(I'm rejecting at 6.5). [SA 3.1.3 with recent sa-update+SARE rules]
What's the recommended practice here? Whitelist? Lower the SARE scores?
Remove some less-safe SARE rules? Lower the HTML_TINY_FONT score [which
looks right, but if it's right for me, why not everyone else]? I'd like
all ham to score under 2, ideally. And almost all of it does. But I'd
prefer not to whitelist if possible. I like to feel I can trust SA
without introducing special cases.

Here are the received headers:

Received: from s217124rg180-p.uklond6.savvis.net ([213.174.202.180]
helo=easyjet.com)
by mail.barcombe.net with esmtp (Exim 4.60)
(envelope-from [EMAIL PROTECTED])
id 1GpoFF-0007fV-Ne
for [EMAIL PROTECTED]; Thu, 30 Nov 2006 15:54:47 +
Received: from mail pickup service by easyjet.com with Microsoft SMTPSVC;
 Thu, 30 Nov 2006 15:54:50 +

I think the Received: from mail pickup service line is causing the
SARE_OBFU_AMP2B rule to fire. Am I right? If so, isn't this likely to be
a reasonably common cause of false positives?

Chris


Re: Easyjet e-mail scoring very high

2006-12-01 Thread Chris Lear
* Loren Wilton wrote (01/12/06 13:57):
 HTML_FONT_FACE_BAD=0.156
 HTML_MESSAGE=0.001
 HTML_TINY_FONT=2.324
 MARKETING_PARTNERS=1.765
 MIME_HTML_MOSTLY=1.102
 SARE_OBFU_AMP2B=2.555
 SARE_SPEC_LEO_LINE03a=0.408

 I think the Received: from mail pickup service line is causing the
 SARE_OBFU_AMP2B rule to fire. Am I right? If so, isn't this likely to be
 
 Nope.  All of the rules above are effectively body rules, dealing mostly 
 with various forms of HTML obfuscation.

Thanks for pointing that out. I was being rather dim.

The html contains this sort of thing:
http://www#46;easyjet#46;com/EN/Members/

Which looks like the culprit. In fact, every full stop in the html is
represented as #46; for some reason.

Still wondering though... how do you solve a problem like EasyJet?

Chris


Re: Easyjet e-mail scoring very high

2006-12-01 Thread Chris Lear
* Loren Wilton wrote (01/12/06 14:54):
 The html contains this sort of thing:
 http://www#46;easyjet#46;com/EN/Members/

 Which looks like the culprit. In fact, every full stop in the html is
 represented as #46; for some reason.

 Still wondering though... how do you solve a problem like EasyJet?
 
 
 Sure looks like spam to me.  ;-)
 
 Which also looks like just about every airline message I've seen from any 
 airline.  :-(  Apparently they hired spammers to design their marketing 
 campain mail.
 
 You could try sending to mostmaster or whatever at whichever marketing 
 company is really sending that mail and see if you can get any attention 
 from them.  Probably not, but it might be worth trying.

The trouble is, it's not marketing. It's a confirmation of a flight
booking, which I paid for. The airline doesn't issue tickets. So it's
something I genuinely want in my inbox. It looks like it's generated
directly by the easyjet.com web server.


Re: Easyjet e-mail scoring very high

2006-12-01 Thread Chris Lear
* Adam Stephens wrote (01/12/06 16:10):
 Chris Lear wrote:
 * Loren Wilton wrote (01/12/06 14:54):
   
 The html contains this sort of thing:
 http://www#46;easyjet#46;com/EN/Members/

 Which looks like the culprit. In fact, every full stop in the html is
 represented as #46; for some reason.

 Still wondering though... how do you solve a problem like EasyJet?
   
 Sure looks like spam to me.  ;-)

 Which also looks like just about every airline message I've seen from any 
 airline.  :-(  Apparently they hired spammers to design their marketing 
 campain mail.

 You could try sending to mostmaster or whatever at whichever marketing 
 company is really sending that mail and see if you can get any attention 
 from them.  Probably not, but it might be worth trying.
 

 The trouble is, it's not marketing. It's a confirmation of a flight
 booking, which I paid for. The airline doesn't issue tickets. So it's
 something I genuinely want in my inbox. It looks like it's generated
 directly by the easyjet.com web server.
   
 
 I had some complaints about that this week; it's obviously a new issue, 
 and it looks like it only applies to the ticket confirmations. Since 
 people really need these booking confirmations I've whitelisted it - 
 using a whitelist_from_rcvd rule seems to catch the booking 
 confirmations only as the marketing material is sent from a different 
 machine.

Thanks for all the advice. I've reluctantly whitelisted them and written
a polite message to [EMAIL PROTECTED] It doesn't seem to have
bounced, so maybe someone will read it. I'll let you know if I get a
response.
Meanwhile, I suppose this is something for others to be aware of if you
run an mta that rejects on high SA scores (and have users that might
want to fly EasyJet).

Chris


Re: How do I stop these?

2006-11-21 Thread Chris Lear
* John Rudd wrote (20/11/06 15:46):
 John Tice wrote:
 
 On Nov 20, 2006, at 10:00 AM, Nathan Zabaldo wrote:
 
 I am getting pounded by these types of emails.  Does anyone else get 
 these? What rule can I apply to have them killed.  It's driving me 
 nuts.  Please help!!!
 
 These are scoring at about 4X my threshold without the SARE stock 
 ruleset. You may need to tweak you scoring. I find bayes_99 to be reliable.
 
 FROM_LOCAL_NOVOWEL
 FORGED_RCVD_HELO
 BAYES_99
 RCVD_IN_SORBS_DUL
 RCVD_IN_NJABL_DUL
 
 
 
 RelayCatcher is doing a fine job of keeping me from seeing most of the 
 spam that's out there, lately.  See any messages on this list with 
 RelayCatcher in the subject.  Particularly RelayCatcher 0.3 in the 
 subject.

...or RelayChecker 0.3.

Chris


Re: Amazon / RFCI false positives

2006-11-06 Thread Chris Lear
* Tony Finch wrote (05/11/06 17:43):
 On Sat, 4 Nov 2006, Michael Scheidell wrote:
 
 So? Build something better. Its open source. Don't use the RFCI scores,
 drop them, stop bithing about somehting YOU can change.
 
 Well, I've added a -2 for email from Amazon, but I thought other people
 might like a warning.

Thanks. Warning appreciated.

I think that the people who made derogatory claims about Tony's logic,
or claimed that you don't understand had failed to appreciate what
These messages are wanted by their recipients so should not be
scored as spam by SpamAssassin means. Anyone who disagrees with that
piece of logic would appear to be using Spamassassin for a purpose that
its designers didn't think of.

Chris


Re: Amazon / RFCI false positives

2006-11-06 Thread Chris Lear

jdow wrote:

From: Chris Lear [EMAIL PROTECTED]

* Tony Finch wrote (05/11/06 17:43):

On Sat, 4 Nov 2006, Michael Scheidell wrote:


So? Build something better. Its open source. Don't use the RFCI scores,
drop them, stop bithing about somehting YOU can change.


Well, I've added a -2 for email from Amazon, but I thought other people
might like a warning.


Thanks. Warning appreciated.

I think that the people who made derogatory claims about Tony's logic,
or claimed that you don't understand had failed to appreciate what
These messages are wanted by their recipients so should not be
scored as spam by SpamAssassin means. Anyone who disagrees with that
piece of logic would appear to be using Spamassassin for a purpose that
its designers didn't think of.


Tony's phrasing implied that he thought the scoring was so wrong
that it should be modified by the people who wrote the rule and ran
it against mass checks. That logic is dead wrong.


That logic, right or wrong, is yours, not Tony's.



The correct phrasing might have indicated there is a problem for some
sites with Amazon failing RFCi requiring a special rule to negate
Amazon.com's negative scores on RFCi.


I think that the correct phrasing was exactly what was given, in that 
case. I understood it, anyway.




Demanding that the RFCi rules vanish into the night just is not going
to fly. And it indicates flawed thought processes.


Which, again, may or may not be true, but certainly wasn't even vaguely 
hinted at by Tony. These flawed thought processes appear (to me, but 
maybe I'm unusually pedantic) to be imaginary.


Chris


Re: I'm thinking about suing Microsoft

2006-10-25 Thread Chris Lear
* Marc Perkel wrote (25/10/06 05:22):
 Europeans have sued Microsoft many times.

For anti-competitive behaviour, maybe. For copyright infringement, perhaps.
But for attracting crime? For discriminating against owners of illegal
software? I hope not.
If you win, of course, you might take on php, perl and other easy-to-use
web scripting languages that allow people to write crime-attracting
sites that are easy targets for IRC bots etc. Plenty of scope for the
Perkel suing machine. Unless your real gripe is simply that Microsoft a)
is successful and b) insists on licensing software. Unfortunately,
neither of these things is illegal in any country as far as I can tell.

 
 Chris Lear wrote:
 * Marc Perkel wrote (23/10/06 19:34):
 I'm considering filing a lawsuit against Microsoft to try to get an 
 order to make them make public security updates for Windows to 
 everyone, registered or not.

 The idea is that their product Windows creates a toxic byproduct 
 (spam,ddos zombies) that interfere with everyone else's internet 
 usage and that they have a responsibility to clean it up. It would be 
 similar to a suit where a business that is otherwise legitimate 
 attracts crime in a neighborhood or a manufacturer dumping toxic 
 waste into a stream.

 Virus infected spam zombie are a toxic byproduct of their business 
 model and it affects all of us and they have a duty to the public to 
 fix it. I'm somewhat of a legal expert, not a lawyer though. But just 
 wanted to get some feedback on the idea.



 Only in America...




Re: score=0.0 tests=none -- how can that be???

2006-10-25 Thread Chris Lear
* Debbie D wrote (25/10/06 04:48):
 Matt Kettler [EMAIL PROTECTED] wrote in message 
 news:[EMAIL PROTECTED]
 Debbie D wrote:
 I'm just not getting it.. I have a whole list of custom rules, I use
 RulesDuJour, I have custom scores to mark stuff higher.. I have 
 reasonable
 limits set.. the users do not adjust tings here, I do..  I use lint when 
 I
 add scores and rules..

 So tell me.. how in the past week or so I have 11 mails in *my* box that
 show:

 X-Spam-Status: No, score=0.0 required=4.5 tests=none

 Usually that means a timeout, or your milter was configured to skip SA
 for the message.

 How do you call SA? mimedefang? spamc call in procmail.rc?

 
 Exim 4.52 with SA and ClamAV I use spamc

In that case, the header is (I'm fairly sure) not added by SA, but by
exim. Try stopping spamd. Does exim still add the headers? If so, then
the occasional occurrence is because spamd is overloaded.
Look in the exim mail log for the mail in question. It might give the
answer.

Chris


Re: I'm thinking about suing Microsoft

2006-10-24 Thread Chris Lear

* Marc Perkel wrote (23/10/06 19:34):
I'm considering filing a lawsuit against Microsoft to try to get an 
order to make them make public security updates for Windows to everyone, 
registered or not.


The idea is that their product Windows creates a toxic byproduct 
(spam,ddos zombies) that interfere with everyone else's internet usage 
and that they have a responsibility to clean it up. It would be similar 
to a suit where a business that is otherwise legitimate attracts crime 
in a neighborhood or a manufacturer dumping toxic waste into a stream.


Virus infected spam zombie are a toxic byproduct of their business model 
and it affects all of us and they have a duty to the public to fix it. 
I'm somewhat of a legal expert, not a lawyer though. But just wanted to 
get some feedback on the idea.





Only in America...


Re: Psst!

2006-10-20 Thread Chris Lear
* Chris Santerre wrote (20/10/06 15:30):
 
 
 -Original Message-
 From: David B Funk [mailto:[EMAIL PROTECTED]
 Sent: Friday, October 20, 2006 1:20 AM
 To: users@spamassassin.apache.org
 Subject: Re: Psst!


 On Thu, 19 Oct 2006, Matt Kettler wrote:

  Another thing I've been noticing recently.. some idiot has
 been culling
  the web archives of mailing lists, and is trying to send
 spam emails to
  MESSAGE ID's of posts I've made. Check your mail logs!
 
  One or more of those would make a great spamtrap.

 Actually this kind of thing has been going on for some time. I still
 occasionally see spam sent to a Message-ID address derived from
 a machine that died years ago. The last owner of it was an active
 Usenet poster and is probably in all kinds of news archives.
 
 Just curious, but how many people see spam being sent to usersnames with
 the fisrt letter dropped? I see a ton in my logs. I believe spammers
 figure [EMAIL PROTECTED] will also have a [EMAIL PROTECTED]  Too bad for
 them...they do not. :)

Loads. Also with a variety of other manglings. One local part is
dwoodhouse, and some rejected variations are:
8jwoodhouse
8odhouse
dhouse
oodhouse
woodhousejwoodhouse
ydoodhouse

I can't see why they bother. Or maybe the address harvester is broken.


Re: ALL_TRUSTED creating a problem

2006-10-19 Thread Chris Lear

* Jo Rhett wrote (19/10/06 08:55):

Mark wrote:

We cannot really say SA's autodetection is broken, because SA is designed
to be called post-SMTP. Nor that a milter is broken per se for not adding
a Received: header, as that is the responsibility of the MTA itself. But a
milter using SA *can* be said to be broken if it's not proving SA
with the required post-SMTP view of things. Instead of patching SA, or
trying to fix it even, any milter using SA should simply DTRT (Do The
Right Thing): which is: add a pseudo Received: header before handing it
over to SA.


You'all are way behind the boat.  We've already patched it to support 
the undocumented requirement.  That's not an issue.


Perhaps SA being focused on post-SMTP is the problem here.  Why is 
this the focus?  In the modern world, you want to reject during SMTP not 
send backscatter to the poor folks whose e-mail got forged.


Frankly, a milter environment is the only possible right way to run SA. 
  So why the constant comments as if this is some one-off weird config?




Frankly, anyone who considers the way they do things to be the only 
possible right way is in danger of being Just Plain Wrong.


[further spleen-venting withheld]


Re: SA 3.1.7 children hang but don't die

2006-10-19 Thread Chris Lear

* David B Funk wrote (19/10/06 03:47):

On Wed, 18 Oct 2006, Sandy S wrote:


Daryl -
I switched back to 3.1.5 after my last post, and am sorry to report that I'm
still seeing the same issue under 3.1.5.  After running a while, the
processes in a state of K start building up until I manually kill them.

Regretfully (VERY regretfully) turning off FuzzyOCR.

Sandy


I'll second this, SA 3.1.5  FuzzyOCR on RHEL-AS4

I've been seeing this off  on ever since I added FuzzyOCR.
Logs seem to correlate to FuzzyOCR processing a gif image during a
peak of messages. Get FuzzyOcr.log message:
 FuzzyOcr received timeout after running 10 seconds.




I'm running SA 3.1.5 with FuzzyOCR. I'm seeing errors in the FuzzOCR 
log, like this:



[2006-10-18 09:34:24] FuzzyOcr received timeout after running 10 seconds.
[2006-10-18 09:49:14] FuzzyOcr received timeout after running 10 seconds.
[2006-10-18 10:09:26] Unexpected error in pipe to external programs. 
   Please check that all helper programs are installed 
and in the correct path.
   (Pipe Command /usr/bin/gifasm -d 
/tmp/.spamassassin2589Eye8ALtmp/out, Pipe exit code 1 (), Temporary 
file: /tmp/.spamassassin25893ZSX3Ltmp)



But I'm no longer getting children in the K state, since I put a spamd 
restart into the logrotate script. I haven't turned off FuzzyOCR which 
is doing an excellent job for me.


This isn't particularly conclusive, I'm afraid, because when I was 
seeing the problem it was sporadic and occasional, so it might just be 
luck, though it's been OK for a few days.


Chris


Re: tmp files being left over from FuzzyOCR?

2006-10-19 Thread Chris Lear

* Bill wrote (19/10/06 14:03):

Since I installed FuzzyOCR I've noticed I'm having a lot of files named
similar to  .spamassassin8932mZBFrtmp  left in my /tmp folder. These are
from FuzzyOCR, correct? The content of these files has lots of spaces,
hyphens, commas with a few readable words and the word picture a few
times.

Is there something I need to do to ensure these files are removed? After
I manually remove them I see new tmp files being created and removed but
sometimes a file is NOT removed.


I suspect that if you look in your FuzzyOCR log, you will find errors 
that match the unremoved temp files.


Eg from my FuzzyOCR.log:

[2006-10-18 10:10:47] Unexpected error in pipe to external programs.
  Please check that all helper programs are 
installed and in the correct path.
  (Pipe Command /usr/bin/gifasm -d 
/tmp/.spamassassin2591CHsvrEtmp/out, Pipe exit code 1 (), Temporary 
file: /tmp/.spamassassin2591dNqOn7tmp)


I see that /tmp/.spamassassin2591CHsvrEtmp/ is still there, but 
/tmp/.spamassassin2591dNqOn7tmp isn't.


And another example:

[2006-10-18 09:34:24] FuzzyOcr received timeout after running 10 seconds.

#ls -l /tmp/.spamassassin* | grep 09:34
-rw---  1 spamd users 0 Oct 18 09:34 /tmp/.spamassassin2589Wc3z7Gtmp
-rw---  1 spamd users 23579 Oct 18 09:34 /tmp/.spamassassin2589yvpP1Htmp


Looks like when gifasm fails, you get a dir left over. If there's a 
timeout, you get a file left over.


Chris


Re: tmp files being left over from FuzzyOCR?

2006-10-19 Thread Chris Lear

* Bill wrote (19/10/06 15:29):

I'm using FuzzyOcr-2.3b and I can't find any reference to this option in
any of the FuzzyOCR software I downloaded.

focr_keep_bad_images 0

Here's a sample of the items in my /tmp folder. You said your's were
folders, mine's not. All of these files are left behind as at the time I
made this sample it was 9:25.


Look in your FuzzyOCR log. If it's like mine, you will see timeouts like 
this:


[2006-10-18 09:49:14] FuzzyOcr received timeout after running 10 seconds.

If the times on these timeouts match the times on the temp files, then 
that's what's causing them. That logic works for what I'm seeing.




===
CIRCULAR 230 DISCLOSURE: Pursuant to Regulations Governing Practice Before
the Internal Revenue Service, any tax advice contained herein is not
intended or written to be used and cannot be used by a taxpayer for the
purpose of avoiding tax penalties that may be imposed on the taxpayer.
===


Shame. I was hoping to get out of paying some tax.


CONFIDENTIALITY NOTICE:
This electronic mail message and any attached files contain information
intended for the exclusive use of the individual or entity to whom it is
addressed and may contain information that is proprietary, privileged,
confidential and/or exempt from disclosure under applicable law.  If you are
not the intended recipient, you are hereby notified that any viewing,
copying, disclosure or distribution of this information may be subject to
legal restriction or sanction.  Please notify the sender, by electronic mail
or telephone, of any unintended recipients and delete the original message
without making any copies.


I hope I was the intended recipient, but I'm not sure how I can know.


Re: Spamd not killing children

2006-10-17 Thread Chris Lear

* Chris Lear wrote (16/10/06 10:32):
 The problem I'm having is that spamd doesn't seem to be able to clean up
 unwanted idle child processes.

[...]
I've had a look in the spamd code, and I'm now wondering whether my 
problem is related to logging bugs (eg 
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4237). I've set 
logrotate to restart spamd after syslog restarts as per the advice in 
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4316. Hopefully 
this will fix it.

I'm still unsure whether this is a spamd bug or not.

Chris


Spamd not killing children

2006-10-16 Thread Chris Lear
Subject sounds unpleasantly like incitement to filicide, for which I
apologise.

The problem I'm having is that spamd doesn't seem to be able to clean up
unwanted idle child processes.

Here's the logfile evidence:

Oct 16 00:12:59 marvin spamd[6351]: prefork: child states: III
Oct 16 00:13:09 marvin spamd[18043]: spamd: connection from localhost
[127.0.0.1] at port 35720
Oct 16 00:13:09 marvin spamd[18043]: spamd: setuid to spamd succeeded
Oct 16 00:13:09 marvin spamd[18043]: spamd: checking message
[EMAIL PROTECTED] for spamd:210
Oct 16 00:13:12 marvin spamd[25627]: spamd: connection from localhost
[127.0.0.1] at port 35722
Oct 16 00:13:12 marvin spamd[25627]: spamd: setuid to spamd succeeded
Oct 16 00:13:12 marvin spamd[25627]: spamd: checking message
[EMAIL PROTECTED] for spamd:210
Oct 16 00:13:14 marvin spamd[18043]: spamd: identified spam (29.7/5.0)
for spamd:210 in 5.3 seconds, 1545 bytes.
Oct 16 00:13:14 marvin spamd[18043]: spamd: result: Y 29 -
BAYES_99,RAZOR2_CF_RANGE_51_100,RAZOR2_CF_RANGE_E4_51_100,RAZOR2_CF_RANGE_E8_51_100,RAZOR2_CHECK,URIBL_AB_SURBL,URIBL_BLACK,URIBL_JP_SURBL,URIBL_OB_SURBL,URIBL_SBL,URIBL_SC_SURBL,URIBL_WS_SURBL
scantime=5.3,size=1545,user=spamd,uid=210,required_score=5.0,rhost=localhost,raddr=127.0.0.1,rport=35720,mid=[EMAIL
 PROTECTED],bayes=0.891,autolearn=spam
Oct 16 00:13:15 marvin spamd[6351]: prefork: child states: IBK
-^
[...] Time passes, and spamd continues to work [...]

Oct 16 10:18:00 marvin spamd[6351]: prefork: child states: IIKK
-^^

spamd seems to be trying to kill child processes to get the number of
threads down to 2. But for some (apparently unreported) reason the
threads don't die, and the server is slowly collecting children marked
as K.

I recently upgraded spamassassin to 3.1.5, and I also installed
FuzzyOcr, which I suspect might be part of the problem.

Can anyone tell me a) what logs to look in to work out why this has
happened? (I've looked in the FuzzyOcr log, which does show some errors
and timeouts, but apparently none at relevant times), b) whether there's
anything I can do about it (I'll start by disabling FuzzyOcr, but I'd
like to use it), or c) whether there's a spamassassin bug?

I looked at the code in SpamdForkScaling.pm, and I see that there are 2
places where child processes are killed. In one place (sub
child_error_kill, line 134), there is a warn line if the kill fails. In
the other (sub need_to_del_server, line 732) there isn't.

Chris


Re: DEAR_SOMETHING rule scoring issue

2006-08-09 Thread Chris Lear
* Gregory T Pelle wrote (09/08/06 15:14):
 What is the procedure to have a rule score reviewed?
 
 I have been looking over the scoring for version 3.1.x at
 
   http://spamassassin.apache.org/tests_3_1_x.html
 
 and think that a score of 1.6 is high for the DEAR_SOMETHING rule.  I
 know that our customer support emails have the first line as Dear
 customer's name  It would seem to me that any business that is
 trying to sound professional would have emails that hit this rule.

Where I work I'm always trying to persuade the people who write bulk
e-mail to customers *not* to start it with Dear customer's name,
because I think it does the opposite of sounding professional. But maybe
it's just me. They are indeed trying to sound professional, and think
that personalising the e-mail with Dear will do that, and I don't seem
to win the argument. It hasn't made me lower the DEAR_SOMETHING score,
though.

Chris


Re: Allowing IMAP/POP to Send Email

2006-08-03 Thread Chris Lear

* Marc Perkel wrote (03/08/06 14:39):


Tony Finch wrote:

The reason that message submission is done with SMTP is because of the
number of SMTP extensions that the MUA will want to use, in particular
DSNs, deliver-by, deliver-after, message tracking, and whatever else may
be invented in the future. If you want to make message submission a part
of IMAP and POP then you'll have to re-do all these SMTP extensions twice,
which is a colossal waste of time.


  


Not really - what I'm proposing is that the IMAP connection just pipe 
the message into an SMTP server. The IMAP is acting only and an 
authenticated connection back to SMTP. I'm not suggesting replacing 
SMTP. What I'm suggesting is that POP/IMAP can be used as a transport to 
get the mail there because it's an existing connection, is already 
established, is already authenticated with the credentials of the email 
account, and it isn't a port that people would block like port 25 is.


I'm not trying to replace SMTP. I'm just trying to suggest a better way 
for end users to get outgoing email to the SMTP server.




What if I set up an SMTP server at home behind my ADSL router, collect 
my vanity-domain mail there, and access it via IMAP or POP3? It seems I 
only have one option, which is to send my mail via IMAP to my home 
server. Which then sends via SMTP to... the Internet (or via a 
smarthost). And the home server sending via SMTP is going to look a bit 
like a MUA sending via SMTP. How would you tell the difference? Is a 
home mail server outlawed in the brave new world? Or does my SMTP server 
have to learn to talk IMAP to make message submissions to the ISP's server?


Chris


Re: exim4 + forwarding + spamassassin

2006-07-27 Thread Chris Lear

* Zinski, Steve wrote (27/07/06 02:50):

Not sure how to get exim to pass the initial scan to spamd using a
different user. I've gone through my exim.conf file and changed every
single user =  entry to a known user and it still insists on using
nobody for the first pass.

Another thing that intrigues me is the wording of the log entries.

In the first pass, spamd says that it's checking the message. In the
second pass it says processing the message.


I think exim only puts the message through spamassassin once (then 
subsequently caches the result, if required), and uses the username set 
up in the acl:


# Reject messages with a SpamAssassin score 7
deny message   = Rejected: Flagged as spam ($spam_score).
 spam  = nobody:true
 ^^ - **here**
 condition = ${if {$spam_score_int}{70}{1}{0}}

I have a similar setup, except that I run spamc as a user called spamd. 
This gives site-side bayes, and works fine.


Is it possible that the second run through spamd is from you running 
spamc after the message is delivered? Ie, not from exim?


There's an exim-users mailing list that's probably a better place for 
these questions.


Chris




-Original Message-
From: Stuart Johnston [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 26, 2006 3:05 PM

To: users@spamassassin.apache.org
Subject: Re: exim4 + forwarding + spamassassin

Your first scan is running as nobody (that's bad) but the second is
running as szinski.  That would explain the BAYES_99.  I'm not sure
about the FORGED_RCVD_HELO and HTML_50_60 though.


Zinski, Steve wrote:

I need some help trying to figure out why spamassassin scores the same
message differently.

I am using an ACL with exim4 to scan email during the actual smtp
connection (so I can reject spam before my server accepts it). It's
pretty straightforward. My ACL looks like this:
 
# Reject messages with a SpamAssassin score 7

deny message   = Rejected: Flagged as spam ($spam_score).
 spam  = nobody:true
 condition = ${if {$spam_score_int}{70}{1}{0}}

Everything works just fine for mail destined to local accounts, but
there seems to be a discrepancy in spamassassin when mail is delivered
to a forwarded account (the forwarder directs mail to another local
account; i.e., [EMAIL PROTECTED] -- [EMAIL PROTECTED]). What
happens is that spamassassin scores the message low (non-spam) when it
accepts it from the Internet, but then scores it higher (as spam) when
the message is rerouted to the local mailbox. Here is a snippet from
maillog that illustrates this:

Jul 26 07:58:20 vps spamd[7361]: spamd: connection from localhost
[127.0.0.1] at port 56458 
Jul 26 07:58:20 vps spamd[7361]: spamd: setuid to nobody succeeded 
Jul 26 07:58:20 vps spamd[7361]: spamd: checking message
[EMAIL PROTECTED] for nobody:99 
Jul 26 07:58:20 vps spamd[7361]: spamd: clean message (2.6/5.0) for
nobody:99 in 0.1 seconds, 2230 bytes. 
Jul 26 07:58:20 vps spamd[7361]: spamd: result: . 2 -

HTML_MESSAGE,URIBL_SBL,URIBL_WS_SURBL


scantime=0.1,size=2230,user=nobody,uid=99,required_score=5.0,rhost=local
host,raddr=127.0.0.1,rport=56458,mid=[EMAIL PROTECTED]
8,autolearn=no 
Jul 26 07:58:20 vps spamd[26587]: prefork: child states: II 
Jul 26 07:58:21 vps spamd[7361]: spamd: connection from localhost
[127.0.0.1] at port 56459 
Jul 26 07:58:21 vps spamd[7361]: spamd: setuid to szinski succeeded 
Jul 26 07:58:21 vps spamd[7361]: spamd: processing message
[EMAIL PROTECTED] for szinski:503 
Jul 26 07:58:21 vps spamd[7361]: spamd: identified spam (7.5/5.0) for
szinski:503 in 0.6 seconds, 2183 bytes. 
Jul 26 07:58:21 vps spamd[7361]: spamd: result: Y 7 -



BAYES_99,FORGED_RCVD_HELO,HTML_50_60,HTML_MESSAGE,URIBL_SBL,URIBL_WS_SUR

BL


scantime=0.6,size=2183,user=szinski,uid=503,required_score=5.0,rhost=loc
alhost,raddr=127.0.0.1,rport=56459,mid=[EMAIL PROTECTED]

hn8,bayes=0.97051713734,autolearn=no

As you can see, during the initial smtp pass (accepting from remote
host) the message is deemed clean with a score of 2.6. Then, when

the

same message is delivered to the local account, it's identified as

spam

with a score of 7.5. Unfortunately, my ACL only kicks in during the
first pass so the message gets accepted and delivered instead of
rejected. Anyone know what I might be doing wrong here?

Any help would be greatly appreciated.

Steve Zinski
University of Richmond






Re: The best way to use Spamassassin is to not use Spamassassin

2006-07-13 Thread Chris Lear

* Marc Perkel wrote (12/07/06 18:30):

Catchy subject line eh?

OK - so what I mean by this is that I now use SA for about 5% of all 
incoming email. The reaso of spam is rejected before I get to SA through 
a fairly large number of tricks that allow me to determine with near 
100% accuracy things that are spam. It is none mostly through behavior 
and karma related lists. Being host blacklisted or URI blacklisted.


I don't know if it's relevant to Marc's point, but it seems to me that 
if SA was reduced to network checks only it would still be a very good 
blocker of spam. And perhaps what Marc is doing is, more or less, moving 
SA's network checks into the MTA and using them to reject rather than 
just score.


I suppose something similar would be to score all the URIBL rules and 
RCVD_IN rules high, and abandon the traditional regex rules.


Network checks are easily the most hit spam rules in SA anyway. Here's a 
bit of sa-stats for spam on a machine I look after (the MTA blocks based 
on sbl-xbl.spamhaus.org before anything gets to SA, so that's not 
represented here):


   1BAYES_99
   2URIBL_BLACK
   3URIBL_SBL
   4URIBL_JP_SURBL
   5URIBL_OB_SURBL
   6RCVD_IN_SORBS_DUL
   7RCVD_IN_NJABL_DUL
   8HTML_MESSAGE
   9FORGED_RCVD_HELO
  10URIBL_SC_SURBL
  11URIBL_WS_SURBL
  12SARE_MLB_Stock6
  13URIBL_AB_SURBL
  14SARE_MLB_Stock1
  15STOCK_NAME_FVGT1



Of course that 5% is very important because that is where I get the
data for the other tests that allow me to bypass filtering.


Even this isn't necessarily so. Data for network tests can be collected 
automatically, by trapping spammers who trawl the web/usenet for 
addresses, those who scan for open port 25s, or those who try high MX's. 
So at least some useful data can be collected without SA, or even human 
intervention.



But - I
want you all to start thinking of a new way to look at spam
filtering.


I'm not sure this is a new way to look at spam filtering, but I agree 
that content testing against regular expressions is increasingly looking 
like a crude and easily-outwitted technique compared to dns tests. Bayes 
is still good, though.


Re: sa-learn script

2006-07-11 Thread Chris Lear

* Nicholas Payne-Roberts wrote (11/07/06 11:58):
Does anybody know a good way to script sa-learn to daily check on junk 
e-mail folders? i'm currently trying the following line in a cron.daily 
script, but its throwing up an error:


find /home/vpopmail/domains -name .Junk E-mail -exec  sa-learn 
--showdots --spam cur {} \;


Your --exec subcommand is the problem. The {} expands to the full path 
of the found file. It doesn't change directory. A version that might work is


find /home/vpopmail/domains -name .Junk E-mail -exec  sa-learn 
--showdots --spam {}/cur \;


There's not much point using --showdots in cron, I would have thought, 
but it's probably useful for testing.


To make sure your find command is right, you can do something like this:

find /home/vpopmail/domains -name .Junk E-mail -exec echo sa-learn 
--showdots --spam {}/cur \;


which will simply echo a list of commands that would get executed.

Chris


Yahoo! SpamGuard spam

2006-07-11 Thread Chris Lear
I was entertained by this. A score of 5.491 added to an e-mail because 
of a Yahoo! advert stuck on the bottom by the Yahoo! MTA.

And the advert is for SpamGuard.


[... headers chopped... ]
X-Spam-Score: 2.9
X-Spam-Level: ++
X-Spam-Report: Spam report: Score = 2.9. 
Tests=BAYES_00=-2.599,DRUGS_ERECTILE=0.493,DRUGS_ERECTILE_OBFU=2.408,FUZZY_VPILL=0.924,SARE_OBFU_VIAGRA=1.666


[... email body chopped ...]
___
All New Yahoo! Mail � Tired of [EMAIL PROTECTED]@! come-ons? Let our SpamGuard 
protect you. http://uk.docs.yahoo.com/nowyoucan.html



Chris


Re: Lots of missed spam

2006-06-29 Thread Chris Lear

* Leigh Sharpe wrote (29/06/06 03:03):


This was my first suspicion. I turned off Bayes tests temporarily and
it had little effect. I'm seriously considering resetting the bayes
and starting again


I can recommend that. I had a situation a while ago where the bayes 
database got mysteriously corrupted (sa-learn dump magic suddenly showed 
nspam way way less than nham). I deleted the whole bayes database, did a 
bit of manual training, let it carry on with the automatic training, and 
it was all fine again in a day or so.


If spam hits BAYES_00 (which carries a negative score), you're better 
off without bayes at all.


But with good bayes, most of the spam you've posted will be blocked. The 
difference between BAYES_00 and BAYES_99 is +6.099. So a small negative 
score with BAYES_00 will be sent over 5 by BAYES_99.


Chris


Re: Suing Spammers

2006-05-15 Thread Chris Lear
* jdow wrote (14/05/06 02:09):
 From: Gary W. Smith [EMAIL PROTECTED]
 
 On another paw, Craig, do consider who is the injured party. Marc is
 not. The final recipient, the addressee, is an injured party for the
 spam in her mailbox. The addressee's ISP is also an injured party due
 to the (vastly) increased mail volume her servers must handle. They
 have a tort for filing suit. The person who filters the spam is, one
 can argue, benefiting from the spam. So it is hard for him to sue
 and win anything.
 
 
 
 I disagree.  As a provider you are paying for the acceptance,
 processing, storage and re-transmission of that spam.  It is costing you
 resources which can be quantified.  My boxes have been running at about
 15% on average, 24x7.  Knowing that spam is 80% of that then you might
 be able to prove in a court of law that it is indeed damaging you
 financially to process this.
 
 But the burden gets turned back to you to prove this damage.  So the
 question is what the return will be versus the cost of proving it.
 Unless you are processing millions of spams per day from a single
 spammer then more than likely you will be hard pressed to see any type
 of return.
 
  jdow  Waitaminit - Marc heavily implied that he was offering a
 spam filtering service. If that is true then Marc is not being injured.
 The spam is his bread and butter, regardless of how much he wishes to
 be put out of that business.

What if he's not providing a spam filtering service, but a clean
e-mail service? Then the spammer is the enemy, not the bread and
butter. And it's the same service even if all spammers boycott his
servers. Indeed, I imagine he would get more customers if all spammers
boycotted his servers.

 
  jdow  That is why I made comment of three cases, the actual end
 recipient, the actual end recipient's ISP, and the spam filtering
 service provider. Of the three the first can sue and win something
 nominal. In the second case the ISP has so much bulk that the costs
 of the filtering and extra machinery are demonstrable injuries that
 amount to big money. The third case is a person actually making the
 spam filtering his business. In what way is that third person being
 injured?

In just the same way as the ISP, it seems to me. He's trying to provide
a service (delivering legit E-mail), and incurs demonstrable costs.

Chris


Can spamassassin stop this?

2006-05-12 Thread Chris Lear
I run a fairly uncompromising spamassassin, which rejects mail scoring
5.5 or above (and in my own mailbox, I treat anything scoring over 0 as
suspect). I find that almost all false negatives that slip through are
the result of a not-perfectly-trained site-wide bayes database
[Basically, I train it, so it works well for me. Hardly anyone else
bothers]. I run lots of network tests, which work really well.
But this e-mail looks like it would never get blocked. Does sa have a
hope against this, or have the spammers finally come up with something
that can't be filtered? Even with BAYES_99 (default score 3.5) it would
score just under 5.5.

This is the first time I've noticed a spam e-mail that I can't see how
spamassassin could kill.

Chris

=


Return-path: [EMAIL PROTECTED]
Envelope-to: [EMAIL PROTECTED]
Delivery-date: Fri, 12 May 2006 04:52:03 +0100
Received: from bzq-88-155-227-248.red.bezeqint.net ([88.155.227.248])
by marvin.thomasmurray.com with smtp (Exim 4.54)
id 1FeOh7-0001os-6a
for [EMAIL PROTECTED]; Fri, 12 May 2006 04:52:03 +0100
From: kalyn kari [EMAIL PROTECTED]
To: dacia katelin [EMAIL PROTECTED]
Subject: Was it love, or was it the thought of being in love?
Date: Fri, 12 May 2006 03:52:03 +
Message-ID: [EMAIL PROTECTED]
MIME-Version: 1.0
Content-Type: text/html; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Mailer: PHP/4.4.0
X-Marvin-Spam-Score: 1.9
X-Marvin-Spam-Level: +
X-Marvin-Spam-Report: Marvin spam report: Score = 1.9.
Tests=BAYES_50=0.001,HTML_MESSAGE=0.001,MIME_HTML_ONLY=0.001,RCVD_IN_NJABL_DUL=1.946
X-Marvin-AntiVirus: Clean

!DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN
html
head
meta http-equiv=Content-Type content=text/html; charset=us-ascii
/head
body

Hullo!brbr
[E]rectilebr
[D]ysfunction?brbr

We can help! Our site: bochhorfando/b[dot]bcom/b ;) Don't forget
to replace b[dot]/b to b./bbrbr
---br
cigarette after another and extinguishing them on the edge of a
full ash tray, with Dolly, and with the old prince, where there
was talk about dinner, about politics, about Marya Petrovna's
illness, and where Levin suddenly forgot for a minute what was
happening, and felt as though he had waked up from sleep; the
other was in her presence, at her pillow, where his heart seemed
breaking and still did not break from sympathetic suffering, and
he prayed to God without ceasing.  And every time he was brought
back from a moment of oblivion by a scream reaching him from the

/body/html


Re: Could you scan your logs for me?

2006-02-03 Thread Chris Lear
* Ole Nomann Thomsen wrote (03/02/06 09:27):
 Hi, can I ask a small favor from some of you running SA with Bayes enabled:
 Please run the following perl-oneliner on your SA-log (mine is current):
 
 perl -ne 'if (/result:/) {$n++; $b++ if (/BAYES/);} } print $b/$n,\n; {' 
 current
 
 (I promise it's not a rootkit :-)
 
 I get:
 0.710109622411693
 
 I suspect you really ought to see 1, always. What do you get?

0.960777058279371

In my case, the difference is attributable to this in local.cf:

bayes_ignore_to users@spamassassin.apache.org
whitelist_to users@spamassassin.apache.org

Chris


Re: Another URL obfuscation

2006-01-10 Thread Chris Lear
* Jeff Chan wrote (10/01/2006 15:42):
 On Tuesday, January 10, 2006, 6:17:38 AM, Larry Rosenbaum wrote:
 I found this obfuscated URL in a drug spam:
 
 A href=3Dhttp://gozifo .upze5otbbutzanbb655k685ys5nn%2Eridgykh=
 comFONT SIZE=3D2/FONT
 
 Good grief, does any mail client actually parse that as a
 functional URI?

Yes. In your e-mail, my Thunderbird created a clickable link to
http://gozifo
My IE gives a DNS error when it tries that address.
My FireFox redirects to
http://www.google.com/search?btnI=I%27m+Feeling+Luckyie=UTF-8oe=UTF-8q=gozifo
which in turn redirects to http://www.vojir.com/other/basic-myebol.html
which gives a 404 error. It's probably possible to turn this
(mis)feature off in FireFox, but there it is by default.

I have no idea whether this is the original intention of the
obfuscation. I would guess not - and if it's viewed as html to start
with that might make a difference.

Chris


Re: SARE_URI_EQUALS false positives

2006-01-03 Thread Chris Lear
* Loren Wilton wrote (24/12/2005 00:23):
 Does anyone have any suggestions, apart from simply reducing the score
 for SARE_URI_EQUALS? Is this a spamassassin bug, or is there no way to
 guarantee that only real uris are parsed as such?
 
 Several.

Hi. Thanks for the response. I'm replying rather late due to pressures
of Christmas.

 
 1.Change your report generator to remove the extraneous dot between
 updated and by.  Or change it to the more common underscore, if you insist
 on these words being connected for some reason.
 
 2.Put spaces around the equal sign.

These are fine suggestions, but sadly not practical. The e-mails are
auto-generated diffs from cvs commits. The files being committed are
java properties files. In particular, the updated.by property contains
internationalised versions of the phrase Updated by. The more common
underscore would be unusual in the java properties file, and expecting
the developers to change the way they work to avoid SARE misfires is a
slightly overzealous reaction to the spam problem, I think. However, it
is possible if there's no sensible alternative.
The second suggestion is only a workaround, not a fix, anyway, because
spamassassin will still check http://updated.by as a uri.

 
 3.If you are reluctant for the correct fix, drop the score on the
 uri_equals rule to 4 or maybe 3, depending on what else your report manages
 to hit.

I am reluctant to use the correct fix. Actually I'm inclined to think
that the word correct is being misapplied here. I've changed the
scores appropriately, though.

 
 4.You could submit a Bugzilla on the parsing of that phrase.  But
 frankly I consider the bug in the report generation, not SA's parsing of
 strange syntax.

The reason I didn't submit a bug was that I was not sure there was one -
hence the original query. And I'm still not going to submit a bug,
because I'm persuaded that there is not one. What bothered me (and still
does a bit) was that the string updated.by=anything matches a rule
that looks for uris of the form http(s)://*=*. Ie the http(s) is
conjured out of nowhere for schemeless uris. I can see the point, but I
thought it would be worth bringing a possible problem to light. It's a
possible problem, not a bug per se, and the subsequent discussion shows
that people take different views on the seriousness of this kind of
parsing issue. One thing that hasn't been mentioned in respect of this
is that if spamassassin is looking aggressively for schemeless uris, it
could in some cases create quite a lot of unwanted uri checking traffic.

I'm happy to stick with what I've got now. I've sent some examples off
as indicated so that the SARE corpus will contain my mail in future.

Chris


SARE_URI_EQUALS false positives

2005-12-23 Thread Chris Lear
I'm getting false positives for SARE_URI_EQUALS, which scores 5 and is
therefore skewing the scoring of some mail quite badly.
The weird thing is that the uris that spamassassin is complaining about
aren't uris at all. The mail in question is auto-created reports of cvs
diffs, so it's slightly unusual.
I've tried to condense the debug information. Here it is:

This is some of the output from spamassassin -D false_positive

[16733] dbg: uri: parsed uri found, updated.by=Mis
[16733] dbg: uri: cleaned parsed uri, http://updated.by=Mis
[16733] dbg: uri: cleaned parsed uri, updated.by=Mis
[16733] dbg: uri: parsed uri found, http://updated.by=Mis
[16733] dbg: uri: cleaned parsed uri, http://updated.by=Mis
[16733] dbg: uri: parsed uri found, updated.by=Updated
[16733] dbg: uri: cleaned parsed uri, updated.by=Updated
[16733] dbg: uri: cleaned parsed uri, http://updated.by=Updated
[16733] dbg: uri: parsed uri found, http://updated.by=Updated
[16733] dbg: uri: cleaned parsed uri, http://updated.by=Updated

These parsed uris are not links in the e-mail. They are just text.

I've had a bit of a look at the regexps that spamassassin uses to work
out what is a uri, and it seems that updated.by=Updated is treated as
a uri because .by is a valid tld and spamassassin looks for schemeless
uris, then prepends http:// for the tests.

I'm running spamassassin 3.1.0 on perl 5.8.2.

Does anyone have any suggestions, apart from simply reducing the score
for SARE_URI_EQUALS? Is this a spamassassin bug, or is there no way to
guarantee that only real uris are parsed as such?

Chris


Re: SARE_URI_EQUALS false positives

2005-12-23 Thread Chris Lear
* jdow wrote (23/12/05 11:26):
 From: Chris Lear [EMAIL PROTECTED]
 
 I'm getting false positives for SARE_URI_EQUALS, which scores 5 and is
 therefore skewing the scoring of some mail quite badly.
 The weird thing is that the uris that spamassassin is complaining about
 aren't uris at all. The mail in question is auto-created reports of cvs
 diffs, so it's slightly unusual.

[...]
 
 I've had a bit of a look at the regexps that spamassassin uses to work
 out what is a uri, and it seems that updated.by=Updated is treated as
 a uri because .by is a valid tld and spamassassin looks for schemeless
 uris, then prepends http:// for the tests.
 
 I'm running spamassassin 3.1.0 on perl 5.8.2.
 
 Does anyone have any suggestions, apart from simply reducing the score
 for SARE_URI_EQUALS? Is this a spamassassin bug, or is there no way to
 guarantee that only real uris are parsed as such?
 
 Before you drop the score precipitously check if there is some other
 characteristic of the emails that trigger falsely which can be used to
 apply a negative score. If there is such a characteristic then generate
 the appropriate negative score. If not weigh how effective the rule is
 for you. The version of sa-stats.pl that is on the SARE site helps
 figure this out nicely.
 
 That said it's close to a 50/50 rule that hits on very few messages
 here so should have a low score. (It hit on 6 messages out of 75000.)
 Cutting it out completely here seems like it would be effective TODAY.
 That could change. At one time it was quite necessary. Spammer fads
 change.)

I've reduced the score, and a quick check shows that that rule hits
almost nothing anyway, so it's not a big problem. The bayes rules were
keeping the false positives from doing much damage, anyway.
But spamassassin uses uris for lots of things, and if it's commonly
parsing (reasonably) normal text as uris, I would expect that to be a
problem in more rules than just SARE_URI_EQUALS.

Chris


Re: SARE_URI_EQUALS false positives

2005-12-23 Thread Chris Lear
* jdow wrote (23/12/05 12:06):
 From: Chris Lear [EMAIL PROTECTED]
* jdow wrote (23/12/05 11:26):
 From: Chris Lear [EMAIL PROTECTED]
 
 I'm getting false positives for SARE_URI_EQUALS, which scores 5 and is
 therefore skewing the scoring of some mail quite badly.
 The weird thing is that the uris that spamassassin is complaining about
 aren't uris at all. The mail in question is auto-created reports of cvs
 diffs, so it's slightly unusual.
 
 [...]
 
 I've had a bit of a look at the regexps that spamassassin uses to work
 out what is a uri, and it seems that updated.by=Updated is treated as
 a uri because .by is a valid tld and spamassassin looks for schemeless
 uris, then prepends http:// for the tests.
 
 I'm running spamassassin 3.1.0 on perl 5.8.2.
 
 Does anyone have any suggestions, apart from simply reducing the score
 for SARE_URI_EQUALS? Is this a spamassassin bug, or is there no way to
 guarantee that only real uris are parsed as such?
 
 Before you drop the score precipitously check if there is some other
 characteristic of the emails that trigger falsely which can be used to
 apply a negative score. If there is such a characteristic then generate
 the appropriate negative score. If not weigh how effective the rule is
 for you. The version of sa-stats.pl that is on the SARE site helps
 figure this out nicely.
 
 That said it's close to a 50/50 rule that hits on very few messages
 here so should have a low score. (It hit on 6 messages out of 75000.)
 Cutting it out completely here seems like it would be effective TODAY.
 That could change. At one time it was quite necessary. Spammer fads
 change.)
 
 I've reduced the score, and a quick check shows that that rule hits
 almost nothing anyway, so it's not a big problem. The bayes rules were
 keeping the false positives from doing much damage, anyway.
 But spamassassin uses uris for lots of things, and if it's commonly
 parsing (reasonably) normal text as uris, I would expect that to be a
 problem in more rules than just SARE_URI_EQUALS.
 
 That is a standalone rule.
 
 And I do note that many of the SARE rules have severe problems in very
 specific cases. There are some mailing lists that are not well filtered
 for spam which have postings which trigger some of the too effective
 to toss SARE rules. I've developed some massive meta rules to at least
 partially get a handle on the problem. (A number of times XXX hit option
 would be nice to have for this.)

Sorry to go on, but I wonder whether you've missed by point. The
SARE_URI_EQUALS rule is working fine. It just looks in the uris that
spamassassin gives it, and complains when they contain =.
The problem is that spamassassin is treating things that aren't uris as
uris. So SARE_URI_EQUALS is working on dud data.

In this specific case, the e-mail contains the text
updated.by=Updated. This is not a uri, and nor should it be treated as
one. But spamassassin thinks it is (becasue .by is a valid tld), so, as
far as I can tell, *all* uri rules will check it. It so happens that
SARE_URI_EQUALS hits in this case, but other uri rules are vulnerable to
false positives if the uri parsing is wrong, aren't they?

Chris


Re: How can i block this?

2005-10-12 Thread Chris Lear
* Matt Kettler wrote (10/11/05 19:37):
 Alessio wrote:
 I have received this mail, the heading from is blank! Is possible? 
 
 Yes, it's quite normal and is called a message with a null return path.

Is it? I thought the return path (or envelope sender) was quite distinct
from the From: header in the message itself.
Bounce messages usually have From: headers (normally showing
[EMAIL PROTECTED]).

A blank From: header is possible, but it's unusual in normal mail from MUAs.

Chris


Re: How can i block this?

2005-10-12 Thread Chris Lear
* mouss wrote (10/12/05 13:13):
 Chris Lear a écrit :
 
* Matt Kettler wrote (10/11/05 19:37):
  

Alessio wrote:


I have received this mail, the heading from is blank! Is possible? 
  

Yes, it's quite normal and is called a message with a null return path.



Is it? I thought the return path (or envelope sender) was quite distinct
from the From: header in the message itself.
Bounce messages usually have From: headers (normally showing
[EMAIL PROTECTED]).

A blank From: header is possible, but it's unusual in normal mail from MUAs.

  

 while the OP seems confused (he said: heading from), his logs show he 
 is talking about the envelop sender (from= of his sendmail or whatever).

I see. Sorry.

Chris


Bayes expiry/oddity

2005-09-23 Thread Chris Lear
I'm running a reasonably small site-wide spamassassin, and I use a
site-side bayes db. Spamassassin runs as the user spamd.

I noticed that I got spam last night with no BAYES_XX markup. I looked
into it this morning, and discovered that the bayes db only has 47 spam
messages in it (nspam from sa-learn --dump magic). It has about 69000
ham. It must have gone from 200 spams at around 11pm last night to 50
this morning, and the only explanation I can think of is that the spam
has been expired, but on the other hand this seems odd.

Spamassassin learnt 143 messages as spam yesterday (according to my
logs). In the same period it learnt 291 as ham. These figures are
reasonably representative of the traffic (on weekdays, anyway)

Can anyone explain what happened to the bayes db? It's now steadily
auto-learning itself back to normal, but we are going to get many more
false negatives today I think.

Any information/explanation appreciated.

Chris

PS I think it's extremely unlikely that there's been a concerted
attack/mistake by users using sa-learn the wrong way and re-learning the
spam as ham. For one thing, spamassassin is called by exim during the
smtp phase, and if the e-mail is marked as spam it's never delivered to
anyone. For another thing, there's nobody else around that knows what
sa-learn is.


Re: Bayes expiry/oddity

2005-09-23 Thread Chris Lear
* Chris Lear wrote (09/23/05 10:34):
 I'm running a reasonably small site-wide spamassassin, and I use a
 site-side bayes db. Spamassassin runs as the user spamd.
 
 I noticed that I got spam last night with no BAYES_XX markup. I looked
 into it this morning, and discovered that the bayes db only has 47 spam
 messages in it (nspam from sa-learn --dump magic). It has about 69000
 ham. It must have gone from 200 spams at around 11pm last night to 50
 this morning, and the only explanation I can think of is that the spam
 has been expired, but on the other hand this seems odd.
 
 Spamassassin learnt 143 messages as spam yesterday (according to my
 logs). In the same period it learnt 291 as ham. These figures are
 reasonably representative of the traffic (on weekdays, anyway)
 
 Can anyone explain what happened to the bayes db? It's now steadily
 auto-learning itself back to normal, but we are going to get many more
 false negatives today I think.
 
 Any information/explanation appreciated.

None forthcoming, so I'm putting this down to a freak bayes database
corruption. sa-learn --dump magic now shows 161 spam and 69310 ham
learnt, and I'm letting it sort itself out. In about 3 months I guess it
will be back to normal :-).
Spamassassin works fairly well without bayes, so I don't mind too much,
but I would feel happier if I thought that what happened was understandable.

Chris


Re: Unsubscribing

2005-07-15 Thread Chris Lear
* Duane Hill wrote (07/15/05 10:49):
 On Friday, July 15, 2005 at 9:45:17 AM, [EMAIL PROTECTED] confabulated:
 
 I am shortly to go on hols for 2 weeks and so was planning to
 unsubscribe until I get back. I notice on the web page at
 http://wiki.apache.org/spamassassin/MailingLists
 
 it tells you how to subscribe
 
 And in the headers of all messages to the list state this:
 
 list-help: mailto:[EMAIL PROTECTED]
 list-unsubscribe: mailto:[EMAIL PROTECTED]
 List-Post: mailto:users@spamassassin.apache.org

Which helps. The OP's suggestion was...

 [...] I would like to suggest that
 unsubscribe details be added to the page.

I think this is a reasonably sensible suggestion.

 I also notice that I seem to
 be subscribed to two spamassassin lists, not sure how that happened,

And you seem to have sent mail to both at once, resulting in a
duplicate. I think that spamassassin-users@incubator.apache.org is out
of date.

 probably user stupidity knowing me. Is there information somewhere else
 that tells people how to unsubscribe from the list.

See the headers (as mentioned above)

--
Chris


Re: How can I correct this FalsePositive?

2005-07-15 Thread Chris Lear
* Loren Wilton wrote (07/15/05 12:02):
 X-Spam-Status: Yes, score=2.2 required=2.0
 tests=HTML_BACKHAIR_8,HTML_MESSAGE,
 HTML_OBFUSCATE_05_10,MIME_HTML_ONLY autolearn=no version=3.0.4
 
 The easiest way to eliminate this FP would be to take your spam threshold
 back to 5, or at least something close to that.  The rules that hit on this
 mail have nothing whatever to do with the site - they are related to the
 mail message formatting.
 
 Since it only got 2.2 points, nobody should really notice this.  But since
 you have set your spam cutoff way too low, it FPs for you.

...and the cheapest way to fix the message formatting, as I see it, is
to get them to fix the message so it doesn't hit this rule:

1.2 MIME_HTML_ONLY BODY: Message only has text/html MIME parts

Which should also make the message more friendly to non-HTML mail
readers, which is worthwhile anyway. And it will take the score down to 1.0.

--
Chris


SARE_CHARSET_W1251 and SARE_FROM_CHAR_W1251

2005-05-20 Thread Chris Lear
I've been running quite a lot of sare rules on a site-wide SA
installation for a month or two now. I've been keeping a fairly close
eye on it, and there have been few false positives generally.

But today I noticed that several e-mails are hitting both
SARE_CHARSET_W1251 and SARE_FROM_CHAR_W1251. These are ham, sent from
(one specific address in) Ukraine to a Ukrainian in England, written in
English.
The scoring is such that the e-mail gets a score of 3.333 PLUS 4.0 - so
only bayes saves it from being rejected (we reject at 5.5).

I can re-score these rules (or remove sare_header0, which will lower the
scores anyway), but I have 2 questions:
- Is this a slightly unfair double-scoring?
- Are there any other similar rules I should worry about, given that
some Russian mail to this server is ham?

--
Chris


Re: SARE_CHARSET_W1251 and SARE_FROM_CHAR_W1251

2005-05-20 Thread Chris Lear
* John Wilcock wrote (05/20/05 10:51):
 Chris Lear wrote:
 But today I noticed that several e-mails are hitting both
 SARE_CHARSET_W1251 and SARE_FROM_CHAR_W1251. These are ham, sent from
 (one specific address in) Ukraine to a Ukrainian in England, written in
 English.
 The scoring is such that the e-mail gets a score of 3.333 PLUS 4.0 - so
 only bayes saves it from being rejected (we reject at 5.5).
 
 I can re-score these rules (or remove sare_header0, which will lower the
 scores anyway), but I have 2 questions:
 - Is this a slightly unfair double-scoring?
 - Are there any other similar rules I should worry about, given that
 some Russian mail to this server is ham?
 
 These are actually in the header1 file, not header0, but surely they 
 ought to be moved to the 70_sare_header_eng.cf as they hit non-English 
 ham. Bob?

They're in my header0.cf from sare/rules du jour. And in header.cf with
a lower score as well. Have I got the wrong files?

RulesDuJour $ grep SARE_FROM_CHAR_W1251 *
70_sare_header.cf:headerSARE_FROM_CHAR_W1251 From:raw =~
/\=\?Windows-1251\?/i
70_sare_header.cf:describe  SARE_FROM_CHAR_W1251 Displays in
unexpected charset
70_sare_header.cf:score SARE_FROM_CHAR_W1251 1.666
70_sare_header.cf:#ham  SARE_FROM_CHAR_W1251 Found in some
Russian ham
70_sare_header.cf:#hist SARE_FROM_CHAR_W1251 Created by Bob
Menschel May 17 2004
70_sare_header.cf:#counts   SARE_FROM_CHAR_W1251 245s/4h of 238550
corpus (112525s/126025h RM) 02/28/05
70_sare_header.cf:#counts   SARE_FROM_CHAR_W1251 640s/0h of 54176
corpus (16997s/37179h JH-3.01) 02/01/05
70_sare_header.cf:#counts   SARE_FROM_CHAR_W1251 0s/0h of 17050
corpus (14617s/2433h MY) 08/08/04
70_sare_header0.cf:headerSARE_FROM_CHAR_W1251 From:raw =~
/\=\?Windows-1251\?/i
70_sare_header0.cf:describe  SARE_FROM_CHAR_W1251 Displays in
unexpected charset
70_sare_header0.cf:score SARE_FROM_CHAR_W1251 4.000
70_sare_header0.cf:#stypeSARE_FROM_CHAR_W1251 spamgg
70_sare_header0.cf:#hist SARE_FROM_CHAR_W1251 Created by Bob
Menschel May 17 2004
70_sare_header0.cf:#counts   SARE_FROM_CHAR_W1251 180s/0h of 66979
corpus (41757s/25222h RM) 09/04/04
70_sare_header0.cf:#counts   SARE_FROM_CHAR_W1251 209s/0h of 38398
corpus (14914s/23484h JH) 08/14/04 TM2 SA3.0-pre2
70_sare_header0.cf:#counts   SARE_FROM_CHAR_W1251 0s/0h of 17050
corpus (14617s/2433h MY) 08/08/04


--
Chris


Re: SARE_CHARSET_W1251 and SARE_FROM_CHAR_W1251

2005-05-20 Thread Chris Lear
* John Wilcock wrote (05/20/05 12:15):
 Chris Lear wrote:
 They're in my header0.cf from sare/rules du jour. And in header.cf with
 a lower score as well. Have I got the wrong files?
 
 Methinks you have an old header0.cf that is no longer being updated - 
 these rules aren't in the current header0 on rulesemporium.com.

OK, thanks. I'll try to find out what's wrong with my Rules du Jour.

 
 And in any case you shouldn't be using header and header0 together...

I didn't know that. I'll fix that as well.

Thanks for your help.

--
Chris


Re: how to config SA to scan mail from localhost

2005-05-10 Thread Chris Lear
* Evan Platt wrote (10/05/2005 05:21):
 At 09:16 PM 5/9/2005, you wrote:
I'm testing the SA but my server can't connect to outside world. Thus, 
i've to send mail from localhost to myself to find how accurate SA is.
Unfortunately, SA don't scan mails that sent from localhost.

how can I reconfig it to scan every mail.
 
 You don't. You tell spamassassin what mail to scan. How are you calling 
 spamassassin, and what is your mail configuration? 
 

The original question is a restatement of yesterday's how to force SA
to scan mail that send from php post.

My reading of the situation (which might be wrong) is this:

The Original Poster wants to do some sort of project that will give
statistics on the accuracy of spamassassin. He has followed a recipe
that installs qmail with qmail-scanner, and has got a php script that
will send mail to the mail server. But the mail server appears to skip
the scan for local messages, so the project is getting no statistics.

The solution to this problem is to work out how qmail-scanner decides
what to scan, and change it. Unfortunately, I can't help there. I would
try doing a manual smtp connection from the local machine (telnet
localhost 25) and take it from there.

But my worry is that sending a load of e-mail via a php form will
produce hopeless project results, because it will effectively only test
the value of spamassassin's body checks. But perhaps that's part of the
plan.

--
Chris


Re: OT: Confession and rage

2005-05-06 Thread Chris Lear
* Stewart, John wrote (05/06/05 15:55):

[... excellent story chopped ...]

 Do I:
 
 - Never go there again, as I said would be the case in my previous email?
 
 - Show up and try to convince her what a horrible thing she is doing?
 
 - Just screw with their (horribly insecure) online site, signing up for
 appointments all day for Elmer Fudd, etc?
 
 - Simply ban their domain from my mailserver and report them to the RBLs?

Or...

- Offer them some consultancy, in return for a haircut (is this the same
as option 2?)

-- Chris


Re: Simply don't run spam for Mailing Liste

2005-04-28 Thread Chris Lear
* arnaud wrote (27/04/2005 23:06):
 Kris Deugau wrote:
 
[...]
 In my case, for instance, SA is called from procmail just before the
 message is written to a mailbox.  In my .procmailrc file, I have a
 number of procmail recipes that look something like this:
 
 # SATalk
 :0:
 * ^List-Id: users.spamassassin.apache.org
 /home/kdeugau/mail/spam-stomping
 
 This one files messages from this list in the spam-stomping folder
 before SA even sees the message.  I have quite a long list of similar
 entries for other mailing lists.
 
 -kgd
 
 Ok Thank you. As your can see, i haven't understand this option. I use 
 exiscan with exim. It would be better i suppose to perform spamassassin 
 with procmail that i use too.

Or use exim configuration rules to prevent scanning of certain messages.
If you are using exim's acls (either exim 4.50+ or older exim with the
exiscan-acl patch), something like this should work:


[in main config]
acl_smtp_rcpt = acl_check_rcpt
acl_smtp_data = acl_check_content

[in acls]
acl_check_rcpt:
[...]
# Set acl_m0 variable to tell the later acl not to use SA
accept hosts = veronyk.net : freetelecom.com
  set acl_m0 = dontcheckdata

[...]

acl_check_content:
# Skip all content checks if acl_m0 variable set
  accept condition = ${if eq{$acl_m0}{dontcheckdata}{1}{0}}
[...]
  deny  message = I don't like your nasty spam
spam = spamd:true/defer_ok
condition = ${if {$spam_score_int}{80}{1}{0}}
[...]