Re: thanks to thinking people.

2010-07-22 Thread Charles Gregory

On Thu, 22 Jul 2010, Benny Pedersen wrote:

On ons 21 jul 2010 19:09:55 CEST, Alexandre Chapellon wrote

You can have forged return-path and /or stollen credentials... in both
cases you look like a backscatter source.
i belive postfix is smart to change forged sender to something that is 
not fqdn before it bounce :)


A forged sender looks no different than a legitimate sender. Postfix would 
have no way to be 'smart' about this (except for some instances of SPF 
fail, but then why 'bounce'? Why not reject?).


- C


Re: [sa] Re: thanks to thinking people.

2010-07-22 Thread Charles Gregory

On Thu, 22 Jul 2010, Benny Pedersen wrote:

On tor 22 jul 2010 20:03:18 CEST, Charles Gregory wrote
A forged sender looks no different than a legitimate sender. Postfix would 
have no way to be 'smart' about this (except for some instances of SPF 
fail, but then why 'bounce'? Why not reject?).


and why not show logs ?


Sorry. Not OP. Just noting that the opinion that postfix should be smart 
enough to rewrite a forged sender just doesn't make sense.


bounces is newer external since postfix change sender to mailer-daemon with 
will end in some mailbox local if it was sent from local ip


 Postfix doesn't change the sender. Mailer Daemon is the 'sender' for 
all buonces. But it will be sent TO the original sender listed in the 
'From' header. If postfix has generated the From header based on 
transaction authentication, then a 'bounce' would indeed go back to the 
originating mail account. But if you are merely going by IP, then the 
'sender' that postfix tries to 'bounce' mail to will be the forged sender. 
And postfix has no way to know it is forged


- C


Re: [sa] Re: thanks to thinking people.

2010-07-20 Thread Charles Gregory

On Tue, 20 Jul 2010, LuKreme wrote:
We are talking about Checking OUTBOUND messages. It is perfectly ok to 
bounce internal messages.


Caveat: As long as proper care is taken to send the bounce to the 
authenticated sender of the mail and NOT just lamely use the 'From' 
header! Still prefer an SMTP reject over a bounce!


- C


Re: First run score: 25.7 Second: 2.6

2010-07-16 Thread Charles Gregory

On Fri, 16 Jul 2010, Emin Akbulut wrote:

X-Spam-Status: No, score=2.6 required=6.3 tests=HTML_IMAGE_ONLY_32,
X-Spam-Status: No, score=2.6 required=6.3 tests=HTML_IMAGE_ONLY_32,
X-Spam-Status: No, score=5.5 required=6.3 tests=HTML_IMAGE_ONLY_32,
X-Spam-Status: Yes, score=24.4 required=6.3 tests=HTML_IMAGE_ONLY_32,

(liberally snipped)

There are commas at the end of these lines, implying you have trimmed the 
rest of the list of tests that account for the different scores. Go back 
and assemble the FULL logs, so that we can see the difference in what 
tests fire and what tests don't.


Now if I have to GUESS on insufficient data, I would suspect that the
'port' of spamd to Windows(?) does not properly tidy up its children when 
finished. The fact that it crashes certainly points in this direction.

May I presume that you did a 'full' memory test?

To verify this situation, try running the same test as before, but leave a 
one minute gap between each run/test (and with no other spamd calls during 
that time interval!) so that we can see what happens when the spamd 
children have time to properly terminate.


- C

Ps. I'm not researching this deeply, so I may trip over some minor aspect 
of spamd coding/behaviour that the developers will call me on, I'm sure. 
:)


Re: [sa] How to block a network

2010-07-16 Thread Charles Gregory

On Fri, 16 Jul 2010, Igor Chudov wrote:

I receive a large number of spams from network IPs belonging to
SharkTech, 70.39.69.99 or so and so on.


Does UBuntu use 'iptables' firewall? Throw it in there, and
forget even the wasted initial SMTP connections.

- C


Re: First run score: 25.7 Second: 2.6

2010-07-15 Thread Charles Gregory

On Wed, 14 Jul 2010, Matt Kettler wrote:

On 7/14/2010 11:27 AM, Emin Akbulut wrote:
  I noticed randomly while I was testing SA. All I did is below:
WinSpamC  realspam.txt  result1.txt
NET STOP Spamassassin
NET START Spamassassin
WinSpamC  realspam.txt  result2.txt
WinSpamC  realspam.txt  result3.txt

result1: under 6.3
result2: very high
result3: under 6.3

That is quite strange.. sounds like you've got DNS timeout problems.


No, it's something more than that. Go back to the original test and there 
are other tests that stop firing like that FROM_IN_SUBJECT.


It almost seems like some of his spamd children are failing to load all 
their parameters. Noting the frequent crashes mentioned in another post, I 
would say that there is something to it.


I suggest to OP that he try the spamassassin executable, to see if this 
score anomaly repeats itself. If it is only happening on spamd, then 
somehow those crashes point to a problem. If the load is not to high, he 
could even use spamassassin for production. I do. And 99% of the time it 
works fine


(Footnote for people who will inevitably ask: My glue doesn't seem to like 
the way spamc returns the original mail.)


- C


Re: [sa] Re: First run score: 25.7 Second: 2.6

2010-07-15 Thread Charles Gregory

On Thu, 15 Jul 2010, Emin Akbulut wrote:

spamassassin.exe always calculates the same/correct score.


Good... Goood.

pamd second run reports only a few tests. Is it OK? I mean spamd runs 
all test but only adds which one increases score to it's report? Or 
these tests are processed tests list only? First run has tons of tests, 
second run has only 5 tests.


I am presuming, by your description that the exact same *unmodified* file 
is passing through spamc/spamd all three times, and that there are no 
other variables. The spamc calls are literalyl one after the other, with 
no change of userid or other change that would possibly lead toa different 
set of configuration files being read.


So this means that it is spamd itself that is 'different' on the second 
execution. You are going to need to enable verbose logging for spamd and 
do these three tests and see what messages appear in the logs (presumably) 
showing a failure to load config files on the second run.


Is it possiblt that the file LOCKING on your system prevents spamd from 
accessing certain files under certain circumstances?


What happens if you run ANY other messaeg through spamc as the 'second' 
run, and then run the third one on the orignial file? Is spamd sensitie to 
it being the same messaeg or just messes up on 8whatever* the second 
message would happen to be? Timing or content?


- C


Re: First run score: 25.7 Second: 2.6

2010-07-14 Thread Charles Gregory

On Wed, 14 Jul 2010, Bowie Bailey wrote:

First run:
---
X-Spam-Status: Yes, score=25.7 required=6.3 tests=HTML_IMAGE_ONLY_32,
HTML_IMAGE_RATIO_02,HTML_MESSAGE,LOCALPART_IN_SUBJECT


What sticks out to me is that most of the missing score hits on the
second run are from blacklists.


Quite true. What also sticks out to me is that test LOCALPART_IN_SUBJECT 
disappers which means that the headers on the second run are 
substantially different from the headers on the first run.


SOMETHING is severely mangling the mail between the two runs, and quite 
obviously this degrades spamassassin's capability to detect spam.


I suppose I should ask (of the OP) WHY there are two runs at all?

- C


Re: How to stop weird From: crap?

2010-07-12 Thread Charles Gregory

On Mon, 12 Jul 2010, Michelle Konzack wrote:

From: Coupon Dept. CouponDeptdOS_V`CcOP 
IW^GIdATOn2PbJK_/v...@perezcentral.com


I realize that the spammers will soon recognize that you are filtering 
them, but for the moment, why not score heavily on the 'unusual' 
characters inside these coded addresses?


header LOC_WEIRD_FROM From =~ /[...@\]*[\^\`\ ]...@\]*@/
score LOC_WEIRD_FROM 2
# not too high a score, just enough to tip them over...
# note: the '[...@\]*' confines the match to within a local address part

- C


Re: [sa] Re: How to stop weird From: crap?

2010-07-12 Thread Charles Gregory

On Mon, 12 Jul 2010, Karsten Bräckelmann wrote:

header LOC_WEIRD_FROM From =~ /[...@\]*[\^\`\ ]...@\]*@/
# note: the '[...@\]*' confines the match to within a local address part

Using From:addr instead is better and more accurate.


Provided the spammer doesn't use more than one address on the From 
header. :)


That RE is more complicate than it needs to, yet might even match the 
real name. From is not From:raw.


From:raw, acording to docs, only prevents decoding of quoted printable 
and base 64 strings, and preserves whitespace. So the RE, as given, looks 
for the angel bracket at the beginning of ALL possible addresses, and 
scans for the undesirable characters. I don't see any unnecessary 
complexity in the RE (except that yes, you could use From:addr and 
eliminate the sections that pin-down address, but I've already explained I 
prefer an RE that captures ALL addresses, not just the first).


As a footnote to OP, these characters ARE 'legal' even though rarely used.
That's why you can't score too high...


But I posted that solution yesterday already. Coming late to the show,
eh? ;)


1) Syadmins New Year's Resolution: I will read all posts before 
responding.


2) Sorry, I got used to seeing so much *discussion* trying to dissect what 
was, to me, an obvious problem that I got fed up with it, and figured no 
one else was posting a rule, so I would


Great minds, and all that? :)

- C

Re: Move SPAM to directory and notify user

2010-07-09 Thread Charles Gregory

On Fri, 9 Jul 2010, Jose Luis Marin Perez wrote:

In a CentOS 4.7 server I installed qmail + simscan + ClamAV + Spamassassin
3.3.0 that is working properly.
Now my intention is that when a mail is considered SPAM this is moved to a
folder called SPAM and in turn notifies the user (via email) so you can
review it.
Is it possible?


If your Spamassassin is properly adding a header showing the Spam 'score' 
as a row of asterisks, then you can check for that header in procmail and 
deliver accordingly. I would suggest, to avoid having a notice for every 
piece of spam, try instead to have a cron job that checks the spam folder 
for *new* mail, once nightly, and sends a single message to the user. Of 
course, if the user is always getting spam, then that notice gets ignored 
pretty quickly, so you may want to decide whether it is worth the trouble.


- C


Re: SA checking of authenticated users' messages

2010-07-08 Thread Charles Gregory

On Wed, 7 Jul 2010, Louis Guillaume wrote:

   (spamass-milter doesn't tell SA about auth) ==   [
   rbl checks run against authenticated user's IP address
   lack of ALL_TRUSTED for authenticated user's mail
  That last one seems to be my problem. Does the patch fix this? I'll
  try updating and see what happens.

Hi Again!
I just need to clarify one thing that's not clear to me in re-reading our 
thread from the other day: Is there a work-around for this?
My users are getting restless. Everytime their ISP changes their IP address I 
have to whitelist them!


Uh, I missed the original thread, so maybe this was explained, but why 
aren't the users sending mail through their ISP's SMTP server?


Presuming there is a good answer for this, then, have you considered just 
whitelisting based on the user's From: header? There's a trick to it: 
90% of the time, spammers have a harvested address, but *don't* have the 
NAME portion of the user's From: header.


So  build a rule that matches their WHOLE 'From:' header, like this:

header  LOC_FROMOURUSER  From =~ /^User Name theiraddr...@example.com/

Notice the absence of the coomnly usd 'i' flag on the regex.
If they have quotes around their name, include them in the regex.
The entire line shuold *exactly* match what the user's MUA generates.
The only thing that messes this up is when users have the annoying habit 
of changing their 'name' on their mail


Naturally, there is a small risk of having a spammer send a message with 
exactly that header, but really, how many of those will there be?


- Charles



Re: Problems with File::Scan::ClamAV

2010-07-03 Thread Charles Gregory

On Sat, 3 Jul 2010, sebast...@debianfan.de wrote:

i have a debian Lenny system with SpamAssassin version 3.3.1
  running on Perl version 5.10.0.


Is it running properly?

I had installed clamav and i got a problem by installing 
file::scan::clamav.


How is this connected to spamassassin? My first feeling on this is that 
you would have better luck posting this to the Debian lists. Or perhaps 
contact the author


I presume you have already googled for your errors :)

- C


Re: Whitelist programmatically

2010-06-26 Thread Charles Gregory

On Sat, 26 Jun 2010, Massimiliano Giovine wrote:

What does it do? How can i read the documentation of the spamassassin
behavior with whitelisting?


Firstly, the behaviour of the various whitelist options are described in 
the Mail::SpamAssassin::Conf documentation. There is a copy on the web

at:

http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.html

Now I have to ask what functionality you are trying to achieve that is not 
already in SA? Are you simply trying to give your users a 'friendly' way 
to add whitelist entries to their spamassassin config?


If so, and the volume of entries is not large, I would suggest you use an 
'include whitelistfile' command in the user's .spamassassin/user_prefs and 
then use whatever user interface you like to put a listing of whitelist 
commands into that file Using a separate file would avoid issues with 
a script error corrupting the main user_prefs file.


- Charles



2010/6/22 Martin Gregorie mar...@gregorie.org:

On Tue, 2010-06-22 at 07:28 +0200, Massimiliano Giovine wrote:

Really thanks for the answers.
So, i need to configure my spamassassin installation to use the
running database (i'm already using a mysql database for other reason)
for whitelisting or i have to write the logic of a whitelist using my
database installationa?


You can do it all in SA. The steps are:

1)add another table to the database. This need only have a single column
 that contains the list of e-mail addresses you want to whitelist.
 The column needs to be the prime key, which is normally indexed.
 The e-mail address needs to be indexed for good performance.

2)you need a way of adding addresses to the table. If you're happy to
 use SQL you can use the MySQL interactive SQL tool or wrap it
 in a shell script to implement a shell command like

       whitelist someb...@example.com

3)of course you need some form of backup, but MySQL's standard database
 backup and restore tools should do just fine.  If you already have
 a whitelist, you can easily load it into the database with the MySQL
 bulk loader.

4)you need to write or otherwise obtain a Spamassassin plugin to access
 the database and a rule to call the plugin.

My whitelisting plugin interrogates a database view containing a
moderately complex query. This appears to the plugin as the sort of
table I've just described. If I was implementing your plugin I'd:

- define a table that uses my view name as the table name and contains
 the same column name. This way I could use my existing plugin to
 access the whitelist table without any SQL changes, i.e.

       create table whitelist ( email varchar(80) primary key );

- Modify my plugin to work with MySQL. I use PostgresQL as my database
 but I think the changes would be minimal - possibly little more than
 configuration changes. I've never used MySQL, so can't be more
 definite.


Is there something i can read to go deep into this argument?


There isn't a lot. There's an SA document about writing plugins, which
is quite helpful. I found it was easy enough to read that and then grab
a plugin that accessed a database and modify that, but I do know some
Perl and understand object-oriented programming. You need both to
successfully create a plugin without too much trial and error. I found
that figuring out the database access was easy enough, but the SA
facility for configuring a plugin, i.e. telling it what sort of database
to access and where to find it, was poorly documented and did need quite
a bit of experimentation to get right.

Caveat: As I've never used MySQL the preceding description assumes that
it has all the tools that come as standard with every other SQL database
I've used.


Martin








--
-Massimiliano Giovine
Aksel Peter Jørgensen dice: Why make things difficult, when it is
possible to make them cryptic and totally illogic, with just a little
bit more effort?
Blog: http://opentalking.blogspot.com
Linus Torvalds doesn't die, he simply returns zero.


Re: Whitelist programmatically

2010-06-26 Thread Charles Gregory

On Sat, 26 Jun 2010, Massimiliano Giovine wrote:

You guessed right!
It's a little bit more complicated but the target is what you said!
If i write into user_prefs i have to restart spamassassin service?


H Not sure about that one. I know you have to restart spamd for 
changes to the site-wide config, but it wouldn't make sense to have to 
restart for every user change


Easy enough to test out... Make some changes and see if they take.

So, what are the complicated bits? :)

-C




2010/6/26 Charles Gregory cgreg...@hwcn.org:

On Sat, 26 Jun 2010, Massimiliano Giovine wrote:


What does it do? How can i read the documentation of the spamassassin
behavior with whitelisting?


Firstly, the behaviour of the various whitelist options are described in the
Mail::SpamAssassin::Conf documentation. There is a copy on the web
at:

http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.html

Now I have to ask what functionality you are trying to achieve that is not
already in SA? Are you simply trying to give your users a 'friendly' way to
add whitelist entries to their spamassassin config?

If so, and the volume of entries is not large, I would suggest you use an
'include whitelistfile' command in the user's .spamassassin/user_prefs and
then use whatever user interface you like to put a listing of whitelist
commands into that file Using a separate file would avoid issues with a
script error corrupting the main user_prefs file.

- Charles



2010/6/22 Martin Gregorie mar...@gregorie.org:


On Tue, 2010-06-22 at 07:28 +0200, Massimiliano Giovine wrote:


Really thanks for the answers.
So, i need to configure my spamassassin installation to use the
running database (i'm already using a mysql database for other reason)
for whitelisting or i have to write the logic of a whitelist using my
database installationa?


You can do it all in SA. The steps are:

1)add another table to the database. This need only have a single column
 that contains the list of e-mail addresses you want to whitelist.
 The column needs to be the prime key, which is normally indexed.
 The e-mail address needs to be indexed for good performance.

2)you need a way of adding addresses to the table. If you're happy to
 use SQL you can use the MySQL interactive SQL tool or wrap it
 in a shell script to implement a shell command like

       whitelist someb...@example.com

3)of course you need some form of backup, but MySQL's standard database
 backup and restore tools should do just fine.  If you already have
 a whitelist, you can easily load it into the database with the MySQL
 bulk loader.

4)you need to write or otherwise obtain a Spamassassin plugin to access
 the database and a rule to call the plugin.

My whitelisting plugin interrogates a database view containing a
moderately complex query. This appears to the plugin as the sort of
table I've just described. If I was implementing your plugin I'd:

- define a table that uses my view name as the table name and contains
 the same column name. This way I could use my existing plugin to
 access the whitelist table without any SQL changes, i.e.

       create table whitelist ( email varchar(80) primary key );

- Modify my plugin to work with MySQL. I use PostgresQL as my database
 but I think the changes would be minimal - possibly little more than
 configuration changes. I've never used MySQL, so can't be more
 definite.


Is there something i can read to go deep into this argument?


There isn't a lot. There's an SA document about writing plugins, which
is quite helpful. I found it was easy enough to read that and then grab
a plugin that accessed a database and modify that, but I do know some
Perl and understand object-oriented programming. You need both to
successfully create a plugin without too much trial and error. I found
that figuring out the database access was easy enough, but the SA
facility for configuring a plugin, i.e. telling it what sort of database
to access and where to find it, was poorly documented and did need quite
a bit of experimentation to get right.

Caveat: As I've never used MySQL the preceding description assumes that
it has all the tools that come as standard with every other SQL database
I've used.


Martin








--
-Massimiliano Giovine
Aksel Peter Jørgensen dice: Why make things difficult, when it is
possible to make them cryptic and totally illogic, with just a little
bit more effort?
Blog: http://opentalking.blogspot.com
Linus Torvalds doesn't die, he simply returns zero.






--
-Massimiliano Giovine
Aksel Peter Jørgensen dice: Why make things difficult, when it is
possible to make them cryptic and totally illogic, with just a little
bit more effort?
Blog: http://opentalking.blogspot.com
Linus Torvalds doesn't die, he simply returns zero.


Re: [sa] Re: NO_RELAYS spam

2010-06-18 Thread Charles Gregory

On Fri, 18 Jun 2010, Randy Ramsdell wrote:
I have no problem going over there but I am not convinced that the 
Amavis program is the problem. The header field is changed by 
spamassassin. Doesn't the email simply get handed to Spamassasin by 
Amavis where the headers are modified by spam report etc...?


The headers are missing.
Spamassassin records this fact, but is not responsible for it.
So find out what happens to your message BEFORE spamassassin is called.
Amavis is just a suggested starting place. And if it is to blame, someone 
on their list will reocgnize your query as soon as you post it.


Suggestion: After each step of your mail processing, if you can, save a 
copy of the mail to a log file. At least that way you get a quick overview 
of *which* component removes those headers


- C


Re: NO_RELAYS spam

2010-06-17 Thread Charles Gregory

On Thu, 17 Jun 2010, Randy Ramsdell wrote:
The original email did not hit the NO_RELAYS rule but subsequent runs 
through do hit this rule and it isn't on all email.


This sounds to me like you are 'resending' the mail from a local address 
to your mail server, rather than 'feeding' the original mail back into 
spamassassin. If this is the case, then you would naturally produce a new 
set of headers, and there would be no external relays, thus triggering the 
NO_RELAYS rule



Original rules hit.
X-Spam-Status: No, score=-0.394 tagged_above=- 
required=5tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, 
RCVD_IN_SORBS_WEB=0.619,URG_BIZ=1.585]


Right there, we see 'RCVD_IN_SORBS'. This would not happen even if your 
own server was blacklisted with SORBS. There *was* a Received header for a 
relay, and somehow you have 'removed' it, either via a filtering mechanism 
outside SA, or by 'resending' or 'forwarding' the mail.



After running spamassassin -D


If this is what you used, then the forwarding and header rewriting must 
have occurred prior to this. Did someone 'forward' the spam to you as a 
complaint? Users often fail to properly forward with full headers 
enclosed.


- C


Re: NO_RELAYS spam

2010-06-17 Thread Charles Gregory

On Thu, 17 Jun 2010, Randy Ramsdell wrote:
Hmmm, this mail came in and went straight to the users inbox.  1. Postfix 
--- 2. Amavis ( Spamd/Clamd)  --- 3. Postfix --- 4. Dovecot-deliver
So the problem is somewhere during the 2 ---  3  or step 3 or 4. Step 4 it 
is unlikely since Deliver simply send the file to a directory location.


I'm afraid I'm going to have to side with the people who suggested 
that something in the above steps is deleting headers. Postfix is pretty 
much guaranteed to add at least one Received header, even if it is just 
'Received from localhost'. so if you can guarantee that Step 1 is being 
done, then something in a later Step is removing headers.


Good luck with finding it! :)

- C


Re: Please Help with SA Rule: FH_HOST_IN_ADDRARPA

2010-06-17 Thread Charles Gregory

On Thu, 17 Jun 2010, gwilodailo wrote:


I've discovered that some mail between two of my clients (on separate hosts)
is getting flagged as spam, because of this rule (FH_HOST_IN_ADDRARPA). I'm
not at all an expert with spamassassin, and I'm having some difficulty
finding what this rule is about and what to do about it.


Your reverse DNS lookup for the hostname resolves to a string containing
'in-addr.arpa'. This can be corrected by setting your reverse DNS zone to 
a real hostname for the IP. If you are not in control of the DSN you may 
have to talk to your upstream provider.


If you are only doing this internally, and never send external mail from 
that host, you can just add a whteilst entry for that hostname.


-Charles


Re: SpamAssassin Integration

2010-06-16 Thread Charles Gregory

On Wed, 16 Jun 2010, Gnanam wrote:

I want to integrate SpamAssassin in my web-based application to test spam
score of the email content...


If this is your own custom web software, then it is as simple as adding a 
call to spamassassin (or spamc) in the same area of the script that 
validates things like the format of e-mail addresses. You can keep it 
simple and just report spamassassin's exit code, or you could parse the 
results from SA and pass them back to your user, so that they know what 
rules were triggered, and how to correct their e-mail.


If your web interface is pre-packaged piece of software, then it likely 
sends mail via your local SMTP server by calling 'sendmail' or an 
equivalent function that mimics that command. As long as the web client 
handles SMTP rejections and notifies users of problems sending, you 
should be able to run spamassassin normally in the context of your 
outgoing mail server.


- Charles


Re: More large spam....

2010-06-13 Thread Charles Gregory

On Sat, 12 Jun 2010, Karsten Bräckelmann wrote:

Please do not hijack a thread. Please do not hit Reply, if you do not
intend to reply and contribute to that thread. Removing all quoted text
and changing the Subject does *not* make it a new thread or post.
(Hint: In-Reply-To and References headers.)


(grumble grumble) Stupid mail programs (grumble grumble)
Yeah okay. Not so stupid. I'll comply

Footnote: and I was refraining from commenting on another thread on how 
people 'complain' about features of SA that don't work in ways that match 
*their* style of thinking Oh, the irony :)



Has there been any progress...

No changes since this has been asked the last time.


(nod) Alright. So far this is still a less than once a week phenomenon, 
for me personally. I just raise it occasionally to put a data point into 
the archives. If my inquiry had shaken lose a bunch of 'me too' comments, 
it might have led somewhere. But it hasn't, so the issue remains on the 
far back burner :)



There are just a very few rules scanning non-textual parts of a mail.
Large-ish binary attachments don't have much of an impact on
performance. Large-ish textual attachments potentially do.


Now THAT is a curious comment. All the usage guidelines I have ever read 
implied or outright stated that scanning mails over a certain size was a 
significant degradation to system performance. Am I confusing the 
guidelines for antivirus programs with those for SA? Would it be 'safe' to 
run SA on messages with larger attachments? Anyone ever tested this?


- C

Re: Set for Whitelist Only?

2010-06-13 Thread Charles Gregory

On Sat, 12 Jun 2010, andrewj wrote:
I am migrating to a new server with SpamAssassin. I have a well-known 
email address which is a common spam target, and I want to set it up so 
that only addresses on my whitelist are allowed, everything else is 
automatically blacklisted. How do I set this up?


Other advice on whitelisting aside, if your statement implies that you 
are starting to use spamassassin on mail that was previously unfiltered 
you might want to see how much spam actually still arrives in that mailbox 
once SA is doing it's job. I found that even some of my hardest hit 
mailboxes suddenly dropped down to a managable 3-4 spams delivered per day 
when I got SA working on them.


- C


More large spam....

2010-06-12 Thread Charles Gregory


I got another 1MB spam today.

I still don't want to kill my system by attempting to scan every large 
mail that comes in.


Has there been any progress on an 'option' to scan only text portions of 
mail past a certain size limit and/or scan only the first X bytes? The 
former is preferable because it avoids any issues with incomplete mail, or 
text sections being last


- Charles


Re: Performance problem body tests

2010-06-03 Thread Charles Gregory

On Thu, 3 Jun 2010, Helmut Schneider wrote:

I then started from scratch and tried with SA 3.2.5. The particular
body_tests take only 5 seconds (instead of 30).


As I mentioned before, I noticed this difference myself, and presumed it 
was just a characteristic of the 'improved' logic for deep-scanning the 
body of emails, and perhaps just a larger number of rules than before 
Though I am still intrigued by your comment that this happens only on 
'some' e-mails, not all. Apologies if I missed a response, but was there 
any difference noticable for the mails that process quicker?


- Charles


Re: Performance problem body tests

2010-06-03 Thread Charles Gregory

On Thu, 3 Jun 2010, Mark Martinec wrote:

Here is one common problem of 'certain mail messages'
taking a long time to process - unresolvable for now:
 https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5590


Sorry, but that bug has been around since 3.2.3 - it would not explain a 
sudden sixfold increase in processing time from 3.2.5 to 3.3.1.


- C


Re: [sa] Performance problem body tests

2010-06-02 Thread Charles Gregory

On Wed, 2 Jun 2010, Helmut Schneider wrote:

with certain mails on FreeBSD 8.0 and SA 3.3.1 I have a performance
problem:


What distinguishes 'certain mails'? Length? Content? Mime attachements?


So the body tests take ~ 30 of 37 seconds. It's not a load problem,


I noticed a significant increase in processing time when I upgraded from 
3.2 to 3.3. but it was pretty much for all messages.


You might want to raise the level of debugging so that you see the test 
which did NOT match, so that you can truly assess how long each body rule 
takes to process. Your logs show 6 second 'gaps' but they may have just 
been filled with non-hitting rule tests


- C


Re: SPF_HELO_PASS on a spam message?

2010-05-28 Thread Charles Gregory

On Fri, 28 May 2010, theTree wrote:

I received a spam email that scored zero on the SpamAssassin score. I think
it may be to do with the SPF_HELO_PASS that it scored - would someone be
able to give me some pointers?


I can't be certain with the munged headers, but it looks like
you are FORWARDING your mail internally from one server to another, and 
then doing an SPF check on the 'helo' between your two servers.


You might want to see if you can put SA on your gateway mail server. 
Otherwise, be sure that 'trusted_networks' is set properly, so that SA has 
a better chance of examining the received header from the first external 
connection.


- Charles


Re: Arabic Spam

2010-05-24 Thread Charles Gregory

On Mon, 24 May 2010, Jason Bertoch wrote:
A user reported the following FN to me which is written in an Arabic 
character set.  I have ok_locales en set, but I don't see any rules hitting 
that appear language related.  I also found the normalize_charset option, but 
don't know if it will help or hurt my ability to detect these messages. 
Ideas or thoughts?

http://pastebin.com/KtQSvZ5w


At a guess I would say the bulk of your score is attributed to the
URI in the body that has been flagged as being on the SURBL blocklist.

Beyond that, the issue seems to be that they have used a body 'type' of 
text/html without actually using HTML. So spamasassin is complaining about 
various aspects of the improper use of HTML... Though I can't see how it 
decided that a large font was in use


The solution here seems to be a combination of getting rid of the 'bad' 
URI from the text and gettin ghte sender to fix their (web based?) mail 
client so that all those HTML problems don't occur


- C


Re: percentage off spam

2010-05-18 Thread Charles Gregory


I agree that full smaples are needed.
The % Subject alone is not enough.
But I would expect there is something 'common' to the body
that would combine in a meta rule for decent score with minimal fp...

So throw some examples up on pastebin.

- C


Re: percentage off spam

2010-05-18 Thread Charles Gregory

On Tue, 18 May 2010, Kenneth Porter wrote:

 So throw some examples up on pastebin.

Here's some:
http://sewingwitch.com/ken/Stuff/foo.txt

I'm currently catching them with this:
header   KP_PERCENT Subject =~ /\b-?[78][0-9]%/
describe KP_PERCENT 70-89 percent in subject
scoreKP_PERCENT 1.0


Given how high these spams score already, this will work quite well.
I also noticed that the 'view in a browser' line repeats consistently.
I see some hits on RBL's and URIBL's. Perhaps those should score just a 
little bit higher?


- C


Re: [sa] Re: Custom rules - escape characters

2010-05-07 Thread Charles Gregory

On Fri, 7 May 2010, Daniel Lemke wrote:

Am I seeing ghosts or is this the third time you asked the same question on
this list? Your first mail was already replied so I suggest you have a look
there to get your answers.

Daniel


Oh, good, it's not my mail server acting up again! (smile)

To OP: Spamassassin uses perl regular expressions - man perlre

- C


Re: [sa] odd FPs

2010-05-05 Thread Charles Gregory

On Tue, 4 May 2010, Greg Troxel wrote:
Thanks - I did pretty much understand the tests.  What I'm boggled about 
is that they suddenly started firing, and then now suddenly do not.


This is perfectly consistent with the explanation I offered at the 
beginning of this thread. A legitimate Google MX was temporarily 
blacklisted. Given that it was hitting dialup Lists, I would guess that 
maybe Google was (re)assigned an IP block that was previously dynamic.


- Charles


Re: Scanning Outbound emails

2010-05-05 Thread Charles Gregory

On Wed, 5 May 2010, Bernd Petrovitsch wrote:

Why shouldn't it be possible?
SpamAssassin doesn't care where the mail comes from


Well, actually, it DOES. The test DOS_DIRECT_TO_MX being an example.

Which brings me back to the slightly confused feeling that I still get 
over 'trusted_networks' (which is what the OP should specify so that his 
outbond clients do not trigger RBL rules) and internal networks.


In particular, I find these two paragraphs from Mail::SpamAssassin::Conf
to be contradictory:

Trusted relays that accept mail directly from dial-up connections
(i.e. are also performing a role of mail submission agents - MSA)
should not be listed in internal_networks. List them only in
trusted_networks.

If trusted_networks is set and internal_networks is not, the
value of trusted_networks will be used for this parameter.

So my mail server handles ALL mail, incoming and outgoing. According to 
the first paragraph, I should not list my mail server under 
'internal_networks' because it is an MSA. Because I have no other MTA to 
list as 'internal' I have NO setting for 'internal_networks'.


But according to the second paragraph, this makes my MSA 'default' to 
being an internal_network because its value is lifted from

'trusted_networks'?

I don't think our dialup IP's are triggering the direct-to-mx rules, but 
that may only be because our dynamic IP's are not listed on the 
appropriate RBL's. So is the second paragraph *wrong* about the default 
usage? Or am I lucky? should I specify a 'not' rule for internal networks, 
just to preserve the trusted-only status of my dialups?


- Charles




Re: Scanning Outbound emails

2010-05-05 Thread Charles Gregory

On Wed, 5 May 2010, Jari Fredriksson wrote:

There is one special group that will suffer from that decision: namely
SpamAssassin users within your network.
If they do report their spam to SpamCop using SpamAssassin's own report
mechanism, they are screwed


Why not just add a negative-scoring rule for mail sent to spamcop?
I have to do the same for mail from this list, to avoid FP'ing on every 
post that quotes a bit of spam :)


- C


Re: [sa] odd FPs

2010-05-04 Thread Charles Gregory


On Tue, 4 May 2010, Greg Troxel wrote:

I use spamassass-milter and reject at about 8 points.  Normally this is
fine.  I just got a few false positives.
BAYES_40,DKIM_FORGED,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DOS_OUTLOOK_TO_MX,HELO_NO_DOMAIN,RCVD_IN_PBL,RCVD_IN_SORBS_DUL,RDNS_NONE,UNPARSEABLE_RELAY


Your list of 'matched' rules includes several DNS blacklists (PBL, SORBS).
Occasionally a gmail or earthlink server gets abused, and is temporarily 
blacklisted. IF they dn't fix it themselves, you may have to write to the 
blacklist maintainers and request removal of the IP


- C


How many Froms?

2010-04-28 Thread Charles Gregory

Hiyo!

Occasionally I see an e-mail with multiple addresses on the 'From:' 
header. (not the envelope)


Can anyone think of legitimate uses for multiple From: addresses?
Or could I just use a rule like:

header From =~ /\...@.*\@/

- C


Re: [sa] Re: How many Froms?

2010-04-28 Thread Charles Gregory

On Wed, 28 Apr 2010, David B Funk wrote:

There's an easy fix for that FP, just use the 'From:addr =~ '
varient of the header rule. That ignores the comment part
of the 'From:' address and only examines the stuff inside
the 'b...@blah.blah' part.


Avoid FP, yes, but also avoid the live header that is triggering the rule, 
which was *not* formatted with 


I guess I'll just test for *3* '@'s

- C


Re: Score overriding and behaviour

2010-04-27 Thread Charles Gregory

On Tue, 27 Apr 2010, Giampaolo Tomassoni wrote:

Also, why
body  __SOMMA   m'\Wsomma\W'i

doesn't fire? I have the Rule2XSBody plugin active. Maybe somehow it wasn't
compiled? But why, then?


Do ANY of the rules in your local.cf fire? Try putting a test rule that 
will 'always' fire (like 'header From =~ /\@/') at the end of local.cf, 
then if it doesn't fire, start moving it up, to see if you can home in on 
a line that is perhaps aborting further reading of local.cf


- C




Re: [sa] RE: Score overriding and behaviour

2010-04-27 Thread Charles Gregory

On Tue, 27 Apr 2010, Giampaolo Tomassoni wrote:

Do ANY of the rules in your local.cf fire?

Yes, they do. The __IN_ITALIAN rule referred by SOMMA and SOMMA2, in
example.


Just a side thought, but are we checking for SOMMA or SOMA? One 'm' or 
two? FRT_SOMA2


Try 'retyping' the __SOMMA rule without the m' 

body __SOMMA /\Wsomma\W/i

Also, look for a 'runaway' unclosed quote on a prior rule (though I would 
expect such a condition to barf error messages like crazy)


- C


Re: Whitelisting local domain (spamassassin qmail)

2010-04-26 Thread Charles Gregory

On Mon, 26 Apr 2010, Martin Caine wrote:

Received: from host[my_ip_address].in-addr.btopenworld.com (HELO
?192.168.32.10?) (mar...@[my_domain_dot_com]@[my_ip_address])
 by [our_servers_hostname].memset.net with SMTP; 26 Apr 2010 09:26:45 -


If 'my_ip_address' is truly 'internal' then you should be able to add it 
to 'trusted_networks'. But that allows *all* mail from that internal IP.


- C


Re: Whitelisting local domain (spamassassin qmail)

2010-04-26 Thread Charles Gregory


You used the phrase 'internal' to describe the IP from which you are 
sending your mail. If you are trying to send mail by connecting from an 
untrusted (external) dynamic IP address (including blackberries) then you 
need to use some form of SMTP authentication on the connection to verify 
that the mail is really legitimate mail from your domain. In which case


If your MSA properly inserts the auth information into the 
headers, SpamAssassin should react appropriately.


- Charles


On Mon, 26 Apr 2010, Martin Caine wrote:

Thanks for the reply. Unfortunately where I put my ip it's actually showing
the IP I have here at work, it's the IP assigned for our internet connection
in the office and is dynamic (and even if it was static, whitelisting it
would only fix the problem if we were emailing from the office and wouldn't
whitelist emails sent from blackberries, iphones and other locations).
--
View this message in context: 
http://old.nabble.com/Whitelisting-local-domain-%28spamassassin---qmail%29-tp28364411p28366716.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: [sa] Re: Match returned message headers on any NDR

2010-04-15 Thread Charles Gregory

On Wed, 14 Apr 2010, Kris Deugau wrote:
 I have yet to figure out why people think it's a good idea to relay 
mail from your domain host to your ISP account (especially when the two are 
different companies)


Do not mistake the following statement for any form of approval :)

To many of our users, Outlook Express et al. are mysterious black boxes 
into which they have 'entered their user information' follownig the 
'instructions' provided by their ISP. They are completely unaware that the 
mail client can handle more than one address/account. Or they may be dimly 
aware of the capability, but feel seriously overwhelmed by the options and 
accounts screens.


But forwarding? That is simple in concept. You fill in ONE form with your 
ISP and it is done. AND the ISP will provide help and support for using 
the form to setup forwarding. Most ISP's tend to shove off the onerous 
task of teaching their users how to use Windows.


So given the choice between filling in a form and 'using my mail the way I 
always have' and 'what if I do something wrong, mess up my mail and my ISP 
won't help me?', well, guess what they are going to choose?
Again, don't confuse this for approval of any sort. For my part, I try my 
best to help users make intelligent use of their software. :)


Personally, I *hate* forwarding because too many 'big players' setup 
'reputation based' filtering strategies. So for every False Negative that 
I forward, there is one more chance that some dimwit user will click the 
button that says this is spam and lower *our* reputation. (sigh)


/RANT :)

- Charles


Re: skipping dynamic tests for ISP's own dynamic networks?

2010-04-15 Thread Charles Gregory

On Thu, 15 Apr 2010, Royce Williams wrote:

I will also file a bug to suggest updates to the *_networks language
that is in direct contradiction to the advice in other parts of this
thread.


One thing I might add: It seemed to me that at certain points in the 
discussion there was confusion as to whether the status of the mail server 
running spamassassin was influenced by the mua_networks setting. I believe 
some language in the docs would be appropriate to distinguish any 
'*_networks' settings that would be appropriate for the server *running* 
SA.


Ie. The language about 'mua_networks' makes sense when you realize it is 
about 'trusting' *another* server on your network that has handled the 
dynamic IP's, as opposed to having SA simlpy 'trust' (as in 
'trusted_networks' the dynamic IP's from which it directly receives 
outgiong mail.


- C


Re: FROM_STARTS_WITH_NUMS matches on text-to-email

2010-04-13 Thread Charles Gregory

On Mon, 12 Apr 2010, Ted Mittelstaedt wrote:

Seriously, you shouldn't be asking that question.  The fundamental flaw
here is in the assumption that an all-number mailbox user ID is virtually 
certain to be spam.  It is not.  Clearly, the default score assignment to 
that rule is too high.


Well, firstly, the rule name says STARTS with nums. That would imply 
that the original condition for which this rule was created was NOT an 
'all numeric' user part, but perhaps some 'jumble' of characters that 
merely *starts* with numbers.


I would PROPOSE (to those with a nice testing rig) that the rule be 
modified so that there has to be at least one non-numeric character after 
the initial first 6 digits ie. /^\d{6,}\S*[^\d\s]\S*@/


This will reduce the 'hits' on phone numbers, while possibly still hitting 
the 'bad' usernames that it was intended to hit?


- C


Re: FROM_STARTS_WITH_NUMS matches on text-to-email

2010-04-13 Thread Charles Gregory

On Tue, 13 Apr 2010, Martin Gregorie wrote:

header FROM_STARTS_WITH_NUMS From =~ /\d{6,}[a-z._-][a-z0-9._-]{0,50}@/i


This regex requires that the 7th character be non-numeric.
Look at the regex I posted It covers all cases with six leading 
digits that is not a purely numeric address.


/^\d{6,}\S*[^\s\d]\S*@/

As an aside, let's not forget that the high score that is causing concern 
is only used when there is no bayes and no network testing


- Charles


Re: [sa] Re: FROM_STARTS_WITH_NUMS matches on text-to-email

2010-04-13 Thread Charles Gregory

On Tue, 13 Apr 2010, Martin Gregorie wrote:

header FROM_STARTS_WITH_NUMS From =~ /\d{6,}[a-z._-][a-z0-9._-]{0,50}@/i

This regex requires that the 7th character be non-numeric.



Nope - only that a character after the first six is a legal address
character but non-numeric.


Hmmm My bad.
I forgot that the '{6,} would match more than 6 digits...
Silly me. :-}

- C


Re: CLAMAV 0.95 to be disabled

2010-04-09 Thread Charles Gregory


Realize this is OT, and that even the instigation is OT :)
But I'm hoping someone here just KNOWS 'rpm'. and can help...
(Or can point me to the best forum for a quick answer)

While attempting to use rpm on RH9 to update to a newer set of clamav 
packages, the rpm process locked up, and I had to kill it, and now rpm 
does not seem to be working at all


I'm currently trying 'rpm --rebuilddb' but it's just sitting there, and 
I've got a feeling it has locked-up too


- C


Re: CLAMAV 0.95 to be disabled

2010-04-09 Thread Charles Gregory


OT - RPM

On Fri, 9 Apr 2010, Daniel McDonald wrote:

I'm currently trying 'rpm --rebuilddb' but it's just sitting there, and
I've got a feeling it has locked-up too

You've got to delete the __db.* files in /varlib/rpm before you run
--rebuilddb


I'm trying that now, but don't have much hope. None of the db files
were modified since 2007. So I suspect the corruption is in one of the 
other files :(


- C


Re: [sa] Re: CLAMAV 0.95 to be disabled

2010-04-09 Thread Charles Gregory

On Fri, 9 Apr 2010, Daniel McDonald wrote:

You've got to delete the __db.* files in /varlib/rpm before you run
--rebuilddb


That worked. Thanks! (wiping brow with relief)

- C




Re: Domain specific configuration files??

2010-04-07 Thread Charles Gregory

Rajesh M wrote:

if you standard score is say : 5.0
you can write a header rule to allocate a positive or negative score if
the to field contains the specific domain

example
required_score 5
header header1 To =~ /example1\.com/i
score header1 -1


Your rule would not work with Bcc mail (for example, mail from this list).

You might get the desired result by using a 'Received:' or 'Delivered-To:' 
header This will vary depending on MTA, so examine your own mail and 
test for consistent performance.


- C


Re: [sa] Re: Confused about how to use sa-update

2010-04-01 Thread Charles Gregory

On Thu, 1 Apr 2010, Phill Edwards wrote:

actually posting to the right place! Is this the official spamassassin
mailing list?


Your own spam filter might be eating a lot of the messages?
Try setting a rule to score -100 on mail received from apache.org...

- C


Re: Limit SA to scan messages 100k and below

2010-03-31 Thread Charles Gregory

On Wed, 31 Mar 2010, Keith De Souza wrote:

Sorry as I'm new to SA can you elaborated what you mean by glue?


Geek terminology for the program, script or other mechanism that 
'connects' your MTA and your SA. Ie. The calling MTA or its script must do 
the size check, then decide *whether* to call SA



I'm trying to understand why is it taking 300.0 seconds to scan a message
only 24Kb in size??


1) Server is overloaded. Your load only has to go 10-20% over your 
system's 'maximum capacity' to cause processing times to jump from 20 
seconds up to five minutes or more


2) Something that SA relies upon, like your DNS server, is taking way 
too long to do its job. Check that your DNS has a reasonable timeout 
value. Otherwise it could be waiting for a non-existent domain

This would be the case if the problem occurs for certain addresses,
or more often on spam (which comes from 'unknown' systems) than on 
legitimate mail


3) There may be a 'locking' issue with any databases (Bayes?) that SA 
uses. Again, this may only become a problem under heave load, with too 
many concurrent SA processes



My thoughs so far is to perhaps reducing the file size that SA takes to scan
and see if the scan time reduces.


It is a better idea to try and reduce the number of emails that SA will 
process at the same time.


- C


Re: Scanning large-body spam

2010-03-31 Thread Charles Gregory

On Wed, 31 Mar 2010, Henrik K wrote:

SA 3.3 has special handling for truncated messages


Excuse me for not *thinking* earlier, but it occurs to me that there is a 
very big drawback to *truncating* a message before passing it to SA, as 
opposed to my original request/suggestion to *flag* (or set a config 
param?) to tell SA to *ignore* parts of a message past a certain size.


I believe it is fairly common practice for MTA's to expect SA to return 
the *entire* message, complete with X-Spam header 'markup', from SA's 
standard output stream. This is particularly important where mail 
classified as *slightly* spammy is delivered to a special spam folder 
based upon the headers added by SA. Or on a system where all mail tagged 
as spam is quarantined. Having SA's markup/explanations is critical to 
analysing false positives/negatives.


So SA needs to read and write the *entire* message, but then be given a 
parameter to keep it from thrashing over the really large ones.


- Charles


Re: Scanning large-body spam

2010-03-31 Thread Charles Gregory

On Wed, 31 Mar 2010, Mark Martinec wrote:
 and let it handle arbitrary size messages by avoiding its current 
paradigm of keeping the entire message in memory.


Is there really a problem with the in-memory size? I would have thought 
the major concern was the processing time for evaluating 'full' (and 
rawbody?) rules on a large message



Anyway, the amavisd glue to SpamAssassin does just that: let SpamAssassin
see only the first 400 kB (configurable) of a large message, then edit
the original message based on results obtained from SpamAssassin.


Good for amavis-d, but not for those of us relying on SA to do the whole 
job, and not have our MTA's perform any further message modification


I would be interested in having some of the developers offer an opinion on 
this. Where is the real 'cost' in running SA against a large message? Is 
it just the memory used? Or is it, as I suspect, the use of 'full' rules?


- Charles


Re: Mega-Spam

2010-03-30 Thread Charles Gregory


(Subject line changed to remove the 'flag' to developers)

On Mon, 29 Mar 2010, Karsten Bräckelmann wrote:

.. But then again, this is a topic for the dev
list [1] to start a discussion, not here.


Uh, no, I'm not a developer. And the description of that list
specifically says...

For those involved in the development effort to discuss their work on
 the project. Unless you are working on a patch to SpamAssassin, this
 is probably not a list you need to use. If you're not already on the
 general users list, you should probably go there first...

THIS is the list for users to ask questions *and* make suggestions, and 
has been used this way many times in the past. And the list description of 
this list says (and personal experience has proven) that developers 
monitor this list as time permits.


So with respect, please stop telling people to clutter up a working list 
with (possibly) dumb ideas. And this one was mine, so I can call it dumb 
if I want to. I'm thinking it's not, but open to the idea it may be. :)


As a side note, I had not intended to open a discussion, but merely drop 
my suggestion in the 'suggestion box', but I am glad I posted here, 
because the discusion among users has clarified the extent of the 
problem. :)



[1] Also note your very own Subject.


Intended to grab the attention of time-constrained developers, though 
honestly I am regretting it now, because I don't like having that 'all 
caps' flag attached to a discussion. I've removed it from this post


- Charles

ATTN DEVELOPERS: Mega-Spam

2010-03-29 Thread Charles Gregory


Literally, Mega-Spam. I just got a spam with 1MB of images.

My suggestion has been made before, but I would like to ask that it now 
be taken a bit more seriously. SA needs an option to allow efficient
'partial' scanning of large e-mails, so that, for example, we can 
peform all the valuable header checks, and maybe even scan for URIBL hits 
within the first few hundred K of the body?


Is it possible (and easy!) to set a flag that tells SA to stop testing 
aganist the body when it reaches a certain byte count Or perhaps, if 
I understand the docs correctly, most rules only trigger on textual 
message parts anyway, so by simply disabling 'full' rules and possbily
'rawbody', we could get the desired result without too much of a 
processing hit?


- C


Re: ATTN DEVELOPERS: Mega-Spam

2010-03-29 Thread Charles Gregory

On Mon, 29 Mar 2010, Karsten Bräckelmann wrote:

You did read the entire thread, right? :)  There's nothing new about
this. Moreover, this still is a rare occurrence. Note even Charles, who
started this thread, claims to have received *one* such spam. And it
appears to be his first. ;)


Last September the number of spams exceeding 256KB became frequent enough 
that I bumped up my limit. Now I'm starting to see spams past the new 
limit (400KB). But when they jump up to 1MB, maybe it's time for a 
different solution, and maybe regain some of system efficiency by adding 
the suggested mechanism to SA and only doing significant body scans on 
messages less than 256KN again :)



Now, if this starts to become a more general pattern...


The spams I've seen so far look more 'amateur' than 'pro'. Easily tracable 
IP's. Blacklistable domains. I'm just throwing my idea into the queue now 
so that it can be smoothly integrated with a future release. We've got 
plenty of time, but I suggest not waiting until it becomes a big problem 
before desperately rushing to fix it :)


My 0.02 dollars

- C

razor default in SA 3.3.1?

2010-03-25 Thread Charles Gregory

Hallo!

Follow-up on SA 3.3.1 upgrade yesterday

My system changes log reported the addition of several files
named .razor/... which brought to my attentino that 'RAZOR2' tests
are now enabled by default in SA 3.3.1 

Is there anything that I should be concerned about? It seems to be 
functioning well, and I like the stats for the rules on rulesqa :)


- Charles


add_header + report_safe 0 positioning in 3.3.1

2010-03-25 Thread Charles Gregory


In case anyone else uses a script to scan the SA injected message headers 
to build log records (to detail matched tests, etc), and that script cares 
about the *order* of the headers, then please take note that in 3.3.1 the 
position of the 'report_safe 0' command in your .cf files relative to the 
add_header command(s) determines the position in which X-Spam-Report will

appear in the headers, relative to the others.

This is a minor difference from 3.2.5 - strictly speaking it gives 3.3.1
superior behaviour, with more control/flexibility. So no complaints. :)

Just wanted to mention this in case anyone else notes anomalies in 
their custom logging


- Charles



Re: razor default in SA 3.3.1?

2010-03-25 Thread Charles Gregory

On Thu, 25 Mar 2010, Michael Scheidell wrote:

(you using the freebsd SA port?)


CentOS 4 (RHEL 4) rpm from rpmforge

- C


Re: WARNING CENTOS USERS! BEWARE AUTO YUM INSTALL OF 3.3.1!

2010-03-25 Thread Charles Gregory

On Thu, 25 Mar 2010, fakessh wrote:

I have different problems with latest spamassassin from rpmforge. it does
not start


Did you run sa-update as per my warning?

- C


WARNING CENTOS USERS! BEWARE AUTO YUM INSTALL OF 3.3.1!

2010-03-24 Thread Charles Gregory


Had a nice HEART-STOPPING moment this morning! Logged in and
found my mailbox had no new mail! WTF!??

Checked the logs and discovered that my nightly automatic updates via YUM 
had pulled in the new SA 3.3.1-3.


WARNING: Centos does NOT run the required sa-update to get all the files
into shape to run with the new SA engine! SA will ERROR.

In my particular case, it turns out the 'Mail Avenger' MTA doesn't handle 
the error condition the way I expected and was dropping the mails on the 
floor! OUCH! :(


Fortunately I have been reading all the posts about 3.3.1 with a view to 
installing it as soon as I was sure there were no major bugs. So I knew 
what to try first, and thankfully, yes, it was as simple as running 
sa-update. Mail is flowing again! Yay!


But if anyone is running CentOS and runs yum manually, be warned that SA 
3.3.1 will come in on the next update and you will have to run sa-update 
manually as soon as it is installed.


- Charles, HWCN


Re: [sa] correction: was: WARNING CENTOS USERS! BEWARE AUTO YUM INSTALL OF 3.3.1!

2010-03-24 Thread Charles Gregory

On Wed, 24 Mar 2010, R P Herrold wrote:

 WARNING: Centos does NOT run the required sa-update to get all the files
 into shape to run with the new SA engine! SA will ERROR.
rather: ... some third-party repository packagings, oriented to be used on 
CentOS, do not ...


Correct.
My warning more specifically applies to RPMFORGE rpm of SA-3.3.1-3...

The CentOS provided packages are fine -- the independent packager 
aftermarket has the unexpected behaviour


(nod)

Thanks for the clarification.

- C


Re: [sa] Re: Yahoo/URL spam

2010-03-23 Thread Charles Gregory

On Tue, 23 Mar 2010, Alex wrote:

This is what I have:
/^[^a-z]{0,10}(http:\/\/|www\.)(\w+\.)+(com|net|org|biz|cn|ru)\/?[^
]{0,20}[a-z]{0,10}$/msi


My bad. I got an option wrong. Please remove the 'm' above.
I always get it backwards. According to 'man perlre' (the definitive 
resource for SA regexes!) the 'm' makes '^' match every newline!

We want it to only match the beginning of the body.

So just remove it, and, as noted by others, add the '^' that was 
missing... like so


... ]{0,20}[^a-z]{0,10}$/si

- Charles


Re: Yahoo/URL spam

2010-03-22 Thread Charles Gregory

On Mon, 22 Mar 2010, Alex wrote:

rawbody __BODY_ONLY_URI

/^[^a-z]{0,10}(http:\/\/|www\.)(\w+\.)+(com|net|org|biz|cn|ru)\/?[^ 
]{0,20}[^a-z]{0,10}$/msi
This allows for some amount (up to ten chars?) of text before and
after the URI if I'm reading that right, correct?


Nope. With the /ms flags ^ and $ at beginning and end match the *whole* 
body as a single 'string' and permit 'any character' (. or [^x]) matches 
to also match newlines. So the above regex translates to:


/^ - Beginning of body
[^a-z]{0,10} - match 0-10 non-alpha characters *including* newlines
(http:\/\/|www\.) - match a uri beginning with http *or* www
(\w+\.)+ - match multiple occurences of word followed by .
(this will match 'domain.' *or* 'www.domain.')
(com|net|biz|org|cn|ru) - match TLD (adjust to fit your mail)
\/? - match a slash if there is one
[^ ]{0,20} - match 0-20 non-blank characters (page name, if given)
[^a-z]{0,10} - match 0-10 non-alpha chars including newlines
 (did I TYPO in my OP and leave out the '^'?)
$ - match end of body
/msi


Is it possible to determine the beginning of the line with a body rule?


Insert '\n' into the above regex where you want to match newline.

I didn't think that was possible. I believe this is also what this is 
trying to do?


It's possible, but NOT what this regex does. Essentially this regex 
matches against a complete body that consists of nothing more than a 
single URI on a line, with possible blank lines before or after.
Rather than test for newlines, I test for non-alpha so that a stray space 
or tab or LF code does not fail to match.


This simple regex can also be 'dressed up' with elements of the form
(\[^\\]+\ +)+ to match any HTML code inserted before or after the 
URI. A regex could also check for a link consisting of text 
enclosed by a href=... ... /a


They key is to be sure that you don't use '*' or '+' in any context where 
it could 'run away' and try to match large message bodies This way as 
soon as the body exceeds 40 characters on either side of an unbroken 
string of characters it stops the test. Relatively efficient for a rawbody

test

- C


Re: Yahoo/URL spam

2010-03-19 Thread Charles Gregory

On Thu, 18 Mar 2010, Ned Slider wrote:
If that's not an option, how about a meta rule for FROM_YAHOO and 
__HAS_ANY_URI (this rule exists in SA).


Lots of ham may contain a URI, but how much ham contains ONLY a URI?

Rough outline of rule, untested.

rawbody  __BODY_ONLY_URI
  /^[^a-z]{0,10}(http:\/\/|www\.)(\w+\.)+(com|net|org|biz|cn|ru)\/?[^ 
]{0,20}[a-z]{0,10}$/msi

Combine that with 'frequent abusers' like Yahoo, and you've got something 
you can give a few points


There will probably need to be a variant on this to account for HTML mail 
and/or the 'standard' footers inserted by free mail agents. Which 
incidentally, suprises me here. I thought Yahoo always added a tagline?


- C


Re: Hijacked thread :) (was: ruleset for German...)

2010-03-16 Thread Charles Gregory

On Mon, 15 Mar 2010, Karsten Bräckelmann wrote:

The TextCat plugin. Even part of stock SA, though not enabled by
default. Supports per-user settings.


(nod) For reasons specific to my MTA, I can't run SA 'per user', but I can 
choose the most common languages (en fr) in our system's mail and flag 
when neither of them are used (assigning UNWANTED_LANGUAGE_BODY a minimal 
score) - then the user can set a procmail delivery rule (quarantine when

that rule is present in the X-Spam headers). It will do. :)

But you just forked (to avoid the word hijacked) this thread, which is 
about a very specific, on-going spam run. The OP really doesn't want to 
identify German spam for scoring, cause that's likely his first 
language. ;)


My bad. :)

But my compliments on the OP's excellent English! :)

- C

Re: [sa] Re: ruleset for German Bettchen and Schlafzimmer spam

2010-03-15 Thread Charles Gregory

On Sun, 14 Mar 2010, Jörg Frings-Fürst wrote:

take a look at http://wiki.apache.org/spamassassin/CustomRulesets
and search to German Language Ruleset.


H. I guess this goes back to my inquiry about the Brazilian spam

I'm still looking for a way (hopefully) to simply identify the *language* 
of the mail (when not determined from CHARSET_FARAWAY rules), so that our 
users may opt-in for additional filtering based on language


- Charles

Re: [sa] Re: Bogus mails from hijacked accounts

2010-03-12 Thread Charles Gregory

On Fri, 12 Mar 2010, Dennis B. Hopp wrote:

describe FORGED_YAHOO Yahoo with non-Yahoo Reply-to address
header   __FORGED_YH1 From =~ /\...@yahoo\.com/i
header   __FORGED_YH2 Reply-to =~ /\...@yahoo\.com/i
meta FORGED_YAHOO (__FORGED_YH1  !__FORGED_YH2)


The problem with this is that the !__FORGED_YH2 matches
when there is *NO* Reply-To header at all!

You need something like this:

header __FORGED_YH2 Reply-To =~ /\@([^y]|y[^a]|ya[^h]|yah[^o])/i
meta FORGED_YAHOO (__FORGED_YH1  __FORGED_YH2)

(remove the negation from the meta)
This directly tests for an existing Reply-To specifically to a domain
that does not begin with 'yaho'.

However, keep in mind that the headers for *this* mailing list would 
trigger your rule. So you will also need to meta this with a rule that 
tests for yahoo mail server being the sending SMTP client


Gets tricky, doesn't it?

- C



Re: SMTP REJECT after DATA (was: SpamAssassin Milter Plugin...)

2010-03-10 Thread Charles Gregory

On Wed, 10 Mar 2010, R-Elists wrote:

Charles Gregory Quote:Re: [sa] Re: SMTP REJECT after DATA
The only efficiency to be gained is to reject as much as possible after the
RCPT_TO, before accepting DATA. But for systems like mine, with lousy user
cooperation, rejecting some of the mail after DATA is still the best
option.

i would say you are arguing both sides and that it might be the issue.


I'm arguing that with such a strong component of YMMV there is NO side 
in this debate that is so woefully wrong as to be labelled 'misguided', 
which is what I was responding to in my first posdt in this thread.



i would tend to believe that most have made the choice not to straddle the
fence


I made my own choice, as outlined above, but 'sit on the fence' with 
regard to my opinion on 'best practice' or 'misguided decisions', because 
I don't belive there really is any one 'good' or 'bad' decision (except 
maybe the decision to backscatter, but we all agree that is 'bad').



are you blaming the users for your administration?  ;-)


Naturally. All good adminsitration is customer driven. |-D

- C


Re: [sa] Inconsistent Application of Rules?

2010-03-10 Thread Charles Gregory

On Wed, 10 Mar 2010, Stephen Carville wrote:

I've been seeing several emails lately that are being scored low that,
from what I know of the SA rules should be scored higher.  A recent
example was a typical spam message:
FROM_STARTS_WITH_NUMS,RCVD_IN_DNSWL_LOW,URIBL_AB_SURBL,URIBL_JP_SURBL,
URIBL_OB_SURBL,URIBL_SC_SURBL,URIBL_WS_SURBL autolearn=no
The second message invoked a larger number of body check rules than
the first but I don't understand why.  Is that normal or do I have
something configured incorrectly?


The extra rules are all 'SURBL' blocklist tests which check the embedded 
URI against internet blocklists. It is not uncommon for the first few 
spams using a new URI to get through before the blocklists are updated.
By the time you reran your tests, they had been updated, and so it scored 
higher


- C



Re: [sa] Re: End of Thread [Was: [Emerging-Sigs] SIG: SpamAssassin Milter Plugin Remote Arbitrary Command Injection Attempt]

2010-03-09 Thread Charles Gregory

On Tue, 9 Mar 2010, Ned Slider wrote:
It's clear you either haven't read or haven't understood what Kai wrote, 
which btw was spot on.


More attitude. Yeesh. Kai has an opinion. And in fairness, I give his 
arguments some serious weight. It's not black-n-white. But this attitude 
that he/you have the 'best' solution is just yeah YAWN.



End of Thread.


Hope so.


Re: [sa] Re: [Emerging-Sigs] SIG: SpamAssassin Milter Plugin Remote Arbitrary Command Injection Attempt

2010-03-09 Thread Charles Gregory

On Tue, 9 Mar 2010, Brian wrote:

I'm happy to stay on the Postfix 'merry-go-round' for an answer, or we
can just agree Postfix can't easily do this and move on and stop
flogging this dead horse :-)


I use Mail Avenger for a front end SMTP Says it all

- Charles


SMTP REJECT after DATA (was: SpamAssassin Milter Plugin...)

2010-03-09 Thread Charles Gregory

On Tue, 9 Mar 2010, Kai Schaetzl wrote:

Second: you are completely misguided in your wish to reject mail after
SMTP data stage.


You may certainly argue for YOUR preference (and I emphasise *preference*)
for the most 'efficient' way to run an SMTP server, but there is nothing 
sufficiently 'wrong' with rejecting mail after DATA that you can use the 
term 'misguided'. All this term implies is your attitude


Apart from this, you make some nice arguments, but again, you seem to 
have a bias that weighs them too heavily.


It does not make any sense to process a complete message and then 
reject it.


If this were true, no one would have added 'header' and 'body' checks to 
the postfix configuration and no one would have been jumping through 
hoops to find ways to integrate SA into the front end of MTA's


Indeed, it makes far LESS sense to have a system accept mail but send it 
to a spam folder. That practice leaves the sender with the mistaken 
impression that their mail was sucessfully delivered. And argue as you 
will, there is simply no way to get a broad user base to adopt the habit 
of reviewing a spam folder. I mean the whole point of filtering is that 
the user no longer has to sift through a pile of junk, right?


Processing a message takes CPU power and precious SMTP time. Doing that 
at SMTP stage means you cannot take in as much mail as you could. It 
also means that the sending MTA cannot send as much mail as it could.


Think about that statement twice. It IS correct, but it is an argument FOR 
processing mail at SMTP time. A legitimate outbound SMTP sever is *never* 
as busy as an incoming mail server. So a leigitimate server will not 
suffer *any* penalty from my system introducing a 5-6 second delay into 
the SMTP transaction. But a spammer's zombie is trying to pump out mail as 
fast as it can. The spambot will be slowed down. That is a GOOD thing. 
Yes? :)



There are other reasons not to do this, for instance legal ones.


Again, you are quoting arguments that favor SMTP reject. It is better to 
reject a mail, so that legitimate senders know it, rather than have them 
believe it was delivered when it was sent into a spam folder, perhaps 
suffer consequences and then sue the recipient. Sure, OUR butts will be 
covered by our user agreements, but only if we have jumped through hoops 
so that the user cannot claim they did not know about their spam 
folder. But in the real world, even if we don't get sued, we get a lot of 
people complaining that they didn't know about the optional spam folder 
on our system that the user turned ON themselves! Now we use a spam 
folder for 'borderline' spams that score 5-10. The rest get rejected at 
SMTP time. But still I get these occasional complaints It's just the 
way users are LOL



The idea is not to punish the other side because it sends spam.


If they send spam, I'm happy to see them punished. If they send 
legitimate mail, they should not be punished for the actions of spammers 
by having to GUESS whether their mail made it through.



The idea behind a rejection at SMTP stage is twofold: avoid unnecessary
processing and avoid unncessary traffic. None of that is achieved if you 
take a whole message, scan it and reject it at SMTP stage.


Well, firstly, ALL of that is achieved *regardless* of these arguments 
because the helo/rbl checks are done BEFORE the DATA stage. The only 
'loss' of time is on mail that you were going to have to fully process 
anyway because it made it past those checks. No loss to me. A few seconds 
delay on the SMTP connectino that saves a legitimate sender worry without 
incurring the 'cost' of backscatter, and actually might slow spammers down 
a bit. Maybe I personally don't gain any time. But maybe by the end of 
the day the spammer doesn't get to send quite as many e-mails, and someone 
out there enjoys less traffic on their server!


- Charles


Re: [sa] Re: SMTP REJECT after DATA (was: SpamAssassin Milter Plugin...)

2010-03-09 Thread Charles Gregory

On Tue, 9 Mar 2010, Kai Schaetzl wrote:

and you find it doesn't make sense to spam-scan messages and
reject them in/after DATA stage in a real world scenario.


You ignore my arguments. Hardly surprising.
You reword yours, but say nothing new.

It makes only sense if you are die-hard spam-fighter who wants to 
retaliate...


I stated my objectives and they have nothing to do with this pathetic 
straw-man argument.


Most if not all of your arguments are arguments for spam-filtering 
mail, not in favor of rejection at DATA stage.


How is that English-as-a-second-language class coming along?
I refuse to bore this group by repeating arguments that you so grossly 
mis-categorize in a feeble attempt to promote your point of view.


Last, keep in mind that filtering mechanisms in whatever stage are not 
solely meant for rejecting or spam-fighting, they are for *filtering* 
and then assigning appropriate actions - which often have nothing to do 
with spam/malware detection at all.


Now THAT is off-topic. We are discussing the use of SA at SMTP time.
Please stay on-topic for this group, and for this thread.

If you actually care to continue, I expect a reasonable response to my 
arguments about rejection being better than bouncing or silent diversion.
Geez, you didn't even try to advocate a system of notices to the user to 
overcome the 'silent' portion of that argument. Do I have to argue both 
sides for you? :)


- C


Re: [sa] Re: SMTP REJECT after DATA

2010-03-09 Thread Charles Gregory

On Tue, 9 Mar 2010, Andy Dorman wrote:
So even if we can decide an email is spam before the DATA stage, it 
makes no difference since we have to store the thing for a while anyway 
in case the user wants to look for something caught that shouldn't be.


(nod) To rely on this methodology requires that you *rely* upon your
users to apply a conscientious and consistent system of reviewing their
spam trap/folder on a regular basis. If you have this, then without 
sarcasm I would say you are very fortunate.


But in a system like mine where educating ignorant users is difficult at 
best, it feels a bit too dangerous to allow (too much) mail to be received 
and held without notice to the sender. And unfortunately SMTP protocols do 
not contain a code to tell the sender that mail was 'accepted but held for 
review'. The only way to do that is with a separate mail, and that leads 
back to the backscatter horrorshow, which I am quite sure you would never 
advocate :)


So for us (and we recognize not for everyone), the policy/practice we have 
chosen is the most workable and efficient. I think the only reason I 
leaped into this thread was because of the overbearing attitudes that 
seemed to completely ignore the fundamental notion of YMMV


- C


Re: [sa] Re: SMTP REJECT after DATA

2010-03-09 Thread Charles Gregory

On Tue, 9 Mar 2010, David Morton wrote:

Charles Gregory wrote:

Indeed, it makes far LESS sense to have a system accept mail but send it
to a spam folder.

Maybe in your particular situation, but you can hardly apply that to
everyone


(nod) It was subject to the conditions I consider 'wide spread' but by no 
means universal: the failure of users to review spamtraps.



- since we are supporting several large companies that find it
more acceptable to quarantine mail than to reject it, and *have* trained
their employees to look in a spam folder in the rare case that it is needed.


Stop it! You're making me jealous! LOL


If postfix and amavisd-new have improvements lately that allow for
efficient rejecting at SMTP time, that's great!


The only efficiency to be gained is to reject as much as possible after 
the RCPT_TO, before accepting DATA. But for systems like mine, with lousy 
user cooperation, rejecting some of the mail after DATA is still the best 
option.


Again, I emphasise 'some', and only speak out because someone is 
describing any approach other than their own as 'misguided'.
You are not misguided, and neither am I. We just have different 
situations.



Hmm... policy.  Sounds a lot like a feature of postfix, doesn't it?


LOL... And not at all 'misguided' :)

- C


Re: [sa] Re: SMTP REJECT after DATA

2010-03-09 Thread Charles Gregory

On Tue, 9 Mar 2010, Ted Mittelstaedt wrote:

  There are other reasons not to do this, for instance legal ones.
 Again, you are quoting arguments that favor SMTP reject. It is better to
 reject a mail, so that legitimate senders know it, rather than have them
 believe it was delivered when it was sent into a spam folder...


This is one of the stupidest arguments in this thread


Well, hey, now that we've got *that* off our chest

NOBODY is legally required to accept e-mail.  That is a crock of 
baloney.


Well then it's a good thing I didn't say that, isn't it?


It is NOT illegal to break a contract.


It's called 'fraud'. Look it up.

- C


Re: [sa] Re: SMTP REJECT after DATA

2010-03-09 Thread Charles Gregory

On Tue, 9 Mar 2010, Ted Mittelstaedt wrote:

  It is NOT illegal to break a contract.
 It's called 'fraud'. Look it up.
No, sorry, it's NOT fraud.  Fraud requires proving an intentional 
misrepresentation.


Well duh. Did you think I meant something else?


Breaking a contract does not imply that the
contract was entered into with an intent to break it.


But sending back an SMTP 'delivered' response when the mail was diverted 
to a spam folder could be PERCEIVED as misrepresentation (and therefore 
fraud, because clearly the decision to divert is based in policies 
established long before the implicit 'contract' of accepting a mail).
But again, I stress this is only true for the STUPID USER who does not 
understand that the spam folder is an alternate form of delivery TO THEM. 
My responsibility is complete (and legal) when that mail is delivered to 
either location.


It's all about the hassle and misperceptions. The fewer times I have to 
explain to users how their mail 'disappeared', the easier my life :)


And please remember that my entire context was only to stress that my weak 
definition of 'something illegal' was in CONTRAST to the utterly 
ridiculous notion that rejecting a mail at SMTP DATA time had anything 
illegal to it at all!


- C


Spanish/Brazilian/Mexican spam

2010-03-08 Thread Charles Gregory


Hello!

I think I asked about this once before. I keep getting foreign language
spams with noobvious (to me) indicators that I could test for

Can anyone take a look at this crud and see a header or flag/type that I 
could score in SA?


http://pastebin.com/3gGiaZVK

(Note: post is set to expire at 3pm Tues Mar 9)

Thanks!

- Charles


Re: UPS Delivery Problems

2010-03-03 Thread Charles Gregory

On Wed, 3 Mar 2010, twofers wrote:
I have been getting bombarded for weeks with these and even tho I have 
created specific rules in LOCAL.cf, Spamassassin refuses to even check


The only reason for SA to 'refuse' to check a mail is if it exceeds the 
SIZE LIMIT for scanning. This limit is most often not within SA itself, 
but a parameter in whatever script/shell calls SA.


If you are using 'spamc' as your client, make sure the -s (max size) 
parameter is a good size to catch jpg and virus spam. I use 40.


- C


Re: [sa] Re: is this right? uribl_dbl seems to have a very odd number

2010-03-03 Thread Charles Gregory

On Wed, 3 Mar 2010, Bill Landry wrote:

Yeah. You shouldn't be using it like that on 3.3.0. Go to
http://www.spamhaus.org/dbl and look for SpamAssassin on the FAQ page.

The DBL entries were added via sa-update yesterday, not added manually -
at least for me.


Anytime someone uses a new concept, like the URI checker that doesn't take 
IP's, shouldn't a new syntax be used, or a check for a new plugin?


- C


Re: [sa] Putting your dead domains to use

2010-03-02 Thread Charles Gregory

On Mon, 1 Mar 2010, Marc Perkel wrote:
For what it's worth - if any of you have domains you don't use you can 
point them to my virus harvesting server for spam harvesting.

(SNIP)
The sender has to do 
several other things in order to be blacklisted.


Simple question: Does your 'harvester' have the smarts to detect 
(possible) correspondence from domain *registrars* (or ARIN) to the owners 
of a domain name? I can't guarantee that someone somewhere doesn't have 
our old domain as a 'contact' even though the MX has been a non-existent 
server for the last several years.


Subject to this important consideration for the one possible form of 
'legitimate' mail, I have a domain that used to be excessively spammed,
which would be *perfect* to feed to your harvester... (unless the domain 
is in fact so old that it has dropped from spammers lists).


- Charles


Re: [sa] Setting Blacklist_from and whitelist_to

2010-03-01 Thread Charles Gregory

On Sun, 28 Feb 2010, damuz wrote:

Secondly, it occurred to me that all the (legit) mail to us will only be to
a handful of email addresses and much of the spam still getting through is
sent to spurious recipie...@mydomain.com.
So with this in mind, is it useful or advisable to setup those legit email
addresses as  whitelist_to  and if so, what becomes of the 'rest' of the
mail or do you have to define only receive to whitelist_to?


You have to 'fine tune' this kind of test. Keep in mind that the visible 
'To:' header is hardly more than a *comment* on the mail. It may contain 
a mailing list name, or another *valid* recipient on another domain, 
while the mail was sent to *your* domain as a 'Bcc' hidden recipient.


At the first stage of the SMTP transaction your MTA (should have) already 
rejected any mail that was actually 'addressed' to an invalid address.
So the issue you are dealing with can be described as 'mail to a 
legitimate recipient with a suspicious To: header'.


So it quickly devolves to the fact that the *only* thing you can reject is 
mail that has a 'To:' address that is @your.domain but which is not a 
valid (now or at any time in the past!) recipient on your domain. You 
can't flag mail that is 'To:' another domain. That could be valid!


Now you need to be careful that when you invoke a 'whitelist' you do so 
for the 'To:' header, and NOT for the envelope recipient, which, by 
definition will always be a 'hit'. Unfortunately, the standard 
'whitelist_to' will 'hit' on any embedded headers that your MTA adds to 
show the envelope recipient. You could essentially end up whitelisting all 
mail. So you need to whitelist on the visible headers *manually*


So, if your list of internal recipients is not overly large, you may want 
to try the following:


header  __VALID_MYDOMAIN  ToCc =~ 
/(validuser1|validuser2|...)\...@yourdomain.com/i
header  __TO_MYDOMAIN  ToCc =~ /\...@yourdomain.com/i
meta LOC_INVALID_MYDOMAIN ( __TO_MYDOMAIN  ( ! __VALID_MYDOMAIN ) )
describe LOC_INVALID_MYDOMAIN Address in To or Cc header to invalid address on 
our domain
scoreLOC_INVALID_MYDOMAIN 1

Obivously, score modestly until we are sure there are no false positives. 
The big 'problem' with this scheme is that *any* change to the list of 
valid users requires the first rule to be updated. So I only recommend 
this approach if you have absolute control over your mail system.


- Charles


Re: [sa] Re: Finding URLs in html attachments

2010-03-01 Thread Charles Gregory

On Sun, 28 Feb 2010, LuKreme wrote:
Your best bet is to check if mail claiming to be from paypal is, in fact, 
from paypal.


Actually, I think his problem is that the reference to paypal has been 
buried in an attachment, described as 'type' of 'octet/binary' so that SA 
won't think it is text and scan it, and thus he doesn't *have* any 
'visible' cue that the mail claims to be from paypal. And yes, I think 
that is a pretty serious problem.


Looks like he may have to use a 'full' test to look for the references to 
paypal


- C


Re: [sa] Re: Finding URLs in html attachments

2010-03-01 Thread Charles Gregory

On Mon, 1 Mar 2010, David B Funk wrote:

Looks like he may have to use a 'full' test to look for the references to
paypal

Been there, done that, doesn't work.
AFAIK SA ignores 'octet/binary' attachments for the rule engine. None of
the rules that I tried (uri, body, full, rawbody) saw anything that was
known to be in one of those attachments.


You may have to examine the 'raw' message and look for 'encoding' that 
disguises the URI's in the attachment. Ths whole thing might be encoded as 
base64 or something... A real mess to work with. You might have more 
success making a rule that looks for mime headers that are type 'octet' 
but named 'html'. You won't be able to score that too high on its own, but 
it might combine well in a meta rule with certain buzz phrases from the 
text portions of the e-mail.


- C


Re: Off-topic? Off-list!

2010-02-26 Thread Charles Gregory

On Fri, 26 Feb 2010, Karsten Bräckelmann wrote:

I know I'm tired from repeatedly deleting clearly off-topic posts
without even caring to open them. Wonder how the majority of subscribers
feels about it.


Well, there was a posting with some spam-related SPF stats the other day 
that proved very interesting. And relevant to how I might want to score 
SPF in my SA config. But yeah, it's otherwise getting a bit opinion-heavy 
and repetitve. Let's drop it an move on


- C

tflags userconf

2010-02-26 Thread Charles Gregory

Hallo!

Back on topic :)

I happened to notice that 'tflags userconf' was specified for a few tests 
that, as far as I could tell have on user configurable parameters.


Example (3.2.5):

25_spf.cf:tflags SPF_PASS   nice userconf

So what 'user configuration' is needed for SPF_PASS that is NOT needed
for SPF_FAIL? In general, what does a 'userconf' specification 'look for' 
before permitting a test to run?


- C


Re: tflags userconf

2010-02-26 Thread Charles Gregory

On Fri, 26 Feb 2010, RW wrote:

I'm guessing it's also used to exclude rules from score optimization.
There is a comment in 25_spf.cf:
# these are userconf so that scores are set by hand
tflags SPF_PASS nice userconf net
tflags SPF_HELO_PASSnice userconf net


Ah. I didn't see that because I was grepping * for 'SPF'... :)
Thanks.

- C



Re: Off-topic? Off-list!

2010-02-26 Thread Charles Gregory

On Fri, 26 Feb 2010, Karsten Bräckelmann wrote:

Don't make me stomp my foot (Homer Simpson).


LOL would you believe that someone in my girlfriend's computer class 
actually *said* to the instructor that famous Homerism, Where is the

ANY key? Yes, really. And they are old enough to vote Brrr

- C

Re: Off Topic - SPF - What a Disaster

2010-02-26 Thread Charles Gregory

On Fri, 26 Feb 2010, Benny Pedersen wrote:

On Fri 26 Feb 2010 06:50:12 PM CET, Marc Perkel wrote

And - SPF was originally introduced as a spam fighting solution.

alot of lies out there


Okay, this is getting stupid. Everyone on this thread, go to:

 http://www.openspf.org/Introduction

Spammers are explicitly identified as one of the problems addressed.
And even if this were somehow a 'lie', the original intent of the authors 
does not change whether SPF is *effective* for a given role. So this 
petulant arguing over its purpose is. (ad hominems snipped).


Take it off list, PLEASE.

- C




Re: Is there any Plugin to parse the “quoted email text” part in a mail (replied mail part)

2010-02-26 Thread Charles Gregory

On Fri, 26 Feb 2010, LuKreme wrote:

On 26-Feb-10 11:31, Karsten Bräckelmann wrote:

 Uhm, what's with your real name? (Rewritten in RE style.) How do you
 pronounce *82* f's in a row?

Fff for 8.2 seconds.


That's ten fs a second? Wow. Fast little F'er. ;)

- C

Re: [sa] Re: Bogus Dollar Amounts

2010-02-25 Thread Charles Gregory

On Thu, 25 Feb 2010, John Hardin wrote:

 i still see lot of junk mail coming with different charecters, i do not
 even read them clearly
 how can i stop those kind of emails

Reject languages you can't read at SMTP time?


I've been noticing more 'foreign language' spams that do not use
a 'foreign' character set and therefore do not trigger the 'faraway' 
rules I don't suppose anyone has developed a generic rule that would 
spot 'foreign language usage in non-foreign charset'?


- C


Re: SA on outgoing SMTP

2010-02-17 Thread Charles Gregory

On Wed, 17 Feb 2010, Kris Deugau wrote:
My experience has been that Outlook in particular (not Outlook Express 
or its descendant Windows (Live) Mail) does NOT in fact display SMTP 
error messages exactly as the server spits them out.  :(


Sorry. You've heard that old phrase goes without saying?
Well, I didn't say it. (smile)

Where Microsoft and error messages are concerned, I consider
it par for the course that what is reported to the user will be a 
miserable distortion of whatever actual error occurred. But just the same, 
the user will know that *something* has gone wrong with their mail.


Obviously the fewer FP's the better when dealing with confusing error 
messages :)


- C


Re: SA on outgoing SMTP

2010-02-16 Thread Charles Gregory


Slightly OT. To get 'control' of what my MX does at SMTP time I installed 
a simple SMTP daemon called 'Mail Avenger', which acts as a front end to 
my spamassassin and postfix. It's scripting capabilties allow for such 
interesting things as tracking the volume of mail sent by any one IP over 
a given time period. Stuff like that. Primarily designed for use as an MX, 
but no reason it couldn't help monitor/limit outgoing mail


http://www.mailavenger.org

- C


On Tue, 16 Feb 2010, Alexandre Chapellon wrote:

I have a quite buggy customer network, full of zombie PCs that spends all
days sending spam and wasting the whole reputation of my networks.
As a result it sometimes become quite hard to delivers queues for specific
domains such as Yahoo!'s hosted ones. Indeed they have some temp fail
(blacklist) mechanism that forbid my servers to send messages to them during
hours.
Taht's why I would like to setup some ougoing filtering to avoid sending too
much spam through my mail relays. I think SA can help me in doing so, but I
know too it's not really intented to work this way. I guess SA expects to
work on MX hosts more than on smtp relays.

My prerequisites are mainly:
    - STOP as much spam as possible at SMTP time (before queuing)
    - Have NO (or very few) false positives cause I could not manage telling
thousands of users that they should *always_have_a_subject*,
*shouldn't_write_the_subject_in_CAPS* or anything else.

Further more I can't rely on RBL because a lot of my dyn IP address are
regularily listed on different blacklist.

Does anyone have already setup something like that and what specific
config/tools/plugin could be usefull for me.
If some one already done it does he/she have any statistics about the
efficiency of this setup.

Best regards.


Re: [sa] Re: MTX - How does it stop spam?

2010-02-16 Thread Charles Gregory

On Tue, 16 Feb 2010, Kris Deugau wrote:
*nod*  This is the biggest question I still see remaining;  who maintains the 
blacklist?  How many spams can come from an MTX-approved IP before it 
can/should be blacklisted?


Why do we need any new/special blacklist at all? If the spamming from a 
given IP is sufficiently large, the regular internet blacklists will 
capture this IP and do a far better job of blacklisting, managing removes, 
etc, etc. Why reinvent the wheel?


- C


Re: MTX public blacklist implemented Re: MTX plugin functionally complete?

2010-02-15 Thread Charles Gregory

On Sun, 14 Feb 2010, Jonas Eckerman wrote:
1: The participation record is optional, so you only use it if you want 
everything else to be rejected.


This is why I would support mtamark... It permits the sysadmin to 
determine the default behaviour for his IP range, rather than defining a 
dangerous default in the client.


And I quote:
   This subdomain MAY be inserted at any level in the DNS tree for IPv4
   IN-ADDR.ARPA reverse zones.  For IPv6, to limit the number of DNS
   queries, _srv is only queried at the /128 (host), /64 (subnet) and /
   32 (site) level.  That way it can either provide information for a
   specific IP address or for a whole network block.  More specific
   information takes precedence over information found closer to the top
   of the tree.

The beauty of this mechanism is that we can 'sell' large ISP's on it by 
saying you only need to create one 'allow' entry for each legitimate MTA 
and one 'deny' entry for each netblock.


And for SA there is no need to give it 'starting' scores, like SPF, the 
mechanism is effective as soon as it is used, and ignorable if not...


- C


Re: MTX public blacklist implemented Re: MTX plugin functionally complete?

2010-02-15 Thread Charles Gregory

On Tue, 16 Feb 2010, Jonas Eckerman wrote:

  1: The participation record is optional, so you only use it if you
  want everything else to be rejected.
 This is why I would support mtamark... It permits the sysadmin to
 determine the default behaviour for his IP range, rather than defining a
 dangerous default in the client.

In what way does the above define a dangerous default?


It doesn't. My comment refers to early messages where the author of 
'mtx' said that the 'standard' behaviour in the absence of any mtx 
record as being equivalent to a 'deny' condition. That is, the domain 
would be scored as 'spammish' if it did not participate.


The default in the statement above is to consider a domain as *not* 
participating unless otherwise stated by whoever manages the DNS for the 
domain.


Correct. And my comment was that this was a much better alternative to 
the 'dangerous default' of having 'not participating' mean 'spammy'.


If the domain does not participate it should not be punished when a MTX 
record isn't found.


You got it. Exactly. And that's why I gave up on MTX. Because the author 
was insisting that exactly that should happen.


- C


Re: bayes learning '0 messages found'

2010-02-13 Thread Charles Gregory

On Sat, 13 Feb 2010, smfabac wrote:

Now that we're all on the same page. How do I find out why sa-learn
is not processing the legal not-spam file?  To re-cap, sa-learn --spam
--mbox isspam works but sa-learn --ham --mbox not-spam is not
working.


Well, I would expect if this suggestion were right you would have had all 
sorts of warning messages about syntax, but just in case


Maybe linux is interpreting the dash in the filename as a switch 
indicator? Try enclosing the file name in single quotes or use a filename 
without a dash...


- C




Re: MTX plugin created (Re: Spam filtering similar to SPF, less breakage)

2010-02-13 Thread Charles Gregory

On Sat, 13 Feb 2010, Per Jessen wrote:

Justin Mason wrote:

It might be useful to compare with MTA MARK and see what the status of
that proposal currently is:
http://tools.ietf.org/draft/draft-stumpf-dns-mtamark/

Amazing.  Justin, you must have known about that one - you can't
possibly have just googled it?


Well, I certainly had never heard of this one. And I think that with one 
minor variation in concept it could be useful to scoring systems like 
SA...


Because of the threat of hacks, any system that 'favors' an MTA is simply 
giving spammers a target for exploitation. But an explicit 'disallow' 
record (MTA=0) created by the sysadmin would have a similar impact to 
deliberately naming PTR records as 'dynamic'. SA could 'detect' the 
explicit MTA=0 and add a score (or block outright at MTA level) The 
only thing I would *not* do, given the general laziness of the internet, 
is apply any default meaning to the absence of this TXT record. Only 
explicit identification of an IP or subnet as 'not permitted to send mail' 
would have significance to SA or a blocking MTA.


H. Could work. No impact for non-implementation. Disables an 
unauthorized IP for any case where it is used. I like it...


- C


  1   2   3   4   >