Re: Bayes always reject.

2023-12-13 Thread Jeff Mincy
 > From: Pierluigi Frullani 
 > Date: Wed, 13 Dec 2023 07:49:24 +0100
 > 
 > Hello all,
 >  I'm facing a strange problem.

...
 > tests=BAYES_95,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS,T_SCC_BODY_TEXT_LINE

How did you feed this message into SpamAssassin?
Did you do something to strip off all of the email headers?

For the BAYES_99, as already mentioned you probably need to retrain
bayes, making sure to correct any incorrectly trained email messages.

-jeff


Re: BAYES scores

2023-02-28 Thread Jeff Mincy
 > From: joe a 
 > Date: Tue, 28 Feb 2023 11:37:34 -0500
 > 
 > Curious as to why these scores, apparently "stock" are what they are. 
 > I'd expect BAYES_999 BODY to count more than BAYES_99 BODY.
 > 
 > Noted in a header this morning:
 > 
 > *  3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
 > *  [score: 1.]
 > *  0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100%
 > *  [score: 1.]
 > 
 > Was this discussed recently?  I added a local score to mollify my sense 
 > of propriety.

Those two rules overlap.   A message with bayes >= 99.9% hits both
rules.   BAYES_99 ends at 1.00 not .999.
-jeff



Re: Hits on item with " No description available"

2022-01-20 Thread Jeff Mincy
Greg Troxel writes:
 > From: Greg Troxel 
 > Date: Thu, 20 Jan 2022 16:32:53 -0500
 > 
 > I followed my own advice about egrep -R and found this immediately
 > 
 > it's in
 > 
 > 3.004006/updates_spamassassin_org/72_active.cf
 > 
 > and it is
 > 
 > ##{ FSL_HELO_NON_FQDN_1
 > header  FSL_HELO_NON_FQDN_1 X-Spam-Relays-External =~ /^[^\]]+ 
 > helo=[a-zA-Z0-9-_]+ /i
 > ##} FSL_HELO_NON_FQDN_1
 > 
 > with score
 > 
 > score FSL_HELO_NON_FQDN_1 2.361 0.001 1.783 0.001

BTW: You can create tags (using Exuberant ctags) for spamassassin rules:

I create the tags using:

ctags -f SPAMASSASSIN_TAGS --langdef=CF --langmap=CF:.cf --languages=CF 
--regex-CF='/^[ 
\t]*(header|mimeheader|describe|body|rawbody|full|meta|uri|urirhssub|uridnsbl|urirhsbl|tflags|score|replace_rules)[
 \t]+([^ \t]+)/\2/'   ~/.spamassassin  /var/lib/spamassassin 
/usr/share/spamassassin 

So, I can do Meta-. in Emacs and it goes directly to the 'header  
FSL_HELO_NON_FQDN_1' definition

-jeff


Re: DCC whitelisting

2015-06-11 Thread Jeff Mincy
   From: sha...@shanew.net
   Date: Thu, 11 Jun 2015 10:02:59 -0500 (CDT)
   
   On Wed, 10 Jun 2015, John Hardin wrote:
   
On Wed, 10 Jun 2015, Shane Williams wrote:
   
 Two examples that I know are legitimate senders, but get caught by DCC
 (and pyzor in some cases) and other rules that push them over the
 threshold are the SourceForge.net Project of the Month list and
 various Netflix emails to customers (New Arrivals or we just added a
 show you might like).  In both those cases, the user part of the
 env_from changes, and as I understand it, the DCC Whitelist doesn't
 allow wildcards, so I can't have an entry that matches the server
 part.  Maybe I could be using the substitute List-ID: syntax, but
 neither of those has List-ID as a specific header.
   
Can you reliably identify those at the MTA level and tell the SA glue to 
skip them entirely?
   
   I probably could, but that also seems kludgy.  DCC has a whitelisting
   capability, so why not use it?
   
   Am I misunderstading what DCC's whitelist is intended for?
   
There are numerous ways to whitelist messages in DCC
The easiest is to whitelist by mail_host, eg
  ok substitute mail_host ecerts.americanexpress.com
you put the entries in /var/dcc/whiteclnt (or wherever you have the files 
installed).

The mail_host is the stuff after the @ in the return-path header.

You can test the entry by calling dccproc with the full email
message, eg:
/usr/local/bin/dccproc -d -H -Q -S mail_host -S Sender -S List-ID -S From -l 
~/.dcc -w /var/dcc/whiteclnt -R   put_your_email_message_filename_here

You may need to change dcc_conf to make sure that mail_host is
included at startup
  DCCIFD_ARGS=-SHELO -Smail_host -SSender -SList-ID -SFrom


You can also look at the proof of concept dcc scripts on 
http://www.rhyolite.com/dcc/

  CGI Demonstration
There is a demonstration of the proof of concept CGI scripts that
allow users to maintain individual whitelists and monitor individual
logs of rejected mail at http://www.rhyolite.com/dcc-demo-cgi-bin/
or http://cgi-demo:cgi-d...@www.rhyolite.com/dcc-demo-cgi-bin/. It
requires a user name of cgi-demo and a password of cgi-demo the same
as the user name.

-jeff


Re: effectiveness of DCC checks?

2015-04-14 Thread Jeff Mincy
   From: Quanah Gibson-Mount qua...@zimbra.com
   Date: Tue, 14 Apr 2015 10:59:28 -0700
   
   I've noticed that DCC_CHECK is flagging on tons of items that are clearly 
   not spam.  The most recent hit for me today was a release announcement from 
   the mariadb folks.  Overall, it's a trend I'm routinely seeing where it is 
   flagging a lot of email that clearly isn't spam.  Are others who use DCC 
   seeing similar issues?
   --Quanah

You need to whitelist bulk senders in DCC.   See the DCC manpage:

dcc(8) - Ubuntu Manpage
  Whitelists are the responsibility of DCC clients, since only they know
  which bulk mail they solicited. The only false positives (mail marked
  as bulk by a DCC ...

-jeff


Re: SpamRATS RBL?

2015-03-18 Thread Jeff Mincy
   From: Kevin A. McGrail kmcgr...@pccc.com
   Date: Wed, 18 Mar 2015 10:21:39 -0400
   
   Anyone use this RBL or familiar with it? Pros/cons? Efficacy data? 
   regards, KAM
   
I get 5% spam hits on DYNA and 10% on NOPTR.  The SPAM list isn't that
great ( 1% spam and some false hits).

-jeff


Re: Rule to match a blacklist of email addresses.

2015-01-10 Thread Jeff Mincy
   From: Steve spamassassin_st...@shic.co.uk
   Date: Sat, 10 Jan 2015 14:23:36 +
   
   
   I have a domain for which (for historic reasons) I want a catch-all rule 
   to accept email. Until recently, Spamassassin has done a great job of 
   separating the ham from the spam.  Recently, I've been receiving a large 
   number of spam emails which have been misclassified as ham.   These 
   annoying spam emails tend to be addressed to a relatively small number 
   of email addresses at my domain - addresses which have never been 
   used/provided, so should be a very strong indicator of spam.
   
   If I were to have a list of a few dozen email addresses of the form:
   
   bogus_us...@mydomain.com
   onlyspample...@mydomain.com
   ...
   unwantedrubb...@mydomain.com
   
   
   What is the easiest way to implement a rule that checks against such a 
   list - and ups the spam-score if matched?  Would I have to implement a 
   separate rule for each address?

use blacklist_to bogus_us...@mydomain.com ...

This will lead to hits on USER_IN_BLACKLIST_TO

-jeff


Re: Spam messages bypassing SA

2014-10-28 Thread Jeff Mincy
   From: Bob Proulx b...@proulx.com
   Date: Mon, 27 Oct 2014 18:37:35 -0600
   
   In the first email:
   
 # The lock file ensures that only 1 spamassassin invocation happens
 # at 1 time, to keep the load down.
 #
 :0fw: spamassassin.lock
 *  40
 | spamc -x
   
   Kevin A. McGrail wrote:
geoff.spamassassin140903 wrote:
 Kevin A. McGrail wrote:
  Using procmail without MTA glue is OK for many uses.  I am wondering 
how
  many spamd connections you allow and if you have checked your logs?
 
  I also cannot remember but the uses of a lock file seem odd for
  something that can thread.  Any one know if that is a good idea to
  remove?

 I wonder if you could explain in simple terms what the lockfile achieves
 in this situation? Is it even possible that it could cause messages to
 bypass SA?
   
I don't think a lockfile achieves anything because it's a call to a 
program.
Procmail has some weird syntax so hopefully someone with some procmail-fu
can tell us if a lock on a procmail system call does anything.
   
   Well...  The comment in the example explains what the lock is
   attempting to do.  I think that comment got missed in the follow-ups.
   The lock will restrict spamassassin invocations to one at a time to
   prevent a high system load average running too many spamassassin
   processes all at once.  It will serialize spamassassin invocations to
   one at a time instead of many in parallel.
   
   Normally the MTA will receive incoming messages and will fork a
   process for each incoming connection.  If the outside world connects
   and sends 100 messages all at once then there will be 100 MTA
   processes running in parallel.  If 10,000 all at once then probably
   some MTA process limit will prevent forking that many depending upon
   your configuration.  Each of those will try to send the message
   through procmail and spamassassin in parallel too.  Running 10,000
   procmail processes in parallel probably won't be a problem since it is
   light weight.  However running perl spamassassin 100 or 1,000 times in
   parallel all at once can be quite a resource hit to a moderate system!
   
   By putting the lock in the procmail rule it prevents more than one
   perl spamassassin process from running at a time.  This keeps the
   system from being overloaded due to a spike from the outside world.  I
   want to emphasize that the outside world impacts the system and can
   have an effect of a DDoS just by overwhelming the system with external
   connections.  The MTA has limits to prevent this but while those are
   tuned for normal delivery the MTA maintainers won't know if you are
   running each message through spamasassin and causing a higher load
   because of it.  The default MTA limits are probably too high when
   considering running the message through spamassassin too.
   
   The procmail example comes from the wiki page example:
   
 http://wiki.apache.org/spamassassin/UsedViaProcmail
   
   The wiki page example is launching spamassassin not spamc.  That
   is an important difference to this case.  Someone has changed that to
   spamc in the above and preserved all else including the serialization
   lock.  The spamc talks to a spamd and so the number of parallel
   processes spamd can handle depends upon the spamd configuration.  In
   the spamc use I would be inclined to remove the serialization lock.
   Let it be throttled at the spamd side of things instead.  That would
   make the most sense to me.  Then tune spamd's limits as needed.
   
   In summary I suggest removing the serialization lock from the spamc
   recipe.  Give it a try and monitor system resource utilization.  Start
   tuning at spamd.  Tune other things as needed afterward.
   
 :0fw
 | spamc -x
   
 :0e
 {
   EXITCODE=$?
 }
   
   Bob


I agree with everything you wrote but only when bayes autolearning is
turned off.  Bayes learning holds an exclusive lock to the bayes
database particularly during expiration.

If spamc does bayes autolearning and starts an expiration then other
spamc runs for that user will be locked out of bayes.  At some point
you start getting timeouts at different points in the email delivery
chain.

I have a separate sa-learn (or spamc -L) procmail recipe that has a
serialization lock.

-jeff


Re: Philosophical question on Bayes (was Re: 23_bayes_ignore_header.cf)

2014-10-14 Thread Jeff Mincy
   From: Axb axb.li...@gmail.com
   Date: Tue, 14 Oct 2014 23:37:36 +0200
   
   On 10/14/2014 11:08 PM, Adam Katz wrote:
On Tue, 14 Oct 2014 16:10:52 +0200 Axb axb.li...@gmail.com wrote:
and to avoid further discussions of what header may pollute bayes or
not, I've removed all header entries which are not directly related
to AV/filter products.
   
On 10/14/2014 07:17 AM, David F. Skoll wrote:
I'm not sure I agree with being too clever about Bayes.  Surely by its
very nature, the Bayes algorithm will itself indicate which tokens
are relevant and which are not?  Isn't that the whole point of Bayes?
   
I think being to clever about massaging the data that gets fed to
Bayes may be counter-productive.  For sure, *some* massaging is in order;
a token should be a semantic unit, so something like www.example.com
should probably be one token rather than three, but beyond that I wonder
if it's good or not to massage the data?
   
The purpose of bayes_ignore_header is twofold:
   
  1. Prevent inheriting other systems' false positives (ensure better
 independence)
  2. Prevent relying upon headers that won't exist at delivery time (e.g.
 added by the mailbox server)
   
This is why it's so important to ignore other spam engines, which
basically fit into both of those categories.
   
   I'd love to have the option (switch) to use Bayes on msg bodies ONLY, 
   though I doubt anybody would be a taker for such a project.
   (I'd even be willing to $pon$or such an addition to SA)
   
Wouldn't that be fairly easy to implement  by intercepting the call to
_tokenize_headers in Plugin/Bayes.pm?

  # Tokenize the headers
  my %hdrs = $self-_tokenize_headers ($msg);
  while( my($prefix, $value) = each %hdrs ) {
push(@tokens, $self-_tokenize_line ($value, H$prefix:, 0));
  }

-jeff


Re: Bayes Problem

2014-08-28 Thread Jeff Mincy
   From: Julian Brown jlbp...@gmail.com
   Date: Thu, 28 Aug 2014 10:46:55 -0500
   
   I work for a company that has lots of mail users.  We use Exim with
   Spamassassin.   My job is to track down this problem.
   
   We are getting complaints of too much spam and have tracked it down, using
   Google, to our bayes files not working correctly.  I do not know if they
   are poisoned or just not working.
   
   When bad spam gets through it is always the same, BAYES_00 -1.9 in the
   headers.   According to what I have googled there is only one thing we can
   do and that is to clear the bayes filters and either allow it to start
   again and possibly retrain.   Each individual has their own bayes filters,
   /home/user/.spamassassin/bayes_*.
   
   Exim version 4.82 #2 built 17-Jul-2014 13:21:53
   SpamAssassin Server version 3.3.2
   CentOS 6.5 64bit
   
   But we are getting a lot of it, not all accounts, so I think this means we
   are getting poisoned or something they are doing is rendering the bayes
   filters non functional.
   
   Here is from one of them from a week or 2 ago:
   
   sa-learn --dump magic
   0.000  0476  0  non-token data: nspam
   0.000  0  40270  0  non-token data: nham
...
   
   I don't know the significance of the above readout, but all the discussions
   talk about this.
   
   Julian

You need to learn way more spam messages.   You will get the best results
by learning from essentially all messages, as long as the messages are
learned correctly.   In addition to not having enough spam messages
you probably have learned various spam messages as ham.

-jeff


Re: New at SpamAssassin - how to not get headers

2014-08-05 Thread Jeff Mincy
   From: RobertGrimes gri...@rgconsulting.com
   Date: Tue, 5 Aug 2014 08:50:44 -0700 (PDT)
   
   I don't know if this is fair to ask, but would you (or anyone) care to see
   if the message I am posting should be rated higher than 1.9? I appologize if
   this is not appropriate.
   
   The message is at http://pastebin.com/UZeDtLWZ
   
You need to save the complete original message.   Many of the headers are 
missing.
  MISSING_DATE=0.1,MISSING_MID=0.497,NO_RECEIVED=-0.001,NO_RELAYS=-0.25

With sufficient training you should be able to get BAYES_99 +
BAYES_999

-jeff


Re: getting tons of SPAM

2014-07-02 Thread Jeff Mincy
   From: John Hardin jhar...@impsec.org
   Date: Wed, 2 Jul 2014 14:45:07 -0700 (PDT)
   
   On Wed, 2 Jul 2014, motty cruz wrote:
   
bayan filter is not running: according to header,
   
X-Virus-Scanned: amavisd-new at fqdn.com
X-Spam-Flag: NO
X-Spam-Score: -0.009
X-Spam-Level:
X-Spam-Status: No, score=-0.009 tagged_above=-999 required=5.3
   tests=[HTML_MESSAGE=0.001, T_RP_MATCHES_RCVD=-0.01]
   autolearn=unavailable
Received: from
   
# sa-learn --dump magic
Error Opening file /usr/local/share/GeoIP/GeoIPv6.dat
0.000  0  3  0  non-token data: bayes db version
0.000  0   3338  0  non-token data: nspam
0.000  0784  0  non-token data: nham
   
any ideas?
   
Note the autolearn=unavailable part.
The Bayes database is probably locked doing an expire.

Also, the GeoIP data file should be fixed:
 Error Opening file /usr/local/share/GeoIP/GeoIPv6.dat

   You need to post samples (to pastebin). We can't make comments on what 
   *should* be hitting unless we can see the message itself.

Yep.
-jeff


Re: whitelist_from_spf dbg

2014-05-19 Thread Jeff Mincy
   From: Matus UHLAR - fantomas uh...@fantomas.sk
   Date: Mon, 19 May 2014 15:44:30 +0200
   
On 17.05.14 14:11, Jeff Mincy wrote:
It would have been easier to figure out why it was matching if the
matching spf entry was printed out, for example something like this:

May  8 18:21:27.859 [22058] dbg: spf: whitelist_from_spf: 
amandarodriq...@odysseyshop.ribsbuy.com matches ^.*\@.*buy\.com$ entry
May  8 18:21:27.859 [22058] dbg: spf: whitelist_from_spf: 
amandarodriq...@odysseyshop.ribsbuy.com is in user's WHITELIST_FROM_SPF and 
passed SPF check
   
From: Matus UHLAR - fantomas uh...@fantomas.sk
Date: Sun, 18 May 2014 18:22:49 +0200
 According to the documentation, they are not regexp's (as one could/should
 expect):
   
   Whitelist and blacklist addresses are now file-glob-style patterns,
   
   On 18.05.14 13:44, Jeff Mincy wrote:
   The matching whitelist_from_spf entry *@*buy.com is a file glob pattern
   which matched.  I'm not sure why you are quoting the manual here.  The
   whitelist entry *@*buy.com is turned into a regexp by add_to_addrlist
   in SpamAssassin/Conf/Parser.pm which among other things does s/\*+/\.\*/g
   
   I wanted to point out that you (and many other people) could be surprised
   what you see in the regexp, because the glob-style pattern you enter into
   blacklist/whitelist directive.
   
   Maybe if not the RE, but the directive content was shown in the debug
   output...

Sure, printing out the original glob would be better.   The original
glob isn't currently saved - it would be a little more work.
I could come up with other ideas - such as returning the information
in a tag that could be added to a header.
   
  I assume the contents of *_networks is modified before RE matching, so 
you'd
  wonder what is the content...
   
   Ok, you lost me.  What does the contents of *_networks have to do with
   the suggestion to print the matching whitelist regexp entry?  Nothing
   matching *buy.com has been added to *_networks if that is what you are
   wondering.
   
   sorry, that had to be (black|white)list_*, not *_networks.

Ah.  Yes, the glob style whitelist was modified into a regexp before matching.
   
   -- 
   Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
   Warning: I wish NOT to receive e-mail advertising to this address.
   Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
   Remember half the people you know are below average. 
-jeff


Re: whitelist_from_spf dbg

2014-05-18 Thread Jeff Mincy
   From: Matus UHLAR - fantomas uh...@fantomas.sk
   Date: Sun, 18 May 2014 18:22:49 +0200
   
   On 17.05.14 14:11, Jeff Mincy wrote:
   I just got some spam that was erroneously spf whitelisted hitting 
WHITELIST_FROM_SPF
   It took me a while to figure out why it was getting WHITELIST_FROM_SPF
   but I eventually tracked it down down to this whitelist entry:
  whitelist_from_spf *@*buy.com
   The *@*buy.com (obviously) matches *@odysseyshop.ribsbuy.com.
   
   It would have been easier to figure out why it was matching if the
   matching spf entry was printed out, for example something like this:
   
   May  8 18:21:27.859 [22058] dbg: spf: whitelist_from_spf: 
amandarodriq...@odysseyshop.ribsbuy.com matches ^.*\@.*buy\.com$ entry
   May  8 18:21:27.859 [22058] dbg: spf: whitelist_from_spf: 
amandarodriq...@odysseyshop.ribsbuy.com is in user's WHITELIST_FROM_SPF and 
passed SPF check
   
   According to the documentation, they are not regexp's (as one could/should
   expect):
   
Whitelist and blacklist addresses are now file-glob-style patterns,
   
The matching whitelist_from_spf entry *@*buy.com is a file glob pattern
which matched.  I'm not sure why you are quoting the manual here.  The
whitelist entry *@*buy.com is turned into a regexp by add_to_addrlist
in SpamAssassin/Conf/Parser.pm which among other things does s/\*+/\.\*/g


   sub _wlcheck {
 my ($self, $scanner, $param) = @_;
 if (defined ($scanner-{conf}-{$param}-{$scanner-{sender}})) {
   return 1;
 } else {
   study $scanner-{sender};
   foreach my $regexp (values %{$scanner-{conf}-{$param}}) {
 if ($scanner-{sender} =~ qr/$regexp/i) {
   ##New dbg output here:
   dbg(spf: $param:  $scanner-{sender} matches $regexp entry);
   return 1;
   
   I assume the contents of *_networks is modified before RE matching, so you'd
   wonder what is the content...

Ok, you lost me.  What does the contents of *_networks have to do with
the suggestion to print the matching whitelist regexp entry?  Nothing
matching *buy.com has been added to *_networks if that is what you are
wondering.

-jeff


whitelist_from_spf dbg

2014-05-17 Thread Jeff Mincy


I just got some spam that was erroneously spf whitelisted hitting 
WHITELIST_FROM_SPF
It took me a while to figure out why it was getting WHITELIST_FROM_SPF
but I eventually tracked it down down to this whitelist entry:
   whitelist_from_spf *@*buy.com
The *@*buy.com (obviously) matches *@odysseyshop.ribsbuy.com.   

It would have been easier to figure out why it was matching if the
matching spf entry was printed out, for example something like this:

May  8 18:21:27.859 [22058] dbg: spf: whitelist_from_spf: 
amandarodriq...@odysseyshop.ribsbuy.com matches ^.*\@.*buy\.com$ entry
May  8 18:21:27.859 [22058] dbg: spf: whitelist_from_spf: 
amandarodriq...@odysseyshop.ribsbuy.com is in user's WHITELIST_FROM_SPF and 
passed SPF check

sub _wlcheck {
  my ($self, $scanner, $param) = @_;
  if (defined ($scanner-{conf}-{$param}-{$scanner-{sender}})) {
return 1;
  } else {
study $scanner-{sender};
foreach my $regexp (values %{$scanner-{conf}-{$param}}) {
  if ($scanner-{sender} =~ qr/$regexp/i) {
##New dbg output here:
dbg(spf: $param:  $scanner-{sender} matches $regexp entry);
return 1;
  }
}
  }
  return 0;
}

-jeff


Re: help with regex

2014-02-26 Thread Jeff Mincy
   From: Kevin A. McGrail kmcgr...@pccc.com
   Date: Wed, 26 Feb 2014 19:06:34 -0500
   
   On 2/26/2014 6:53 PM, Webmaster wrote:
I need a regex to match an alphanumeric string with letters and numbers.
   
example:  48HQZBF404TY2298D1414BB8050022YQ3872444
   
The pattern is defined as:
   
A sequence of alphanumeric characters, letters are upper or lower 
case, at least 30 chars long, containing at least 10 numbers.
   
This part is easy enough:  [a-zA-Z0-9]{30,}
   
But I can't figure out how to match only ifthe string contains at 
least 10 numbers. 
   Hmm, I think you might need a plugin for that one.

Can't you do something like this using a look ahead regexp?

(?=[A-Z0-9]{30,})(?:[A-Z]*[0-9]){10,}

The look ahead gets the 30 chars.   Then the next part gets the 10 or
more numbers.   You probably don't need unbounded {10,} but you do need
the {30,} part to be unbounded.

Is the 10 number part really important?

-jeff


Re: re-learning ? was - bayes - large message

2013-04-20 Thread Jeff Mincy
   From: Joe Acquisto-j4 j...@j4computers.com
   Date: Sat, 20 Apr 2013 09:10:26 -0400
   
On 4/19/2013 at 8:33 PM, Joe Acquisto-j4 j...@j4computers.com wrote:
On 4/19/2013 at 8:26 PM, Joe Acquisto-j4 j...@j4computers.com wrote:
I thought I had corrected this issue, with someone's assistance, a while 
ago:

Apr 19 20:21:02.477 [23670] dbg: bayes: expiry completed
Apr 19 20:21:02.477 [23670] info: archive-iterator: skipping large message
Learned tokens from 0 message(s) (0 message(s) examined)

Please ignore.  As much as possible.   I was testing manually and forgot 
--mbox on the command line.

However, I can see something is amiss as it is happily accepting spam I 
thought had been previously submitted.

joe a.
   
   Ok, I am officially puzzled.   
   
   I setup email addresses on my SA box, to which I and others (they say) send 
ham/spam.  Then I have cron tasks that feed those emails twice daily to bayes.  
And emails the output to my admin mailbox.
   
   I can review those admin messages and see Learned tokens from n message(s) 
(n message(s) examined).   Yet, if i resend the bayes food from those dates, 
it appears to re-learn them.   I would expect Learned tokens from 0 
messages(s) (n messages(s). . . 
   if it already had seen them.
   
   I have tried this for several dates and get the same result.  What could it 
be?  Not Operator Trouble, surely . . .
   
   joe a

Bayes uses the message id from the email message to remember which
messages it has seen.  If you are really emailing the messages then
you are getting a new message-id which is then learned.  You need to
train on the unadulterated original email message.  You can do this by
attaching the complete email message.  Otherwise you are training
bayes to recognize tokens added by your users during the forwarding
process as a spam indicator.

-jeff


Re: rdns in received header

2013-02-21 Thread Jeff Mincy
   From: Kevin A. McGrail kmcgr...@pccc.com
   Date: Thu, 21 Feb 2013 08:46:40 -0500
   
   On 2/20/2013 8:51 PM, Jeff Mincy wrote:
...
   
This leads to various bad things (RDNS_NONE  broken WHITELIST_FROM_RCVD)
   
Is there anything in SpamAssassin that can deal more elegantly with
this particular problem?  Perhaps Some sort of please_fill_in_rcvd_rdns
type option?

   Off the cuff, the point of trusted networks is to say you trust that 
   network's headers.  However, in this case, you don't... I don't really 
   know a fix for this because we have enough issues parsing received 
   headers, let alone re-writing them.

Well, I trust the network not to lie.  This is more of an omission

   How good is your perl and maybe you can solve it in MIMEDefang before 
   it's sent to SA?

Yea, I expected this was going to be the answer.   It would have to be
a procmail filter that calls out to a script.  Yuck.

Thanks for confirming my suspicion.

I could always whine to Rcn about it, maybe they'll fix it.

-jeff 


Re: rdns in received header

2013-02-21 Thread Jeff Mincy
   From: Kevin A. McGrail kmcgr...@pccc.com
   Date: Thu, 21 Feb 2013 11:07:20 -0500
   
   On 2/21/2013 10:36 AM, Matus UHLAR - fantomas wrote:
And how is this ISP's issue related to RFCs? The RFC does not mention 
word
trusted
   A fair point that I didn't explain clearly enough.
   
   The RFCs cover received headers for SMTP and RFCs strive to be black and 
   white.  Discussing things as gray area is an argument that Bill Clinton 
   was famous for but doesn't really hold a place in discussing technology 
   covered by

Which RFC talks about Received headers having rDNS or what information
is supposed to be in the received header?
   
   The point of SA's trusted configuration is that you trust the 
   headers.  In this case, he's saying he doesn't trust the headers because 
   they are omitting important information but that they aren't lying, just 
   lying by ommissions.   To me, this says I can't trust those headers 
   and you need to pull back your trust circle which in this case will ruin 
   much of the rules SA uses for pathway analysis (RBLs, rDNS, etc.)
   
   Fixing those headers outside SA or fixing the ISP creating those headers 
   are the real solutions.

There is of course a third option for me - I could turn off the spam
filtering on Rcn email.  Most of the spam is blocked by Rcn, there's
almost no point in trying to filter what little spam is left.


-jeff


Re: rdns in received header

2013-02-21 Thread Jeff Mincy
   From: Matus UHLAR - fantomas uh...@fantomas.sk
   Date: Thu, 21 Feb 2013 16:36:18 +0100
   
   On 2/21/2013 9:03 AM, Jeff Mincy wrote:
   Well, I trust the network not to lie.  This is more of an omission
   
   On 21.02.13 10:26, Kevin A. McGrail wrote:
   Your Clinton-esque logic likely doesn't apply here ;-).  The land of 
   RFC's works to avoid this type of logic in a language I call 
   RFC-eeze.
   
   as long as I understan Jeff's original mail, the issue is that his ISP
   stopped providing DNS information in the Received: headers.
   SA does not do lookups on the IPs in Received: (there's iirc one exemption
   related to a buggy software) and if it's not there, it assumes the rDNS does
   not exist, while it does. 

Actually the ISP added a completely new hop, and that hop is not
adding rDNS to the received header.   I had to add the new hop to
trusted_networks and internal_networks.   The new hop looks like it
is scanning the messages using Cloudmark:
 X_CMAE_Category: ...
 X-CNFS-Analysis: ...
 X-CM-Score: ...
 X-Scanned-by: Cloudmark Authority Engine
   
   
   I could always whine to Rcn about it, maybe they'll fix it.
   
   I think that's a good move to at least try!  It truly sounds more 
   like a DNS error that they might know be are is occurring.
   
   if the error repeats, I assume Jeff's guess is correct and the ISP just
   turned rDNS lookups off.

Or neglected to turn on the lookups in the first place...

-jeff


rdns in received header

2013-02-20 Thread Jeff Mincy


My local ISP (rcn.com) reconfigured their email servers.  The
69.168.97.77 hop does not seem to be doing rdns lookups on the
previous hop.  For example, I get these two received headers at the
trust boundary:

...
Received: from mx.rcn.com ([69.168.97.77])
  by mx06.atw.mail.rcn.net with ESMTP; 20 Feb 2013 17:07:22 -0500
...trust/internal boundary...
Received: from [216.33.63.216] ([216.33.63.216:56326] 
helo=bigfootinteractive.com)
by mx.rcn.com (envelope-from 
1709130a2layfovcia3kqqzqabnxydzhs2jc2h4yaa...@mail.ameriprise.com)
(ecelerity 2.2.3.49 r(42060/42061)) with ESMTP
id 29/DB-26250-A1945215; Wed, 20 Feb 2013 17:07:22 -0500
...

and the relays are parsed as

  X-Spam-Relay: 
 Trusted= ...[ ip=69.168.97.77 rdns=mx.rcn.com helo=mx.rcn.com 
by=mx06.atw.mail.rcn.net ident= envfrom= intl=1 id= auth= msa=0 ]
 Untrusted=[ ip=216.33.63.216 rdns= helo=bigfootinteractive.com 
by=mx.rcn.com ident= 
envfrom=1709130a2layfovcia3kqqzqabnxydzhs2jc2h4yaa...@mail.ameriprise.com 
intl=0 id=29/DB-26250-A1945215 auth= msa=0 ] ...


This leads to various bad things (RDNS_NONE  broken WHITELIST_FROM_RCVD)

Is there anything in SpamAssassin that can deal more elegantly with
this particular problem?  Perhaps Some sort of please_fill_in_rcvd_rdns
type option?

I'm still on 3.2.5 (yes I know it is old).

-jeff


Re: X-Relay-Countries

2013-02-12 Thread Jeff Mincy
   From: Mike Grau m.g...@kcc.state.ks.us
   Date: Tue, 12 Feb 2013 14:18:33 -0600
   
Hmm  I would do something like this (untested):

header RELAY_NOT_US X-Relay-Countries =~ /\b(?!US)[A-Z]{2}\b/
   
   I've had to use, IIRC.
   X-Relay-Countries =~ /\b(?!US|XX)([A-Z]{2})\b/

XX means unknown, mostly due to stale database.  You can update the
IP::Country database.  See:
   http://wiki.apache.org/spamassassin/RelayCountryPlugin

-jeff


Re: BAYES_00

2012-10-06 Thread Jeff Mincy
   From: Arthur Dent misc.li...@blueyonder.co.uk
   Date: Sat, 06 Oct 2012 11:03:18 +0100
   
   Hello all,
   
   Following a hard drive crash I am rebuilding my small home server on a
   Fedora17 platform.
   
   One of the casualties of the HD crash was my spam corpus. I had a (very
   old) backup which happened to include a previous spam corpus so I used
   that to sa-learn.
   
   All my messages hit BAYES_00. 
   
   I don't have many fresh spams. I do not run a SMTP server, I simply
   collect mail for my family and myself from my ISP and other sources
   using fetchmail. My ISP seem to filter most of the really bad stuff so I
   get just a trickle of spams (about 1 per day - if that) but even those
   hit BAYES_00 despite sometimes being identical to a previous FN that had
   already been learned with sa-learn.
   
   Here is my --dump magic: ...
   
   What - if anything - can I do to improve bayes performance?

Get more spam?  Bayes really isn't going to do well with limited
amount of spam.  It does great when correctly trained using lots of
spam.  But with limited data, not so much.

You could try starting over.  It will take 6 months or so to get to
200 spam messages if you are really getting about 1 per day.  You
could just turn off Bayes.  Or you could just turn Bayes off.  I'm
almost at the same point with my home email, for the same reason.

-jeff


Re: Very spammy messages yield BAYES_00 (-1.9)

2012-08-15 Thread Jeff Mincy
   From: Ben Johnson b...@indietorrent.org
   Date: Wed, 15 Aug 2012 13:36:08 -0400
   
   Some 99% of the spam that I receive, which is grossly spammy (we're
   talking auto loans, cash advances, dink pills, the whole lot) contains
   BAYES_00=-1.9 in the tests portion of the X-Spam-Status header.
   
   Might anyone know why? This is a stock installation (Ubuntu package on
   10.04).
   
Most likely you've let autolearn learn a large number of spam messages
as ham.  Any autolearn mistakes need to be corrected.

One or two spam messages with BAYES_00 is not a problem, but a large
number of them indicates a serious problem with learning.   If you
have the old spam messages then you can retrain correctly.  Otherwise
it would probably be best to start over by deleting the bayes database.

   local.cf contains
   
   #   Bayesian classifier auto-learning (default: 1)
   #
   # bayes_auto_learn 1
   
   and I have not overridden the default elsewhere. So, presumably,
   auto-learning is enabled (if that's event relevant).
   
   While I have not trained the Bayesian filter manually to date, how is it
   that the spammiest of the spam is being classified with BAYES_00
   (thereby receiving the score -1.9)? Doesn't BAYES_00 imply that the
   message is almost certainly not spam?

Yes, BAYES_00 says the spam probability is between 0 and 1%.

   http://forums.eukhost.com/f38/problems-spamassassin-bayes-filter-16948/
   
   Outside of the above forum post, search query results for this issue are
   scant.

There have been numerous posts on BAYES.

-jeff


Re: USER_IN_WHITELIST and SPF_FAIL

2012-06-19 Thread Jeff Mincy
   From: John Hardin jhar...@impsec.org
   Date: Tue, 19 Jun 2012 14:44:29 -0700 (PDT)
   
   On Tue, 19 Jun 2012, Benny Pedersen wrote:
   
Den 2012-06-19 22:39, Kevin A. McGrail skrev:
   
 I think that's the concept behind the whitelist_from_spf
   
but some use whitelist_from, its nothing new there :=)
   
can user_in_whitelist be changed to not have -100 as default score, or is 
whitelist_from planned for removements ?
   
   It's needed for whan none of the other more-strict whitelist options will 
   work, so we can't get just rid of it.
   
True.

   I'd suggest instead a lint warning if it is used, alerting the admin that 
   it's discouraged and that it has problems like this and is very easy to 
   spoof.
   
How about creating a different score for whitelist_from that is
separate from whitelist_from_rcvd?   For example, whitelist_from could
trigger USER_IN_SIMPLE_WHITELIST (or some other variation).   The
description of the test could include warnings about how easy
it is to spoof whitelist_from.

-jeff


Re: USER_IN_WHITELIST and SPF_FAIL

2012-06-19 Thread Jeff Mincy
   From: RW rwmailli...@googlemail.com
   Date: Tue, 19 Jun 2012 23:43:57 +0100
   
   On Tue, 19 Jun 2012 18:02:28 -0400
   Jeff Mincy wrote:
   
   From: John Hardin jhar...@impsec.org
   Date: Tue, 19 Jun 2012 14:44:29 -0700 (PDT)
   
   On Tue, 19 Jun 2012, Benny Pedersen wrote:
   
Den 2012-06-19 22:39, Kevin A. McGrail skrev:
   
 I think that's the concept behind the whitelist_from_spf
   
but some use whitelist_from, its nothing new there :=)
   
can user_in_whitelist be changed to not have -100 as default
score, or is whitelist_from planned for removements ?
   
   It's needed for whan none of the other more-strict whitelist
options will work, so we can't get just rid of it.
   
True.

   I'd suggest instead a lint warning if it is used, alerting the
admin that it's discouraged and that it has problems like this and is
very easy to spoof.
   
How about creating a different score for whitelist_from that is
separate from whitelist_from_rcvd?   For example, whitelist_from could
trigger USER_IN_SIMPLE_WHITELIST (or some other variation).   The
description of the test could include warnings about how easy
it is to spoof whitelist_from.
   
   If used sensibly USER_IN_WHITELIST is probably the most reliable rule we
   have, for the overwhelming majority of addresses it's far more accurate
   than spf based whitelisting. It's not always right to treat users as
   idiots.

Huh?  What you mean by used sensibly?  whitelist_from_rcvd is very
reliable.  whitelist_from is trivial to spoof.  whitelist_from_rcvd
and whitelist_from both trigger USER_IN_WHITELIST.

It is easy to get into trouble using whitelist_from - having a
separate score just for whitelist_from would make identifying the
problem easier for the user.

-jeff


Re: Whitelisting with DKIM

2011-10-31 Thread Jeff Mincy
   From: Alex mysqlstud...@gmail.com
   Date: Mon, 31 Oct 2011 12:18:33 -0400
   I have a fedora15 system with sa-3.3.2 and amavisd-2.6.6 and would
   like to whitelist messages like these:
   
   Oct 31 11:19:42 mail02 amavis[3518]: (03518-01-20) SPAM-TAG,
   esc1108418484939_1103604989289_9473_...@in.constantcontact.com -
   50...@example.com, No, score=-4.555 tagged_above=-100 required=5
   tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_IMAGE_RATIO_04=0.61,
   HTML_MESSAGE=0.001, KHOP_RCVD_TRUST=-1.75, LOC_SHORT=0.6,
   
   I've enabled dkim in amavisd.conf:
   
   $enable_dkim_verification = 1;  # enable DKIM signatures verification
   $enable_dkim_signing = 1;# load DKIM signing code, keys defined by 
dkim_key
   
...

   Oct 31 11:29:04.733 [7571] info: rules: meta test L_UNVERIFIED_GMAIL
   has dependency 'DKIM_VERIFIED' with a zero score
   Oct 31 11:29:04.837 [7571] dbg: check:
   
tests=DKIM_SIGNED,DKIM_VALID,HTML_IMAGE_RATIO_04,HTML_MESSAGE,KHOP_RCVD_TRUST,LOC_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_HOSTKARMA_W,RCVD_IN_HOSTKARMA_WL,RCVD_IN_IADB_DK,RCVD_IN_IADB_LISTED,RCVD_IN_IADB_OPTIN,RCVD_IN_IADB_RDNS,RCVD_IN_IADB_SPF,RCVD_IN_UCEPROTECT2,RELAYCOUNTRY_US,RP_MATCHES_RCVD,T_REMOTE_IMAGE,URIBL_GREY
   
   Why does DKIM_VERIFIED have a zero score in 50_scores.cf?

Anybody, including spammers, can do DKIM.  You could make have it
a small negative score like -0.5 or so.
   
   I've added the following entries to local.cf, but I suspect this is
   what I'm doing wrong. I don't mean to whitelist all of constant
   contact.
   
   whitelist_from_dkim *@in.constantcontact.com
   whitelist_from_dkim *@bertolini-sales.com
   
   There is a copy of the full message here:
   
   http://pastebin.com/raw.php?i=pmyFn9f9
   
   Thanks so much for any ideas.
   Alex

I think you want 
  whitelist_from_dkim *@bertolini-sales.com  auth.ccsend.com

The auth.ccsend.com comes from the signature line
  DKIM-Signature: ... d=auth.ccsend.com

-jeff


Disposition deleted

2011-08-08 Thread Jeff Mincy


Can somebody clue me in on how to match 'Disposition: 
automatic-action/MDN-sent-automatically; deleted'
in a disposition-notification mime attachment?

   --_=_NextPart_001_01CC55E0.440F392C
   Content-Type: message/disposition-notification
   Content-Transfer-Encoding: 7bit

   Final-Recipient: RFC822; kathy.du...@ca.com
   Disposition: automatic-action/MDN-sent-automatically; deleted
   X-MSExch-Correlation-Key: 1CORJJTUYkSeBj5kXwFqLQ==

   --_=_NextPart_001_01CC55E0.440F392C--

I've tried body, rawbody and mimeheader without success:
   mimeheader LOCAL_AUTOMATIC_ACTION Disposition =~ 
/automatic-action\/MDN-sent-automatically; deleted/

This appears to be some new MS Exchange bounce message.

I'm running 3.2.5 if it matters.

thanks.  
-jeff


RE: SA and Spear Phishing

2011-03-18 Thread Jeff Mincy
   From: Hamad Ali crownco...@hotmail.com
   Date: Sat, 19 Mar 2011 00:46:08 +0400
   
   ## back on topic ##
   Anyway, I would highly appreciate any help on spear phishing. A solution, a 
guess, or just if you know whether you get spear phish at all is good 
information for me (I started to think that 99% of mail admins never know that 
they get spear phish because of the extremely high success rate of spear phish).
   PS: Spear Phishing is a problem that I noticed many commercial 
appliances struggle at. This thread is not meant to promote or demote SA, but 
to address a cutting-edge problem that many software classifiers fail to 
address.
   --H

Either I haven't gotten any spear phishing spam, or the spear phishing
spam is being blocked by SpamAssassin.  I'll assume the later.

If there's some particular type of email that you're having trouble with
the easiest way to get help is to post a complete sample including all
the headers using some pastebin and send the link and the x-spam-status
line that you get on your SpamAssassin to the group.

Otherwise all you're going to get vague platitudes like train bayes.

-jeff


Re: new rules - where do i activate them?

2011-03-02 Thread Jeff Mincy
   From: John Hardin jhar...@impsec.org
   Date: Wed, 2 Mar 2011 07:50:38 -0800 (PST)
   
   On Wed, 2 Mar 2011, tr_ust wrote:
   
   
This is what my rules look like now:
   
uri LOCAL_URI_EXAMPLE /zynetsw.com\/forms\/use\/index\/form1.html/
score LOCAL_URI_EXAMPLE 200
uri LOCAL_URI_EXAMPLE /zynetsw.com\/forms\/use\/nana\/form1.html/
score LOCAL_URI_EXAMPLE 100
uri LOCAL_URI_EXAMPLE /zynetsw.com\/forms\/use\/ontokoros\/form1.html/
score LOCAL_URI_EXAMPLE 100
uri LOCAL_URI_EXAMPLE /zynetsw.com\/forms\/use\/tbt\/form1.html/
score LOCAL_URI_EXAMPLE 200
uri LOCAL_URI_EXAMPLE /zynetsw.com\/forms\/use\/webadmin\/form1.html/
score LOCAL_URI_EXAMPLE 200
   
I took out the last / as you suggested...thanks.
   
   You may also want to escape the periods so they are literal matches rather 
   then match any single character:
   
  uri LOCAL_URI_EXAMPLE /zynetsw\.com\/forms\/use\/webadmin\/form1\.html/
   
   Also, you only have one rule there. Every time you put in another uri 
   LOCAL_URI_EXAMPLE you overwrite the previous definition. Change the name 
   of each rule, for example by appending _00 _01 _02, etc.
   
Also, the rules could be combined into a single rule (untested) using
regexp (?:index|nana|ontokoros|tbt|webadmin)

uri LOCAL_URI_EXAMPLE 
/zynetsw.com\/forms\/use\/(?:index|nana|ontokoros|tbt|webadmin)\/form1.html/


-jeff


Re: Trouble whitelisting domain users with whitelist_from_rcvd

2010-07-28 Thread Jeff Mincy
   From: keithcommins keith.comm...@windmilllane.com
   Date: Wed, 28 Jul 2010 07:57:43 -0700 (PDT)
   
   Hi there , 
   
   Having some trouble getting this to work correctly , it would seem..
   
   Firstly,  here is my whitelist_from rcvd config from my local.cf file.
   
You can't use whitelist_from_rcvd on internal email.   You don't have
an external relay to match against.   It doesn't matter if your
machine ends in .local or not.

Note the FH_DATE_PAST_20XX.   You probably need to run sa-update sometime this 
year.

The ALL_TRUSTED should be enough by itself.   If you need to have a
separate whitelisting you could try something like the following:

meta __TRUSTED_NETWORKS (NO_RELAYS || ALL_TRUSTED)
header __LOCAL_SENDER  From =~ /\...@mydomain\.com/i
meta   FORGED_LOCAL_SENDER (__LOCAL_SENDER  !__TRUSTED_NETWORKS)
score  FORGED_LOCAL_SENDER 0.1
meta   VALID_LOCAL_SENDER (__LOCAL_SENDER  __TRUSTED_NETWORKS)
score  VALID_LOCAL_SENDER -0.1

-jeff


   whitelist_from_rcvd  *...@mydomain.com mydomain.local
   trusted_networks 172.16.1/24 172.16.2/24 172.16.3/24 172.16.5/24 xx.xx.xx.xx
   internal_networks 172.16.1/24 172.16.2/24 172.16.3/24 172.16.5/24
   xx.xx.xx.xx
   
   ( xx.xx.xx.xx represents the outward facing IP of my mail server )
   
   Secondly, below is a header from a test email I sent to myself..
   
   Return-Path: some.u...@mydomain.com
   Received: by mydomain.com (CommuniGate Pro PIPE 5.2.12)
 with PIPE id 18275900; Wed, 28 Jul 2010 11:31:13 +0100
   X-TFF-CGPSA-Version: 1.5
   X-TFF-CGPSA-Filter: Scanned
   X-Spam-DCC: wuwien: mail.mydomain.com 1290; Body=1 Fuz1=2 Fuz2=6
   X-Spam-Checker-Version: SpamAssassin 3.2.5 ( 2008-06-10 ) on
mail.mydomain.com
   X-Spam-Level: ***
   X-Spam-Status: No, score=3.8 required=8.0
   tests=ALL_TRUSTED,FH_DATE_PAST_20XX,
HTML_IMAGE_ONLY_20,HTML_MESSAGE autolearn=no version=3.2.5
   X-Spam-Pyzor: 
   Received: from [172.16.3.150] (account some.user [172.16.3.150] verified)
 by mydomain.com (CommuniGate Pro SMTP 5.2.12)
 with ESMTPA id 18275888 for some.u...@mydomain.com; Wed, 28 Jul 2010
   11:31:04 +0100
   Message-ID: 4c500626.7010...@mydomain.com
   Date: Wed, 28 Jul 2010 11:27:50 +0100
   From: Some User some.u...@mydomain.com
   User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
   MIME-Version: 1.0
   To: Some User some.u...@mydomain.com
   Subject: (no subject)
   Content-Type: multipart/alternative;
boundary=020906000403080006070205
   X-EsetId: 90695D289D6435708F6F5D7C933375
   
   This is a multi-part message in MIME format.
   --020906000403080006070205
   Content-Type: text/plain; charset=ISO-8859-1; format=flowed
   Content-Transfer-Encoding: 7bit
   
   Couple of things to note , we use Active Directory which means the FQDN name
   of all our machines end in *.local rather than *.com. Should the
   whitelist_rcvd reflect this in any way??
   Its my understanding that all mails should get a Spam Assassin score of -100
   or thereabouts , thus permanently whitelisting all our domain users. However
   , as you can see this isn't happening??
   
   Is there anything else I should be doing to whitelist my domain users??
   
   
   Thanks in advance for all your help..
   Keith
   -- 
   View this message in context: 
http://old.nabble.com/Trouble-whitelisting-domain-users-with-whitelist_from_rcvd-tp29287372p29287372.html
   Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
   


Re: flat file bayes locking issue and difference errors depending on file locking method

2010-04-14 Thread Jeff Mincy
   From: R-Elists list...@abbacomm.net
   Date: Wed, 14 Apr 2010 08:43:21 -0700
   
   having spent the better part of a two days searching as well as trying
   different configs and SA restarts

   we do not have a hardware horsepower resource starvation issue
   
   in reference to the error
   
   spamd[30339]: bayes: cannot open bayes databases
   /home/spamd/.spamassassin/bayes_* R/W: lock failed: Interrupted system call

I'd guess that you have a bayes expire running that is either taking
too long or not finishing and leaving lock files around.

Turn off bayes_auto_expire and use bayes_learn_to_journal.
Add a cron job to periodically sa-learn --sync (say hourly)
and another cron job to do sa-learn --force-expire (daily/weekly)
-jeff


Re: Limit SA to scan messages 100k and below

2010-03-31 Thread Jeff Mincy
   From: Keith De Souza kbdeso...@googlemail.com
   Date: Wed, 31 Mar 2010 14:10:50 +0100
   
   Hi
   
   * You need to change whatever glue you are using to pass messages to SA,
   and skip the scanning for messages larger than your desired threshold.
   
   *Sorry as I'm new to SA can you elaborated what you mean by glue?
   *
   That said, IMHO 100k is rather low. Why do you want that particular
   threshold?*
   
   Judging from your response, I may be wrong in what I need to do:
   
   Basically I'm having a few errors in my Exim logs from legitamate senders
   not coming through:

300 seconds looks like an timeout.   Something is giving up after
waiting 300 seconds.

Note the autolearn=unavailable.   I'd guess that you are getting
locked out from the Bayes database.   You probably had a Bayes expire
running at the same time.   There should be messages about this in a
log file.

If this is the case you can turn off bayes_auto_expire and run expire
from cron.  You could also try learning to the journal and doing
sa-learn --sync periodically from cron.

-jeff

   
   ===
   2010-03-31 01:22:25 1Nwlbc-0001QS-Ua H=
   host81-136-197-86.in-addr.btopenworld.com (mail.duke.tv) [81.136.197.86] F=
   l...@dukeandearl.com temporarily rejected after DATA
   ===
   
   And after checking my SA logs:
   
   ===
   Mar 31 01:25:51 mailserver spamd[5379]: spamd: result: . -4 -
   GENESIS_PHONENUMBER07 *scantime=300.0,size=24337*,
   
user=nobody,uid=8,required_score=3.2,rhost=localhost,raddr=127.0.0.1,rport=42308,mid=
   c7d27527.8a78%l...@dukeandearl.com c7d27527.8a78%25l...@dukeandearl.com
   ,autolearn=unavailable
   ==
   
   I'm trying to understand why is it taking 300.0 seconds to scan a message
   only 24Kb in size??
   I'm begeining to think that because SA is taking so long to scan the
   message, it is timing out
   and hence Exim returning a temporarily reject after DATA.
   
   My thoughs so far is to perhaps reducing the file size that SA takes to scan
   and see if the scan time reduces.
   I may be wrong in my troublshooting methods but I'm not sure why this is
   happeninig at present.
   
   Many Thanks
   
   
   
   
   
   
   2010/3/31 Karsten Bräckelmann guent...@rudersport.de
   
On Wed, 2010-03-31 at 13:24 +0100, Keith De Souza wrote:
 My current sysadmin has now left the company and I'm new to SA and
 Exim. [...]
   
 I've read somewhere that the default setting for SA to scan a message
 is 500k.
   
That's actually the default for spamc. Messages exceeding the threshold
just won't be passed to spamd. SA (and spamd) will check everything it
gets passed.
   
 Can I reduce this, so that SA scans messages 100k and below?
   
You need to change whatever glue you are using to pass messages to SA,
and skip the scanning for messages larger than your desired threshold.
   
That said, IMHO 100k is rather low. Why do you want that particular
threshold?
   
 guenther
   
   
--
char *t=\10pse\0r\0dtu...@ghno
\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8?
c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0;
}}}
   
   


Re: Off Topic - SPF - What a Disaster

2010-02-23 Thread Jeff Mincy
   From: Martin Gregorie mar...@gregorie.org
   Date: Tue, 23 Feb 2010 22:04:07 +
   
   On Tue, 2010-02-23 at 16:17 -0500, Bowie Bailey wrote:
   
The only exception is if you have a strict SPF policy for your own
domain, you can use it to reject spam pretending to be from your users.
   Agreed. That's all I use it for. 

The SPF checks in SpamAssassin will score SPF_FAIL without adding
enough points to block the email by itself.   I'm not ready to
outright block email that fail SPF.

   I installed SPF during a backscatter
   storm, which immediately decreased in volume. Since then the periodic
   backscatter showers have got steadily smaller, so it looks as though
   mailservers configured check SPF before bouncing undeliverable mail have
   been getting steadily more common. 
   
Either that or spammers tend to avoid forging domains that have SPF.

-jeff


Re: X-Relay-Countries can stick?

2010-02-12 Thread Jeff Mincy
   From: Robert Nicholson robert.nichol...@gmail.com
   Date: Fri, 12 Feb 2010 19:32:00 -0600
   
   Perhaps my confusion lies in the fact that it looks like headers != metadata?
   
   Is there a way or setting that allows metadata to result in headers in the 
message?

Did you try add_header?

ifplugin Mail::SpamAssassin::Plugin::RelayCountry
add_header all Relay-Country _RELAYCOUNTRY_
endif


Re: MTX plugin created (Re: Spam filtering similar to SPF, less breakage)

2010-02-11 Thread Jeff Mincy
   From: Charles Gregory cgreg...@hwcn.org
   Date: Thu, 11 Feb 2010 11:55:10 -0500 (EST)
   
   On Wed, 10 Feb 2010, dar...@chaosreigns.com wrote:
http://www.chaosreigns.com/mtx/
   
   You know, just for a moment I thought I would take a look, just for 
   curiosity sake, and instead got this moronic jack-ass ATTITUDE page.

Heh.  Using IE 7.0 I get:

  Your browser cannot handle the 9 year old standard required by the
  web page you attempted to access. ...

IE 7.0 displays the page fine, but you have to save the file out as a
plain html file.

-jeff


Re: Rules for not passing SPF

2010-02-02 Thread Jeff Mincy
   From: dar...@chaosreigns.com
   Date: Tue, 2 Feb 2010 18:38:20 -0500
   
   On 02/02, Marc Perkel wrote:
Why would you want to catch domains without SPF as SPF has no  
relationship to detecting spam?
   
   SPF is entirely about spam.

Actually, SPF is about forgery and forgery is part of the spam problem.
You can still have genuine spam that passes SPF.  Messages that get
SPF_FAIL are forged spam and can be scored or blocked.

   http://www.openspf.org/Introduction
   
   If everyone uses SPF, all we need to block all spam is these rules
   (SPF_NOT_PASS alone should do it), and a blacklist of domains that have
   SPF records including IPs that send spam.

Good luck.   All you need is to get everybody to use SPF and then have
a very large blacklist of spam sending domains.
http://www.rhyolite.com/anti-spam/you-might-be.html
   
   SPF is easy, there's a wizard http://www.openspf.org/, then you paste
   the results into the DNS TXT record for your domain).

SPF is great for what it does.

-jeff


Re: How should this tricky spam be filtered?

2010-01-30 Thread Jeff Mincy
   From: Kārlis Repsons karlis.reps...@gmail.com
   Date: Sat, 30 Jan 2010 13:35:26 +
   
   People,
   perhaps its simple to be done, but I personally would like to know the ways 
to 
   get rid of something like this:

Use pastebin and save the entire message including the headers instead
of forwarding messages like this.

   --  Forwarded Message  --
   ...
   ---
   
   Obviously, the only useful part of all that was the From: name field.

   SA gives just X-Spam-Status: No, score=-0.7 required=4.0 tests=BAYES_20 
   autolearn=ham version=3.2.5-gr2.
   
   Hopefully a valid question here...

Retrain the message correctly in Bayes.  Bayes will catch on to this
after a few times.  The subject alone should be a strong enough clue
for bayes (I get BAYES_80 on this partial sample), so it looks like
you are doing only autolearn and not correcting messages that were
learned incorrectly.

-jeff


Re: How should this tricky spam be filtered?

2010-01-30 Thread Jeff Mincy
   From: Kārlis Repsons karlis.reps...@gmail.com
   Date: Sat, 30 Jan 2010 14:07:16 +
   
   On Saturday 30 January 2010 13:54:14 Jeff Mincy wrote:
Retrain the message correctly in Bayes.  Bayes will catch on to this
after a few times.  The subject alone should be a strong enough clue
for bayes (I get BAYES_80 on this partial sample), so it looks like
you are doing only autolearn and not correcting messages that were
learned incorrectly.
-jeff
   
I couldn't figure out how to get an unadulterated version of the
message from the spamalyser.com link you posted in a previous message.
I tried this
 wget -O - -q http://spamalyser.com/v/5cbffujq/original.txt
pastebin has a simple way to download the original.
Anyway, I eventually got something.

   Hmm, well, I just started with SA, so my filters aren't much trained yet. 
   The thing is, I didn't believe its the Bayes filter to be used for that 
case! 

Bayes is an incredible tool, but only if you let it.  The worst thing
you can do to bayes is mistrain it by learning spam messages has ham.
The other bad thing is to limit the number of messages that it learns from.

   Because I still think, that its not correct to train SA filter on that 
letter 
   as spam! It can contain words, which simply should not contribute to be more 
   spam, no? Thats not a problem?

No, that is not a problem.
Yes, spam contains words, some of those words will also occur in ham.
Bayes will figure out which words are spammy and which are hammy and
which occur in both.

First start with training Bayes and then check if DCC and network
tests are enabled.

Anyway, I get the following.   
   
BAYES_99,DCC_CHECK,RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_FIVETEN_SPAM,RCVD_IN_NIXSPAM,RCVD_IN_UCEPROTECT1,RCVD_IN_UCEPROTECT2,RCVD_IN_UCEPROTECT3,BOTNET,BOTNET_BADDNS

Botnet/FIVETEN/NIXSPAM/UCEPROTECT are additional rules added.

-jeff


Re: How should this tricky spam be filtered?

2010-01-30 Thread Jeff Mincy
   From: Ralph Bornefeld-Ettmann ilike...@bornefeld-ettmann.de
   Date: Sat, 30 Jan 2010 18:14:10 +0100
   
   Am 30.01.2010 16:48, schrieb Jeff Mincy:
   From: Kārlis Repsons karlis.reps...@gmail.com
   Date: Sat, 30 Jan 2010 14:07:16 +
   
   On Saturday 30 January 2010 13:54:14 Jeff Mincy wrote:
Retrain the message correctly in Bayes.  Bayes will catch on to this
after a few times.  The subject alone should be a strong enough clue
for bayes (I get BAYES_80 on this partial sample), so it looks like
you are doing only autolearn and not correcting messages that were
learned incorrectly.
-jeff
   
I couldn't figure out how to get an unadulterated version of the
message from the spamalyser.com link you posted in a previous message.
I tried this
 wget -O - -q http://spamalyser.com/v/5cbffujq/original.txt
pastebin has a simple way to download the original.
Anyway, I eventually got something.

   in the Raw Message tab you can get the plain message
   (http://spamalyser.com/v/5cbffujq/raw)
   
Sorry.   Looks more like html here.

  % wget -O - -q  http://spamalyser.com/v/5cbffujq/raw | head
  !DOCTYPE html PUBLIC -//W3C//DTD HTML 4.01//EN 
http://www.w3.org/TR/html4/strict.dtd;
  html lang=en-GB
  head
  meta http-equiv=Content-Type content=text/html; charset=UTF-8

To get the raw email message, I'd have to write something like 
  wget -O - -q http://spamalyser.com/v/5cbffujq/raw | w3m -dump -T text/html
followed by sed scripts to keep the lines with line numbers discard
the line numbers.

I guess http://spamalyser.com is looking at the User-Agent: Wget/1.10.2
header.

Maybe there could be a really-raw-without-line-numbers-and-no-html target.

-jeff


Re: How should this tricky spam be filtered?

2010-01-30 Thread Jeff Mincy
   From: Kārlis Repsons karlis.reps...@gmail.com
   Date: Sat, 30 Jan 2010 17:20:23 +
   
   On Saturday 30 January 2010 15:48:36 Jeff Mincy wrote:
 BAYES_99,DCC_CHECK,RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_FIVETEN_SPAM,RCVD_IN_NIX
SPAM,RCVD_IN_UCEPROTECT1,RCVD_IN_UCEPROTECT2,RCVD_IN_UCEPROTECT3,BOTNET,BOT
NET_BADDNS

Botnet/FIVETEN/NIXSPAM/UCEPROTECT are additional rules added.
-jeff
   
   Thanks, just about DCC: why its said to be not opensource and commented 
out 
   in a spamassassin default config? Are there any closed-source binaries on a 
   client machine from it? Any such binaries related to SA exist?

DCC is a separately managed project with its own license.  DCC has to be
installed and configured (dccproc and dccifd) outside of SpamAssassin.
After DCC is installed then SpamAssassin has to be configured to use DCC
by loading the plugin.  You can install DCC from source or from various
repositories.   Same is true for razor and pyzor.
-jeff


Re: About upgrading

2010-01-11 Thread Jeff Mincy
   From: Alex mysqlstud...@gmail.com
   Date: Sat, 9 Jan 2010 21:13:24 -0500
   
  sa-learn --dump magic gives:
      0.000          0          3          0  non-token data: bayes db 
version
      0.000          0      57538          0  non-token data: nspam
      0.000          0      74876          0  non-token data: nham
      0.000          0     166338          0  non-token data: ntokens
      0.000          0 1257478501          0  non-token data: oldest atime
      0.000          0 1263049426          0  non-token data: newest atime
      0.000          0 1263049538          0  non-token data: last journal 
sync atime
      0.000          0 1263044805          0  non-token data: last expiry 
atime
      0.000          0    5529600          0  non-token data: last expire 
atime delta
      0.000          0       1868          0  non-token data: last expire 
reduction count
   
Your database has 166338 tokens which is larger than the default
bayes_expiry_max_db_size 15.  The last expiration ran this morning
at 8:46.  You could try letting the bayes database get larger and turn
off bayes_auto_expire.  If you turn off bayes_auto_expire you'll have
to add something to cron to periodically expire tokens.
bayes_auto_expire is fine for lower volumes of email, but can get in
the way with higher volumes.
   
   Also, what is the drawback with using auto_expire on larger volumes?
   Is it the locking delay and preventing learning new messages during
   that time? If you were to put it in cron to manually do an expiry, how
   often should it be run?
   
You have an exclusive lock when doing expiration.  Expiration presumably
takes longer on larger volumes, but it is still pretty fast.  
Running expiration daily or weekly should be more than sufficient.

   Is there anything that should be tested prior to making this change,
   or is it pretty benign?

Yes - turning off bayes_auto_expire is pretty benign.
You may not need to make this type of change.   The default options
for bayes work fine for lower email volumes.

   I suppose you could take the ntokens value before, and subtract it
   from the after value to see how many tokens were expired, right? It
   would be interesting to see how many tokens are expired on a regular
   basis, but not sure that's very useful, just interesting.

sa-learn tells how many tokens were deleted you when you do --force-expire, for 
example:
 expired old bayes database entries in 152 seconds
 1516428 entries kept, 115692 deleted
 token frequency: 1-occurrence tokens: 73.76%
 token frequency: less than 8 occurrences: 16.19%

-jeff


Re: About upgrading

2010-01-09 Thread Jeff Mincy
   From: Cecil Westerhof ce...@decebal.nl
   Date: Sat, 09 Jan 2010 14:39:59 +0100
   
   Cecil Westerhof ce...@decebal.nl writes:
   
I did the upgrade. It took some time and there was a slight problem with
permissions, but it looks like a successful upgrade. I only changed
/dev/null to a real mailbox, because of the 2010 problem. When something
like this happens again I now can recover those e-mails.
   
   I upgraded from 3.0.4 to 3.2.5. I have the feeling that sa-learn takes
   more time with 3.2.5 as it took with 3.0.4. Can this be true?
   
   It is not a problem, because it is done by cron-tab, but I am just
   curious.

You can use spamc -L spam/ham to learn messages.  Spamc -L is faster
than sa-learn.  The spamd daemon needs to be started with --allow-tell.

You can try using bayes_learn_to_journal - and do a separate sa-learn
--sync job in cron.   Learning to the journal is faster.

Also, What is the size of your database?   Maybe you are spending lots
of time doing expires or something.

-jeff


Re: About upgrading

2010-01-09 Thread Jeff Mincy
   From: Cecil Westerhof ce...@decebal.nl
   Date: Sat, 09 Jan 2010 16:24:56 +0100
   
   Jeff Mincy j...@delphioutpost.com writes:
   
   I upgraded from 3.0.4 to 3.2.5. I have the feeling that sa-learn takes
   more time with 3.2.5 as it took with 3.0.4. Can this be true?
   
   It is not a problem, because it is done by cron-tab, but I am just
   curious.
   
You can use spamc -L spam/ham to learn messages.  Spamc -L is faster
than sa-learn.  The spamd daemon needs to be started with
--allow-tell.
   
   That is not really an answer on my question. ;-)

I doubt that bayes learning has slowed down significantly.
I would expect that choice of bayes_store_module, learning to
journal, whether auto expiration runs, and lock contention
matters more than the version.

   But it does not seem to be interesting in my situation.
   First my code has to grow from:
   sa-learn --${typeStr} ${HOME}/Maildir/.SpamDir.${dirStr}/cur/
   to:
   for i in ${HOME}/Maildir/.SpamDir.${dirStr}/cur/*; do
   spamc -L ${typeStr} ${i}
   done
   
   Which is not even enough, because I need to take care of the situation
   that the directory is empty and I need to implement code to show the
   messages delivered by sa-learn.

Oh.  You're learning all of the messages in a directory.  spamc -L is
faster than sa-learn for learning single messages because sa-learn is
a perl script that has to load Mail::SpamAssassin each time.  For a
large directory the slower startup of sa-learn is less of an issue.
sa-learn is fine for doing directories.

   Which a low level of spam it work, but if it becomes bigger, it does not
   work:
   date
   echo ${echoStr}
   sa-learn --${typeStr} ${HOME}/Maildir/.SpamDir.${dirStr}/cur/
   date
   for i in ${HOME}/Maildir/.SpamDir.${dirStr}/cur/*; do
   spamc -L ${typeStr} ${i}
   done
   echo learned in the new way
   date
   gives:
   za jan  9 16:09:25 CET 2010
   Increase
   Learned tokens from 0 message(s) (45 message(s) examined)
   za jan  9 16:09:40 CET 2010
   learned in the new way
   za jan  9 16:10:00 CET 2010
   
   So sa-learn takes 15 seconds and spamc -L 20 seconds. (And I need more
   code. Beside taking care of an empty directory, I also need to implement
   the feedback given by sa-learn.)
   
You learned tokens from 0 messages and looked at 45 messages.
You've already previously learned from those 45 messages, which is
just timing how fast it can do nothing.

You can try using bayes_learn_to_journal - and do a separate sa-learn
--sync job in cron.   Learning to the journal is faster.
   
   I'll look into that.
   
   
Also, What is the size of your database?   Maybe you are spending lots
of time doing expires or something.
   
   sa-learn --dump magic gives:
   0.000  0  3  0  non-token data: bayes db version
   0.000  0  57538  0  non-token data: nspam
   0.000  0  74876  0  non-token data: nham
   0.000  0 166338  0  non-token data: ntokens
   0.000  0 1257478501  0  non-token data: oldest atime
   0.000  0 1263049426  0  non-token data: newest atime
   0.000  0 1263049538  0  non-token data: last journal 
sync atime
   0.000  0 1263044805  0  non-token data: last expiry atime
   0.000  05529600  0  non-token data: last expire 
atime delta
   0.000  0   1868  0  non-token data: last expire 
reduction count
   
Your database has 166338 tokens which is larger than the default
bayes_expiry_max_db_size 15.  The last expiration ran this morning
at 8:46.  You could try letting the bayes database get larger and turn
off bayes_auto_expire.  If you turn off bayes_auto_expire you'll have
to add something to cron to periodically expire tokens.
bayes_auto_expire is fine for lower volumes of email, but can get in
the way with higher volumes.
-jeff


RE: [sa] Re: FH_DATE_PAST_20XX

2010-01-02 Thread Jeff Mincy
   From: R-Elists list...@abbacomm.net
   Date: Sat, 2 Jan 2010 08:33:42 -0800
   
 
/20[1-9][0-9]/   -- /20[2-9][0-9]/
   

   we changed it to this before the update and still had the issue.
   
   so we changed back to the older version and then zero'd the score.
   
   waitied for the update
   
   after the update, changed the score to a small positive value to re-enable
   yet the rule is still *hitting* for some reason...
   
   since it is a header rule, what should i start looking at to see where the
   issue is coming from?
   
   somewhere in SA? should i enable special logging?
   
   or, should i check the MTA and it's assigns that deal with the header?

The rule is probably also defined in some other file.
Are you using 00_FVGT_File001.cf?  If so check there.

-jeff


RE: [sa] Re: FH_DATE_PAST_20XX

2010-01-01 Thread Jeff Mincy
   From: R-Elists list...@abbacomm.net
   Date: Fri, 1 Jan 2010 15:48:13 -0800
   
Cc: Spamassassin users list
Subject: Re: [sa] Re: FH_DATE_PAST_20XX

Damn -- mea culpa.  When we fixed the bug in SVN trunk in bug 
5852, I should have immediately backported it to the 3.2.x 
sa-update channel when I commited that patch, but I didn't.

It's now fixed in updates, but that won't help the admins 
who've been paged to deal with high FP rates on a holiday.  
:(  Sorry folks...

--j.
   
   what should the new rule look like?
   
   i mean, i get it, and i think i know, and i even tested it and it was still
   failing even after a restarts...
   
   s...

   seriously, i disabled the rule early AM yet when the update came through 4
   or so hours later, i believe it looks exactly the same as when i first
   viewed it early on...

The easiest way to see what is being changed since your last sa-update
is to first sa-update /tmp and diff.  The change is trivial but significant...

   root% sa-update -D --updatedir /tmp/updates
   root% diff -r -U 0 /var/lib/spamassassin/3.002005/updates_spamassassin_org 
/tmp/updates/updates_spamassassin_org

   diff -u -w --minimal -r -U 0 
/var/lib/spamassassin/3.002005/updates_spamassassin_org/72_active.cf 
/tmp/updates/updates_spamassassin_org/72_active.cf
   --- /var/lib/spamassassin/3.002005/updates_spamassassin_org/72_active.cf 
2009-07-20 17:01:55.0 -0400
   +++ /tmp/updates/updates_spamassassin_org/72_active.cf   2010-01-01 
18:51:10.0 -0500
   @@ -527,7 +527,7 @@
##{ FH_DATE_PAST_20XX
   -header   FH_DATE_PAST_20XX  Date =~ /20[1-9][0-9]/ [if-unset: 2006]
   +header   FH_DATE_PAST_20XX  Date =~ /20[2-9][0-9]/ [if-unset: 2006]
describe FH_DATE_PAST_20XX  The date is grossly in the future.
##} FH_DATE_PAST_20XX


-jeff


Re: dkim whitelisting

2009-12-16 Thread Jeff Mincy
   From: LuKreme krem...@kreme.com
   Date: Wed, 16 Dec 2009 08:23:23 -0700

   I'm adding address book users into the user_prefs files, but without
   the signing domain this is useless and emails for my users are still
   getting tagged up as spam (these in particular score 7-10 points
   without the whitelist). Is there a better way, or do I just have to
   go in and find a DKIM-Signature for each address book entry and then
   parse out the d= field?
   
Yes, you need the d= part.  Note You should only do this for messages
from domains that are signed and pass DKIM with DKIM_VERIFIED.  Adding
whitelist_From_dkim won't do any good if you don't have DKIM_SIGNED
and DKIM_VERIFIED.

   grep -r ^DKIM-Signature: $HOME/Maildir | awk  '{print $4}' | sed 's/d=//' 
| sed 's/;//' | sort -u
   
   I dunno, doesn't seem that efficient (oh, and it doesn't work since the d= 
doesn't appear in the same location in all the headers).
   
If you are going to use sed, You need the entire DKIM_Signature header
as one line.  Use formail to extract the header, for example
  formail -c -x DKIM-Signature:


NAME
   formail - mail (re)formatter

...
   -c   Concatenate continued fields in the header.  Might  be  convenient
when postprocessing mail with standard (line oriented) text utili-
ties.
-jeff


Re: HABEAS_ACCREDITED SPAMMER

2009-11-24 Thread Jeff Mincy
   From: LuKreme krem...@kreme.com
   Date: Mon, 23 Nov 2009 17:08:11 -0700
   
   On Nov 23, 2009, at 7:39, Matus UHLAR - fantomas uh...@fantomas.sk  
   wrote:
   
Yes, why to differ between non-abusing and abusing marketers...
   
   We've been through this before. On my mail, habeas is a very strong  
   indicator of spam. It does not appear in legitimate mail.
   
I find it a little hard to believe that your spam is so much different from
my spam.  On my mail, not one single spam message (out of 228k total) hit
HABEAS for all of 2009.  The few messages (480 out of 11k) that hit HABEAS
were all ham, either professional organizations/newsletters, transactions
from places like Vanguard or retail stores that I have a relationship with.

   I don't know who these legitimate marketers are, but I don't feel I'm  
   missing anything.
   
You WILL 'block' legitimate mail.  However, It's your email, so you
can do anything you want.  If you think HABEAS is so bad just set the
HABEAS scores to zero and save the network bandwidth.

-jeff


Re: Timeouts: pyzor and razor2

2009-11-09 Thread Jeff Mincy
   From: Art Greenberg a...@eclipse.net
   Date: Mon, 9 Nov 2009 17:58:48 -0500 (EST)
   
   Lately I'm seeing a fairly consistent timeout for checks sent to pyzor and 
   razor2 by SA. Up until a couple of days ago this was a very rare 
   concurrence. Seems odd that both of these would have this trouble at the 
   same time. Has anyone else noticed this? Perhaps I changed something here 
   that is causing it 

Pyzor is currently timing out:
  % /usr/bin/pyzor ping
   public.pyzor.org:24441   TimeoutError: 

Razor is fine
You can increase the timeout if razor is running slow:
   ifplugin Mail::SpamAssassin::Plugin::Razor2
   # How many seconds you wait for razor to complete before you go on without 
the results
   razor_timeout 15
   endif

-jeff


Re: just enabled DCC

2009-10-13 Thread Jeff Mincy
   From: Dan Schaefer d...@performanceadmin.com
   Date: Tue, 13 Oct 2009 08:54:29 -0400
   
   Jason Bertoch wrote:
Dan Schaefer wrote:
I just enabled DCC yesterday and everything appears to be working 
(DCC is registered).  Just to make sure, can someone post an email to 
pastebin that has a DCC hit? Thanks.
   
IIRC, a message with test in the subject and body will match, 
although your logs should tell you what rules are hitting anyway.
   
   Is DCC_CHECK the only DCC rule? Because I didn't find that in my logs 
   yesterday. test in the subject and test in the body only triggered 
   TVD_SPACE_RATIO and BAYES_00 from my personal email address to my work 
   address. Any other suggestions?
   
Use
   spamassassin --test-mode --debug dcc  somespammsg

Should print out stuff like:

   08:58:51.617 0.375 0.375 [28903] dbg: dcc: network tests on, registering DCC
   08:58:54.405 3.164 0.943 [28903] dbg: dcc: dccifd is available: 
/var/lib/dcc/dccifd
   08:58:54.585 3.343 0.179 [28903] dbg: dcc: dccifd got response: 
X-DCC--Metrics: pinky 1356; bulk Body=3 Fuz1=4384 Fuz2=many
   08:58:54.585 3.343 0.000 [28903] dbg: dcc: listed: BODY=3/20 FUZ1=4384/20 
FUZ2=99/20


-jeff


Re: just enabled DCC

2009-10-13 Thread Jeff Mincy
   From: Dan Schaefer d...@performanceadmin.com
   Date: Tue, 13 Oct 2009 09:18:44 -0400
   
   Jeff Mincy wrote:
   From: Dan Schaefer d...@performanceadmin.com
   Date: Tue, 13 Oct 2009 08:54:29 -0400
   
   Jason Bertoch wrote:
Dan Schaefer wrote:
I just enabled DCC yesterday and everything appears to be working 
(DCC is registered).  Just to make sure, can someone post an email 
to 
pastebin that has a DCC hit? Thanks.
   
IIRC, a message with test in the subject and body will match, 
although your logs should tell you what rules are hitting anyway.
   
   Is DCC_CHECK the only DCC rule? Because I didn't find that in my logs 
   yesterday. test in the subject and test in the body only triggered 
   TVD_SPACE_RATIO and BAYES_00 from my personal email address to my work 
   address. Any other suggestions?
   
Use
   spamassassin --test-mode --debug dcc  somespammsg
   
Should print out stuff like:
   
   08:58:51.617 0.375 0.375 [28903] dbg: dcc: network tests on, 
registering DCC
   08:58:54.405 3.164 0.943 [28903] dbg: dcc: dccifd is available: 
/var/lib/dcc/dccifd
   08:58:54.585 3.343 0.179 [28903] dbg: dcc: dccifd got response: 
X-DCC--Metrics: pinky 1356; bulk Body=3 Fuz1=4384 Fuz2=many
   08:58:54.585 3.343 0.000 [28903] dbg: dcc: listed: BODY=3/20 
FUZ1=4384/20 FUZ2=99/20
   
   
-jeff
  
   I followed your instructions and received the following:
   
   [1486] dbg: dcc: network tests on, registering DCC
   [1486] dbg: dcc: dccifd is not available: no r/w dccifd socket found
   [1486] dbg: dcc: dccproc is not available: no dccproc executable found
   [1486] dbg: dcc: dccifd and dccproc are not available, disabling DCC
   
   After seeing that, I NAT-ed 1023 local to 6277 remote and 6277 remote to 
   1023 to my mail server in my firewall. I ran the test again and received 
   the same message.

Your firewall is not the problem shown here.  SpamAssassin can't find
the dcc socket and executable.  Do you have DCC installed?  If so,
where is the dccproc executable?  Did you start dccifd?  Where is the
dccifd socket?  SpamAssassin needs to know where they are.  You can
use various configuration options to tell SpamAssassin where to look,
for example:
  ## DCC options (Admin only)
  dcc_home /var/lib/dcc
  dcc_dccifd_path /var/lib/dcc/dccifd
  dcc_path /usr/bin/dccproc

-jeff


Re: just enabled DCC

2009-10-13 Thread Jeff Mincy
   From: Dan Schaefer d...@performanceadmin.com
   Date: Tue, 13 Oct 2009 10:17:43 -0400
   
   Jeff Mincy wrote:
   From: Dan Schaefer d...@performanceadmin.com
   Date: Tue, 13 Oct 2009 09:18:44 -0400
   
   Jeff Mincy wrote:
   From: Dan Schaefer d...@performanceadmin.com
   Date: Tue, 13 Oct 2009 08:54:29 -0400
   
   Jason Bertoch wrote:
Dan Schaefer wrote:
I just enabled DCC yesterday and everything appears to be 
working 
(DCC is registered).  Just to make sure, can someone post an 
email to 
pastebin that has a DCC hit? Thanks.
   
IIRC, a message with test in the subject and body will match, 
although your logs should tell you what rules are hitting anyway.
   
   Is DCC_CHECK the only DCC rule? Because I didn't find that in my 
logs 
   yesterday. test in the subject and test in the body only 
triggered 
   TVD_SPACE_RATIO and BAYES_00 from my personal email address to my 
work 
   address. Any other suggestions?
   
Use
   spamassassin --test-mode --debug dcc  somespammsg
   
Should print out stuff like:
   
   08:58:51.617 0.375 0.375 [28903] dbg: dcc: network tests on, 
registering DCC
   08:58:54.405 3.164 0.943 [28903] dbg: dcc: dccifd is available: 
/var/lib/dcc/dccifd
   08:58:54.585 3.343 0.179 [28903] dbg: dcc: dccifd got response: 
X-DCC--Metrics: pinky 1356; bulk Body=3 Fuz1=4384 Fuz2=many
   08:58:54.585 3.343 0.000 [28903] dbg: dcc: listed: BODY=3/20 
FUZ1=4384/20 FUZ2=99/20
   
   
-jeff
  
   I followed your instructions and received the following:
   
   [1486] dbg: dcc: network tests on, registering DCC
   [1486] dbg: dcc: dccifd is not available: no r/w dccifd socket found
   [1486] dbg: dcc: dccproc is not available: no dccproc executable found
   [1486] dbg: dcc: dccifd and dccproc are not available, disabling DCC
   
   After seeing that, I NAT-ed 1023 local to 6277 remote and 6277 remote 
to 
   1023 to my mail server in my firewall. I ran the test again and 
received 
   the same message.
   
Your firewall is not the problem shown here.  SpamAssassin can't find
the dcc socket and executable.  Do you have DCC installed?  If so,
where is the dccproc executable?  Did you start dccifd?  Where is the
dccifd socket?  SpamAssassin needs to know where they are.  You can
use various configuration options to tell SpamAssassin where to look,
for example:
  ## DCC options (Admin only)
  dcc_home /var/lib/dcc
  dcc_dccifd_path /var/lib/dcc/dccifd
  dcc_path /usr/bin/dccproc
   
-jeff
  
   I did just install DCC, but I don't know if it is installed correctly. 
   And of course, DCC's website is down 
   (http://www.rhyolite.com/anti-spam/dcc/). I used the instructions here 
   instead: http://www.freespamfilter.org/FC4.html#_Toc110999211
   
   Now when I run:
   spamassassin -t -D dcc  spam_message
   I get:
   [2955] dbg: dcc: network tests on, registering DCC
   [2955] dbg: dcc: dccifd is not available: no r/w dccifd socket found
   [2955] dbg: dcc: dccproc is available: /usr/bin/dccproc
   [2955] dbg: dcc: opening pipe: /usr/bin/dccproc -H -x 0 -a 74.86.146.6  
   /tmp/.spamassassin2955q6p1Yatmp
   [2955] dbg: dcc: got response: X-DCC-SIHOPE-DCC-3-Metrics: 
   pony.performanceadmin.com 1085; Body=2 Fuz1=2 Fuz2=many
   
   and
   2.2 DCC_CHECK  Listed in DCC 
   (http://rhyolite.com/anti-spam/dcc/)
   in the report
   
   Even though the dccfid socket cannot be found, does this appear to be 
   working correctly?

Yes dccproc is working.  You got a hit on DCC_CHECK.  

You should use dccifd if possible.  It is faster.

-jeff


Re: Another dcc question

2009-10-13 Thread Jeff Mincy
   From: Rick Knight rick_kni...@rlknight.com
   Date: Tue, 13 Oct 2009 08:53:21 -0700
   
   Just following this thread because I recently got dcc working also. In 
   my case I didn't have dcc installed. After installing dcc everything  
   seems to be working but now I'm wondering about dccifd. On my system 
   dccproc is in /usr/local/bin but dccifd is in /var/dcc/libexec/. I also 
   have start-dccifd in /var/dcc/libexec. I assume I need to add 
   dcc_dccifd_path to my local.cf and then run start-dccifd before starting 
   spamassassin. Is that correct?
   
Run spamassassin  --test-mode.   If spamassassin finds dccifd it will
say 'dccifd is available':

  % spamassassin --test-mode --debug dcc  MESSAGE 21 | fgrep dccifd
  134:[14145] dbg: dcc: dccifd is available: /var/lib/dcc/dccifd
  135:[14145] dbg: dcc: dccifd got response: X-DCC-sonic.net-Metrics: pinky 
1156; bulk Body=1 Fuz1=many Fuz2=many

If you get 'dccifd is not available:
  ... dbg: dcc: dccifd is not available: no r/w dccifd socket found

then you need to use dcc_dccifd_path or dcc_home
-jeff


Re: Another dcc question

2009-10-13 Thread Jeff Mincy
   From: Rick Knight rick_kni...@rlknight.com
   Date: Tue, 13 Oct 2009 09:42:18 -0700
   
   Jeff Mincy wrote:
   From: Rick Knight rick_kni...@rlknight.com
   Date: Tue, 13 Oct 2009 08:53:21 -0700
   
   Just following this thread because I recently got dcc working also. In 
   my case I didn't have dcc installed. After installing dcc everything  
   seems to be working but now I'm wondering about dccifd. On my system 
   dccproc is in /usr/local/bin but dccifd is in /var/dcc/libexec/. I also 
   have start-dccifd in /var/dcc/libexec. I assume I need to add 
   dcc_dccifd_path to my local.cf and then run start-dccifd before 
starting 
   spamassassin. Is that correct?
   
Run spamassassin  --test-mode.   If spamassassin finds dccifd it will
say 'dccifd is available':
   
  % spamassassin --test-mode --debug dcc  MESSAGE 21 | fgrep dccifd
  134:[14145] dbg: dcc: dccifd is available: /var/lib/dcc/dccifd
  135:[14145] dbg: dcc: dccifd got response: X-DCC-sonic.net-Metrics: 
pinky 1156; bulk Body=1 Fuz1=many Fuz2=many
   
If you get 'dccifd is not available:
  ... dbg: dcc: dccifd is not available: no r/w dccifd socket found
   
then you need to use dcc_dccifd_path or dcc_home
-jeff
  
   Thanks Jeff,
   
   When I run test-mode I just get this
   
   bash: MESSAGE: No such file or or directory
   
   I'm sure I'm just useing the command wrong.

create a file called MESSAGE that contains a complete spam message
with full headers.


Re: Incresing numbers of DCC_CHECK in ham

2009-10-09 Thread Jeff Mincy
   From: Jari Fredriksson ja...@iki.fi
   Date: Fri, 9 Oct 2009 17:58:06 +0300
   
   This looks worrying. I have it at 2.2 pts, and not caused any false
   positives, but still, odd. Or is it? I know it is a SPAM indicator
   but a bulk indicator.

Auto correct: That should be 'I know it is *not* a spam indicator but a bulk 
indicator.'

Yes - it indicates bulk.  Lots of people have seen the email message.
DCC will hit spam, mailing lists, and retail email such as amazon, and
various extremely short email messages.
   
   But it is triggered for example by some mailing list posts which are genuine 
and not bulk.

What is a genuine mailing list post that is not bulk?  If lots of
people are on the mailing list then the message is, by definition, bulk.
   
   Is someone trying to poison DCC?

Yes, you are(:-)   If you haven't whitelisted the mailing list then
you are reporting the email from the mailing list to DCC, which will
increase the DCC count.   Eventually somebody will report the mailing
list as spam to DCC and you will get a DCC match on the default
many=99.

You have to whitelist the mailing list in the dcc whiteclnt file.

-jeff


Re: Incresing numbers of DCC_CHECK in ham

2009-10-09 Thread Jeff Mincy
   From: Jari Fredriksson ja...@iki.fi
   Date: Fri, 9 Oct 2009 19:25:15 +0300
   
  Is someone trying to poison DCC?

Yes, you are(:-)   If you haven't whitelisted the
mailing list then 
you are reporting the email from the mailing list to DCC,
which will 
increase the DCC count.
   
   Me? But I do report to DCC/Razor2/SpamCop only spam. I do not report ALL my 
email.

Using spamassassin --report reports the spam message to dcc with a -t
target count of many

   How does DCC actually work? Is any query a report somehow for DCC?

If you ask the DCC network you are reporting it.
From the dccproc man page.
 -Q   only queries the DCC server about the checksums of messages instead
  of reporting and then querying.  This is useful when dccproc is used
  to filter mail that has already been reported to a DCC server by
  another DCC client such as dccm(8).  This can also be useful when
  applying a private white or black list to mail that has already been
  reported to a DCC server.  No single mail message should be reported
  to a DCC server more than once per recipient, such as would happen
  if dccproc is not given -Q when processing a stream of mail that has
  already been seen by a DCC client.  Additional reports of a message
  increase its apparent bulkness.
-jeff


Re: Incresing numbers of DCC_CHECK in ham

2009-10-09 Thread Jeff Mincy
   From: Jari Fredriksson ja...@iki.fi
   Date: Fri, 9 Oct 2009 20:44:09 +0300
   
DCC identifies mail that has been sent often. That's what
the rule checks for, if other recipients have seen it,
too. 

You voluntarily installed DCC, knowing SA will use it.
This was on your discretion, and it's your duty to
evaluate if it actually is, what you want.

[1] Once, mind you. Which is what DCC does, counting. The
   report spam option in SA reports it differently as
many. 
   
   1. So what is DCC good for?

DCC is extremely good at detecting bulk messages.  All or nearly all
spam messages are bulk.

   2. Why does SpamAssassin use it?
   
DCC is a separately configured plugin that does not run unless
configured to do so at each SpamAssassin site.

   3. Should I uninstall DCC if I want to get bulk but not Spam?
   
You should whitelist legitimate bulk email in the DCC whiteclnt file.
Or you could bypass SpamAssassin for mailing lists.  You could lower
the DCC_CHECK score.   Or you could disable or uninstall DCC.

   4. Question 2. again. SpamAssassin is about Spam, but I really need
  to receive bulk, as in mailing lists and newspaper posts. Are
  there people do not want any mail but what their friends send
  them, and that is the purpose of DCC?

If you use DCC you have to whitelist legitimate sources of bulk email.

   5. What special does the Report to DCC SpamAssassin function do for our 
good?
   
Using Report to DCC reports the message to DCC with a count of many.
After that everybody else querying the same message will get a count
of many.

-jeff


Re: Problems with whitelist_from_rcvd

2009-10-02 Thread Jeff Mincy
   From: Igor Bogomazov b...@hl.ru
   Date: Fri, 2 Oct 2009 12:34:55 +0400
   
   When I add the string like:
   whitelist_from s...@domain.mail
   it works OK.
   
   But:
   whitelist_from_rcvd s...@domain.mail prefix.domain.mail
   doesn't work.
   
   I've checked rDNS of the prefix.domain.mail with 'host' utility - it's
   all right.
   
   And the appropriate mail header seems to be correct:
   Received: from prefix.domain.mail (unknown [12.12.12.12])
   
   What's the matter?

It is hard to say for sure without seeing actual received headers.

You need to use the last external relay used by the email.

From man Mail::SpamAssassin::Conf. 

   whitelist_from_rcvd ...

   This string is matched against the reverse DNS lookup used during
   the handover from the internet to your internal network's mail
   exchangers.  It can either be the full hostname, or the domain
   component of that hostname.  ...

The easiest way to figure out which one to use is to add a Relay
header using:
   add_header all Relay trusted=_RELAYSTRUSTED_, untrusted=_RELAYSUNTRUSTED_

Then get the RDNS from the first untrusted=[ip=... rdns=RDNS ...] relay.
If the RDNS is blank then the whitelist_from_rcvd won't work.

Your internal_networks and trusted_networks needs to be setup correctly.

-jeff


Re: Re-running SA on an mbox

2009-09-22 Thread Jeff Mincy
   From: MySQL Student mysqlstud...@gmail.com
   Date: Tue, 22 Sep 2009 15:38:47 -0400
   
Try using a local SA setup for stripping the headers. By local, I mean
don't use your main production SA - run a separate copy with its own
(cut down) configuration and all data base accesses and UBL calls etc
turned off.
   
   Much better idea, thanks. Thanks for the script, too.
   Alex

formail can be used to remove headers, for example:

   To remove all Received: fields from the header:
  formail -I Received:

The following should do what you wanted to remove the X-Spam headers:
  formail -I X-Spam  msg

-jeff


Re: Problem with whitelist_from_rcvd and forged reverse lookup

2009-07-30 Thread Jeff Mincy
   From: Sebastian Wiesinger spamassassin.us...@ml.karotte.org
   Date: Thu, 30 Jul 2009 17:48:09 +0200
   
   * John Hardin jhar...@impsec.org [2009-07-30 17:39]:
Sendmail - Procmail - SA (spamc)
   
Cool, that should be simple.
   
Can you send:
   
(1) the Received: headers from an email generated on that box, and
   
(2) the procmail stanza where you call SA?
   
   I could create a procmail rule that excludes local mail from SA, but I
   would much rather like to whitelist this in spamassassin. Nevertheless
   thanks for your offer to help with procmail.
   
Processing locally generated email that contain spam URLs through
SpamAssassin is not a particularly good idea.  If you have Bayes
enabled then you are training your Bayes that spam URLs and whatever
else is in the log files are hammy tokens.

You really do want to skip SpamAssassin processing on messages like
this in your procmail.

-jeff


Re: Pyzor or DCC

2009-07-23 Thread Jeff Mincy
   From: Jonas Eckerman jonas_li...@frukt.org
   Date: Thu, 23 Jul 2009 15:37:11 +0200
   
   Michael Hutchinson wrote:
   
I saw a test
message with just the word test in the subject hit DCC once.
   
That's really strange, I don't see how DCC would fire on the subject..
the checksum of the message must have somehow matched some Spam.. 
   
   That's perfectly normal. DCC doen't just match spam, it matches things 
   that has been seen before. That means it matches bulk, but also anything 
   that happens to be very common for other reasons.

yep.
   
   I imagine that an empty message with the subject test is pretty 
   common, so it's perfectly reasonable for DCC to have seen such messages 
   many times before.
   
   I don't know if DCC cares about the subject att all. If it doesn't, it's 
   even more liekey that it would hit on an empty test message.
   
   /Jonas

DCC does hit on empty messages.   The empty messages can be
whitelisted.   The DCC distribution includes a fetch-testmsg-whitelist
script:

% head /usr/src/dcc-1.3.111/misc/fetch-testmsg-whitelist
#!/bin/sh

# Fetch a list of empty mail messages for whitelisting.  Many free mail
#   service providers add HTML or other text to mail.  That causes empty
#   and nearly empty mail messages to have valid DCC checksums and not be
#   ignored by DCC clients.

# The fetched file can be included in whiteclnt files.  For example, the
#   following line in /var/dccwhiteclnt would whitelist many common
#   empty messages


Re: Pyzor or DCC

2009-07-22 Thread Jeff Mincy
   From: RW rwmailli...@googlemail.com
   Date: Wed, 22 Jul 2009 03:45:50 +0100
   
   On Wed, 22 Jul 2009 13:42:52 +1200
   Michael Hutchinson mhutchin...@manux.co.nz wrote:
   
If you get an E-Mail scoring in both Pyzor and DCC, the chances are
very high that the message is Spam. We only deal with around 90,000
incoming delivery attempts per day - but have not had a false
positive from Pyzor or DCC yet, and have been using both for some
years.
   
   That's odd, I get quite a lot of DCC FPs and a few Pyzor FPs on a
   relatively small amount of email. They tend to hit on bulk mail, like
   newsletters, automated mail and very generic mails. I saw a test
   message with just the word test in the subject hit DCC once. 

DCC identifies 'bulk' email.  You have to whitelist desired bulk
email senders in the DCC whiteclnt (etc) file.  The DCC distribution
includes sample scripts like edit-whiteclnt.
   
Pyzor and Razor are easier to use because of the whitelisting.
Razor and DCC are both highly effective (80%), and Pyzor is good (40%).

-jeff


Re: Underscores

2009-07-16 Thread Jeff Mincy
   From: Matt Kettler mkettler...@verizon.net
   Date: Thu, 16 Jul 2009 08:52:50 -0400
   
   twofers wrote:
How can I pattern match when every word has an underscore after it.
Example:
This_sentenance_has_an_underscore_after_every_word
   
I'm not really good at Perl pattern matching, but \w and \W see an
underscore as a word character, so I'm just not sure what might work.
   
body =~ /^([a-z]+_+)+/i
   
Is that something that will work effectively?

Is this for a spam rule?

   I'd do something like this:
   
   body  MY_UNDERSCORES/\S+_+\S+_+\S+/
   
   Unless you really want to restrict it to A-Z.
   
   Regardless, ending any regex in + in a SA rule is redundant. Since +
   allows a one-instance match, it will devolve to that. You don't need to
   match the entire line with your rule, so the extra matches are
   redundant. It will match the first instance, and that's all it needs to
   be a match.
   
   Also any regex ending in * should just have it's last element removed,
   as that will devolve to a zero-count match.

The /\S+_+\S+_+\S+/ rule will lots of technical email, for example
discussions on shell environment variables like LD_LIBRARY_PATH.

-jeff


Re: rbl/dnsbl seems to use wrong ip sometimes

2009-07-11 Thread Jeff Mincy
   From: dmy i...@dwsa.de
   Date: Sat, 11 Jul 2009 14:27:34 -0700 (PDT)
   
   So is there a way to configure that ALL DNS tests just use the last external
   ip address (or at least NOT the first one?). Because to me it doesn't make
   any sense to test the ip people use to deliver messages to their smarthost
   and it produces quite a few false positives on my system...

The smarthost presumably requires authenticated senders.
The smarthost should then add a Received: header that shows that the
sender was authenticated (eg ESMTPSA).   If the smarthost is trusted
then the sender will be trusted.   Various tests are not run on
trusted hosts.
-jeff


   RW-15 wrote:
On Sat, 11 Jul 2009 12:52:56 -0700 (PDT)
dmy i...@dwsa.de wrote:

As far as I understand SpamAssassin is supposed to just check the ip
that directly delivered the email to my server but not the IP the
email is originally from (as that woundn't make any sense as almost
everyone is using dyn ips...). 

It depends on the test. Most of them run on all addresses outside the
trusted network, except for DUL tests and Spamhaus PBL + XBL which run
on the last external.

   -- 
   View this message in context: 
http://www.nabble.com/rbl-dnsbl-seems-to-use-wrong-ip-sometimes-tp24443359p2012.html
   Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
   


Re: USER_IN_WHITELIST Not Scoring

2009-07-10 Thread Jeff Mincy
   From: Karsten Bräckelmann guent...@rudersport.de
   Date: Fri, 10 Jul 2009 23:43:03 +0200
   
   On Fri, 2009-07-10 at 06:53 -0700, an anonymous Nabble user wrote:
My local root user sends me nightly emails with mail/spam statistics and
information.  Because of the spam information contained in the email, it
sometimes flagged as spam itself.

In my local.cf, I have put the root user's email address in the
whitelist_from line, however whenever I send an email as the root user to 
my
legitimate email account, it is not getting scored.
   
 whitelist_from r...@myphonydomain.com
   
   Don't use the un-constrained whitelist_from, unless as a last resort, if
   there's no other way and you cannot use the proper constrained ones,
   like whitelist_from_rcvd.
   
A local root sender should be getting ALL_TRUSTED.  whitelist_from_rcvd
won't work on local email - you need at least one external hop to get the
'rcvd' part.  You could write SpamAssassin rules to look for the messages,
but you probably don't want to AUTOLEARN the messages since any tokens in
the email are probably spam hosts.  As pointed out earlier, this type of
email should bypass SpamAssassin in procmail (etc).

   Anyway, no sample -- no way to point out your issue. Do paste at least
   the headers of such a mail.
   
Yep.

-jeff


Re: Controlling spamd logging from spamc

2009-06-04 Thread Jeff Mincy
   From: Martin Gregorie mar...@gregorie.org
   Date: Tue, 02 Jun 2009 16:54:11 +0100
   
   How difficult would it be to let spamc control spamd's logging output on
   a per-message basis? 
   
   My reason for asking is this: I maintain a body of spam that I use to
   develop and regression test local rules and, during rule development,
   use spamc to pass the test messages through my only copy of spamd. This
   is useful because I can keep the test messages in a normal user on a
   different host from the one running spamd and avoid local configuration
   ambiguities. However, as part of my logwatch environment I run a perl
   program to collect the day's spam stats. I find that the stats are
   meaningless any day I develop and/or regression test rules because, of
   course, spamd is logging these as well as actual mail. I should add
   that, since my ISP introduced greylisting, the 'spam' logged during
   regression testing is at least 12 times the volume of genuine spam
   received that day, so the day's stats are meaningless and so are any
   stats generated by scanning the whole of /var/log/maillog* 
   
   It would be useful for me to be able to disable spamd logging during
   rule testing. 
   
Wouldn't it be easier to run another spamd on a different machine for
rule development and testing?  Or perhaps just running as a different
'test' user, and then ignore log messages for that user in the statistics.

   Would anybody else find this a useful feature too?

I've sometimes wanted the other way - eg get more debugging output for
a particular message.

-jeff


Re: AWL functionality messed up?

2009-05-28 Thread Jeff Mincy
   From: Linda Walsh sa-u...@tlinx.org
   Date: Wed, 27 May 2009 17:28:35 -0700
   
   Jeff Mincy wrote:
   From: Linda Walsh sa-u...@tlinx.org
   Date: Wed, 27 May 2009 12:48:43 -0700
   
   Bowie Bailey wrote:  
   At face value, this seems very counter productive.
   
You still aren't understanding the wiki or the AWL scoring or what AWL
is trying to do.
   
Ah, but it only seems I'm daft, today...:-)
   
   If I get spam from 1000 senders, they all end up in my
   AWL???
   
yes.   every email+ip address pair that sends you email winds up in
your AWL with an average score for that pair.  This is ok.
   
GRRRnot so ok in my mindset, but ... and ... errr..
   well that only makes it more confusing, in a way...since I was
   only 99% certain that I'd never gotten any HAM from hostname
   '518501.com' (thinking for a short period that AWL might be classify
   things by hosts as reliable or not, instead of, or in addition to
   by email-addr), but I'm 99.97% certain I've never gotten any HAM
   from user 'paypal.notify' (at) hostname '5185
   
It is using the relay IP address, not the hostname...
You've most likely received some other spam from this email+ip pair
that was scored as ham.  Hard to tell without seeing the original
scores.
   
   AWL should only be added to by emails judged to be 'ham' via
   the feed back mechanisms --, spammers shouldn't get bonuses for
   being repeat senders...
   
You are getting too attached to the 'whitelist' part of the name.
Pretend AWL stands for average weighting list.
   =
Aw...come on.  Isn't the world difficult enough without
   changing white to black or white to weighing?  I mean, we humans
   have enough trouble agreeing on what our symbols, words mean in
   relation to concepts and all without ya goin' and redefining perfectly
   good acceptable symbols to mean something else completely and still
   claim it to be some semblance of English.   No wonder most of the
   non-techno-literate humans on this world regard us techies with
   a hint of suspicion regarding the difficulty of problems.  We go around
   redefining words to suit reality and catch the heat when the rest of
   the world doesn't understand our meaning:
   
I don't think AWL is the best possible name for the functionality,
simply because it is easy to misinterpret.

AWL isn't whitelisting spammers.   It is pushing the score to the
average for that sender.   The sender can have a high average or a low
average.   
   ---
An average?  So it keeps the scores of all the past emails of every 
email we 
   ever got sent?  Must just store a weighted average -- otherwise
   the space (hmm...someone said something about 80MB+ auto-whitelist DB
   files?)
   
AWL tracks the total score and the number of messages.

Why not call it the Historically Based Score Normalizer or
   HBSN module?  Db file could be historical-norms or something.
   
Call it BOB if that will help ...
   
If the previous email from a particular sender was FP or FN then AWL
will have an incorrect average and will wind up doing or trying to do
the wrong thing with subsequent email for that sender.
   
Maybe it shouldn't add in the 'average' unless it exceeds
   the 'auto-learning threshold'??  I.e. something like the
   'bayes_auto_learn_threshold_nonspam' for HAM and the
   'bayes_auto_learn_threshold_spam' for SPAM.  Assuming it doesn't
   already do such a thing, it would make a little sense...so as
   not to train it on 'bad data'...
   
Perhaps.   I don't have a particularly strong opinion.

When I run sa-learn --spam email over a message, can I
   assume (or is it the case) that telling SA, a message was 'spam'
   would assign a sufficiently large value to the 'HBSN' value for that
   sender to reduce any effect of having falsely (if it is likely to happen)
   incorrect value?
   
Nope.

Or might I at least assume that each sa-learn over a message
   will modify it's AWL score appropriately?
   
no.  You shouldn't assume.  sa-learn doesn't modify the AWL entry.
You can use spamassassin --add-to-blacklist.

You can remove addresses using spamassassin --remove-from-whitelist
   
Yes...saw that after visiting the wiki.  Is there a
   --show-whitelist-with-current-scores-and-their-weight switch as well
   (as opposed to one that only showed the addr's in the white list, or only
   showed the non-weighted scores)?
   
If I understand what you are asking for here, you can add an X-Spam-AWL
header that gives you the current scores:
  add_header all AWL awl=_AWL_, mean=_AWLMEAN_, count=_AWLCOUNT_, 
prescore=_AWLPRESCORE_
The awl scores are stored in a database file.  You can do db type
things with the awl file.

Thanks...and um...
How difficult would it be to have the name of the module reflect
   what it's

Re: AWL functionality messed up?

2009-05-27 Thread Jeff Mincy
   From: Linda Walsh sa-u...@tlinx.org
   Date: Wed, 27 May 2009 12:48:43 -0700
   
   Bowie Bailey wrote:
Linda Walsh wrote:
   
I got a really poorly scored piece of spam -- one thing that stood out
as weird was report claimed the sender was in my AWL.

Any sender who has sent mail to you previously will be in your AWL.  
This is probably the most misunderstood component of SA.  Read the wiki.

http://wiki.apache.org/spamassassin/AutoWhitelist
   
   
   At face value, this seems very counter productive.
   
You still aren't understanding the wiki or the AWL scoring or what AWL
is trying to do.

   If I get spam from 1000 senders, they all end up in my
   AWL???
   
yes.   every email+ip address pair that sends you email winds up in
your AWL with an average score for that pair.  This is ok.

   WTF?
   
   AWL should only be added to by emails judged to be 'ham' via
   the feed back mechanisms --, spammers shouldn't get bonuses for
   being repeat senders...
   
You are getting too attached to the 'whitelist' part of the name.
Pretend AWL stands for average weighting list.

   How do I delete spammer addresses from my 'auto-white-list'?
   
   (That's just insane..whitelisting spammers?!?!)

AWL isn't whitelisting spammers.   It is pushing the score to the
average for that sender.   The sender can have a high average or a low
average.   

If the previous email from a particular sender was FP or FN then AWL
will have an incorrect average and will wind up doing or trying to do
the wrong thing with subsequent email for that sender.

You can remove addresses using spamassassin --remove-from-whitelist

-jeff


Re: spamassassin runs razor spamc not

2009-05-22 Thread Jeff Mincy
   From: Mester mes...@freemail.hu
   Date: Fri, 22 May 2009 14:52:08 +0200
   
Check in the ~/.spamassassin/user_prefs file for the user that runs
amavisd-new.  I know the Mandriva package has that set to 'use_razor2
0', so I always have to hunt it down and fix it.
I had no use_razor2 line in the ~amavis/.spamassassin/user_prefs file
but after appending these lines to the file:
use_razor2
razor_config /var/lib/amavis/.razor/razor-agent.conf
and restarting both amavis and spamassassin nothig has changed.

Then, you need to run some of the amavisd-new debugs

I believe the syntax is

[amav...@foo]$ /usr/sbin/amavisd debug-sa plugin
   
   It worked. And now I found the error: amavis user couldn't read the 
   /var/log/razor-agent.log file. I modified the owner of that file to 
   amavis and now I see the check lines in that file.
   
   Is there a way to instruct spamassassin to write the razor, pyzor and 
   dcc check's result to every e-mail's header an not only for spams?

SpamAssassin has add_header that can be used for Pyzor and DCC.

  add_header all Pyzor _PYZOR_
  add_header all DCC _DCCB_; _DCCR_

I don't know how headers are added in amavis.
-jeff


Re: learning from IMAP spam collection

2009-05-19 Thread Jeff Mincy
   From: Michael Monnerie michael.monne...@is.it-management.at
   Date: Tue, 19 May 2009 09:34:53 +0200
   
   On Sonntag 17 Mai 2009 Michael Monnerie wrote:
Why is it so extremely
slow and CPU consuming just to remove any existing markups?
   
   There really seems to be no other way than calling spamassassin -d to 
   remove existing markups. I guess I will create an account where a script 
   takes all messages from folder X, removes markup, and stores to Y. Like 
   this, I don't mind too much how long it takes. It's still a PITA that 
   there's no quick spamc like way to remove markups.
   
You can use formail to remove headers.  It is way faster than spamassassin -d.
The only trick is listing all of the headers that can be added by
SpamAssassin.

formail -b -t -I X-Spam-Status: -I X-Spam-Flag: -I X-Spam-Checker-Version: -I 
X-Spam-Rbl: -I X-Spam-Pyzor: -I X-Spam-DCC: -I X-Spam-Level: -I X-Spam-Bayes: 
-I X-Spam-Relay: -I X-Spam-Report: -I X-Spam-AWL: -I X-Spam-Karma: -I 
X-Spam-ASN: -I X-Spam-CRM114: -I X-Spam-Relay-Country:   msg

-jeff


Re: whitelist_from_spf

2009-05-14 Thread Jeff Mincy
   From: Alvaro Marín alv...@hostalia.com
   Date: Thu, 14 May 2009 13:30:49 +0200

   It seems that there is a problem resolving DNS records of that domain so I
   want to whitelist it. If I add:
   
   whitelist_from_spf *...@orange.es
   
   It's ignored by SA, as the log says.
   Reviewing code of SPF.pm from SpamAssassin, I see:
   
 # if the message doesn't pass SPF validation, it can't pass an SPF
 ...
   
   So, which is the purpose of this whitelist feature? If the SPF check fails,
   it can't do whitelist?
   
Yes.  The whitelist check is done after the SPF check.  Anybody can
have a SPF record.  SPF just means that the message is genuine = not
forged.  You can get genuine spam.  If you aren't getting SPF_PASS on
the message then whitelist_from_spf won't do anything.

If you are getting SPF_PASS on email from other domains then the
domain you are trying to whitelist probably does not have spf setup.

-jeff


Re: Properly integrating clamAV into SpamAssassin

2009-05-04 Thread Jeff Mincy
   From: Adam Katz antis...@khopis.com
   Date: Sun, 03 May 2009 18:47:21 -0400

   I am under the impression that virus checking is *not* that much easier
   than a fully-loaded SA implementation, so therefore spam detection
   should run first.  Counter-point:  online lookups cost bandwidth and
   latency, virus detection doesn't (yet) require any.

Have you timed ClamAV?  It is essentially free.  On my machine I
get 100 ClamAV virus scans per second, which is *way* faster than
SpamAssassin.

   Pause.  Constructive comments and criticisms?

I disagree with your premise...

Time ClamAV and your fully-loaded SA implementation on a set of
messages.   You can time SpamAssassin with and without network tests
for a more complete picture.
   
   Don't get too caught up in the above part, it is all illustrative in
   getting to my question below.
   
   Mail that passes SpamAssassin but gets caught by ClamAV would add value
   to SA's Bayesian and AWL databases and thus the message stands a chance
   at getting caught in the future regardless of its viral content.
   
Feeding virus email into SpamAssassin Bayes seems like a bad idea to
me.  The bayes tokens aren't going to be all that useful for catching
non virus spam.

Adding the virus email into AWL seems somewhat reasonable since any
further email from the same IP address is likely to be another virus
or botnet spam.  However, in practice any botnet spam will use
different random email addresses so you probably won't get any awl
hits on the AWL addresses learned from virus email.

-jeff


Re: Almost no score

2009-05-01 Thread Jeff Mincy
   From: Charles Gregory cgreg...@hwcn.org
   Date: Fri, 1 May 2009 10:48:00 -0400 (EDT)
   
   Uh, what do these 'ratware' rules trigger on? 

The rules trigger on spam with a particular Message-Id and boundary pattern.

   How effective are they, and what are the chances of false positives?

For last month the KB_RATWARE_OUTLOOK_08 rule hits 
21% of spam (4665 hits out of 21748 spam).   It works great here.
I haven't seen any FP.  Your mileage may vary.

I got the rules from Karsten's sandbox:
http://svn.apache.org/viewvc/spamassassin/rules/trunk/sandbox/kb/70_misc.cf

I would imagine that these rules will eventually show up in sa-update.
-jeff

   
   On Thu, 30 Apr 2009, LuKreme wrote:
(single lines)
header  KB_RATWARE_OUTLOOK_16  ALL =~ /^Message-Id: 

([0-9a-f]{8})\$([0-9a-f]{8})\$.{100,400}boundary==_NextPart_000__\1\.\2/msi
 
# 
   
header  KB_RATWARE_OUTLOOK_12  ALL =~ /^Message-Id: 

([0-9a-f]{8})\$([0-9a-f]{4})[0-9a-f]{4}\$.{100,400}boundary==_NextPart_000__\1\.\2/msi
 
# 
   
header  KB_RATWARE_BOUNDARYALL =~ /^Message-Id: 

([0-9a-f]{8})\$[0-9a-f]{8}\$.{100,400}boundary==_NextPart_000__\1\./msi
 
# 
   
score KB_RATWARE_BOUNDARY 2.0
score KB_RATWARE_OUTLOOK_16 0.1
   
   
-- 
Exit, pursued by a bear.
   


Re: 'anti' AWL

2009-04-29 Thread Jeff Mincy
   From: Charles Gregory cgreg...@hwcn.org
   Date: Wed, 29 Apr 2009 14:31:22 -0400 (EDT)
   
   
   I just turned off my AWL today, because of FP issues but
   
f...@example.com sends me lots of mail.  Say it's over 100.  It's all ham 
and 
it all comes from mail.example.com. The AWL for this email couplet is , 
say 
-2.1.  An email comes in from f...@example.com but sent from 
spam.spammer.tld 
and score 7.0.  It gets an additional, say, .42 (20% of the AWL) to score 
7.42 instead. Now, another mail from f...@example.com comes in from 
mail.spam2.tld, this one scores 4.3. It gets a +.42 for missing the match 
on 
mail.example com, and gets a +.288 for missing the match on 
spam.spammer.tld
   
   This sounds like an attempt to mimic the effects of SPF records by noting 
   which servers send most of the mail for a given address. Sadly, this 
   logic breaks down when the spammers 'get there first' and/or send a 
   greater volume of mail than the genuine sender. Admittedly the latter 
   situation is a low probability for any single sender, but in the big 
   picture, *someone* is getting their AWL reputation trashed every time a 
   spammer forges their e-mail.

AWL stores the IP/16 address with the email address.   So your awl
reputation is not being trashed by forged e-mail that comes from a
different IP address.
   
   Just this Monday I had a phishing attack againstmy clients, with *dozens* 
   of e-mails, all purporting to come from ME that came from the *same* 
   server! In this case, as I only send a half dozen messages per month from 
   that account, the spammer would get the favored rating?

Only if the spammer uses the same server that you do.
-jeff


Re: 'anti' AWL

2009-04-28 Thread Jeff Mincy
   From: LuKreme krem...@kreme.com
   Date: Tue, 28 Apr 2009 08:43:46 -0600
   
   OK, working on my first cup of coffee this morning, so maybe this has  
   potential.
   
   The way the AWL works is by keeping track of the origin of emails,  
   both the address and the server (the top line Received header?) that  
   send the email.  So, lets say that I have a lot of email from 
f...@example.com 
 and that foo's email is sent to me via mail.example.com.
   
   Now, I get an email claiming to be from f...@example.com but sent to me  
   from suspiciousserver.tld, so the AWL is not applied.
   
Your idea will FP anytime anybody adds a new email device or the ISP
changes (etc).

You could use the sagrey plugin to add a point to email from a new
email address+ip pairs.

-jeff


Re: AWL and FP's....

2009-04-22 Thread Jeff Mincy
   From: Charles Gregory cgreg...@hwcn.org
   Date: Wed, 22 Apr 2009 15:56:53 -0400 (EDT)
   
   Just curious if anyone has ever found a 'clean' way to handle the 'damage' 
   done to the AWL when someone's mail is blocked by a false positive, and 
   the sender is stupid enough to keep retrying the offending mail?

Meaning that the first message from the sender was incorrectly marked
as spam and AWL then made sure that all subsequent messages from the
same sender were also marked as spam?
   
The easiest way to fix it is to smash the AWL entry with spamassassin
--add-to-whitelist or remove the AWL entry using --remove-from-whitelist.

   I would rather not turn off AWL. I like the way it gives a negative score 
   bias to frequent correspondents. But is there a (sub)setting to allow me 
   to permit the negative bias, but *not* allow it to add a positive one?
   
Nope - the only thing you can do is set the factor which acts on both
positive and negative scores.

   And while I'm at it, can anyone verify whether 'constantcontact' is really 
   a legit mail service or a spam haven? That's the FP that caused this 
   issue
   
they do email for various organizations.

-jeff


Re: use_auto_whitelist error in lint

2009-04-09 Thread Jeff Mincy
   From: realshock wael.alt...@gmail.com
   Date: Thu, 9 Apr 2009 06:56:05 -0700 (PDT)
   
   Matt Kettler-3 wrote:
Find out where else you've got use_auto_whitelist 0 in your config,
and remove it. 
On the plus side, it does confirm you've correctly disabled the plugin.
   
   I searched all over the place, and following your directions, do you think
   this command will find where it is?
   # grep -iR use_auto_whitelist /*

spamassassin -D --lint prints out the config files, eg:
  spamassassin -D --lint 21 | fgrep 'config: read file'

The use_auto_whitelist is in one of those config files.
-jeff


Re: need help - procmail spamassassin

2009-04-04 Thread Jeff Mincy
   From: sebast...@debianfan.de sebast...@debianfan.de
   Date: Sun, 05 Apr 2009 01:56:38 +0200
   
   Hello,
   
   i am filtering mails with spamassassin  procmail.
   
This is more of a procmail question, so it doesn't actually belong here.

   The header of message
   
   X-Spam-Level: **
   
   I want to sort mails into some different directories.
   
   10 or more -- directory 10
   9 -- directory 9
   
   and so one
   
Do you really want that many different mail folders?   Wouldn't low=5,
mid=10 and high=15 be sufficient?

   But - nothing happens - the mails are all in the /Maildir/new directory
   why ?
   
The .*\( part.

   :0:
   * ^X-Spam-Level: .*\(\*\*\*\*\*\*\*\*\*\*
   Maildir/10/new

You don't need the .* and you don't want the \(

* ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*

Also, You can use the numeric score directly.

For example, you can set X_SPAM_SCORE in a procmail recipe the be the
number following score= on the X-Spam-Status line.

 X_IS_SPAM=Unknown
 X_SPAM_SCORE=
 :0
 * ^X-Spam-Status: \/.*
 {
   :0
   * ^X-Spam-Status: \/(Yes|No|YES|NO|Skipped)
   { X_IS_SPAM=$MATCH }

   :0
   * ^X-Spam-Status: (Yes|No|YES|NO)[, ]+(hits|score)=\/([-0-9.]+)
   { X_SPAM_SCORE=$MATCH }
 }

Then you can do recipes like this that matches spam scoring 12.5 or higher.

 SPAM_CUTOFF=12.499
 :0
 * X_IS_SPAM ?? (Yes|YES)
 *$ -$SPAM_CUTOFF ^0
 *$  $X_SPAM_SCORE ^0
 somefolder
   
   :0:
   * ^X-Spam-Level: .*\(\*\*\*\*\*\*\*\*\*\*
   Maildir/10/new
   
   :0:
   * ^X-Spam-Level: .*\(\*\*\*\*\*\*\*\*\*
   X-Spam-Level: ***
   Maildir/9/new

You don't want the extra 'X-Spam-Level: ***' line here.

-jeff


Re: New kind of spam

2009-03-31 Thread Jeff Mincy
   From: Arvid Ephraim Picciani a...@exys.org
   Date: Tue, 31 Mar 2009 12:33:49 +0200
   
What do you mean its impossible to train bayes?
   
   i was assuming the random text at the end is what couses my bayes db to 
   behave randomly.
   
Random text that occurs only in spam rapidly becomes a spam sign.  Random
spam text that also occurs in ham requires a period of adjustment for
Bayes, but eventually Bayes figures it out.

Bayes really can be trained to deal with this message.
For example, I get BAYES_95:
   
   well i get 00
   
An occasional spam getting a low bayes score is ok, but lots
of spam getting BAYES_00 is a problem.

Train Bayes with more spam messages and correct any incorrectly learned
messages.

After I learn this message the probability increases to BAYES_99
   
   yes, for that specific message.  what exactly is the point of learning 
   specific messages when the next one will be different anyway.

Perhaps you are missing the point of bayes.  I got bayes_95 on the
message before training on the message.  My SpamAssassin hadn't seen
the message before, but it had trained on similar spams.
Bayes breaks the message up into various tokens, some of tokens from
this or any spam message will be repeated in other spam messages.

  % wget -O - -q http://codepad.org/W53onqK9/raw.txt | spamc | /bin/fgrep 
--text X-Spam-Bayes
  X-Spam-Bayes: bayes=1., N=50(47-2+29), ham=(sort, doing), 
spam=(UD:spaces.live.com, UD:live.com, UD:entry, dawn, 
HX-Mozilla-Status2:)
   
   interestingly i dont have that header.
   i'll check docs.

The X-Spam-Bayes header was added with
  add_header all Bayes bayes=_BAYES_, 
N=_BAYESTC_(_BAYESTCLEARNED_-_BAYESTCHAMMY_+_BAYESTCSPAMMY_), 
ham=(_HAMMYTOKENS(5,short)_), spam=(_SPAMMYTOKENS(5,short)_)

-jeff


Re: New kind of spam

2009-03-30 Thread Jeff Mincy
   From: Arvid Ephraim Picciani a...@exys.org
   Date: Wed, 25 Mar 2009 16:59:58 +0100
   
   http://codepad.org/W53onqK9
   
   i gave on this kind of spam.  its impossible to train bayes and changing 
   to fast to make custom rules. ...
   
What do you mean its impossible to train bayes?
Bayes really can be trained to deal with this message.
For example, I get BAYES_95:

  wget -O - -q http://codepad.org/W53onqK9/raw.txt | spamc | /bin/fgrep --text 
X-Spam-Bayes
  X-Spam-Bayes: bayes=0.9679, N=50(29-2+11), ham=(sort, doing), 
spam=(UD:spaces.live.com, UD:live.com, UD:entry, dawn, 
HX-Mozilla-Status2:)

After I learn this message the probability increases to BAYES_99

  % wget -O - -q http://codepad.org/W53onqK9/raw.txt | sa-learn --spam
  Learned tokens from 1 message(s) (1 message(s) examined)
  % sa-learn --sync
  % wget -O - -q http://codepad.org/W53onqK9/raw.txt | spamc | /bin/fgrep 
--text X-Spam-Bayes
  X-Spam-Bayes: bayes=1., N=50(47-2+29), ham=(sort, doing), 
spam=(UD:spaces.live.com, UD:live.com, UD:entry, dawn, 
HX-Mozilla-Status2:)

Note that Bayes has determined that UD:spaces.live.com is a spam sign.

The X-Spam-Bayes header is added with
  add_header all Bayes bayes=_BAYES_, 
N=_BAYESTC_(_BAYESTCLEARNED_-_BAYESTCHAMMY_+_BAYESTCSPAMMY_), 
ham=(_HAMMYTOKENS(5,short)_), spam=(_SPAMMYTOKENS(5,short)_)

-jeff


RE: Server overload, queuing for SA possible?

2009-03-26 Thread Jeff Mincy
   From: Bowie Bailey bowie_bai...@buc.com
   Date: Thu, 26 Mar 2009 08:48:30 -0500
   
   Brian J. Murrell wrote:
On Wed, 2009-03-25 at 15:01 -0400, Michael Scheidell wrote:
 
 Match your MTA processes to the spamd children.  Your MTA will send
 4xx 'busy now, come back to play later' message.  Let the sending
 MTA queue it back up (or zombies will just go away)

I don't really see that as a socially responsible action.  If my
mailserver was completely loaded to the point of not even being able
to queue a message, I'd buy pushing back on the sender with a 4xx,
but the reality is that while I may have maxed out my spamd children,
I can likely still receive and queue mail locally.

The queueing up of mail to spamd really belongs on the local server,
and should not become a burden on sending MTAs.
   
   This really depends on where you are running SA in the delivery process.
I'm kinda gathering that this is not possible within spamassassin
itself.  Probably in fact it is for at least some MTAs but how to
achieve it becomes MTA specific and OT here.
   
   SA is not capable of any sort of queuing.  If you need that, you will
   have to make your MTA do it one way or another.

The spamassassin executable doesn't queue - it just starts up a new
process each time it scans a message.

However, spamd queues connections when all of the children are busy
processing messages.

From the spamd man page:

   -m number , --max-children=number
   This option specifies the maximum number of children to spawn.
   Spamd will spawn that number of children, then sleep in the
   background until a child dies, wherein it will go and spawn a new
   child.

   Incoming connections can still occur if all of the children are
   busy, however those connections will be queued waiting for a free
   child.  The minimum value is 1, the default value is 5.

As long as messages are processed reasonably quickly everything will
be fine.  If spamd takes too long to process messages then the MTA
will start timing out (like 2-10 minutes).  What happens then is up to
the MTA.
-jeff


RE: Server overload, queuing for SA possible?

2009-03-26 Thread Jeff Mincy
   From: Bowie Bailey bowie_bai...@buc.com
   Date: Thu, 26 Mar 2009 09:55:45 -0500
   
   Jeff Mincy wrote:
   From: Bowie Bailey bowie_bai...@buc.com
   Date: Thu, 26 Mar 2009 08:48:30 -0500

   Brian J. Murrell wrote:
On Wed, 2009-03-25 at 15:01 -0400, Michael Scheidell wrote:

 Match your MTA processes to the spamd children.  Your MTA will
   send   4xx 'busy now, come back to play later' message.  Let the
   sending   MTA queue it back up (or zombies will just go away)
   
I don't really see that as a socially responsible action.  If my
mailserver was completely loaded to the point of not even being
   able  to queue a message, I'd buy pushing back on the sender with
   a 4xx,  but the reality is that while I may have maxed out my
   spamd children,  I can likely still receive and queue mail
   locally. 
The queueing up of mail to spamd really belongs on the local
   server,  and should not become a burden on sending MTAs.

   This really depends on where you are running SA in the delivery
   process.  I'm kinda gathering that this is not possible within
   spamassassin  itself.  Probably in fact it is for at least some
   MTAs but how to  achieve it becomes MTA specific and OT here.

   SA is not capable of any sort of queuing.  If you need that, you
   will have to make your MTA do it one way or another.

The spamassassin executable doesn't queue - it just starts up a new
process each time it scans a message.

However, spamd queues connections when all of the children are busy
processing messages.

From the spamd man page:

   -m number , --max-children=number
   This option specifies the maximum number of children to
   spawn. Spamd will spawn that number of children, then
   sleep in the background until a child dies, wherein it
   will go and spawn a new child.

   Incoming connections can still occur if all of the
   children are busy, however those connections will be
   queued waiting for a free child.  The minimum value is 1,
the default value is 5. 

As long as messages are processed reasonably quickly everything will
be fine.  If spamd takes too long to process messages then the MTA
will start timing out (like 2-10 minutes).  What happens then is up to
the MTA.
-jeff
   
   Ok, it does queue connections, but that is very limited.  This thread is
   specifically talking about what happens when spamd is taking too long.
   
Yes.   We were getting away from that issue.

The machine may not have enough resources to run the number of spamd
children.  A caching name server helps with throughput.   Some more
details about the machine could be useful as well as details on what
else is happening on the machine when the spamd queue backs up.

   If I'm reading the spamc man page correctly, it will wait 5 minutes for
   spamd to process the message, but it will only wait about 3 seconds for
   a connection to spamd (3 tries with 1 second sleep between them).
   That's not much of a queue.  Or am I missing something?

The --connect-retries=retries and --retry-sleep=sleep options control
connection attempts.   The connection attempt was successful, you are
just waiting for spamd to get around to the message.   If spamd
refuses the connection then spamc will retry a few times.

-jeff


RE: Server overload, queuing for SA possible?

2009-03-26 Thread Jeff Mincy
   From: Bowie Bailey bowie_bai...@buc.com
   Date: Thu, 26 Mar 2009 12:07:23 -0500
   
   Jeff Mincy wrote:

   If I'm reading the spamc man page correctly, it will wait 5
   minutes for spamd to process the message, but it will only wait
   about 3 seconds for a connection to spamd (3 tries with 1 second
   sleep between them). That's not much of a queue.  Or am I missing
   something? 

The --connect-retries=retries and --retry-sleep=sleep options control
connection attempts.   The connection attempt was successful, you are
just waiting for spamd to get around to the message.   If spamd
refuses the connection then spamc will retry a few times.
   
   Ok, so spamd will accept the connection and hold onto it until a child
   process is available.  How many connections can spamd queue?

I dunno.  As I recall, on linux the maximum number of connections is
controlled by some kernel limit, probably 4000.  You'll run out of
something else before you get anywhere near this number.  Of course,
messages will start timing out in spamc if they are not processed fast
enough.

-jeff


Re: Blacklisting Cyrillic

2009-03-26 Thread Jeff Mincy
   From: Kenneth Porter sh...@sewingwitch.com
   Date: Thu, 26 Mar 2009 17:22:21 -0700
   
   I'd like to score anything in Windows-1251 fairly high, as I don't expect 
   to get anything legitimate in that charset. How can I read the charset 
   declared in a Subject header, or in a MIME part, for matching in a rule?
   
   The only tools I see are ok_locales and CHARSET_FARAWAY, but those seem 
   like heavy hammers as they blacklist everything and then require me to 
   whitelist what I want. I'd rather the reverse: let me list which codepages 
   to reject.
   
   I tried this rule but it's not firing and I'm not sure why:
   
   describe KP_CYRILLIC Cyrillic code page
   header   KP_CYRILLIC Subject =~ /Windows-1251/
   scoreKP_CYRILLIC 0.1
   
Try Subject:raw to inhibit decoding?

-jeff


Re: Spam Assassin White List

2009-03-24 Thread Jeff Mincy
   From: Matus UHLAR - fantomas uh...@fantomas.sk
   Date: Tue, 24 Mar 2009 15:30:23 +0100
   
   On 23.03.09 21:58, dsh979 wrote:
I did not realise that items listed on the white list or the black list
would still be subject to the operation/analysis of the SpamAssassin 
Rules.  
   
   all rules are processed unless you play with ShortCircuit plugin. Beware of
   that: It may render the SA useless if you don't knwo what you are doing.
   
You have asked why I have set the required score the 100.  Lengthy
explanation (sorry).  I have done this to prevent SpamAssassin from
inserting SpamWarnings into the header/body of the relevant email.
   
   There's report_safe option to configure that.
   
Also rewrite_header 
   
Q:How can I list items/users on a white list or a black list without 
the
lists (and items) being the subject of further analysis by the SpamAssassin
Rules (and therefore obtaining the same score for each item on the relevant
list, irrespective of the operation of the SpamAssassin Rules, that is
-100=white list items  +100 = black list items)?
   
   I somehow do not understand this question.

He wants the white/black lists to run first and then short circuit.
So anybody in the whitelist gets a score of -100 and anybody in the
blacklist gets a score of +100.  This can probably be done with the
ShortCircuit plugin and setting the priority of the rules so that they
run first.

Black lists aren't all that useful for stopping spam.   The email
addresses are forged in spam.

-jeff


Re: negative scores for spam

2009-03-23 Thread Jeff Mincy
   From: Chris Barnes ch...@txbarnes.com
   Date: Mon, 23 Mar 2009 11:14:37 -0500
   
   Jeff Mincy wrote:
   
Yow.  The negative scoring bayes rules are extremely reliable when well
trained.  Ham messages are not trying to evade the filter.  Defeating
bayes with poison is mostly a myth.  The random garbage might work the
first time but not the second time as long as you are training these
messages as spam.  If you are getting lots of BAYES_00 hits on spam
then the problem is almost certainly incorrect training where spam
messages were incorrectly learned as ham.
   
   Fair enough.
   
   But the problem remains.  A simple glance at this list shows that this 
   happens often enough to be a fairly common problem.
   
   The question is:  How does one fix the problem after it occurs?

The way to fix the problem is to relearn any incorrectly learned
messages.  So any spam message that was incorrectly learned as ham,
either automatically or manually, needs to be correctly relearned as
spam using sa-learn.  You should also learn as spam any spam messages
that hits BAYES_00, or anything less than BAYES_50.  You should also
do the same thing for HAM messages hitting BAYES_50 - BAYES_99.

The more messages that you correctly train the more accurate and
definitive bayes will be.

If you don't have the incorrectly learned messages to retrain then you
can always start over by removing the bayes database files in your
.spamassassin directory.

-jeff


Re: negative scores for spam

2009-03-20 Thread Jeff Mincy
   From: Hoover Chan c...@sacredsf.org
   Date: Fri, 20 Mar 2009 13:55:08 -0700 (PDT)
   
   The threshold was set to 6.6 (cf. required=6.6). The message this
   was attached to was very definitely junk. This kind of situation got
   me curious about the whole thing where any positive spam score is
   set as the threshold but seeing junk mail coming in with negative
   scores.
   
Train BAYES.  The message hit BAYES_00.  You want BAYES_99.  So either
you have incorrectly learned similar messages or you haven't trained
enough.
-jeff
   
   
   -- 
   Hoover Chan c...@sacredsf.org 
   Technology Director 
   Schools of the Sacred Heart 
    Broadway St. 
   San Francisco, CA 94115
   
   
   - Rick Macdougall ri...@ummm-beer.com wrote:
   
Hoover Chan wrote:
 Can someone point me to what I can do to my Spam Assassin config for
a situation like the following?
 
 X-Spam-Status: No, score=-1.496 tagged_above=-10 required=6.6
  tests=[AWL=-1.103, BAYES_00=-2.599, HTML_MESSAGE=0.001,
  URIBL_BLACK=1.955, URIBL_GREY=0.25]
 
 That is, a positive score criterion with a spam message that comes
out with a negative number.
 

Errr

-1.103 - 2.599 + 0.001 + 1.955 + 0.25 = -1.49600

Where do you see that it should be positive ?

Regards,

Rick


Re: negative scores for spam

2009-03-20 Thread Jeff Mincy
   From: Jesse Stroik jstr...@ssec.wisc.edu
   Date: Fri, 20 Mar 2009 16:14:39 -0500
   
   Hoover Chan wrote:
The threshold was set to 6.6 (cf. required=6.6). The message this was 
attached to was very definitely junk. This kind of situation got me curious 
about the whole thing where any positive spam score is set as the threshold but 
seeing junk mail coming in with negative scores.
   
   You are getting negative scores for auto white list and for bayes_00. 
   It's a matter of taste and what you believe makes sense, but I don't 
   consider bayes to be all that accurate (since there are methods for 
   defeating bayes, poisoning bayes, etc).  As such, I don't allow Bayes to 
   assign negative scores or positive scores within a couple of points of 
   the threshold.  You can do so by assigning scores like this:
   
   score BAYES_00  0
   score BAYES_05  0
   score BAYES_20  0
   score BAYES_40  0
   
Yow.  The negative scoring bayes rules are extremely reliable when well
trained.  Ham messages are not trying to evade the filter.  Defeating
bayes with poison is mostly a myth.  The random garbage might work the
first time but not the second time as long as you are training these
messages as spam.  If you are getting lots of BAYES_00 hits on spam
then the problem is almost certainly incorrect training where spam
messages were incorrectly learned as ham.

   I also disable AWL since a lot of spam, especially the stuff most likely 
   to be tested against spamassassin, will like use known good email 
   addresses from your domain as the from address.  This is fairly likely 
   to hit on the AWL.

Yow again.   AWL uses email address and the IP address.  So forged
email addresses used in spam is not going to use the same EMAIL+IP
pair as legitimate email using the same email address.
   
   Again, it's just a matter of taste and it all depends on how you've set 
   up your scoring.  I'm pretty cautious to ensure there aren't false 
   positives as that would decrease the value of spamassassin greatly for 
   us, but I otherwise avoid AWL and Bayes negative scores.
   
   If you sent us a copy of the spam, we could test it and show you what 
   should be hitting.

Use pastebin instead.

-jeff


Re: SpamAssassins bayes mechanism and message headers

2009-03-18 Thread Jeff Mincy
   From: Matt Kettler mkettler...@verizon.net
   Date: Tue, 17 Mar 2009 21:30:02 -0400
   
   fl...@pbartels.info wrote:
Hello,
   
instead of disabling a lot possibly set message headers using
bayes_ignore_header and ending up in strange configs like:
   
bayes_ignore_header Return-Path
   ...
(found on the net)
   Where?
   
shouldn't SpamAssassins bayes mechanism just ignore the complete
message header and just look at the body?
This seems useful in my opinion.
   It seems like a very misguided idea to me.
   
   Is there any reason to think headers make bad tokens?
   Do you have any test data showing this improves your bayes accuracy?

Yes - I think some headers make extremely bad tokens for bayes, for
example the X-Mailer/User-Agent headers.   40% of the spam I get
claims to  have Microsoft Outlook as a x-Mailer.   So bayes rapidly
determines that *UAMicrosoft (etc) is an extremely strong token.
These *UA tokens were enough to push a short ham message to BAYES_99.
When I added an bayes_ignore_header the score dropped to ~BAYES_40
Obfuscated words like 'st0ck' are 100% indications of spam (or of
messages that discuss spam), so these words work great for bayes.
A 'X-Mailer: Microsoft Office Outlook' header doesn't really tell you
anything about the message, at least not to the extent that bayes
treats these tokens.

The Message-ID tokens are also low quality tokens.  Most of these
tokens are hapaxes that are never used by other messages.  These just
fill up the bayes database.  Maybe if the Message-ID tokens were even
more processed then maybe these could be more useful for bayes - eg -
replace 1234.56789 with a format %4d.%5d, or throw out all of the
timestamp numbers and keep the just the stuff after the @.
-jeff


Re: SpamAssassins bayes mechanism and message headers

2009-03-18 Thread Jeff Mincy
   From: Greg Troxel g...@ir.bbn.com
   Date: Wed, 18 Mar 2009 15:33:31 -0400
   
   Jeff Mincy j...@delphioutpost.com writes:
   
   From: Matt Kettler mkettler...@verizon.net
   Date: Tue, 17 Mar 2009 21:30:02 -0400
   
shouldn't SpamAssassins bayes mechanism just ignore the complete
message header and just look at the body?
This seems useful in my opinion.
   It seems like a very misguided idea to me.
   
   Is there any reason to think headers make bad tokens?
   Do you have any test data showing this improves your bayes accuracy?
   
Yes - I think some headers make extremely bad tokens for bayes, for
example the X-Mailer/User-Agent headers.   40% of the spam I get
   
   I think I'm having a similar problem, where I get spam via a
   mailinglist, and bayes gives the spam credit for having similar headers
   to the ham which arrives on the list.  I'm not so concerned about
   including the headers as they arrive at the list server, but all the
   headers added from receipt by the list server seem inappropriate.
   
   I'll try bayes_ignore_header.

Scanning mailing list email is more trouble that it's worth.  It can
be done, but you have to be very motivated and it is a lot of work to
maybe catch a few mailing list spam messages.

Bayes needs to ignore any headers and any special footer tokens added by
the mailing list postings.  You need to extend trusted_networks to the
mailing list so that various tests are done on the submitter instead of
the mailing list.  DCC should be whitelisted for most mailing lists
since the email messages are bulk.  Any automatic reporting needs to be
turned off.  I'm sure there are other things that I'm forgetting.

If the mailing list has reasonably good spam filtering then just skip
running SpamAssassin.

-jeff


Re: SpamAssassins bayes mechanism and message headers

2009-03-18 Thread Jeff Mincy
   From: Matt Kettler mkettler...@verizon.net
   Date: Wed, 18 Mar 2009 19:49:53 -0400
   
   Jeff Mincy wrote:
   From: Matt Kettler mkettler...@verizon.net
   Date: Tue, 17 Mar 2009 21:30:02 -0400
   
   fl...@pbartels.info wrote:
Hello,
   
instead of disabling a lot possibly set message headers using
bayes_ignore_header and ending up in strange configs like:
   
bayes_ignore_header Return-Path
   ...
(found on the net)
   Where?
   
shouldn't SpamAssassins bayes mechanism just ignore the complete
message header and just look at the body?
This seems useful in my opinion.
   It seems like a very misguided idea to me.
   
   Is there any reason to think headers make bad tokens?
   Do you have any test data showing this improves your bayes accuracy?
   
Yes - I think some headers make extremely bad tokens for bayes, for
example the X-Mailer/User-Agent headers.   40% of the spam I get
claims to  have Microsoft Outlook as a x-Mailer.   So bayes rapidly
determines that *UAMicrosoft (etc) is an extremely strong token.
These *UA tokens were enough to push a short ham message to BAYES_99.
When I added an bayes_ignore_header the score dropped to ~BAYES_40
  
   That seems rather extraordinarily strange. Did the messages match no
   other tokens at all?  (ie: did you run it through spamaassassin -D bayes
   before and after?)
   
This was the X-Spam-Bayes header that was added at the time:
   X-Spam-Bayes: bayes=1., N=27(19-0+13), ham=(), spam=(HTo:U*mincy, 
HTo:D*com, HTo:D*rcn.com, H*F:D*net, H*UA:Build)

This header was added using:
   add_header all Bayes bayes=_BAYES_, 
N=_BAYESTC_(_BAYESTCLEARNED_-_BAYESTCHAMMY_+_BAYESTCSPAMMY_), 
ham=(_HAMMYTOKENS(5,short)_), spam=(_SPAMMYTOKENS(5,short)_)


So, there are 27 tokens, 0 hammy, 13 spammy.

   I'd be very interested in what's going on there, because it makes very
   little sense unless the message really matched very, very little other
   existing training.
   
3 of the top 5 spammy tokens eg: HTo:U*mincy, HTo:D*com, HTo:D*rcn.com
come from the To: mi...@rcn.com header.  The  H*UA:Build came from a
  'X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0)'
header.  As I recall, there were various H*UA:Outlook etc headers.

Bayes was 100.000% sure that this message was spam based on the To,
X-Mailer, and From headers.  The envelope on all email message that I
read at home are addressed to mi...@rcn.com (ignoring for the moment
that mi...@starpower.net also happens to get to me).  The 'To:' header
is either going to be mi...@rcn.com or some made up email address that
will never be repeated or it is my email address. So Bayes will see my
email address in both spam and ham.  At the time more than 80% of
email I was getting at rcn.com was spam so, To: mi...@rcn.com was
turned into three strong spam tokens.  My real mi...@rcn.com email
address in the To header says nothing about the spamminess of the
message.  This is in contrast to the mi...@starpower.net email address
which is almost certainly spam and has been added to the
blacklist_to).  So my solution was to add 'bayes_ignore_header To
From' and use blacklist_to/blacklist_from for the suspect email
addresses.  I came up with similar justification for adding
'bayes_ignore_header X-Mailer'.

The body of the message was a single sentence asking me about my
primary music software.

If you want to see more detail lets take it off the public mailing
list.

-jeff


Re: Some emails pass spamassassin unprocessed

2009-02-20 Thread Jeff Mincy
   From: Monky promil...@yahoo.de
   Date: Fri, 20 Feb 2009 03:31:14 -0800 (PST)
   
   Hello,
   I am running the Spamd Daemon version 3.2.5 on my Linux web and mail server
   and in general it works well. From time to time (somewhere in between 1-10%
   of all emails) spam passes the filter - but not because spamassassin decides
   that it is ham but because the email never gets processed by spamassassin
   (the header shows no X-Spam at all).

look in the mail log files to see what was happening when messages are
passed through unprocessed.  SpamAssassin could be waiting on lock
files.  For example, Bayes files are locked while an automatic Bayes
expiry runs.

-jeff


Re: vbounce and out of office messages

2009-02-01 Thread Jeff Mincy
   From: Kai Schaetzl mailli...@conactive.com
   Date: Sun, 01 Feb 2009 14:31:17 +0100
   
   Karsten Bräckelmann wrote on Fri, 30 Jan 2009 19:42:16 +0100:
   
FWIW, and to make Michael happy, I just caught one today -- hit another
rule, __BOUNCE_OOO_3. Sadly, it also hit __BOUNCE_AUTO_REPLY. So there's
more to disable...
   
   why? Why disable a rule because of a few FPs? If that rule isn't scored in 
   any way that makes it a threat that is perfectly acceptable. It's the 
   overall behavior of a rule that makes it worth or not worth using it, not 
   a few FPs. Nobody, at least not me, expects these rules to be free of FPs.
   
I use vbounce rules to detect bounce messages that were missed by
various procmail filtering rules.  Any message identified as a bounce
is processed and delivered differently in procmail rules.  So, any
vbounce FP is rather painful.  If you aren't doing anything special
delivering bounce messages then a FP in this rule wouldn't matter very
much.

-jeff


Re: vbounce and out of office messages

2009-02-01 Thread Jeff Mincy
   From: Kai Schaetzl mailli...@conactive.com
   Date: Sun, 01 Feb 2009 17:40:00 +0100
   
   Jeff Mincy wrote on Sun, 1 Feb 2009 10:01:49 -0500:
   
I use vbounce rules to detect bounce messages that were missed by
various procmail filtering rules.  Any message identified as a bounce
is processed and delivered differently in procmail rules.  So, any
vbounce FP is rather painful.
   
   No, it is not, unless you score these rules too high or unless you use the 
   single rules for triggering other actions. That's what SA is all about: 
   scoring. ...

Huh?   You don't want bounces to be processed as regular spam.
If you train bayes on bounces then you are training bayes to detect
bounces and pretty soon SpamAssassin will detect all bounces,
including valid bounces as spam.

This comment is taken from the 20_vbounce.cf file:
 # If you use this, set up procmail or your mail app to spot the
 # ANY_BOUNCE_MESSAGE rule hits in the X-Spam-Status line, and move
 # messages that match that to a 'vbounce' folder.

   ... If you try to (mis-)use it in other ways problems are to be 
   expected. That's not the fault of the vbounce rules.

The purpose of 20_vbounce is to detect and identify bounces so that
you may process bounce messages differently.

So I disagree, any FP in the vbounce rules is the fault of vbounce
rules and prevents these rules from being used as designed.

   AFAIK, the default score for the all BOUNCE rules is 0.1

Right.  If you aren't going to use the vbounce rules for extra processing
then there really isn't any point in running the rules.  The low default
score pretty much guarantees that message classification will not change
one way or the other.

-jeff


Re: profile the various tests being done

2009-01-21 Thread Jeff Mincy
   From:  Brian J. Murrell br...@interlinx.bc.ca
   Date: Wed, 21 Jan 2009 19:15:19 + (UTC)
   
   I'm trying to figure out why in some cases, spamd is taking in excess of 
   1200s to process messages.  Is there any way to profile (i.e. time, or 
   timestamp) each of the tests that spamd is doing so I can see where the 
   longest ones are?

   Even enabling the kind of debug that spamassassin -D produces, along 
   with timestamps for each line of debug would be useful.
   
Somebody else posted this a while back.

Do spamassassin -D  email.txt 21 | timestamp

where timestamp is a .function defined in .bashrc :

  function timestamp()
  { perl -MPOSIX -MTime::HiRes -n -e '
  BEGIN {$|=1; $dp=0; $t0=Time::HiRes::time};
  $t=Time::HiRes::time; $dt=$t-$t0; printf(%s%06.3f %4.3f %4.3f %s,
POSIX::strftime(%H:%M:,localtime($t)), $t-int($t/60)*60,
$dt, $dt-$dp, $_); $dp=$dt' $*
  }

Or pipe it directly to the one liner:

spamassassin -D  email.txt 21 | perl -MPOSIX 

-jeff


Re: Spam with clean URI's which forward to DNSBListed URL (by HTML redirect header)

2009-01-07 Thread Jeff Mincy
   From: Theo Van Dinter felic...@apache.org
   Date: Wed, 7 Jan 2009 11:36:18 -0500
   
   On Wed, Jan 07, 2009 at 04:46:44PM +0100, Florian Lagg wrote:
So - if possible - I want spamassassign to:
1. Request the links in the mail body and check them for http-error 302 or
meta redirects
2. Check the links we got by doing this against some DNSBL's
 
Is this possible? Is there a reason why we shouldn't do this?

You can look at the WebRedirect plugin on 
http://wiki.apache.org/spamassassin/CustomPlugins
   
   Possible?  Sure.
   Should?  Not unless you want to turn your (and anyone else running that 
code's)
   machine into a DDoS client.

   In other words, while it's possible to shoot yourself in the face, it's 
really
   not a good idea to do so.

There are various WARNING: PRIVACY AND TECHNICAL ISSUES listed in the
plugin.   I used the plugin for a while, but stopped using it when the
number of hits dropped off.

-jeff


Re: sa-update damages existing SA installation

2008-12-18 Thread Jeff Mincy
   From: Marcin Krol mrk...@gmail.com
   Date: Thu, 18 Dec 2008 18:37:12 +0100
   
   Hello everyone,
   
   When I run sa-update -D --gpgkey 6C6191E3 --channel 
   sought.rules.yerp.org, it damages my SA installation!
   
sa-update puts rules in /var/lib/spamassassin/VER Once this directory
exists all site rules are expected to come from this directory.  The
previous installation directory (eg /usr/local/share/spamassassin) is
ignored.

Try doing sa-update of the normal rules before you use sa-update of
additional rule sets.
   ...

   And my SA doesn't score any mails anymore! I have to purge the existing 
   SA (dpkg -P spamassassin), reinstall it from scratch, restore conf files 
   from backups and then it works.
   
   WTF! Does anybody know what goes wrong?
   
Use -D to print see which config files is being read by spamassassin:

   % spamassassin --lint -D 21 | fgrep 'config: using'
   [31869] dbg: config: using /etc/mail/spamassassin for site rules pre files
   [31869] dbg: config: using /var/lib/spamassassin/3.001007 for sys rules 
pre files
   [31869] dbg: config: using /var/lib/spamassassin/3.001007 for default 
rules dir
   [31869] dbg: config: using /etc/mail/spamassassin for site rules dir
   [31869] dbg: config: using /home/jeff/.spamassassin/user_prefs for user 
prefs file
   [31869] dbg: config: using 
/var/lib/spamassassin/3.001007/updates_spamassassin_org/empty.pre for 
included file
   [31869] dbg: config: using 
/var/lib/spamassassin/3.001007/updates_spamassassin_org/10_misc.cf for 
included file
   [31869] dbg: config: using 
/var/lib/spamassassin/3.001007/updates_spamassassin_org/20_advance_fee.cf for 
included file

-jeff


Re: White List From RCVD

2008-12-11 Thread Jeff Mincy
   From: mouss mo...@netoyen.net
   Date: Thu, 11 Dec 2008 19:55:44 +0100
   
   Asif Iqbal a écrit :
I have this in local.cf in qmail.here.net's /etc/mail/spamassassin dir

  whitelist_from_rcvd joe.sm...@here.com  qtdenexmbm24.AD.HERE.COM

But email from that address still tagged as spam. What am I doing wrong?

   
   you should run the message through spamassassin -D to see which relays
   are trusted.
   
   or you could get luck with:
   
   always_trust_envelope_sender 1
   
   
If you add a Relay header eg: 
  add_header all Relay trusted=_RELAYSTRUSTED_, untrusted=_RELAYSUNTRUSTED_

Then you want the rdns= from the first untrusted relay.

In this case it is probably:
  whitelist_from_rcvd joe.sm...@here.com here.com

THe whitelist probably wont work for here.com
because of lack of reverse dns.
  Received: from NO?REVERSE?DNS (HELO sudnp799.here.com)

The debug output should confirm this.


RE: about fake mails

2008-12-07 Thread Jeff Mincy
   From: Giampaolo Tomassoni [EMAIL PROTECTED]
   Date: Sun, 7 Dec 2008 15:52:10 +0100
   
-Original Message-
From: Yavuz Maslak [mailto:[EMAIL PROTECTED]
Sent: Sunday, December 07, 2008 3:02 PM

Ok
I have started to use dkim verification.  I defined whitelists in
local.cf.
it works.
But I could not find how I give high score for  a spammer who doesn't
use
gmail's mail servers.

Although a  domain has domain keys, how can I give positive score for a
mail
which comes from a fake smtp server ?
   
   There is no direct way (to my knowledge) to do this.
   
   You have to apply a positive score to all mail claiming to be From: a
   gmail address, then apply a negative score voiding the first one to the
   DKim-verified ones. 
   
You can write a meta rule for email that claims to be from gmail that
does not have DKIM.  

   # add some penalty points to mail from yahoo and gmail.com which
   # does not carry a valid signature; exempt mail from mailing lists
   header __L_ML1   Precedence =~ m{\b(list|bulk)\b}i
   header __L_ML2   exists:List-Id
   header __L_ML3   exists:List-Post
   header __L_ML4   exists:Mailing-List
   header __L_HAS_SNDR  exists:Sender
   meta   __L_VIA_ML(__L_ML1 || __L_ML2 || __L_ML3 || __L_ML4 || 
__L_HAS_SNDR)
   header __L_FROM_Y1   From:addr =~ [EMAIL PROTECTED]
   header __L_FROM_Y2   From:addr =~ [EMAIL PROTECTED](ar|br|cn|hk|my|sg)$}i
   header __L_FROM_Y3   From:addr =~ [EMAIL PROTECTED](id|in|jp|nz|uk)$}i
   header __L_FROM_Y4   From:addr =~ [EMAIL 
PROTECTED](ca|de|dk|es|fr|gr|ie|it|pl|se)$}i
   meta   __L_FROM_YAHOO (__L_FROM_Y1 || __L_FROM_Y2 || __L_FROM_Y3 || 
__L_FROM_Y4)
   header __L_FROM_GMAIL From:addr =~ [EMAIL PROTECTED]
   meta L_UNVERIFIED_YAHOO  (!DKIM_VERIFIED  !DK_VERIFIED  
__L_FROM_YAHOO  !__L_VIA_ML)
   priority L_UNVERIFIED_YAHOO  500
   scoreL_UNVERIFIED_YAHOO  2.5
   meta L_UNVERIFIED_GMAIL  (!DKIM_VERIFIED  __L_FROM_GMAIL  
!__L_VIA_ML)
   priority L_UNVERIFIED_GMAIL  500
   scoreL_UNVERIFIED_GMAIL  2.5

I got these rules from this list.  I added !DK_VERIFIED to
L_UNVERIFIED_YAHOO.

-jeff


Re: Whitelist Dynamic List of IP's

2008-12-04 Thread Jeff Mincy
   From: John Hardin [EMAIL PROTECTED]
   Date: Thu, 4 Dec 2008 13:31:05 -0800 (PST)
   
   On Thu, 4 Dec 2008, Matt wrote:
   
Is there a way to tell Spamassassin to whitelist a dynamic list of
IP's in a file?  I have have a dynamic list of IP's called ./pop_hosts
that have checked email by pop3 within last 15 minutes and I would
like to white list them all if thats possible.  The IP's in the file
are constantly changing though.
   
   Perhaps your MTA can append a pop-auth header (assuming that's how this is 
   being used)...

I have a question related to MTA auth headers.

Our Exim4 MTA adds an auth header, for example:
 Received: from cpe-24-25-182-74.maine.res.rr.com ...
by pinky.delphioutpost.com with esmtps 
(TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32)

The code from Received.pm looks for either ESMTPA or ESMTPSA, but not
ESMTPS.

3.1.7 code branch
  # try to catch authenticated message identifier
  #
  # with ESMTPA, ESMTPSA, LMTPA, LMTPSA should cover RFC 3848 compliant MTAs
  # with ASMTP (Authenticated SMTP) is used by Earthlink, Exim 4.34, and others
  # with HTTP should only be authenticated webmail sessions
  if (/ by .*? with (ESMTPA|ESMTPSA|LMTPA|LMTPSA|ASMTP|HTTP)\;? /i) {
$auth = $1;
  }

3.2.5 has similar code:
  if (/ by /  / with (ESMTPA|ESMTPSA|LMTPA|LMTPSA|ASMTP|HTTP)(?: |$)/i) {
$auth = $1;
  }

Looking at http://www.rfc-archive.org/getrfc.php?rfc=3848

   o  The new keyword ESMTPS indicates the use of ESMTP when STARTTLS
  [1] is also successfully negotiated to provide a strong transport
  encryption layer.

   o  The new keyword LMTPS indicates the use of LMTP when STARTTLS is
  also successfully negotiated to provide a strong transport
  encryption layer.

Shouldn't both ESMTPS and LMTPS be acceptable and included in the regexp?

Thanks.

-jeff


  1   2   >