Re: recent update to __STYLE_GIBBERISH_1 leads to 100% CPU usage

2019-05-29 Thread Karsten Bräckelmann
On Wed, 2019-05-29 at 12:47 +0200, Stoiko Ivanov wrote:
> On Wed, 29 May 2019 11:31:42 +0200 Matthias Egger  wrote:
> > On 28.05.19 10:31, Stoiko Ivanov wrote:
> > > with a recent update to the ruleset, we're encountering certain
> > > mails, which cause the rule-evaluation to use 100% cpu.

Thanks for the report, Stoiko.


> > Your sample just triggered the error and therefore the system started 
> > blowing off partially :-) So next time, please paste that example to 
> > e.g. pastebin or github or some website and link to it ;-)
> 
> Aye - sorry for that! I first wanted to open a bug-report at bugzilla,
> but since the one which dealt with a similar issue contained the
> suggestion to contact the user-list with problems for single rules - I
> did just that - without considering those implications!
> 
> Next time I'll definitely take the pastebin-option!

Both is good advice, filing a bug report as well as generally using
pastebin or similar external method to provide samples...

I see this has been filed in bugzilla by now.


> > But anyway, can you tell me how you found out __STYLE_GIBBERISH_1 is
> > the culprit? I have no clue how to isolate that, since a strace does
> > not really help... Or is there some strace for perl which i do not
> > know?
> 
> hmm - in that case the way to go was to enable a commented out
> debug-statement in the spamassassin source, which lists which rule is
> evaluated. (on 3.4.2 installed on a Debian this is
> in /usr/share/perl5/Mail/Spamassassin/Plugin/Check.pm - in
> do_rawbody_tests - just comment out the if-condition for would_log
> 
> Then you see it in the debug-output

Hmm, curious why that would be commented out.

It's the rules-all debug area feature that should generally be
available since the 3.4 branch, IIRC.

  spamassassin -D rules-all

will then announce regex rules *before* evaluating them, so even long-
running regex rules that do not match are easy to identify.


-- 
Karsten Bräckelmann  -- open source. hacker. assassin.



Re: recent update to __STYLE_GIBBERISH_1 leads to 100% CPU usage

2019-05-29 Thread Karsten Bräckelmann
On Wed, 2019-05-29 at 08:27 +0200, Markus Benning wrote:
> Hi,
> 
> seems to work.
> 
> Had to add
> 
> score __STYLE_GIBBERISH_1 0

That's a non-scoring sub-rule, setting its score to 0 has no effect.
Redefining the rule to disable it is the way to go:

  meta __STYLE_GIBBERISH_1  0

> to my SA config to make your mail pass.


-- 
Karsten Bräckelmann  -- open source. hacker. assassin.



Re: Can't Get Removed From List

2018-02-27 Thread Karsten Bräckelmann
On Mon, 2018-02-26 at 10:13 -0700, Kevin Viner wrote:
> Hi everybody, I have an opt-in mailing list through MailChimp, and follow all
> best practices for my monthly emails. Unfortunately, every time I send out a
> list, I'm getting my fingerprint marked by Razor as spammy. SpamAssassin
> advice is: 

The following text is not SA "advice" nor report.

You should start by consulting who / what gave that text in response to
get details.


> "You're sending messages that people don't want to receive, for example
> "=?utf-8?Q?=E2=9D=A4=C2=A0Valentine=27s=20Day=20Mind=20Reading?=".  You need
> to audit your mailing lists."
> 
> The problem I'm having is that I'm not receiving any abuse reports through
> MailChimp, I follow all best practice sending guidelines, and am not sending
> out spammy emails. I'm a professional entertainer with a fairly large list.
> 
> Cloudmark has been helpful in resetting my fingerprint upon request, but
> this has become an ongoing monthly problem that they don't seem to be
> interested in resolving with me. Please advise, as nobody seems to be able
> to tell me what is happening. I have a monthly email database of 10,000+, so
> if there are 1 or 2 complaints happening (which MailChimp isn't even
> seeing), it seems like a 0.1% or less rate of complaints isn't anything I
> can really do something about. And every time I'm flagged, I start having
> issues sending out emails in my day to day work.

-- 
Karsten Bräckelmann  -- open source. hacker. assassin.


Re: FROM header with two email addresses

2017-10-24 Thread Karsten Bräckelmann
On Tue, 2017-10-24 at 13:22 +0200, Merijn van den Kroonenberg wrote:
> > Hello all, I was the original poster of this topic but was away for a
> > couple of days.
> > I find it amazing to see the number of suggestions and ideas that have
> > come up here.
> > 
> > However none of the constuctions matched "my" From: lines of the form
> > 
> > From: "Firstname Lastname@"  > sendern...@real-senders-domain.com
> > <mailto:sendern...@real-senders-domain.com>>

> My comments in this mail are only about the
> "us...@companya.com" <us...@companyb.com>
> situation, not about actual double from addresses.

Indeed, in this thread multiple different forms of "email address alike
in From: sender real name" have surfaced. This type is occasionally
used to try to look legit by using real, valid addresses of the
recipient's domain (a colleague) instead of a real name, wich is harder
to get correct and easier for humans to spot irregularities in.

The OP's form looks like a broken From header and an intermediate SMTP
choking on and rewriting it.


-- 
Karsten Bräckelmann  -- open source. hacker. assassin.


Re: Sender needs help with false positive

2017-08-07 Thread Karsten Bräckelmann
On Mon, 2017-08-07 at 19:15 -0400, Alex wrote:
> > version=3.4.0
> 
> Version 3.4.0 is like ten years old. I also don't recall BAYES_999
> being available in that version, so one thing or the other is not
> correct.

Minor nitpick: 3.4.0 was released in Feb 2014, slightly less than 10
years ago. ;)  But that's code only anyway, with sa-update rules'
version and age are kept up-to-date independently.

Similarly the BAYES_999 test indeed is not part of the original 3.4.0
release. It has been published via sa-update though, and even older
3.3.x installations with sa-update have that rule today.

The check_bayes() eval rule always supported the 99.9% variant, it's
just a float number less than 1.0...


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Results of Individual Tests on spamd "CHECK"

2017-08-07 Thread Karsten Bräckelmann
On Mon, 2017-08-07 at 14:17 -0500, Jerry Malcolm wrote:
> I tried SYMBOLS.  You are correct that it lists the tests, but not the 
> results:
> 
> BAYES_95,HTML_IMAGE_ONLY_32,HTML_MESSAGE,JAM_DO_STH_HERE,LOTS_OF_MONEY,MIME_HTML_ONLY,
>  [...]
> 
> But I saw this line in a forum discussion... So I'm sure there is some 
> way to generate it.
> 
>  >>> tests=[AWL=-1.103, BAYES_00=-2.599, 
> HTML_MESSAGE=0.001,URIBL_BLACK=1.955, URIBL_GREY=0.25]
> 
> Any ideas?

That particular one appears to be part of the Amavisd-new generated
headers. You can get the same rules with individual scores in stock SA
using the _TESTSSCORES(,)_ Template Tag with the add_header config
option. See M::SA::Conf docs [1].

For ad-hoc testing without adding this to your general SA / spamd
configuration, feed the sample message to the plain spamassassin script
with additional --cf configuration:

  spamassassin --cf="add_header all TestsScores tests=_TESTSSCORES(,)_"  < 
message

Also see 10_default_prefs.cf for more informational detail in the stock
Status header.


> On 8/7/2017 1:13 PM, Daniel J. Luke wrote:
> > On Aug 7, 2017, at 2:00 PM, Jerry Malcolm  wrote:
> > > I'm invoking spamd using:
> > >
> > > CHECK SPAMC/1.2\r\n
> > > 

Not your best option for ad-hoc tests... ;)

> > > Can someone tell me what I need to add to the spamd call (and the
> > > syntax) in order to get the results of the individual tests
> > > returned as part of the status?

You will need SA configuration. The spamd protocol itself does not allow
such fine grained configuration.


[1] http://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Is this really the SpamAssassin list? (was Re: unsubscribe)

2014-10-28 Thread Karsten Bräckelmann
On Tue, 2014-10-28 at 11:19 -0700, jdebert wrote:
 On Tue, 28 Oct 2014 04:27:14 +0100
 Karsten Bräckelmann guent...@rudersport.de wrote:
  On Mon, 2014-10-27 at 19:44 -0700, jdebert wrote:

   Redirecting them makes people lazy. Better than annoying but they
   don't learn anything except to repeat their mistakes.
  
  Your assumption, the list moderators (aka owner, me being one of them)
  would simply and silently obey and dutifully do the un-subscription
  for them, is flawed. ;)
 
 This assumption is unwarranted. I did not say that.

You said that the unsubscribe-to-list posting user would not learn and
get lazy, when those posts get redirected to the owner rather than
hitting the list.

Not learning: False. As I said, moderators would respond with
explanation and instructions. In particular learning about his mistake
and how to properly (and in future) unsubscribe, does make him learn.
Since we'd not just unsub him, the user will even have to proof that he
learned, by following procedures unsubscribing himself.

Getting lazy: People are lazy. But since there's absolutely nothing we
would simply do for them, there's no potential in the process to get
lazy over. They will have to read and understand how to do it. And they
will have to follow every step of the unsub procedure themselves.

So if my assumption was really that unwarranted, please explain what
else you did mean with those two sentences.


 Did you read the rest of the message?

Yes. And quite frankly, catching unsub messages and bouncing them with
a note as you mentioned is almost identical to the proposed redirect
them to owner to handle it. With the latter involving moderators,
having the advantage, that we can and will offer additional help if need
be.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: procmail

2014-10-28 Thread Karsten Bräckelmann
On Tue, 2014-10-28 at 22:10 -0400, David F. Skoll wrote:
  frankly in times of LMTP and Sieve there is hardly a need to use 
  procmail - it is used because i know it and it just works - so why 
  should somebody step in and maintain it while nobody is forced to use
  it
 
 I use Email::Filter, not procmail, but tell me: Can LMTP and Sieve do
 the following?

Dammit, this is just too teasing... Sorry. ;)

procmail can do all of those. (Yeah, not your question, but still...)


 1) Cc: mail containing a specific header to a certain address, but only
 between 08:00-09:00 or 17:00-21:00.

Sure. Limiting to specific days or hours can be achieved without
external process by recipe conditions based on our own SMTP server's
Received header, which we can trust to be correct.

 2) Archive mail in a folder called Received-Archive/-MM.

Trivial. See man procmailex.

 3) Take mail to a specific address, shorten it by replacing things
 like four with 4, this with dis, etc. and send as much of the
 result as possible as a 140-character SMS message?  Oh, and only do
 this if the support calendar says that I am on the support pager that
 week.

Yep. Completely internal, given there's an email to SMS gateway
(flashback 15 years ago), calling an external process for SMS delivery
otherwise.

 4) Take the voicemail notifications produced by our Asterisk
 software and replace the giant .WAV attachment with a much
 smaller .MP3 equivalent.

Check. Calling an external process, but I doubt procmail and ffmpeg /
avconv is worse than Perl and the modules required for that audio
conversion.

Granted, in this case I'd need some rather skillful sed-fu in the pipe,
or a little help of an external Perl script using MIME-tools... ;)


 These are all real-world requirements that my filter fulfills.  And it
 does most of them without forking external processes.  (Item 3 actually 
 consults
 a calendar program to see who's on support, but the rest are all handled
 in-process.)

That said, and all joking apart:

Do you guys even remember when this got completely off topic?


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Is this really the SpamAssassin list? (was Re: unsubscribe)

2014-10-28 Thread Karsten Bräckelmann
On Tue, 2014-10-28 at 19:56 -0700, jdebert wrote:
 On Wed, 29 Oct 2014 00:33:04 +0100
 Karsten Bräckelmann guent...@rudersport.de wrote:

 Redirecting them makes people lazy. Better than annoying but
 they don't learn anything except to repeat their mistakes.

Your assumption, the list moderators (aka owner, me being one of
them) would simply and silently obey and dutifully do the
un-subscription for them, is flawed. ;)
   
   This assumption is unwarranted. I did not say that.
  
  You said that the unsubscribe-to-list posting user would not learn and
  get lazy, when those posts get redirected to the owner rather than
  hitting the list.
 
 Not exactly what I said. 

In the part you snipped of my previous post, I asked you to explain what
you did mean, if not what I discussed in detail.

This response is not helpful, neither constructive.


  Not learning: False. As I said, moderators would respond with
  explanation and instructions. In particular learning about his mistake
  and how to properly (and in future) unsubscribe, does make him learn.
  Since we'd not just unsub him, the user will even have to proof that
  he learned, by following procedures unsubscribing himself.
 
 False as evidenced by how the same people repeat the same thing on
 the same list and on other lists. Got it.

Show me an example of one subscriber repeating this mistake on this
list.

Show me an example of one subscriber repeating this mistake on this
list, after the proposed and discussed redirect to owner procedure is
in effect, which is meant to help with the issue.

You cannot possibly show the latter, since it is not yet in effect. So
there is no evidence as you just claimed. Moreover, there is
absolutely no basis to your evidence claim, that directly approaching
those subscribers by moderators would not make them learn.

You'll have a really hard time showing the first, too.

Got it. (Not a native English speaker, what's that supposed to mean in
the context of your quote? Equivalent of a foot-stomp?)


  Getting lazy: People are lazy. But since there's absolutely nothing we
  would simply do for them, there's no potential in the process to get
  lazy over. They will have to read and understand how to do it. And
  they will have to follow every step of the unsub procedure themselves.
 
 The long form of saying we're agreed. And one of the reasons to
 automate the process.

Fun research project for you in strong favor of automation: How many
such posts did this list get in the last month? Statistically irrelevant
spike. Last 6 months? Last year? Two years?

I am a moderator of this list. I do know that handling those bad unsub
requests manually would be barely noticeable compared to the general
moderation load. Which isn't high either.


   Did you read the rest of the message?
  
  Yes. And quite frankly, catching unsub messages and bouncing them
  with a note as you mentioned is almost identical to the proposed
  redirect them to owner to handle it. With the latter involving
  moderators, having the advantage, that we can and will offer
  additional help if need be.
 
 Having the listserver catch the messages and handle them is
 almost identical to redirecting them to the owner for manual
 handling? I could see that if list owners still managed lists
 manually. But there's this nifty new software that manages lists
 automatically, freeing the list owners from all that drudge work.

I am very sorry, but it appears you have absolutely no clue what nursing
mailing lists today means.

Yes, all subscription (and un-subscription) is handled automatically. No
owner intervention, not even notices. Automation.

What we mostly do face is posts by non-subscribers. Mostly spam (just
ignore), but also a non-negligible amount of valid posts by
non-subscribers, or list-replies by subscribers using a wrong address.
The latter outweighs by far the amount of non-subscribers.

Unsub posts to the list? About the same as non-subscriber posts. Very
limited. Almost negligible, if some rare samples won't trigger an
on-list shitstorm.


With the proposed process in place, I would have spent less lime
managing and resolving the last 12 months' bad unsub requests, than it
took me arguing with you about something that really does not concern
you.


 Your assumption is that I am telling you to do all this manually. You
 seemed to be ambivalent about this, not preferring to do it manually but
 seeming to prefer to do it manually. 

No. I know from experience that doing this manually is the easiest,
least time consuming solution.

And with no word did I imply you are telling me to do all this manually.
Quite the contrary.


 My assumption was expecting it to occur to everyone that it might be
 done automatically. I really did not expect to have to write to
 ISO-9002 standards on a user list. 

Exactly, *might*. Not the best solution in this case.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4

Re: Is this really the SpamAssassin list? (was Re: unsubscribe)

2014-10-27 Thread Karsten Bräckelmann
On Mon, 2014-10-27 at 17:00 -0400, Kevin A. McGrail wrote:
 On 10/27/2014 4:48 PM, Kevin A. McGrail wrote:
  On 10/27/2014 4:45 PM, David F. Skoll wrote:

   How hard would it be to have the mailing list quarantine a message 
   whose subject consists solely of the word unsubscribe ? 

  Heh... Apparently more needed than I hoped.  I'll have to ask the
  foundation if they can implement something to achieve this. 
 I've emailed infra with the following request:

Might help, but not worth much effort if infra cannot set it up easily.
While we've seen a few recently, usual and overall frequency is *much*
lower.


 header__KAM_SA_BLOCK_UNSUB1Subject =~ /unsubscribe/i

Ouch. Would you please /^anchor$/ that beast? Unless you actually intend
this sub-thread to be swept off the list, too. ;)


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Is this really the SpamAssassin list? (was Re: unsubscribe)

2014-10-27 Thread Karsten Bräckelmann
On Mon, 2014-10-27 at 19:44 -0700, jdebert wrote:
 On Mon, 27 Oct 2014 17:00:11 -0400
 Kevin A. McGrail kmcgr...@pccc.com wrote:

  I've emailed infra with the following request:
  
  ...we have been getting consistent unsubscribe messages posted to
  the entire users list which begs the question if there is a way to
  redirect those to the mailing list owner instead of just posting
  them?
 
 Redirecting them makes people lazy. Better than annoying but they
 don't learn anything except to repeat their mistakes.

Your assumption, the list moderators (aka owner, me being one of them)
would simply and silently obey and dutifully do the un-subscription for
them, is flawed. ;)

Just as with regular moderation, we'd respond with a template explaining
things, offering instructions -- and additional information on a
case-by-case basis.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: How is it that my X-Spam-Status is no, but my header gets marked with

2014-10-27 Thread Karsten Bräckelmann
On Mon, 2014-10-27 at 20:19 -0700, jdebert wrote:
 On Mon, 27 Oct 2014 15:45:03 -0700 (PDT)
 John Hardin jhar...@impsec.org wrote:

  The apparent culprit is a procmail rule that explicitly passes a
  message through the mail system again. The message is being scanned
  twice. If she can either deliver to a local mailbox rather than
  forwarding to an email address, or modify the procmail rule that
  calls SA to ignore messages that have already passed through the
  server once, I think the problem would go away.
 
 It looks as if it's the global procmailrc that always puts all mail,
 even mail between local users through spamassassin. However, I don't
 see how going through spamassassin again will modify the header. It's

It is not the second run that modifies the header. It's the first one.
With the second run classifying the mail as not-spam.

 already modified before the user procmail rule sees it. Something
 appears to be causing the first run of sa to modify the header
 unconditionally. If global procmail actually does the first run.

A system-wide procmail recipe feeds mail to SA.

Then there's a user procmail recipe that forwards mail with a Subject
matching /SPAM/ to another dedicated spam dump address with the same
domain, which ends up being delivered to that domain's MX. The same SMTP
server. Now re-processing the original mail (possibly wrapped in an
RFC822 attachment by SA), feeding it to SA due to the system-wide
procmail recipe...

On that second run, the message previously classified spam does not
exceed the threshold. Thus the X-Spam-Status of no, overriding the
previous Status header which is being ignored by SA anyway.

Result: Subject header rewritten by SA, despite final (delivery time)
spam status of no. This thread's Subject.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: .link TLD spammer haven?

2014-10-25 Thread Karsten Bräckelmann
On Fri, 2014-10-24 at 19:05 -0700, John Hardin wrote:
 On Fri, 24 Oct 2014, John Hardin wrote:
 
  On Sat, 25 Oct 2014, Martin Gregorie wrote:
 
Less obviously, it doesn't seem to matter whether you write the rule
as /\.link\b/  or /\.link$/ - both give identical matches. Both match
the following regexes just as you'd expect:
  http://www.linkedin.com/home/user/data.link
  http://www.example.link
  
but, less obviously, both also match this:
  http://www.example.link/path/to/file.txt
 
  {boggle}
 
...but
  grep -P '\.link\b' matches it, but
  grep -P '\.link$'  does not.
  
I presume that this means that the uri rule tests against two strings:
one being just the domain name and the other being the whole URI and
declares a rule hit if either string matches.

Basically correct. SA uri rules are not only tested against the raw URI
as extracted from the message, but also some normalized variations.
Without going into details, OTOH this includes un-escaping, protocol
prefix (if missing) and path stripping.

  $ echo -e \n apache.org/path/ |
  ./spamassassin -D -L --cf=uri URI_DOMAIN /^http:\/\/[^\/]+$/

  dbg: rules: ran uri rule URI_DOMAIN == got hit: http://apache.org;

Note the regex matching a domain only anything-but-slash [^/]+
substring anchored at the end of the string. Also note the input
message's URI lacking a protocol, but the rule hit showing the (default)
protocol added by SA in one variation.


  I don't think so, but I'm not positive.
 
  If you have a testing environment set up, try adding this and see what you 
  get in the log:
 
 uri__ALL_URI  /.*/
 
 oops. This too:
 
   tflags __ALL_URI  multiple
 
 Sorry for forgetting that bit, it's rather important. :)

That seemingly straight-forward approach does not work in this case. The
tflags multiple option does not make uri rules match multiple times on a
single URI extracted from the message. It still generates a single hit
per extracted URI only, not including multiple hits on its normalized
variations.

The tflags multiple option on a uri rule enables it to match multiple
times on different URIs extracted from the message.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: How is it that my X-Spam-Status is no, but my header gets marked with

2014-10-25 Thread Karsten Bräckelmann
On Sat, 2014-10-25 at 20:06 -0700, Cathryn Mataga wrote:
 
 Okay, here's another header.Shows X-Xpam-Status as no.
 
 In local.cf I changed to this, just to be sure.
 
 rewrite_header Subject [SPAM][JUNGLEVISION SPAM CHECK]

 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on 
 ecuador.junglevision.com
 X-Spam-Level: *
 X-Spam-Status: No, score=1.5 required=3.5 tests=BAYES_50,HTML_MESSAGE, 
 MIME_HTML_ONLY,MIME_QP_LONG_LINE autolearn=disabled version=3.3.2

 Subject: [SPAM][JUNGLEVISION SPAM CHECK] Confirmation of Order Number 
 684588 * Please Do Not Reply To This Email *

Somehow, you are passing messages to SA twice.

First one classifies it spam and rewrites the Subject. Second run
doesn't. Added headers, content wrapping, or most likely re-transmission
from trusted networks makes the second run fail.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: URIBL_RHS_DOB high hits

2014-10-11 Thread Karsten Bräckelmann
On Sat, 2014-10-11 at 23:40 +0200, Reindl Harald wrote:
 it hits again and i doubt that sourceforge is a new domain
 whatever the reason is - for me enough to disable it forever

Jumping to conclusions, aren't you?


 Oct 11 23:34:43 mail-gw spamd[28079]: spamd: result: . 0 - 
 BAYES_50,CUST_DNSWL_7,CUST_DNSWL_9,DKIM_ADSP_ALL,HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_PASS,T_RP_MATCHES_RCVD,URIBL_RHS_DOB,USER_IN_MORE_SPAM_TO
  
 scantime=0.9,size=8902,user=sa-milt,uid=189,required_score=4.5,rhost=localhost,raddr=127.0.0.1,rport=39381,mid=7655276d-92b5-4dbd-8041-6db5c4fb8...@tieman.se,bayes=0.499983,autolearn=disabled
 Oct 11 23:34:43 mail-gw postfix/qmgr[28308]: 3jFfYt4WVTz1l: 
 from=netatalk-admins-boun...@lists.sourceforge.net, size=8829, nrcpt=1 
 (queue active)

$ host sourceforge.net.dob.sibl.support-intelligence.net
Host sourceforge.net.dob.sibl.support-intelligence.net not found: 3(NXDOMAIN)

$ host tieman.se.dob.sibl.support-intelligence.net
tieman.se.dob.sibl.support-intelligence.net has address 127.0.0.2

$ whois tieman.se | grep 2014
created:  2014-01-11
modified: 2014-09-20


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: URIBL_RHS_DOB high hits

2014-10-11 Thread Karsten Bräckelmann
On Sun, 2014-10-12 at 00:29 +0200, Reindl Harald wrote:
 Am 12.10.2014 um 00:23 schrieb Reindl Harald:
  Am 12.10.2014 um 00:18 schrieb Karsten Bräckelmann:
   On Sat, 2014-10-11 at 23:40 +0200, Reindl Harald wrote:

it hits again and i doubt that sourceforge is a new domain
whatever the reason is - for me enough to disable it forever
  
   Jumping to conclusions, aren't you?
 
  yes - the conclusion is that it had way too much FP's recently

Arguably, tieman.se should be sufficiently old to not be listed.

However, what I am much more annoyed about is your rambling, claiming
DOB would list sourceforge.net -- and by that, particularly with this
thread's topic, giving the impression of DOB again listing the world.
Which it doesn't.

Obviously, you did not check facts or investigate the issue at all.


 frankly it hitted even my own message you replied to
 see at bottom

Yes, so will this one. DOB does NOT operate on sender or From header.
See for yourself:

  echo -e \n tieman.se | ./spamassassin

So yes, it hit on your mail. But no, it does not list your domain.


  Oct 11 23:34:43 mail-gw spamd[28079]: spamd: result: . 0 -

FWIW, you can investigate and check any detail you want, because the
mail has been accepted by your SMTP server.

With a configuration of add_header all Report _REPORT_, the listed
domain even is included in the report, without any need for manual
post-processing.

  *  0.3 URIBL_RHS_DOB Contains an URI of a new domain (Day Old Bread)
  *  [URIs: tieman.se]


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: URIBL_RHS_DOB high hits

2014-10-11 Thread Karsten Bräckelmann
On Sun, 2014-10-12 at 01:28 +0200, Reindl Harald wrote:
 Am 12.10.2014 um 01:09 schrieb Karsten Bräckelmann:

  it hits again and i doubt that sourceforge is a new domain

  However, what I am much more annoyed about is your rambling, claiming
  DOB would list sourceforge.net -- and by that, particularly with this
  thread's topic, giving the impression of DOB again listing the world.
  Which it doesn't.
 
 it seems to hit randomly which is even more worse because listing the 
 world is more obvious - i claim that it is not trustable currently, not 
 more and not less, may anybody make his own decision, i told mine and 
 there is nothing worng with that

You have exactly one false positive listing. That is not even close to
hit randomly.

Please stop the repeated, false accusations on this list.


  Obviously, you did not check facts or investigate the issue at all.
 
 don't get me wrong, there ist not much to investigate if it hits legit 
 mailing-list messages

Correct, there is not much to investigate. The *only* thing would be to
verify *which* domain hit the DOB listing, and whether it actually is a
bad or warranted listing. Besides, that one is absolutely crucial to
check before claiming a false positive.

A single thing to verify. You did not.

Besides, it is just a coincidence that another domain in your log paste
actually was listed when I checked. Any other domain from the body could
have been the culprit. And still potentially can, since you only posted
logs -- no SA headers, body, or list of URIs.


  With a configuration of add_header all Report _REPORT_, the listed
  domain even is included in the report, without any need for manual
  post-processing.
 
 *  0.3 URIBL_RHS_DOB Contains an URI of a new domain (Day Old Bread)
 *  [URIs: tieman.se]
 
 which is just not true - the domain is way older

Yes, that seems to be a DOB false positive listing (and the only one
known right now, see above). Get over it.

And BTW, that was meant as a helpful hint for you and anyone else
reading this thread, about getting crucial details while investigating
(or reporting) issues. No need to bark at me, and repeat yet again
that's the one bad listing you encountered. The above is how to do it
and what you get.


 and the SBL hit because 
 support-intelligence.net makes things not better
 
 URIBL_SBL Contains an URL's NS IP listed in the SBL blocklist * 
 [URIs: tieman.se.dob.sibl.support-intelligence.net]

That is a SpamHaus listing. Support Intelligence is not responsible for
it, but the victim.

This is entirely unrelated to URIBL_RHS_DOB and this thread's topic.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: URIBL_RHS_DOB high hits

2014-10-11 Thread Karsten Bräckelmann
On Sun, 2014-10-12 at 02:58 +0200, Reindl Harald wrote:
 Am 12.10.2014 um 02:20 schrieb Karsten Bräckelmann:

  You have exactly one false positive listing. That is not even close to
  hit randomly.
 
 well, i can't verify the other hits because don't have access to other 
 users email - the follwoing is another one and that *is* the definition 
 of randomly - in doubt such a list must not answer when there is not 
 verified data instead hit a FP
 
 URIBL_RHS_DOB Contains an URI of a new domain (Day Old Bread)
 [URIs: goo.gl]

Another false positive DOB listing. Not good. Thanks for taking some
time to actually provide detail.

As for your personal definition of randomness, please see what others
have to say about it. Multiple bad listings still is not random.

  http://en.wikipedia.org/wiki/Randomness


  Please stop the repeated, false accusations on this list.
 
 point out that it is not trustable currently is not a accusation and 

You claimed DOB listed sourceforge.net, which it didn't. You repeatedly
claimed their listing to be random, which it isn't. That is what I
referred to as false accusations.


 frankly http://support-intelligence.com/dob/ itself states The list is 
 currently in BETA and should be used accordingly. We still have some 
 kinks in it and occasionally domains older than five days, or other 
 important domains end up in the list

Yes. So what?

You are free to disable DOB on your server. You are free and in fact
welcome to report any issue with stock SA included DNSBLs, on-list or in
bugzilla, with founded evidence.

You are not free to claim $list responses to be random without proof.


  Obviously, you did not check facts or investigate the issue at all.
 
  don't get me wrong, there ist not much to investigate if it hits legit
  mailing-list messages
 
  Correct, there is not much to investigate. The *only* thing would be to
  verify *which* domain hit the DOB listing, and whether it actually is a
  bad or warranted listing. Besides, that one is absolutely crucial to
  check before claiming a false positive.
 
  A single thing to verify. You did not
 
 if it hits a regular mailing list thread it is problematic and as said

No. It depends on the content. See this list for prime example.

 if there are no data for whatever reason the answer should be NXDOMAIN 
 and not 127.0.0.1 in doubt because FP does more harm than FN

False accusation, again. You just claimed $list would return anything
other than NXDOMAIN in case of not-being-listed.

  $ host not-registered-domain.com.dob.sibl.support-intelligence.net
  Host not-registered-domain.com.dob.sibl.support-intelligence.net not found: 
3(NXDOMAIN)

We're talking false positive listings. Not random responses, neither
positive listing if in doubt.

Again, stop unfounded false accusations on this list.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Score Ignored

2014-10-08 Thread Karsten Bräckelmann
On Wed, 2014-10-08 at 15:48 -0500, Robert A. Ober wrote:
  On Mon, 22 Sep 2014 15:11:44 -0500 Robert A. Ober wrote:

   *Yes,  my test messages and SPAM hit the rules but ignore the score.*

 What is the easiest way to know what score is applied per rule? Neither 
 the server log nor the header breaks it down.

Wait. If there's no Report, if you do not have the list of rules hit and
its respective scores, how do you tell your custom rule's score is
ignored by SA?


Besides the Report as mentioned by Axb already, you also can modify the
default Status header to include per-rule scores.

add_header all Status _YESNO_, score=_SCORE_ required=_REQD_ 
tests=_TESTSSCORES(,)_ autolearn=_AUTOLEARN_ version=_VERSION_


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: rejected Null-Senders

2014-10-07 Thread Karsten Bräckelmann
On Tue, 2014-10-07 at 17:46 +0200, Reindl Harald wrote:
 can somebody comment in what context null-senders and
 so bounces and probably autorepsonders are blocked
 by DKIM_ADSP_NXDOMAIN,USER_IN_BLACKLIST

SA does not block. *sigh*

In this context, the DKIM_ADSP_NXDOMAIN hit is irrelevant, given its low
score. The USER_IN_BLACKLIST hit is what's pushing the score beyond your
STMP reject threshold.


 DKIM_ADSP_NXDOMAIN,USER_IN_BLACKLIST
 from= to=u...@example.com
 3jC2XD1j8Cz1y: milter-reject: END-OF-MESSAGE

See whitelist_from documentation for the from / sender type mail headers
SA uses for black- and whitelisting.

The above seems to show SMTP stage MAIL FROM, which results in only one
of the possible headers and depends on your SMTP server (and milter in
your case).


 a customer sends out his yearly members-invitation nad i see some 
 bounces / autrorepsonders pass through and some are blocked with the 
 above tags, at least one from his own outgoing mainserver
 
 what i don't completly understand is the DKIM_ADSP_NXDOMAIN since in 
 case of NXDOMAIN the message trigger the response could not have been 
 delivered and how the USER_IN_BLACKLIST comes with a empty sender
 
 not that i am against block some amount of backscatters, i just want to 
 understand the conditions

-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: spamd does not start

2014-10-07 Thread Karsten Bräckelmann
On Tue, 2014-10-07 at 18:55 +0300, Jari Fredrisson wrote:
 I built SA 3.4 using cpan to my old Debian Squeeze-lts.
 
 root@hurricane:~# time service spamassassin start
 Starting SpamAssassin Mail Filter Daemon: child process [4868] exited or
 timed out without signaling production of a PID file: exit 255 at
 /usr/local/bin/spamd line 2960.
 
 real0m1.230s

 I read that line in spamd and it talks about two bugs. And a long
 timeout needed. But this dies at once, hardly a timeout?

It states the child process exited or timed out. Indeed, obviously not
a timeout, so the child process simply exited.

Anything in syslog left by the child?


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: recent channel update woes

2014-10-07 Thread Karsten Bräckelmann
On Tue, 2014-10-07 at 18:49 -0400, Eric Cunningham wrote:
 Is there a way to configure URIBL_RHS_DOB conditionally such that if 
 there are issues with dob.sibl.support-intelligence.net like we're 
 seeing, that associated scoring remains neutral rather than increasing 
 (or decreasing)?

No. As-is, a correct DNSxL listing is indistinguishable from a false
positive listing.


One possible strategy to detect FP listings would be an additional DNSxL
query of a test-point or known-to-be not listed value. This comes at the
cost of increased load both for the DNSxL as well as SA instance, and
will lag behind due to TTL and DNS caching. The lower the lag, the lower
the caching, the higher the additional load.

By doing such tests not on a per message basis but per spamd child. or
even having the parent process monitor for possible world-listed
situations, the additional overhead and load could be massively reduced.

Simply monitoring real results (without test queries) likely would not
work. It is entirely possible that really large chunks of the mail
stream continuously result in positive DNSxL listings. Prime candidates
would be PBL hitting botnet spew, or exclusively DNSWL trusted messages
during otherwise low traffic conditions. Distinguishing lots of
consecutive correct listings from false positives would be really hard
and prone to errors.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: recent channel update woes

2014-10-07 Thread Karsten Bräckelmann
On Wed, 2014-10-08 at 01:18 +0200, Reindl Harald wrote:
 Am 08.10.2014 um 00:49 schrieb Eric Cunningham:

  Is there a way to configure URIBL_RHS_DOB conditionally such that if
  there are issues with dob.sibl.support-intelligence.net like we're
  seeing, that associated scoring remains neutral rather than increasing
  (or decreasing)?
 
 not really - if you get the response from the DNS - well, you are done
 
 the only exception are dnslists which stop to answer if you excedd the 
 free limit but in that case they answer with a different response what 
 is caught by the rules

Exceeding free usage limit is totally different from the recent DOB
listing the world issue.

Also, exceeding limit is handled differently in lots of ways. It ranges
from specific limit exceeded results, up to listing the world at the
hostile end or in extreme situations to finally get the admin's
attention. It also includes simply no results other than NXDOMAIN, which
is hard to distinguish from proper operation in certain low-listing
conditions.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: recent channel update woes

2014-10-07 Thread Karsten Bräckelmann
On Tue, 2014-10-07 at 16:37 -0700, Dave Warren wrote:
 If you're paranoid, you can monitor the DNSBLs that you use via script 
 (externally from SpamAssassin) and generate something that reports to 
 you when there's a possible issue. If you're really paranoid, you can 
 have it write a .cf that would 0 out the scores, but I assure you that 
 you'll spend more time building, testing and maintaining such a system 
 than it's worth in the long run, in my experience it's better to just 
 page an admin.
 
 I monitor positive and negative responses, for IP based DNS BLs, I use 
 the following by default:
 
 127.0.0.1 should not be listed.
 127.0.0.2 should be listed.

Depending on how the DNSBL implements such static test-points, they
might not be affected by the issue causing the false listings.
Similarly, domains likely to appear on exonerate lists (compare
uridnsbl_skip_domain e.g.) might also not be affected.

For paranoid monitoring, low-profile domains that definitely do not and
will not match the listing criteria might be better suited for the task.


 $MYIP should not be listed.
 
 Obviously these need to be tweaked and configured per-list, not all 
 lists list 127.0.0.2, and some lists use status codes, so should not be 
 listed and should be listed are really match/do-not-match some 
 condition
 
 In the case of DNSWL, $MYIP should be listed, if I get de-listed, I want 
 to know about that too.

-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Administrivia (was: Re: recent channel update woes)

2014-10-06 Thread Karsten Bräckelmann
On Mon, 2014-10-06 at 13:36 -0400, Kevin A. McGrail wrote:
 On 10/6/2014 1:23 PM, Kevin A. McGrail wrote:
  On 10/6/2014 1:11 PM, Jason Goldberg wrote:

   How to i get removed from this stupid list.
  
   I love begin spammed by a list about spam which i did not signup for.
 
  Email users-h...@spamassassin.apache.org and the system will mail you 
  instructions.
 
  If you did not sign up for the list, that is very troublesome and we 
  can ask infrastructure to research but I believe we have a 
  confirmation email requirement to get on the list. 

First of all: Jason's posts are stuck in moderation. The sender address
he uses is not the one he subscribed with.

Sidney and I (both list moderators) have been contacting Jason off-list
with detailed instructions how to find the subscribed address and
offering further help.


 Obviously we take this very seriously as anti-spammers because the 
 definition I follow for spam is it's about consent not content.  If you 
 didn't consent to receive these emails, we have a major issue.

The list server requires clear and active confirmation of the
subscription request by mail, validating both the address as well as
consent.


 I've confirmed we have a confirmation email process in place that 
 requires the subscribee to confirm the subscription request.  And I 
 believe this has been in place for many years.  So if you did not 
 subscribe to the list or confirm the subscription, you may need to check 
 if your email address credentials have been compromised as that's the 
 second most likely scenario for the cause beyond an administrator adding 
 you directly.
 
 Karsten, any thoughts other than if a list administrator added them 
 directly?   Have infrastructure check the records for when and how the 
 subscriber was added?  Open a ticket with Google?

He has not been added by a list administrator.

Without the subscribed address, there is absolutely nothing we can do. I
grepped the subscription list and transaction logs for parts of Jason's
name and company. The address in question is entirely different.


Just to give some answers. This issue should further be handled
off-list.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: SpamAssassin false positive bayes with attachments

2014-10-06 Thread Karsten Bräckelmann
On Mon, 2014-10-06 at 09:03 -0400, jdime abuse wrote:
 I have been seeing some issues with bayes detection from base64
 strings within attachments causing false positives.
 
 Example:
 Oct  6 09:02:14.374 [15869] dbg: bayes: token 'H4f' = 0.71186828264
 Oct  6 09:02:14.374 [15869] dbg: bayes: token 'wx2' = 0.68644662127
 Oct  6 09:02:14.374 [15869] dbg: bayes: token 'z4f' = 0.68502147581
 Oct  6 09:02:14.378 [15869] dbg: bayes: token '0vf' = 0.66604823748
 
 Is there a solution to prevent triggering bayes from the base64 data
 in an attachment? It was my impression that attachments should not
 trigger bayes data, but it seems that it is parsing it as text rather
 than an attachment.

Bayes tokens are basically taken from rendered, textual body parts (and
mail headers). Attachments are not tokenized.

Unless the message's MIME-structure is severely broken, these tokens
appear somewhere other than a base64 encoded attachment. Can you provide
a sample uploaded to a pastebin?


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: running own updateserver

2014-10-05 Thread Karsten Bräckelmann
On Wed, 2014-10-01 at 13:19 +0200, A. Schulze wrote:
 Hello,
 
 I had the idea to run my own updateserver for two purposes:
   1. distribute own rules
   2. override existing rules
 
 But somehow I fail on #2.
 
 
 SA rules normally reside in /var/.../spamassassin/$SA-VERSION/channelname/*.cf
 Also the are files /var/.../spamassassin/$SA-VERSION/channelname.cf  
 including the real files in channelname/
 
 Now I had some rules overriding existing SA rules in  
 /etc/mail/spamassassin/local.cf
 These rules I moved to my own channelname and now the defaults from  
 updates_spamassassin_org
 are active again.
 
 My guess: rules are included in lexical order from  

Correct.

 /var/.../spamassassin/$SA-VERSION/channelname.cf
 and my new channel spamassassin_example_org is *not after*  
 updates_spamassassin_org
 
 I proved my guess by renaming the channelfiles to z_spamassassin_example_org
 ( adjusted the .cf + include also )
 
 Immediately the intended override was active again.
 
 Is my guess right?

Yes.

 If so, any (other then renaming the channel) chance to modify the order?

No. The directory name and accompanying cf file are generated by
sa-update based on the channel name. There is no way for the channel to
enforce order.

Besides picking a channel name that lexicographically comes after the
to-be-overridden target channel, you're limited to local post sa-update
rename or symlink hacks with additional maintenance cost.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: bad local parts (thisisjusttestletter)

2014-10-04 Thread Karsten Bräckelmann
On Sat, 2014-10-04 at 22:15 +0200, Reindl Harald wrote:
 i recently found thisisjusttestletter@random-domain as sender as well 
 as thisisjusttestletter@random-of-our-domains as RCPT in my logs and 
 remember that crap for many years now

Surely, SA would never see that message, since that's not an actual,
valid address at your domain. And you're not using catch-all, do you?

(Yes, that question is somewhere between rhetoric and sarcastic.)

 well, postfix access maps after switch away from commercial
 appliances - are there other well nown local-parts to add
 to this list?

What would you need a blacklist of spammy address local parts for? Do
not accept messages to SMTP RCPT addresses that don't exist. Do not use
catch-all. Problem solved...

Other than that, this is an OT postfix question.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: bad local parts (thisisjusttestletter)

2014-10-04 Thread Karsten Bräckelmann
On Sun, 2014-10-05 at 01:53 +0200, Reindl Harald wrote:
 Am 05.10.2014 um 01:41 schrieb Karsten Bräckelmann:
  On Sat, 2014-10-04 at 22:15 +0200, Reindl Harald wrote:

   i recently found thisisjusttestletter@random-domain as sender as well
   as thisisjusttestletter@random-of-our-domains as RCPT in my logs and
   remember that crap for many years now
  
  Surely, SA would never see that message, since that's not an actual,
  valid address at your domain. And you're not using catch-all, do you?
 
  (Yes, that question is somewhere between rhetoric and sarcastic.)
 
 but thisisjusttestletter@random-domain is a valid address in his 
 domain until you prove the opposite with sender-verification and it's 
 drawbacks

Correct. And it is unsafe to assume any given address local part could
not possibly be valid and used as sender address in ham.

If at all, such tests should be assigned a low-ish score, not used in
SMTP access map blacklisting. However, I seriously doubt it's actually
worthwhile to maintain such rules.


   well, postfix access maps after switch away from commercial
   appliances - are there other well nown local-parts to add
   to this list?
  
  What would you need a blacklist of spammy address local parts for? Do
  not accept messages to SMTP RCPT addresses that don't exist. Do not use
  catch-all. Problem solved...
 
 don't get me wrong but you missed the 'i recently found 
 thisisjusttestletter@random-domain' as sender at the start of my post

As sender, continued by as well as [...] as RCPT using the exact same
local part.

So you just found one such instance in your logs. And yes, I have seen
that very address local part, too, occasionally. Although only in SMTP
logs and AFAIR never ever in SMTP accepted spam, let alone FNs, because
just like your sample, they always sported a similarly invalid RCPT
address.

Did you ever see this in MAIL FROM with a *valid* RCPT TO address?

And did it end up scored low-ish? Below 15? Otherwise, it's just not
worth it.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: bad local parts (thisisjusttestletter)

2014-10-04 Thread Karsten Bräckelmann
On Sun, 2014-10-05 at 02:43 +0200, Reindl Harald wrote:
 Am 05.10.2014 um 02:27 schrieb Karsten Bräckelmann:
  On Sun, 2014-10-05 at 01:53 +0200, Reindl Harald wrote:
  Am 05.10.2014 um 01:41 schrieb Karsten Bräckelmann:
  On Sat, 2014-10-04 at 22:15 +0200, Reindl Harald wrote:

  i recently found thisisjusttestletter@random-domain as sender as well
  as thisisjusttestletter@random-of-our-domains as RCPT in my logs and
  remember that crap for many years now
 
  Surely, SA would never see that message, since that's not an actual,
  valid address at your domain. And you're not using catch-all, do you?
 
  (Yes, that question is somewhere between rhetoric and sarcastic.)
 
  but thisisjusttestletter@random-domain is a valid address in his
  domain until you prove the opposite with sender-verification and it's
  drawbacks
 
  Correct. And it is unsafe to assume any given address local part could
  not possibly be valid and used as sender address in ham.
 
 most - any excludes that one honestly

I would agree, gladly. If only I would not have these pictures in my
head of an admin creating that as a deliverability testing address. Same
ball park as a Subject of test. I almost can hear his accent...


  If at all, such tests should be assigned a low-ish score, not used in
  SMTP access map blacklisting. However, I seriously doubt it's actually
  worthwhile to maintain such rules.
 
 agreed - i only asked if there are known other local parts
 of that sort because i noticed that one at least 5 years
 ago as annoying

Annoying? That was before using SA and with using catch-all, right?

So it was annoying back then. Doesn't explain why you're chasing it
today. How many of them can you find in your logs? Even including its
variants (e.g. atall appended), I assume the total number to be really
low. And, frankly, exclusively existent in SMTP logs rejecting the
message.

Unless there still is catch-all in effect, that should have been axed
some 10 years ago.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Custom rule not hitting suddenly?

2014-09-08 Thread Karsten Bräckelmann
On Mon, 2014-09-08 at 11:35 -0600, Amir Caspi wrote:
 One of my spammy URI template rules is, for some reason, not hitting
 any more.  Spample here:
 
 http://pastebin.com/jy6WZhWW
 
 In my local.cf sandbox I have the following:
 
 uri __AC_STOPRANDDOM_URI1 
 /(?:stop|halt|quit|leave|leavehere|out|exit|disallow|discontinue|end)\.[a-z0-9-]{10,}\.(?:us|me|com|club|org|net)\b/
 
 This is part of my AC_SPAMMY_URI_PATTERNS meta rule, which hits just
 fine on other emails (including others of this particular format).
 
 Debug output shows this subrule didn't hit anything (that is, the rule
 isn't mentioned at all in the debug output), but regexpal.com says it
 should have hit just fine.

Works for me.

Pulled the sample from pastebin and fed to spamassassin -D with your
custom rule added as additional configuration. That rule hits.


 Could the problem be with the \b delimiter at the end?

No. The word-boundary \b does not only match between a word \w and
non-word \W char, but also at the beginning or end of the string, if the
adjacent char is a word char.

 I've noticed that sometimes can cause issues in failing to hit, but
 usually only when a URI ends with a slash...

That, too, would be unrelated to the \b word-boundary.

What bothers me is that sometimes qualification. Either it matches or
it doesn't. If it matches sometimes, something yet unnoticed has a
severe impact.


Did you grep the -D debug output for the hostname? Also try grepping for
URIHOSTS (SA 3.4, without -L local only mode), which lists all hostnames
found in the message.


 and this same rule hits other matching URIs in other spams.  However,
 this isn't the first time I've noticed a failure to match... so any
 idea why it's not hitting?  Per the regex rules, it SHOULD be hitting
 fine unless it's the \b...
 
 Any ideas?

The URI is at the very end of a line with a CRLF delimiter following and
the next line beginning with a word character. If you inject a space
after the URI, does that make the rule match? (That should not be the
issue, just trying to rule out conversion problems.)

Also I noticed the headers are CRLF delimited, too. How did you get that
sample? Any chance it has been modified or re-formatted by a text editor
and does not equal the raw, original message?

Does the pastebin uploaded file still not trigger the rule for you?


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Custom rule not hitting suddenly?

2014-09-08 Thread Karsten Bräckelmann
On Mon, 2014-09-08 at 18:08 -0600, Amir Caspi wrote:
 On Sep 8, 2014, at 4:09 PM, Karsten Bräckelmann guent...@rudersport.de 
 wrote:
 
  Pulled the sample from pastebin and fed to spamassassin -D with your
  custom rule added as additional configuration. That rule hits.
 
 It does not hit on mine, and I think I've figured out why.  I'm using
 SA 3.3.2 with perl 5.8.8 on CentOS 5.10.  Yes, I know I should be
 using 3.4, but I haven't yet had a chance to try the RPM that a couple
 of people have built.  Nonetheless, with SA 3.3.2, it appears that the
 URI engine doesn't like the .club TLD.  See below.

Good one. Yes, it's the TLD.


 Sep  8 20:02:58.897 [9267] dbg: rules: ran uri rule AC_ALL_URI == got 
 hit: negative match
 
 So, for some reason, the URI engine is not picking out these .club
 URIs, it's getting negative match.  Is it because the engine in
 3.3.2 doesn't like that TLD?  To test this, I manually changed the TLD
 of the second spam URI (out.blah) to .us or .org, and then the engine
 picked it out just fine:

At the time of the 3.3.2 release, the .club TLD simply didn't exist. It
has been accepted by IANA just recently. Of course I was conveniently
using a trunk checkout for testing and kind of shrugged off that TLD in
question.

FWIW, this is not actually a 3.3.x issue. It's the same with 3.4.0. Yes,
that is a *recent* TLD addition... *sigh*


 Sep  8 20:03:43.151 [9197] dbg: rules: ran uri rule AC_ALL_URI == got 
 hit: http://out.dosearchcarsonsale.us;
 Sep  8 20:04:35.578 [9227] dbg: rules: ran uri rule AC_ALL_URI == got 
 hit: http://out.dosearchcarsonsale.org;
 
 So, it seems to me that the URI engine is barfing on the TLD, and
 that's the problem...

 Is there a patch I can apply that would fix this, until I can upgrade to 3.4?

SVN revision 1615088. The text changed link shows the diff and
has a link to the plain patch.

  http://svn.apache.org/viewvc?view=revisionrevision=1615088

Dunno if that the patch applies cleanly to 3.3.2, though.

You also can change M::SA::Util::RegistrarBoundaries manually. As per
the svn diff above, two blobs are involved:  (a) the VALID_TLDS hash
foreach() definition and  (b) the VALID_TLDS_RE.

So you could get those out of trunk and edit RegistrarBoundaries.pm
locally. It also should be possible to simply replace that Perl module
with the current trunk version.

And last but not least, generation ob both these TLD blobs is documented
in the code right before their definition. You can always generate it
fresh.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Valid TLDs (was: Re: Custom rule not hitting suddenly?)

2014-09-08 Thread Karsten Bräckelmann
Some discussion of the underlying issue.

On Tue, 2014-09-09 at 02:59 +0200, Karsten Bräckelmann wrote:
 At the time of the 3.3.2 release, the .club TLD simply didn't exist. It
 has been accepted by IANA just recently. Of course I was conveniently
 using a trunk checkout for testing and kind of shrugged off that TLD in
 question.
 
 FWIW, this is not actually a 3.3.x issue. It's the same with 3.4.0. Yes,
 that is a *recent* TLD addition... *sigh*

Unlike the util_rb_[23]tld options, the set of valid TLDs is actually
hard-coded. It would not be a problem to make that an option, too.
Which, on the plus side, would make it possible to propagate new TLDs
via sa-update. Not only 3.3.x would benefit from that, but also 3.4.0
instances. Plus, it would be generally faster anyway.

There is one down side: A new dependency on Regexp::List [1]. The RE
pre-compile one-time upstart penalty should be negligible.

The question is: Is it worth it?  WILL it be worth it?

This incidence is part of the initial round of IANA accepting generic
TLDs. There's hundreds in this wave, and some are abused early. This is
moonshine registration, nothing like new TLDs being accepted in the
coming years.

Or is it? Will new generic TLDs in the future be abused like that, too?
How frequently will that happen? Is it worth being able to react to it
quickly? How long will URIBLs take to list them? How long will it take
for the average MUA to even linki-fy them?

Opinions? Discussion in here, or should I move this to dev?

I guess I'd be happy to introduce to you... util_rb_tld.


[1] Well, or a really, really f*cking ugly option that takes a
pre-optimzed qr// blob containing the VALID_TLDS_RE.

-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Valid TLDs (was: Re: Custom rule not hitting suddenly?)

2014-09-08 Thread Karsten Bräckelmann
On Mon, 2014-09-08 at 22:15 -0400, Daniel Staal wrote:
 --As of September 9, 2014 3:45:33 AM +0200, Karsten Bräckelmann is alleged 
 to have said:
 
  This incidence is part of the initial round of IANA accepting generic
  TLDs. There's hundreds in this wave, and some are abused early. This is
  moonshine registration, nothing like new TLDs being accepted in the
  coming years.
 
  Or is it? Will new generic TLDs in the future be abused like that, too?
  How frequently will that happen? Is it worth being able to react to it
  quickly? How long will URIBLs take to list them? How long will it take
  for the average MUA to even linki-fy them?
 
  Opinions? Discussion in here, or should I move this to dev?
 
 --As for the rest, it is mine.
 
 New TLDs will always be abused...

And old ones. TK, re-naming the web. Yes, sometimes it is valid to add
a point or two for the mere occurence of a TLD in a URI.

For how long? Whoever applied for new generic $tld put about 180 grand
up the shelve. How much is it worth them to prevent spammers from
tasting domains and actually turn their investment into serious
customers paying bucks?


 Anyway, personal opinion: Spamassassin is currently structured to have code 
 and rules as separate things.  Putting this in the code blurs that - it's a 
 rule.  Unless there is a major performance penalty, I would move it to be 
 with the rest of the rules.  It should make maintenance easier and clearer.

It is and would not be a rule as you stated, but configuration.

Apart from that nitpick, I understand you would be in favor of a Valid
TLD option, rather than hard-coded. Noted.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Valid TLDs (was: Re: Custom rule not hitting suddenly?)

2014-09-08 Thread Karsten Bräckelmann
On Mon, 2014-09-08 at 22:37 -0400, listsb-spamassas...@bitrate.net wrote:
 On Sep 8, 2014, at 21.45, Karsten Bräckelmann guent...@rudersport.de wrote:
 
  Some discussion of the underlying issue.
  
  On Tue, 2014-09-09 at 02:59 +0200, Karsten Bräckelmann wrote:
  At the time of the 3.3.2 release, the .club TLD simply didn't exist. It
  has been accepted by IANA just recently. Of course I was conveniently
  using a trunk checkout for testing and kind of shrugged off that TLD in
  question.
  
  FWIW, this is not actually a 3.3.x issue. It's the same with 3.4.0. Yes,
  that is a *recent* TLD addition... *sigh*
  
  Unlike the util_rb_[23]tld options, the set of valid TLDs is actually
  hard-coded. It would not be a problem to make that an option, too.
  Which, on the plus side, would make it possible to propagate new TLDs
  via sa-update. Not only 3.3.x would benefit from that, but also 3.4.0
  instances. Plus, it would be generally faster anyway.
  
  There is one down side: A new dependency on Regexp::List [1]. The RE
  pre-compile one-time upstart penalty should be negligible.
  
  The question is: Is it worth it?  WILL it be worth it?
 
 pardon my possible technical ignorance here - could this potentially be
 a network test, rather than a list propagated by sa-update?  e.g.
 query dns for existence of delegation?

This cannot be queried for. Because the Valid TLDs (code|option) is what
is used to identify URIs in the first place, even from plain text links
any normal MUA would linki-fy.

Apart from that, the list of generic TLDs is not going to change *that*
frequent, that a few days between IANA acceptance, SA incorporating it,
and first occurrence in mail as sa-update takes would make a difference.

And as I hinted at before, (new) generic TLD owners have a vital
interest in their TLD not be mostly abused. If it is, it's not worth the
investment.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Valid TLDs (was: Re: Custom rule not hitting suddenly?)

2014-09-08 Thread Karsten Bräckelmann
On Mon, 2014-09-08 at 21:45 -0500, Dave Pooser wrote:
 On 9/8/14 8:45 PM, Karsten Bräckelmann guent...@rudersport.de wrote:
 
 There is one down side: A new dependency on Regexp::List [1]. The RE
 pre-compile one-time upstart penalty should be negligible.
 
 [1] Well, or a really, really f*cking ugly option that takes a
 pre-optimzed qr// blob containing the VALID_TLDS_RE.
 
 I may be biased as I've been dealing with a different CPAN dependency
 flustercluck recently (love maintainers who can't be bothered to update
 the version info so CPAN doesn't realize there's an update and I have to
 manually un/re install), but I'm a vote for the hideously ugly
 preoptimized blob over adding a new dependency.
 
 That said, I'd rather have the new dependency than keep the configuration
 embedded in the rules.
  ^
Code, not rules. Which basically is the issue here...

 So, in order of preference:
 1) Pre-optimized blob
 2) Regexp::List dependency
 3) Current method

Got ya. Both (1) and (2) would require code changes, so it's 3.4.1+ only
anyway.

Thanks.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: shouldn't spamc -L spam always create BAYES_99?

2014-09-06 Thread Karsten Bräckelmann
On Sun, 2014-09-07 at 09:09 +1200, Jason Haar wrote:
 We've got a problem with a tonne of spam getting BAYES_50 or even
 BAYES_00. We're re-training SA using spamc -L spam but it doesn't seem
 to do as much as we'd like. Sometimes it doesn't change the BAYES_
 score, and other times it might go from BAYES_50 to BAYES_80
 
 I think bayes is working (there's also a tonne of mail getting BAYES_99)
 but I'm guessing there's some learning logic I'm not aware of to
 explain why me telling SA this is spam doesn't seem to be entirely
 listened to?

The Bayesian classifier operates on tokens, not messages. So while
training a message as spam is like this is spam as you put it,
according to Bayes it's these tokens appear in spam.

For each token (think of it as words), the number of ham and spam they
appeared in and have been learned from are counted. The higher that
ratio is, the higher the probability of a message to be the same
classification for any given token found in later mail.


 So my question is: shouldn't -L spam/-L ham always make SA re-train
 the bayes more explicitly? Or is that really not possible with a single
 email message? (ie it's a statistics thing). Just trying to understand
 the backend :-)

It's statistics. Learning (increasing the number of ham or spam a token
has been seen in) has less effect for tokens seen about equally frequent
in both ham and spam, than if there already is a bias. Similarly, tokens
with high counts need more training to change overall probability, than
tokens less common in mail. IOW, words like and will never be a strong
spammyness indicator.


For more details on that entire topic of Bayes and training, I suggest
the sa-learn man page / documentation. For a closer look at the tokens
used for classification see the hammy/spammytokens Template Tags in the
M::SA::Conf docs. Both available here:

  http://spamassassin.apache.org/doc/

For ad-hoc debugging after training see the spamassassin --cf option to
add_header the token details without a need to actually add them to
every mail.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Bayes autolearn questions

2014-09-06 Thread Karsten Bräckelmann
Please use plain-text rather than HTML. In particular with that really
bad indentation format of quoting.


On Sat, 2014-09-06 at 17:22 -0400, Alex wrote:
 On Thu, Sep 4, 2014 at 1:44 PM, Karsten Bräckelmann wrote:
  On Wed, 2014-09-03 at 23:50 -0400, Alex wrote:
 
 I looked in the quarantined message, and according to the _TOKEN_
 header I've added:

 X-Spam-MyReport: Tokens: new, 47; hammy, 7; neutral, 54; spammy, 16.

 Isn't that sufficient for auto-learning this message as spam?
  
  That's clearly referring to the _TOKEN_ data in the custom header, is it
  not?
 
 Yes. Burning the candle at both ends. Really overworked.

Sorry to hear. Nonetheless, did you take the time to really understand
my explanations? It seems you sometimes didn't in the past, and I am not
happy to waste my time on other people's problems if they aren't
following thoroughly.


That has absolutely nothing to do with auto-learning. Where did you get
the impression it might?
  
   If the conditions for autolearning had been met, I understood that it
   would be those new tokens that would be learned.
 
  Learning is not limited to new tokens. All tokens are learned,
  regardless their current (h|sp)ammyness.
 
  Still, the number of (new) tokens is not a condition for auto-learning.
  That header shows some more or less nice information, but in this
  context absolutely irrelevant information.
 
 I understood new to mean the tokens that have not been seen before, and
 would be learned if the other conditions were met.

Well, yes. So what?

Did you understand that the number of previously not seen tokens has
absolutely nothing to do with auto-learning? Did you understand that all
tokens are learned, regardless whether they have been seen before?

This whole part is entirely unrelated to auto-learning and your original
question.


  Auto-learning in a nutshell: Take all tests hit. Drop some of them with
  certain tflags, like the BAYES_xx rules. For the remaining rules, look
  up their scores in the non-Bayes scoreset 0 or 1. Sum up those scores to
  a total, and compare with the auto-learn threshold values. For spam,
  also check there are at least 3 points each by header and body rules.
  Finally, if all that matches, learn.
 
 Is it important to understand how those three points are achieved or
 calculated?

In most cases, no, I guess. Though that is really just a distinction
usually easy to do based on the rule's type: header vs body-ish rule
definitions.

If the re-calculated total score in scoreset 0 or 1 exceeds the
auto-learn threshold but the message still is not -- then it is
important. Unless you trust the auto-learn discriminator to not cheat on
you.


   Okay, of course I understood the difference between points and tokens.
   Since the points were over the specified threshold, I thought those
   new tokens would have been added.
 
  As I have mentioned before in this thread: It is NOT the message's
  reported total score that must exceed the threshold. The auto-learning
  discriminator uses an internally calculated score using the respective
  non-Bayes scoreset.
 
 Very helpful, thanks. Is there a way to see more about how it makes that
 decision on a particular message?

  spamassassin -D learn

Unsurprisingly, the -D debug option shows information on that decision.
In this case limiting debug output to the 'learn' area comes in handy,
eliminating the noise.

The output includes the important details like auto-learn decision with
human readable explanation, score computed for autolearn as well as head
and body points.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Large commented out body HTML causing SA to timeout/give up/allow spam

2014-09-05 Thread Karsten Bräckelmann
On Fri, 2014-09-05 at 11:55 -0400, Justin Edmands wrote:
 We are seeing a few emails that are about a 1MB and [...]

 dbg: timing: total 46640 ms

 BUT, because the live test likely took 46 seconds, I think SA is
 giving up or something similar. The actual email run through the live
 SA instance shows no score at all.

If SA timed out, this would be reflected in your logs. Your guessing
suggests you did not check logs.

How are you passing messages to SA? Using spamc/d? With spamc the size
limit of messages it will process is 500 kByte by default. Other methods
and glue are likely to have a size limit, too.

Odds are, that message simply has not been passed to SA.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Bayes autolearn questions

2014-09-04 Thread Karsten Bräckelmann
On Wed, 2014-09-03 at 23:50 -0400, Alex wrote:

   I looked in the quarantined message, and according to the _TOKEN_
   header I've added:
   
   X-Spam-MyReport: Tokens: new, 47; hammy, 7; neutral, 54; spammy, 16.
   
   Isn't that sufficient for auto-learning this message as spam?

That's clearly referring to the _TOKEN_ data in the custom header, is it
not?

  That has absolutely nothing to do with auto-learning. Where did you get
  the impression it might?
 
 If the conditions for autolearning had been met, I understood that it
 would be those new tokens that would be learned.

Learning is not limited to new tokens. All tokens are learned,
regardless their current (h|sp)ammyness.

Still, the number of (new) tokens is not a condition for auto-learning.
That header shows some more or less nice information, but in this
context absolutely irrelevant information.


Auto-learning in a nutshell: Take all tests hit. Drop some of them with
certain tflags, like the BAYES_xx rules. For the remaining rules, look
up their scores in the non-Bayes scoreset 0 or 1. Sum up those scores to
a total, and compare with the auto-learn threshold values. For spam,
also check there are at least 3 points each by header and body rules.
Finally, if all that matches, learn.


 Okay, of course I understood the difference between points and tokens.
 Since the points were over the specified threshold, I thought those
 new tokens would have been added.

As I have mentioned before in this thread: It is NOT the message's
reported total score that must exceed the threshold. The auto-learning
discriminator uses an internally calculated score using the respective
non-Bayes scoreset.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: A rule for Phil

2014-09-04 Thread Karsten Bräckelmann
On Thu, 2014-09-04 at 13:54 -0600, Philip Prindeville wrote:
 On Sep 3, 2014, at 7:36 PM, Karsten Bräckelmann guent...@rudersport.de 
 wrote:

  header __KAM_PHIL1To =~ /phil\@example\.com/i
  header __KAM_PHIL2Subject =~ /(?:CV|Curriculum)/i
  
  Bonus points for using non-matching grouping. But major deduction of
  points for that entirely un-anchored case insensitive 'cv' substring
  match.
 
 I’d anchor both matches,

Generally correct, of course. For anchoring the To header regex, I
suggest using the To:addr variant I used in my rules. That way the
address easily can be anchored at the beginning /^ and end $/ of the
whole string, which equals the address. Without the :addr option, proper
anchoring is a real mess.

 or else amp...@example.community.org will fire.

Granted, the To header is cosmetic and does not necessarily hold the
actual recipient address. However, since example.com is the OPs domain
(so to speak), it is unlikely he'll receive mail with addresses like
that. ;)


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: correct AWL on training

2014-09-04 Thread Karsten Bräckelmann
On Thu, 2014-09-04 at 09:11 -0600, Jesse Norell wrote:
 On Thu, 2014-09-04 at 13:04 +0200, Matus UHLAR - fantomas wrote:
  On 03.09.14 15:13, Jesse Norell wrote:

 Both today and in the past I've looked at some FP's that scored very
   high on AWL.  At least today I dug up the old messages that caused AWL
   to get out of line, and trained them as ham.  AWL's scores still show
   the high scores on those (in this case I manually corrected AWL).  It
   sure seems like manual training should at minimum remove the incorrect
   score from AWL, if not actually make an adjustment in the opposite
   direction.

I can see how one could wish for this.

However, keep in mind those are entirely unrelated sub-systems. The AWL
really only is a rather simple historic score-averager.

In this context it is also important to note, that sa-learn is Bayes
only. Any other type of reporting is spamc or spamassassin, including
AWL manipulation. The spamassassin executable notably is the only one
that actually can handle both.

The AWL manipulating options are rather limited, offering addition of a
high scoring positive or negative entry, or plain removal of an address.
In particular unlike Bayes, AWL doesn't work on a per-message basis.
Forgetting a single message's history entry is not supported.


  spamassassin has options for manipulating adress list:
  --add-to-whitelist --add-to-blacklist --remove-from-whitelist
  --add-addr-to-whitelist --add-addr-to-blacklist --remove-addr-from-whitelist
  
  and you can clean up AWL by using sa-awl.
 
   I can as an admin, but pop/imap users can't.  They can access the
 spam/ham training, it just doesn't correct the AWL data any.  In this

So you implemented a feedback / training mechanism for Bayes for your
POP or IMAP users. SA doesn't provide it.

 case I'm looking at, a few messages came in first that got AWL way off,
 and now training it as ham (which is hard enough to get users to do)
 doesn't help the situation.  (Some of our systems allow the user access
 to whitelist, but unfortunately this one doesn't - they can't fix it.)

Bayes training will have an effect of ~5.5 at max, which is the extreme
between BAYES_00 and 999. Real life effect of training is commonly about
half of that max. This is likely to not suffice way off AWL scores.
Besides you're trying to correct AWL by Bayes training.


The question is: Why was the AWL score way off in the first place?

In your FP case, why have (more than one?) messages from that sender
address, originating from a given net-block been classified spam before?
Even worse, given AWL now was way off and pulled the score above
threshold, the previous messages recorded in AWL are not just spam, but
with a high score. Again, why?


 Ie. after training, AWL had score of ~47 from 7 messages.  Seems like
   those FP scores should be subtracted, and even another -5 per message
   trained wouldn't hurt.  Likewise, FN should adjust AWL upwards on manual
   training, no?
  
  I am not sure how should the manual training be done when talking about AWL.
  The only way I think is to remove the address from AWL.
 
   Just adjust the score would be another option.  AWL, you got it
 wrong, lets take the score the other direction.  (or at least undue the
 mistake/damage it just did)  You could have a config option for how much
 adjustment to make in the other direction (maybe 3 to 5ish?).

-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: correct AWL on training

2014-09-04 Thread Karsten Bräckelmann
On Fri, 2014-09-05 at 01:05 +0200, Karsten Bräckelmann wrote:
 The AWL manipulating options are rather limited, offering addition of a
 high scoring positive or negative entry, or plain removal of an address.
 In particular unlike Bayes, AWL doesn't work on a per-message basis.
 Forgetting a single message's history entry is not supported.

In related news: The AWL plugin was enabled by default in 3.1 and 3.2,
disabled by default again since 3.3.

TxRep is a proposed replacement (see bugzilla). It might be worth
evaluating whether it better addresses the features you'd benefit from
in this case, including forgetting or correcting per-message entries.
Since it still is under development, even feature requests or discussing
these issues for TxRep might be worth it.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: A rule for Phil

2014-09-03 Thread Karsten Bräckelmann
On Wed, 2014-09-03 at 12:30 +0200, Luciano Rinetti wrote:
 Thank You for the answer Karsten,
 you have right, Phil doesn't exists, (as example.com) but i hide the
 real address for obvious reasons, and it is a role email that i want
 will receive only mail with subject CV or Curriculum and all the
 general mail will be treated and scored as spam.
 My intention are not top secret, i will be glad even only if you
 address me to the SA conf docs or the rule-writing wiki.

Let me google that for you. The first result should be the SA wiki
WritingRules page as a starter.

  http://lmgtfy.com/?q=spamassassin+rule+writing


 Il 03/09/2014 05:21, Karsten Bräckelmann ha scritto:
  On Mon, 2014-09-01 at 07:36 +0200, Luciano Rinetti wrote:

   I need a rule that, when a message is sento to p...@example.com
   and the Subject contains CV or Curriculum, scores the message with -9
   and a rule that, when a message is sent to to p...@example.com
   and the Subject doesn't contains CV or Curriculum, scores the message 
   with 7

  The specified criteria are trivial, and can be easily translated into
  rules. Reading the SA conf docs and maybe some of the rule-writing wiki
  docs should enable the reader to do exactly that. (Hint: meta rules)

Oh well, here goes. Untested.

header __PHIL_TOTo:addr =~ /phil\@example.com/i
header __PHIL_SUBJ  Subject =~ /\b(cv|curriculum)\b/i

meta PHIL_CURRICULUM  __PHIL_TO  __PHIL_SUBJ
describe PHIL_CURRICULUM  CV for Phil
scorePHIL_CURRICULUM  -2

meta PHIL_NOT_CURRICULUM  __PHIL_TO  !__PHIL_SUBJ
describe PHIL_NOT_CURRICULUM  Not a CV for Phil
scorePHIL_NOT_CURRICULUM  1

Do note though, that this approach is NOT fool-proof. Messages
containing a CV still can end up classified spam for various reasons.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: A rule for Phil

2014-09-03 Thread Karsten Bräckelmann
On Wed, 2014-09-03 at 17:18 -0400, Kevin A. McGrail wrote:
 On 9/3/2014 5:14 PM, Karsten Bräckelmann wrote:
The specified criteria are trivial, and can be easily translated into
rules. [...]

  header __PHIL_TOTo:addr =~ /phil\@example.com/i
  header __PHIL_SUBJ  Subject =~ /\b(cv|curriculum)\b/i
 
  meta PHIL_CURRICULUM  __PHIL_TO  __PHIL_SUBJ
  describe PHIL_CURRICULUM  CV for Phil
  scorePHIL_CURRICULUM  -2
 
  meta PHIL_NOT_CURRICULUM  __PHIL_TO  !__PHIL_SUBJ
  describe PHIL_NOT_CURRICULUM  Not a CV for Phil
  scorePHIL_NOT_CURRICULUM  1

 It appears I did not email the list my response but should provide an 
 interesting exercise if only to see how similar our approach was:

Which isn't much of a surprise. It's practically the very translation of
the stated requirements into simple logic and regex header rules. ;)


 header __KAM_PHIL1To =~ /phil\@example\.com/i
 header __KAM_PHIL2Subject =~ /(?:CV|Curriculum)/i

Bonus points for using non-matching grouping. But major deduction of
points for that entirely un-anchored case insensitive 'cv' substring
match.

(As a matter of principle, since that's a seriously short substring
match. Granted, that char combination is pretty rare in dict/words.)


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Bayes autolearn questions

2014-09-02 Thread Karsten Bräckelmann
On Tue, 2014-09-02 at 21:11 -0400, Alex wrote:
 I have a spamassassin-3.4 system with the following bayes config:
 
 required_hits 5.0
 rbl_timeout 8
 use_bayes 1
 bayes_auto_learn 1
 bayes_auto_learn_on_error 1
 bayes_auto_learn_threshold_spam 9.0
 bayes_expiry_max_db_size 950
 bayes_auto_expire 0
 
 However, spam with scores greater than 9.0 aren't being autolearned:

http://spamassassin.apache.org/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html


 Sep  2 21:01:51 mail01 amavis[25938]: (25938-10)
 header_edits_for_quar: bmu011...@bmu-011.hichina.com -
 bestd...@example.com, Yes, score=16.519 tag=-200 tag2=5 kill=5
 tests=[BAYES_50=0.8, KAM_LAZY_DOMAIN_SECURITY=1, KAM_LINKBAIT=5,
 LOC_DOT_SUBJ=0.1, LOC_SHORT=3.1, RCVD_IN_BL_SPAMCOP_NET=1.347,
 RCVD_IN_BRBL_LASTEXT=1.449, RCVD_IN_PSBL=2.3,
 RCVD_IN_UCEPROTECT1=0.01, RCVD_IN_UCEPROTECT2=0.01, RDNS_NONE=0.793,
 RELAYCOUNTRY_CN=0.1, RELAYCOUNTRY_HIGH=0.5, SAGREY=0.01] autolearn=no
 autolearn_force=no
 
 I've re-read the autolearn section of the docs,

The one I linked to above?

 and don't see any reason why this 16-point email wouldn't have any new
 tokens to be learned?

Rules with certain tflags are ignored when determining whether a message
should be trained upon. Most notably here BAYES_xx.

Moreover, the auto-learning decision occurs using scores from either
scoreset 0 or 1, that is using scores of a non-Bayes scoreset. IOW the
message's score of 16 is irrelevant, since the auto-learn algorithm uses
different scores per rule.

Next safety net is requiring at least 3 points each from header and body
rules, unless autolearn_force is enabled. Which it is not in your
sample.

Either of those could have prevented auto-learning.


Also, according to your wording, you seem to think in terms of (number
of) new tokens to be learned. Which has nothing in common with
auto-learning.

(Even worse, new tokens would strongly apply to random gibberish
strings, hapaxes in Bayes context. Which are commonly ignored in Bayes
classification.)


 I looked in the quarantined message, and according to the _TOKEN_
 header I've added:
 
 X-Spam-MyReport: Tokens: new, 47; hammy, 7; neutral, 54; spammy, 16.
 
 Isn't that sufficient for auto-learning this message as spam?

That has absolutely nothing to do with auto-learning. Where did you get
the impression it might?


 I just wanted to be sure this is just a case of not enough new points
 (tokens?) for the message to be learned, and that I I wasn't doing
 something wrong.

Points: aka score, used in the context of per-rule (per-test) and
overall score classifying a message based on the required_score setting.

Token: think of it as word used by the Bayesian classifier sub-system.
In practice, it is more complicated than simply space separated words.
Context (e.x. headers) and case might be taken into account, too.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Bayes autolearn questions

2014-09-02 Thread Karsten Bräckelmann
On Tue, 2014-09-02 at 20:22 -0600, LuKreme wrote:
 On 02 Sep 2014, at 19:11 , Alex mysqlstud...@gmail.com wrote:
 
  However, spam with scores greater than 9.0 aren't being autolearned:
 
 I believe the score threshold is the base score WITHOUT bayes.
 
 Try running the email through with a -D flag and see what you get.
 
 (And that is only a partial answer, the threshold number ignores
 certain classes of tests beyond bayes,but I don't remember which ones.
 It's unfortunate that the learn_threshold_spam uses a number that
 appears to be related to the spam score, because it isn't.

It is. Using the accompanying, non-Bayes score-set. To avoid direct
Bayes self-feeding, and other rules indirect self-feeding due to Bayes-
enabled scores.

BTW, if one knows of that mysterious (bayes_auto_) learn_threshold_spam
you mentioned, one found the AutoLearnThreshold doc mentioning exactly
that: Bayes auto-learning is based on non-Bayes scores.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: A rule for Phil

2014-09-02 Thread Karsten Bräckelmann
On Mon, 2014-09-01 at 07:36 +0200, Luciano Rinetti wrote:
 I need a rule that, when a message is sento to p...@example.com
 and the Subject contains CV or Curriculum, scores the message with -9

Scoring the message with $number is impossible and not how SA works.
Triggering a rule with a negative score (e.x. -9) is possible.

 and a rule that, when a message is sent to to p...@example.com
 and the Subject doesn't contains CV or Curriculum, scores the message 
 with 7

Same. Won't score the message with 7, but can trigger a rule worth
some points.


The specified criteria are trivial, and can be easily translated into
rules. Reading the SA conf docs and maybe some of the rule-writing wiki
docs should enable the reader to do exactly that. (Hint: meta rules)

However, since this request is just too simple, and way too easy too
shoot one's own foot, I'll spend more time on this explanation than
simply dumping the requested flawed rules would take.

What are you actually after? What is your problem?

And why would Phil distinguish that strong between Subject tagged mail
and general mail to him? Sure, because it's not phil but a role account.
But you chose to disguise the purpose, so it's harder for us to help
you.

It's easier, if you don't try to hide your actual question.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Bayes autolearn questions

2014-09-02 Thread Karsten Bräckelmann
On Tue, 2014-09-02 at 21:16 -0600, LuKreme wrote:
 On 02 Sep 2014, at 20:50 , Karsten Bräckelmann guent...@rudersport.de wrote:
  On Tue, 2014-09-02 at 20:22 -0600, LuKreme wrote:

  I believe the score threshold is the base score WITHOUT bayes.
  
  Try running the email through with a -D flag and see what you get.
  
  (And that is only a partial answer, the threshold number ignores
  certain classes of tests beyond bayes,but I don't remember which ones.
  It's unfortunate that the learn_threshold_spam uses a number that
  appears to be related to the spam score, because it isn't.
  
  It is. Using the accompanying, non-Bayes score-set. To avoid direct
  Bayes self-feeding, and other rules indirect self-feeding due to Bayes-
  enabled scores.
  
  BTW, if one knows of that mysterious (bayes_auto_) learn_threshold_spam
  you mentioned, one found the AutoLearnThreshold doc mentioning exactly
  that: Bayes auto-learning is based on non-Bayes scores.
 
 But that is not the case, You can have a score without bayes that
 exceeds the threshold and still have the message not auto learned.

True.

I chose to not repeat myself highlighting the details and mentioning the
constraint of header and body rules' points. See my other post half an
hour earlier to this thread. And the docs.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Add spamassassin triggered rules in logs when email is blocked

2014-08-29 Thread Karsten Bräckelmann
On Fri, 2014-08-29 at 11:27 -0400, Karl Johnson wrote:
 I'm using amavisd-new-2.9.1 and SpamAssassin v3.3.1. I would like to
 know if it's possible to add Spamassassin triggered rules when an
 email is blocked because I discard the email when it's spam and I want
 to know why it's blocked (which rules).

Wrong place, that is an Amavis question. SA does not reject, discard or
otherwise block mail. Amavis does, based on the SA score.


 For now I only have the score (hits) in maillog:
 
 Aug 24 04:04:36 relais amavis[3475]: (03475-08) Blocked SPAM
 {DiscardedInternal}, MYNETS LOCAL [205.0.0.0]:54459 [205.0.0.0]
 bluew...@zzz.zzz.ca - z...@zzz.ca, Message-ID:
 e1xlsmo-0002nt...@zz.zz.ca, mail_id: 4RZ-Vm0_iZmi, Hits: 13.573,
 size: 4269, 10089 ms

That log line is generated by Amavis. SA has no control of its contents.


 I would like to add in logs for example:
 
 DATE_IN_FUTURE_06_12=0.001, DCC_CHECK=4,
 SPF_PASS=-0.001,TVD_SPACE_RATIO=0.001
 
 Is that possible?

-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Advice on how to block via a mail domain in maillog

2014-08-29 Thread Karsten Bräckelmann
On Fri, 2014-08-29 at 12:43 -0600, Philip Prindeville wrote:
 On Aug 29, 2014, at 6:45 AM, Kevin A. McGrail kmcgr...@pccc.com wrote:
  On 8/29/2014 5:48 AM, emailitis.com wrote:

   I have a lot of Spam getting into our mail servers where the common
   thread is cloudapp

You guys realize cloudapp.net is Microsoft Azure, don't you?


   And the hyperlinks in the emails are http://expert.cloudapp.net/.
   
   Please could you advise on how I can block by the information on
   the maillog on that, or using a rule which checks the URL to include
   the above thread?

SA does not block.


  There is a new feature in trunk that I believe will help you easily
  called URILocalBL.pm

 That should do it.
 
 There’s a configuration example in the bug, and POD documentation in
 the plugin, but in this particular case you’d do something like:
 
 uri_block_cidr L_BLOCK_CLOUDAPP   191.237.208.246
 body L_BLOCK_CLOUDAPP eval:check_uri_local_bl()

That seem an overly complicated variant of a simple uri regex rule. And
it really depends on the IP to match a URI? And manual looking it up?

  uri URI_EXPERT_CLOUDAPP  m~^https?://expert\.cloudapp\.net$~


 describe L_BLOCK_CLOUDAPP Block URI’s pointing to expert.cloudapp.net
 score L_BLOCK_CLOUDAPP5.0

SA does not block. *sigh*


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: remove_header not working?

2014-08-29 Thread Karsten Bräckelmann
On Fri, 2014-08-29 at 11:46 +0200, Axb wrote:
 Those reports are added by Exim's interface which does not seem to 
 respect the local.cf directives.

Exim accessing SA template tags?


 On 08/29/2014 11:29 AM, Fürtbauer Wolfgang wrote:
  unfortunatelly not, X-Spam-Reports are still there

If the option report_safe 0 is set, SA automatically adds a Report
header, though only to spam. Equivalent

  add_header spam  Report _REPORT_


The following is not only added to ham, but its contents are not the
_REPORT_ template tag but resemble the default report template, the
body text used for spam with report_safe 1.

There is no template tag to access the report template. Thus, this
header must be defined somewhere in the configuration, complete with all
that text, embedded \n newlines and _PREVIEW_ and _SUMMARY_ template
tags.

  X-Spam-Report: Spam detection software, running on the system
hausmeister.intern.luisesteiner.at,
has NOT identified this incoming email as spam.  The original
message has been attached to this so you can view it or label
similar future email.  If you have any questions, see
postmaster for details.
 
Content preview:  [...]

Content analysis details:   (-221.0 points, 5.0 required)
 
 pts rule name  description
 -- 
  --
-100 USER_IN_WHITELIST  From: address is in the user's white-list


  X-Spam-Report: Software zur Erkennung von Spam auf dem Rechner
aohsupport02.asamer.holding.ah

Are there really *two* X-Spam-Report headers?

Also, why is this one in German? SA doesn't mix languages during a
single run.

Why do the hostnames differ?

And, well, which hostmaster fat-fingered that ccTLD?


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Spam info headers

2014-08-29 Thread Karsten Bräckelmann
On Fri, 2014-08-29 at 00:30 -0400, Alex wrote:
 Regarding report_safe, the docs say it can only be applied to spam. Is
 that correct?

Yes, it only applies to spam. It defines whether classified spam will be
attached to a newly generated reporting message, or only modified by
adding some X-Spam headers.

Ham will never get wrapped in another message by SA...


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: no subject tagging in case of X-Spam-Status: Yes

2014-08-29 Thread Karsten Bräckelmann
On Fri, 2014-08-29 at 12:02 +0200, Reindl Harald wrote:
 Am 29.08.2014 um 04:03 schrieb Karsten Bräckelmann:

  Now, moving forward: I've had a look at the message diffs. Quite
  interesting, and I honestly want to figure out what's happening.
 
 it looks really like spamass-milter is responsible
 
 in the second version below it whines it can't extract
 the score to decide if it's above reject and so it
 really looks like the milter heavily relies on headers

Yay for case in-sensitive parsing...

 found that out much later last night by plaing with headers in general
 
 spamass-milter[14891]: Could not extract score from Yes: Score=5.7, 
 Tag-Level=5.0, Block-Level=10
 
 add_header all Status _YESNO_, score=_SCORE_, tag-level=_REQD_, block-level=10
 add_header all Status _YESNO_, Score=_SCORE_, Tag-Level=_REQD_, Block-Level=10

If you use the SA default Status header, or at least the prefix
containing score and required, is header rewriting retained by the
milter without the Flag header?

  add_header all Status _YESNO_, score=_SCORE_ required=_REQD_ ...

Given that log line, a likely explanation simply is that the milter
needs to determine the spam status, to decide which SA generated headers
to apply to the message. Your choice of custom Status header is not what
the milter expects, and thus needs to resort to the simple Flag header.

(Note the comma after yes/no, but no comma between score and required.)


  First of all, minus all those different datetime strings, IDs and
  ordering, the real differences are
  
-Subject: [SPAM] Test^M
-X-Spam-Flag: Yes^M
  
+Subject: Test^M
  
  So it appears that only the sample with add_header spam Flag has the
  Subject re-written.
 
 correct
 
  However, there's something else going on. When re-writing the Subject
  header, SA adds an X-Spam-Prev-Subject header with the original. Which
  is clearly missing.
 
 the version is killed in smtp_header_checks which is also
 the reason that i started to play around with headers
 
 nobody but me has a reason to know exact versions of running software

Previous-Subject, not Version.

I mentioned this specifically, because the absence of the Previous
Subject header with Subject rewrite clearly shows, SA generated headers
are not unconditionally added to the message, but single headers are
cherry picked.

IOW, header rewriting does work without the Flag header. It is the glue
that decides whether to inherit the rewritten header, and outright
ignores the Previous Subject header.


  Thus, something else has a severe impact on which headers are added or
  modified. In *both* cases, there is at least one SA generated header
  missing and/or SA modified header not preserved.

-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: formatting of report headers

2014-08-28 Thread Karsten Bräckelmann
On Thu, 2014-08-28 at 11:08 +0200, Reindl Harald wrote:
 is it somehow possible to get line-breaks in the
 report headers to have them better readable?

SA inserts line-breaks by default, to keep headers below 80 chars wide.


 report_safe 0
 clear_headers
 add_header spam Flag _YESNO_
 add_header all Status _YESNO_, score=_SCORE_/_REQD_, tests=_TESTS_, 
 report=_REPORT_

 on the shell it looks like this

What you get in the shell is precisely what SA returns -- to the shell
or any other calling process. Any reformatting or re-flow of multiline
headers has been done by other tools.


 X-Spam-Status: No, score=4.3/5.0,
 tests=ADVANCE_FEE_4_NEW,ADVANCE_FEE_4_NEW_MONEY,ADVANCE_FEE_5_NEW,ADVANCE_FEE_5_NEW_MONEY,ALL_TRUSTED,BAYES_99,BAYES_999,DEAR_SOMETHING,DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,LOTS_OF_MONEY,T_MONEY_PERCENT,URG_BIZ,
 report=
 * -2.0 ALL_TRUSTED Passed through trusted hosts only via SMTP
 *  3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
 *  [score: 1.]

That long _TESTS_ string without line-breaks is due to the very long
_REPORT_ in that header. If you add a dedicated Report header, the
Status header and its list of tests will be wrapped appropriately, too.

FWIW, SA even generates the Report header by default with your setting
of report_safe 0. Not in your case, because you chose to clear_headers
and manually define almost identical versions to the default headers.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Certain types of spam seem to get through SA

2014-08-28 Thread Karsten Bräckelmann
On Thu, 2014-08-28 at 09:15 -0600, LuKreme wrote:
 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on mail.covisp.net
 X-Spam-Level: *
 X-Spam-Status: No, score=1.7 required=5.0 tests=URIBL_BLACK autolearn=no
   version=3.3.2

 X-Spam-Status: No, score=-0.0 required=5.0 tests=SPF_HELO_PASS,SPF_PASS
   autolearn=ham version=3.3.2

Bayes and auto-learning are enabled, yet there are no BAYES_XX rules hit
in either sample. Something seems broken.

(Not a first time poster, so I just assume the Bayes DB isn't fresh.)


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: formatting of report headers

2014-08-28 Thread Karsten Bräckelmann
On Thu, 2014-08-28 at 21:43 +0200, Reindl Harald wrote:
 Am 28.08.2014 um 19:11 schrieb Karsten Bräckelmann:

  FWIW, SA even generates the Report header by default with your setting
  of report_safe 0. Not in your case, because you chose to clear_headers
  and manually define almost identical versions to the default headers.
 
 no, it don't

Yes, it does.

Read my comment again, carefully. And see the docs, option report_safe
in the section Basic Message Tagging Options.

  http://spamassassin.apache.org/doc/Mail_SpamAssassin_Conf.html


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: formatting of report headers

2014-08-28 Thread Karsten Bräckelmann
On Thu, 2014-08-28 at 21:43 +0200, Reindl Harald wrote:
 Am 28.08.2014 um 19:11 schrieb Karsten Bräckelmann:

  FWIW, SA even generates the Report header by default with your setting
  of report_safe 0. Not in your case, because you chose to clear_headers
  and manually define almost identical versions to the default headers.

More detail, in addition to my other reply.

 # header configuration
 fold_headers 1
 report_safe 0

 If this option is set to 0, [...]. In addition, a header named
  X-Spam-Report will be added to spam.  -- M::SA::Conf docs

 X-Spam-Status: No, score=0.3 required=5.0 tests=BAYES_50,CUST_DNSBL_2,
   
 CUST_DNSBL_5,CUST_DNSWL_7,DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,SPF_SOFTFAIL
   autolearn=disabled version=3.4.0

Not spam, no X-Spam-Report header.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Reporting to SpamCop

2014-08-28 Thread Karsten Bräckelmann
On Thu, 2014-08-28 at 16:14 -0500, Chris wrote:
 I'm having an issue with getting SA 3.4.0 when run as spamassassin -D -r
 to report spam to SpamCop. The errors I'm seeing are:

Ignoring the Perl warnings for now.

 In my v310.pre file I have:
 
 loadplugin Mail::SpamAssassin::Plugin::SpamCop 
 /usr/local/share/perl/5.18.2/Mail/SpamAssassin/Plugin/SpamCop.pm

It should never be necessary to provide the (optional) filename argument
with stock SA plugins. Even worse, absolute paths will eventually be
harmful.

 I have set the SpamCop from and to addresses in the SpamCop.pm file:

The Perl modules are no user-serviceable parts. Do not edit them.

Moreover, the SpamCop plugin provides the spamcop_(from|to)_address
options to set these in your configuration. See

  http://spamassassin.apache.org/doc/Mail_SpamAssassin_Plugin_SpamCop.html

 setting = 'cpoll...@example.com',
 setting = 'submit.exam...@spam.spamcop.net',

Wait... What exactly did you edit?

The only instances of 'setting' in SpamCop.pm are the ones used to
register SA options. Did you replace the string spamcop_from_address
with your email address?

I have a gut feeling the Perl warnings will disappear, if you revert any
modifications to the SpamCop.pm Perl module and set the options in your
configuration instead...


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: writing own rbl rules

2014-08-28 Thread Karsten Bräckelmann
On Fri, 2014-08-29 at 00:22 +0200, Reindl Harald wrote:
 the simple answer to my question would have been no, in no case SA does
 any RBL check if the client is from the same network range and there is
 no way to change that temporary even for development [...]

That would have been simpler indeed, but that also would have been
wrong.


 if there is no hop before and hence no received headers before
 there is still a known IP - the one and only and in that case
 the currently connection client - there is no reason not fire
 a DNSBL/DNSWL against that IP

SA is not the SMTP server, it has no knowledge of the connection's
remote IP. SA depends on the Received headers added by the internal
network's SMTP server (or its milter) to get that information.


  Besides: SA is not an SMTP. It does not add the Received header. And it
  absolutely has to inspect headers, whether you like that or not. That is
  how SA determines exactly that last, trustworthy, physical IP. And for
  that, trusted and internal networks need be correct, so by extension
  external networks also are correct.
 
 and the machine SA is running on receiving the message adds that
 header which is in case of direct testing the one and only and
 so trustable

Your configuration stated that machine is not trustable.

  In particular, your MX, your first internal relay, absolutely MUST be
  trusted by SA. That is the SMTP relay identifying the sending host,
  complete with IP and rDNS.
 
 again: the machine running SA *is the MX*

Correct (even though it is irrelevant whether it is or not). So don't
configure SA to not trust that machine, and include at the very least
that IP in your trusted_networks.

Your configuration stated that machine is not trustable.


  Received headers before that simply CANNOT be trusted. There is no way
  to guarantee the host they claim to have received the message from is
  legit
 
 in case running postfix with SA as milter *there are no* Received
 headers *before* because there is nobody before

There almost always is at least one Received header before, the sender's
outgoing SMTP server.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: no subject tagging in case of X-Spam-Status: Yes

2014-08-28 Thread Karsten Bräckelmann
On Fri, 2014-08-29 at 00:30 +0200, Reindl Harald wrote:
 besides the permissions problem after the nightly sa-update the reason
 was simply clear_headers without add_header spam Flag _YESNO which
 is entirely unexpected behavior

No, that is not the cause.

$ echo -e Subject: Foo\n | ./spamassassin | grep Subject
Subject: [SPAM] Foo
X-Spam-Prev-Subject: Foo

$ cat rules/99_DEVEL.cf
required_score -999# regardless of score, classify spam
   # to enforce header rewriting
clear_headers
rewrite_header Subject [SPAM]


Besides, your own reply to my first post to this thread on Mon also
shows this claim to be false. The output of the command I asked you to
run clearly shows clear_headers in your config being in effect and a
rewritten Subject.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: writing own rbl rules

2014-08-28 Thread Karsten Bräckelmann
On Fri, 2014-08-29 at 01:06 +0200, Reindl Harald wrote:
 the question was just how can i enforce RBL tests inside the own LAN

 the question was just how can i enforce RBL tests inside the own LAN

 the question was just how can i enforce RBL tests inside the own LAN

RBL tests cannot be enforced. Internal and trusted networks settings
need to be configured correctly to match the RBL test's scope, in your
case last-external.

If there are trusted relays found in the Received headers, and the first
trusted one's connecting relay is external (not in the internal_networks
set), then an RBL test for last-external will be run.

This is entirely unrelated to own LAN or network range.


  Received headers before that simply CANNOT be trusted. There is no way
  to guarantee the host they claim to have received the message from is
  legit
 
  in case running postfix with SA as milter *there are no* Received
  headers *before* because there is nobody before
  
  There almost always is at least one Received header before, the sender's
  outgoing SMTP server
 
 *no no no and no again*
 
 there is no Received header before because a botnet zombie don't use
 a outgoing SMTP server

I said almost always, with direct-to-MX delivery being the obvious
exception. Possible with botnet spam, yes, but too easy to detect. Thus,
botnet zombies frequently forge Received headers.

(Besides, in your environment SA won't see much botnet spam anyway.
Spamhaus PBL as first level of defense in your Postfix configuration
will reject most of them. But that's not the point here.)


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: no subject tagging in case of X-Spam-Status: Yes

2014-08-28 Thread Karsten Bräckelmann
On Fri, 2014-08-29 at 01:23 +0200, Reindl Harald wrote:
 Am 29.08.2014 um 01:20 schrieb Karsten Bräckelmann:
  On Fri, 2014-08-29 at 00:30 +0200, Reindl Harald wrote:
   besides the permissions problem after the nightly sa-update the reason
   was simply clear_headers without add_header spam Flag _YESNO which
   is entirely unexpected behavior
  
  No, that is not the cause.
  
  $ echo -e Subject: Foo\n | ./spamassassin | grep Subject
  Subject: [SPAM] Foo
  X-Spam-Prev-Subject: Foo
  
  $ cat rules/99_DEVEL.cf
  required_score -999# regardless of score, classify spam
 # to enforce header rewriting
  clear_headers
  rewrite_header Subject [SPAM]
  
  Besides, your own reply to my first post to this thread on Mon also
  shows this claim to be false. The output of the command I asked you to
  run clearly shows clear_headers in your config being in effect and a
  rewritten Subject
 
 i verfied that 20 times in my environment
 
 removing the line add_header spam Flag _YESNO_ and no tagging
 maybe the combination of spamass-milter and SA but it's fact

So far I attributed most of your arguing to being stubborn and
opinionated. Not any longer.

Now you're outright lying.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: writing own rbl rules

2014-08-28 Thread Karsten Bräckelmann
On Fri, 2014-08-29 at 01:59 +0200, Reindl Harald wrote:
 Am 29.08.2014 um 01:51 schrieb Karsten Bräckelmann:
  On Fri, 2014-08-29 at 01:06 +0200, Reindl Harald wrote:

   the question was just how can i enforce RBL tests inside the own LAN
  
  RBL tests cannot be enforced. Internal and trusted networks settings
  need to be configured correctly to match the RBL test's scope, in your
  case last-external.
  
  If there are trusted relays found in the Received headers, and the first
  trusted one's connecting relay is external (not in the internal_networks
  set), then an RBL test for last-external will be run.
  
  This is entirely unrelated to own LAN or network range
 
 that may all be true for blacklists and default RBL rules
 
 it is no longer true in case of 4 internal WHITELISTS which you
 want to use to LOWER scores to reduce false positives while
 otherwise bayes may hit - such traffic can also come from
 the internal network

There is absolutely no difference between black and whitelists. With the
only, obvious exception of the rule's score.

So, yes, it still is true in the case of (internal) whitelists.


Besides that, you are (still) confusing SA *_networks settings with the
local network topology. They are loosely related, but don't have to
match.

You can easily run RBL tests against IPs from within the local network
and treat them like any other sending SMTP client, by  (a) excluding
them from the appropriate *_networks settings, and  (b) define the RBL
test accordingly. If you want to query for the last-external, it has to
be the last external relay according to the configuration.

BTW, unless the set of IPs to whitelist is permanently changing, it is
much easier to write a negative score rule based on the X-Spam-Relays-*
pseudo-headers. This also has the benefit of being highly flexible, not
depend on trust borders and allow to maintain internal_networks matching
the LAN topology.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: no subject tagging in case of X-Spam-Status: Yes

2014-08-28 Thread Karsten Bräckelmann
On Fri, 2014-08-29 at 02:15 +0200, Reindl Harald wrote:
 look at the attached zp-archive and both messages
 produced with the same content before you pretend
 others lying damned - to make it easier i even
 added a config-diff

But no message diff. ;)

 and now what?
 
 maybe you should accept that even new users are
 no idiots and know what they are talking about

Please accept my apologies. It appears something else is going on here,
and you in fact did not lie.

I'd like to add, though, that I do *not* assume new users to be idiots.
Plus, I generally spend quite some time on helping others fixing their
problems, including new users, as you certainly have noticed.


Now, moving forward: I've had a look at the message diffs. Quite
interesting, and I honestly want to figure out what's happening.

First of all, minus all those different datetime strings, IDs and
ordering, the real differences are

  -Subject: [SPAM] Test^M
  -X-Spam-Flag: Yes^M

  +Subject: Test^M

So it appears that only the sample with add_header spam Flag has the
Subject re-written.

However, there's something else going on. When re-writing the Subject
header, SA adds an X-Spam-Prev-Subject header with the original. Which
is clearly missing.

Thus, something else has a severe impact on which headers are added or
modified. In *both* cases, there is at least one SA generated header
missing and/or SA modified header not preserved.

Definitely involved: Postfix, spamass-milter, SA. And probably some
other tool rewriting the message / reflowing headers, as per some
previous posts (and the X-Spam-Report header majorly inconvenienced by
re-flowing headers).

Regarding SA and the features in question: There is no different
behavior between calling the plain spamassassin script and using
spamc/d. There is absolutely nothing in SA itself that could explain the
discrepancy in Subject rewriting, nor the missing X-Spam-Prev-Subject
header.

My best bet would be on the SA invoking glue, not accepting or
overwriting headers as received by SA. Which tool that actually is, I
don't know. But I'd be interested to hear about it, if you find out.


(The additional empty line between message headers and body in the case
without X-Spam-Flag header most likely is just copy-n-paste body. Or
possibly another artifact of some tool munging messages.)


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: no subject tagging in case of X-Spam-Status: Yes

2014-08-28 Thread Karsten Bräckelmann
On Fri, 2014-08-29 at 02:15 +0200, Reindl Harald wrote:
 look at the attached zp-archive [...]

Since I already had a closer look at the contents including your local
cf, and I am here to offer help and didn't mean no harm, some comments
regarding the SA config.


 # resolves a bug with milter always triggering a wrong informational header
 score UNPARSEABLE_RELAY 0

See the RH bug you filed and its upstream report. Do you still need
that? This would be the first instance of continued triggering of that
test I ever encountered.


 # disable most builtin DNSBL/DNSWL to not collide with webinterface settings
 score __RCVD_IN_SORBS 0
 score __RCVD_IN_ZEN 0
 score __RCVD_IN_DNSWL 0

Rules starting with double-underline are non-scoring sub-rules.
Assigning a zero score doesn't disable them like it does with regular
rules. In the case of RBL sub-rules like the above, it does not prevent
DNS queries. It is better to

  meta __FOO 0

overwrite the sub-rule, rather than set a score that doesn't exist.


 # unconditional sender whitelists
 whitelist_from *@apache.org
 whitelist_from *@bipa.co.at
 whitelist_from *@centos.org
 whitelist_from *@dovecot.org
  [...]

Unconditional whitelisting generally is a bad idea and might appear in
forged addresses.

If possible, it is strongly suggested to use whitelist_from_auth, or at
least whitelist_from_rcvd (which requires *_networks be set correctly).


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Spam info headers

2014-08-27 Thread Karsten Bräckelmann
On Wed, 2014-08-27 at 17:07 -0400, Alex wrote:
 I've set up a local URI DNSBL and I believe there are some FPs that
 I'd like to identify. I've currently set up amavisd to set
 $sa_tag_level_deflt at a value low enough that it always produces the
 X-Spam-Status header on every email.
 
 It will show LOC_URIBL=1 in the status, but is it possible to have
 it somehow report/show the domain that caused the rule to fire, in the
 same way that it can be done with spamassassin directly on the
 command-line using -t?

The URIs [1] are automatically added to the uridnsbl rule's description
for _REPORT_ and _SUMMARY_ template tags. The latter is identical to the
additional summary at the end with the -t option, the first one is
suitable for headers.

  add_header spam  Report _REPORT_

That Report header is set by default with report_safe 0 (stock SA, not
Amavis).


[1] Actually lists a single one only, if multiple URIs are hit. That's a
comment documented TODO item.

-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Spam info headers

2014-08-27 Thread Karsten Bräckelmann
On Wed, 2014-08-27 at 21:37 -0400, Alex wrote:
 On Wed, Aug 27, 2014 at 6:18 PM, Karsten Bräckelmann guent...@rudersport.de 
 wrote:

  The URIs [1] are automatically added to the uridnsbl rule's description
  for _REPORT_ and _SUMMARY_ template tags. The latter is identical to the
  additional summary at the end with the -t option, the first one is
  suitable for headers.
 
add_header spam  Report _REPORT_
 
  That Report header is set by default with report_safe 0 (stock SA, not
  Amavis).
 
 I now recall having added a few custom headers in the past, and it was
 indeed necessary to instruct amavis to display them. I did a little
 more digging around, and learned how I was doing it previously was
 replaced with the following, in amavisd.conf:
 
   $allowed_added_header_fields{lc('X-Spam-Report')} = 1;
 
 So I've modified my local.cf with the following:
 
 report_safe 0
 clear_report_template

That's actually a historic, unfortunate naming.

Despite it's name, the report option (see 10_default_prefs.cf) sets the
template used with report_safe 1 or 2, which by default shows a brief
description, (attached spam) content preview and _SUMMARY_.

It does not have any impact on the X-Spam-Report header added with
report_safe 0 by default or the _REPORT_ template tag.

In the case of report_safe 0, the clear_report_template option actually
has no effective impact at all. That report will just not be added
anyway.

 add_header all Report _REPORT_
 
 Despite specifying all, it's only displayed in quarantined messages.
 I need it to be displayed on non-spam messages, and all messages
 would be most desirable.

That'd be an Amavis specific issue. Using add_header all, SA does add
that header to both ham and spam no matter what. In particular,
quarantining is outside the scope of SA, and if that makes a difference
whether a certain header appears or not, that's also outside the scope
of SA.


 There's also this in the SA conf docs:
 
report ...some text for a report...
  Set the report template which is attached to spam mail
  messages. See the 10_misc.cf configuration file in  
  /usr/share/spamassassin for an example.

 Is this still valid? 10_misc.cf apparently no longer exists, so I
 wasn't able to follow through there.

Wow. 10_misc.cf last appeared in 3.1.x, and is otherwise identical to
10_default_prefs.cf since 3.2. In particular with respect to that very
doc snippet -- nothing at all changed in that paragraph, except that
file name.

You want to update your docs bookmarks.


It's times like these I wonder whether I am the only one left grepping
his way through files and directories, searching for $option. Or
remembering the ancient magic of a tab, when looking for possibly
matching (numbered!) files...


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Prevent DNSBL URI matches, without affecting regex URI rules?

2014-08-26 Thread Karsten Bräckelmann
On Tue, 2014-08-26 at 11:22 -0400, Kris Deugau wrote:
 Is there a way to prevent a URI from being looked up in DNSBLs, without
 *also* preventing that URI from matching on uri regex rules?
 
 I would like to add quite a few popular URL shorteners to
 uridnsbl_skip_domain, but then I can't match those domains in uri regex
 rules for feeding x and URL shortener meta rules.

Works for me.

$ echo -e \n example.com | ./spamassassin -D --cf=uri HAS_URI /.+/
dbg: rules: ran uri rule HAS_URI == got hit: http://example.com;

$ ./spamassassin --version
SpamAssassin version 3.3.3-r1136734
  running on Perl version 5.14.2

$ grep example.com rules/25_uribl.cf
uridnsbl_skip_domain example.com example.net example.org


 Still using SA 3.3.2;  if the behaviour of uridnsbl_skip_domain has been
 narrowed down in 3.4 to only skipping the listed domains on DNSBL
 lookups (as per its name) that may prod me to get 3.4 running.

Oh, 3.3.2...

Also verified the 3.3.2 (and 3.3.0 for that matter) svn tag version, in
addition to my local 3.3 branch above. Same result, works for me.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: writing own rbl rules

2014-08-26 Thread Karsten Bräckelmann
On Wed, 2014-08-27 at 01:08 +0200, Reindl Harald wrote:
 below the stdout/sterr of following script filtered for dns
 so the lists are asked, but the question remains why that
 don't happen from a IP in the same network

Nope, no RBL queries. See below.

 in the meantime there are a lot of custcount-lastexternal
 generated from a web-interface including the 4 below and
 the local network range is listed on them, hence why i
 want them used unconidtionally and not only with foreign IP's

If it's internal, it's internal. There is a reason you are setting up
lastexternal DNSxL rules.

Do not invalidate SA *_networks configuration in an attempt to adjust it
to poorly, non real-live generated samples. Generate a proper sample
instead, either by actually sending mail from external IPs, or if need
be by manually editing the MX Received header, forging an external
source (do pay attention to detail).

Besides, there is no point in whitelisting your own LAN IPs. Those
should simply hit ALL_TRUSTED, or just not be filtered in the first
place.


 /usr/bin/spamassassin -D   /var/lib/spamass-milter/spam-example.eml

 [sa-milt@mail-gw:~]$ cat debug.txt | grep -i dns

 Aug 27 00:59:29.249 [30833] dbg: metadata: X-Spam-Relays-Untrusted: [ 
 ip=10.0.0.19 rdns=mail-gw.thelounge.net
 helo=mail-gw.thelounge.net by=mail.thelounge.net ident= envfrom= intl=0 
 id=3hjPzJ6TWVz23 auth= msa=0 ] [
 ip=10.0.0.6 rdns=arrakis.thelounge.net helo=arrakis.thelounge.net 
 by=mail-gw.thelounge.net ident= envfrom= intl=0
 id=3hjPzJ2tkPz1w auth= msa=0 ]

There is no X-Spam-Relays-Trusted metadata in your grep for dns, which
means there is absolutely no trusted relay. Given those relays are in
the 10/8 class A network and you deliberately breaking trusted_networks
in a previous post, that seems about right...

 Aug 27 00:59:29.249 [30833] dbg: metadata: X-Spam-Relays-External: [ 
 ip=10.0.0.19 rdns=mail-gw.thelounge.net
 helo=mail-gw.thelounge.net by=mail.thelounge.net ident= envfrom= intl=0 
 id=3hjPzJ6TWVz23 auth= msa=0 ] [
 ip=10.0.0.6 rdns=arrakis.thelounge.net helo=arrakis.thelounge.net 
 by=mail-gw.thelounge.net ident= envfrom= intl=0
 id=3hjPzJ2tkPz1w auth= msa=0 ]

Same issue with X-Spam-Relays-Internal not showing up in the grep, thus
being completely empty. Unless you specified internal_networks manually,
it is set to trusted_networks. Thus equally invalid.


 Aug 27 00:59:29.254 [30833] dbg: dns: checking RBL bl.spameatingmonkey.net., 
 set cust12-lastexternal
 Aug 27 00:59:29.254 [30833] dbg: dns: checking RBL spam.dnsbl.sorbs.net., set 
 cust15-lastexternal
 Aug 27 00:59:29.254 [30833] dbg: dns: checking RBL psbl.surriel.com., set 
 cust14-lastexternal

All those third-party RBLs with your cust sets are extremely fishy.

Anyway, there are no dbg: dns: IPs found: and dbg: dns: launching
lines, so this clearly shows the RBLs are NOT queried.

 Aug 27 00:59:29.254 [30833] dbg: dns: checking RBL dnswl-low.thelounge.net., 
 set cust16-lastexternal

No activity with your custom RBL either. But well, how would you expect
SA to query *last* external, given you deliberately told SA there are no
internal relays...

All external. No internal, no last external aka hop before first
internal either.


First of all, do read and understand the (trusted|internal)_networks
options in the M::SA::Conf [1] docs, section Network Test Options.

Then remove the current bad *_networks options in your conf. If you
don't fully understand those docs, keep it at that, default. If you do
understand and see an actual need to manually set them, do so, but do so
*correctly*.

Hints on gathering relevant information from the debug output:

Don't just grep for generic dns, but check specifics by grepping for
X-Spam-Relays and (trusted|internal)_networks. Better yet, don't grep
but search the debug output interactively, and read nearby / related
info.

While debugging, actually reading, searching for terms or at least
glimpsing the entire debug output is good advice anyway.


[1] http://spamassassin.apache.org/doc/Mail_SpamAssassin_Conf.html

-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: writing own rbl rules

2014-08-26 Thread Karsten Bräckelmann
On Wed, 2014-08-27 at 03:01 +0200, Reindl Harald wrote:
  If it's internal, it's internal. There is a reason you are setting up
  lastexternal DNSxL rules.
 
 the intention is to handle the internal IP like it would be external

Again: Craft your samples to match real-life (production) environment.
Do not configure or try to fake an environment that will not match
production later. It won't work.

You want to configure SA. So configure SA. Correctly.

If you insist on not following that advice, please refrain from further
postings to this list.


  Aug 27 00:59:29.249 [30833] dbg: metadata: X-Spam-Relays-Untrusted: [ 
  ip=10.0.0.19 rdns=mail-gw.thelounge.net
  helo=mail-gw.thelounge.net by=mail.thelounge.net ident= envfrom= intl=0 
  id=3hjPzJ6TWVz23 auth= msa=0 ] [
  ip=10.0.0.6 rdns=arrakis.thelounge.net helo=arrakis.thelounge.net 
  by=mail-gw.thelounge.net ident= envfrom= intl=0
  id=3hjPzJ2tkPz1w auth= msa=0 ]
  
  There is no X-Spam-Relays-Trusted metadata in your grep for dns, which
  means there is absolutely no trusted relay. Given those relays are in
  the 10/8 class A network and you deliberately breaking trusted_networks
  in a previous post, that seems about right...
 
 the intention to berak it was to behave like it is external
 and just check the RBL behavior

Read my previous post again, carefully. If you define everything to be
external, there is no *last* external SA can trust.


  Anyway, there are no dbg: dns: IPs found: and dbg: dns: launching
  lines, so this clearly shows the RBLs are NOT queried.
 
 that's my problem :-)

So you know how to fix it. Configure *_networks in SA correctly, and
send a message from an external host.


  No activity with your custom RBL either. But well, how would you expect
  SA to query *last* external, given you deliberately told SA there are no
  internal relays...
 
 well, there will never be internal relays, just a inbound-only MX

That IS an internal relay. Your MX must be in your internal_networks,
and it is by the very definition of MX an SMTP relay.


  All external. No internal, no last external aka hop before first
  internal either.
 
 i want that RBL checks in general only for the *phyiscal* IP
 with no header inspections - 90% of inflow will be finally
 filtered out by postcsreen anyways

You need an internal, trusted relay to get that IP you desire. That
relay is what generates the Received header with precisely that IP.

Besides: SA is not an SMTP. It does not add the Received header. And it
absolutely has to inspect headers, whether you like that or not. That is
how SA determines exactly that last, trustworthy, physical IP. And for
that, trusted and internal networks need be correct, so by extension
external networks also are correct.


  First of all, do read and understand the (trusted|internal)_networks
  options in the M::SA::Conf [1] docs, section Network Test Options.
  
  Then remove the current bad *_networks options in your conf. If you
  don't fully understand those docs, keep it at that, default. If you do
  understand and see an actual need to manually set them, do so, but do so
  *correctly*.
 
 the intention is no trust / untrust at all and handle any IP
 with it's phyiscal connection

Do read the docs I linked to.

You are totally misunderstanding trust. It is not about what you trust,
or don't. It is about which Received headers SA can trust to be correct.

In particular, your MX, your first internal relay, absolutely MUST be
trusted by SA. That is the SMTP relay identifying the sending host,
complete with IP and rDNS.

Received headers before that simply CANNOT be trusted. There is no way
to guarantee the host they claim to have received the message from is
legit.


  [1] http://spamassassin.apache.org/doc/Mail_SpamAssassin_Conf.html
 
 thanks!

In general, I stand to what I wrote in the previous post. And I strongly
suggest you follow that advice.

The approach you tried and defended with claws in this already lengthy
thread will not work and is bound to fail. Stop arguing, and start
setting up a serious test environment and correct SA options.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: no subject tagging in case of X-Spam-Status: Yes

2014-08-25 Thread Karsten Bräckelmann
On Mon, 2014-08-25 at 11:37 +0200, Reindl Harald wrote:
 header contains X-Spam-Status: Yes, score=7.5 required=5.0
 but the subject does not get [SPAM] tagging with the config
 below - not sure what i am missing

What does this command return?

  echo -e Subject: Foo\n | spamassassin --cf=required_score 1


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: drop of score after update tonight

2014-08-25 Thread Karsten Bräckelmann
On Mon, 2014-08-25 at 17:47 +0200, Reindl Harald wrote:

 yes and that is one which the currently existing
 Barracuda Spamfirewall scored with around 20 and
 grabbed from the backend there for testings

 the plain content i attached as ZIP (what made it to the listg)
 is used for testing by just copy the content to a formmailer or
 in a new plaintext message in TB point directly to the test MX

Given  (a) you disabled RBL checks in SA,  (b) that sample is a plain
body without any headers, and  (c) your method of sending the sample
even hits ALL_TRUSTED,  SA still does a pretty decent job in comparison.

The Barracuda appliance you're comparing results to did not have those
disadvantages.


Anyway, changing scores after a successful sa-update are to be expected.
The re-scoring algorithm only uses the default threshold of 5.0, it does
not know the concept of a second reject score.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: no subject tagging in case of X-Spam-Status: Yes

2014-08-25 Thread Karsten Bräckelmann
On Mon, 2014-08-25 at 18:55 +0200, Reindl Harald wrote:
 Am 25.08.2014 um 18:00 schrieb Karsten Bräckelmann:

  What does this command return?
  
echo -e Subject: Foo\n | spamassassin --cf=required_score 1
 
 as root as expected the modified subject
 as the milter user the unmodified

 [root@mail-gw:~]$ echo -e Subject: Foo\n | spamassassin 
 --cf=required_score 1

 X-Spam-Status: Yes, score=3.7 required=1.0 tests=MISSING_DATE,MISSING_FROM,
 MISSING_HEADERS,MISSING_MID,NO_HEADERS_MESSAGE,NO_RECEIVED,NO_RELAYS
 Subject: [SPAM] Foo
 X-Spam-Prev-Subject: Foo

Exactly as expected. Subject tagging works.


 [root@mail-gw:~]$ su - sa-milt
 [sa-milt@mail-gw:~]$ echo -e Subject: Foo\n | spamassassin 
 --cf=required_score 1

 X-Spam-Status: No, score=0.0 required=1.0 tests=none
 Subject: Foo

No tests at all. I doubt the milter generated all those missing headers
including From and Date, instead of a Received one only. So it seems the
restricted sa-milt user has no read permissions on the SA config.

As that user, have a close look at the -D debug output.

  spamassassin -D --lint


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: no subject tagging in case of X-Spam-Status: Yes

2014-08-25 Thread Karsten Bräckelmann
On Mon, 2014-08-25 at 19:43 +0200, Reindl Harald wrote:
 Am 25.08.2014 um 19:13 schrieb Karsten Bräckelmann:

  No tests at all. I doubt the milter generated all those missing headers
  including From and Date, instead of a Received one only. So it seems the
  restricted sa-milt user has no read permissions on the SA config.
  
  As that user, have a close look at the -D debug output.
  
  spamassassin -D --lint
 
 bingo - only a snippet below
 thank you so much for setp in that thread


 the files inside exept one have correct permissions (0644)
 but /var/lib/spamassassin/3.004000/updates_spamassassin_org not

 i guess i will setup a cronjob to make sure the permissions
 below /var/lib/spamassassin/ are 755 and 644 for any item

A dedicated cron job doesn't make sense. You should add that to the
existing cron job that runs sa-update and conditionally restarts spamd.
Changing permissions has to be done before restarting spamd.

Alternatively, ensure the respective users for spamd, sa-update and the
milter are identical, or at least share a common group.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: drop of score after update tonight

2014-08-25 Thread Karsten Bräckelmann
On Tue, 2014-08-26 at 00:08 +0200, Reindl Harald wrote:
 the bayes=1.00 below makes me wonder because around 1000 careful
 selected ham/spam messages for training - IMHO that should be more in
 such clear cases

Please do read the docs or at least the rule's description (hint, see
the BAYES_99 one) before venting such opinion.

The Bayesian Classifier returns a probability of the mail being ham or
spam, in a range between 0 and 1. Zero being ham, 1 spam, and a value of
0.5 being neutral, kind of undecided.

A bayes value of 1. is as high as it gets, and the rules'
descriptions also clearly state the spam probability being 99.9 to 100%.


 however, i admit that i am a beginner with SA!
 
 Aug 26 00:01:32 mail-gw spamd[6836]: spamd: result: Y 5 -
 ADVANCE_FEE_4_NEW,ADVANCE_FEE_4_NEW_MONEY,ADVANCE_FEE_5_NEW,ADVANCE_FEE_5_NEW_MONEY,ALL_TRUSTED,BAYES_99,BAYES_999,DEAR_SOMETHING,DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,LOTS_OF_MONEY,T_MONEY_PERCENT,URG_BIZ
 scantime=0.3,size=4760,user=sa-milt,uid=189,required_score=1.0,rhost=localhost,raddr=127.0.0.1,rport=29317,mid=*,bayes=1.00,autolearn=disabled

-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Rule to check return-path for To address

2014-08-23 Thread Karsten Bräckelmann
On Sat, 2014-08-23 at 14:59 -0400, Jeff wrote:
 I recently started getting hammered by spam and nearly all of the spam
 emails have one thing in common. The return-path header contains the
 email address that the spam is being sent to.
 
 Below is a sample header:
 ...
 Return-Path: amazon-voucher-myname=mydomain@indiarti.com
 ...
 
 The green text above is the email address that the spam is being sent
 to (i.e., myn...@mydomain.com).

That's common practice with legitimate mail, too, in particular mailing
lists. Have a look at this mail's Return-Path header.


 Is there a way to write a custom SpamAssassin rule that will mark any
 message as spam if the return-path contains the 'To' address,
 regardless of what it may be, and the equal sign (i.e.,
 user=domain.tld)?

See the TO_EQ_FROM stock rule.

A similar rule for the Return-Path should actually be simpler, though.
The Return-Path header (or similar envelope from type headers) is
generated by the MTA, so the order of Return-Path and To headers should
be static -- unlike To and From, which are set by the sending MUA.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Bayes training via inotify (incron)

2014-08-22 Thread Karsten Bräckelmann
On Fri, 2014-08-22 at 17:32 -0700, Ian Zimmerman wrote:
 Isn't inotify a bit of overkill for this?  If you have a dedicated
 maildir for training, you know that anything in maildir/new is, uh,
 new.  So you process it and move it to maildir/cur.  What am I missing?

The new/ directory is for delivery, messages moved will end up in cur/.

Training on messages in new/ means training solely on classification.
These messages have not been seen by a human, and he's most likely not
even aware there's new mail at all.

Messages moved (copied) into dedicated (ham|spam) learning folders will
be placed in cur/.

Thus, training on content in dedicated learning folders' new/ dirs won't
work, because human reviewed mail does not go there. And training on
new/ dirs in general is like overriding all of the precaution measures
of SA auto-learning, and blindly train anything and everything above or
below the required_score threshold.


Besides, moving messages from new/ to cur/ is the IMAP server's duty. No
third-party script should ever mess with that.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Learning both spam and ham, edge case

2014-08-22 Thread Karsten Bräckelmann
On Fri, 2014-08-22 at 17:44 -0700, Ian Zimmerman wrote:
 I know that if you misclassify a mail as spam with
 
  sa-learn --spam /path/to/ham
 
 you can later run
 
  sa-learn --ham /path/to/ham
 
 to correct the mistake, and SA will do the right thing (ie. forget the
 wrong classification).  And conversely, with ham - spam.

Correct. SA will recognize it has been learned before, and automatically
forget the previous training before re-training.


 My question is, what happens if you run
 
  sa-learn --spam /path/to/spam --ham /path/to/ham
 
 and the same message is in both mailboxes?  Is the behavior even
 well-defined (ie. not random)?  And if so, can it be relied on in new
 versions?

Interesting...

First of all, see the man-page.  --ham and --spam are options, they
don't take arguments.

   sa-learn [options] [file]...

So your example is flawed by the assumption that --ham or --spam would
affect its file/path arguments, or possibly any following file/paths.
Which they don't.

Experimenting with --ham and --spam options, and two (identical) file
arguments yields:

Learning as ham or spam is not based on command-line option order, but
sa-learn code: --ham file --spam file results in learning spam, then
ham.

If you want to know more about sa-learn innards, I recommend looking at
its source code, or at least investigating

  sa-learn -D [...] 21 | egrep '(learn|archive-iterator)'


In short: It is not random, but well-defined (see the source code). In
particular, there is no order of options. It is not guaranteed to be the
same in future (major|minor) versions, since your invocation sample is
not even documented.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Delays with Check_Bayes

2014-08-21 Thread Karsten Bräckelmann
On Thu, 2014-08-21 at 13:13 -0700, redtailjason wrote:

 Are you open to the possibility of upgrading to 3.4.0 and using the Redis 
 backend for Bayes? (Just offering an alternative.)
 
 We have been developing and upgrade plan to 3.4. Based on this, we are
 prioritize this upgrade and will be expediting it. Thanks. 

Thanks for including the part you're directly referring to, as I
requested. However, please do distinguish the quoted part from your
comments. The first paragraph actually was written by John, but your
post lacks any hint of the author, and even worse displays the quote and
your text visually identical.

See the difference between your latest two posts and any other post in
this thread?


I blame Nabble for even making this possible. In a reply, the quoted
text must be visually distinctive. More reason to avoid Nabble.

 View this message in context: 
 http://spamassassin.1065346.n5.nabble.com/Delays-with-Check-Bayes-tp111067p18.html
 Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
 
Sic. This is a mailing list. And Nabble a third-party list archive
service and poor forum-style web frontend to the mailing list.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Delays with Check_Bayes

2014-08-20 Thread Karsten Bräckelmann
On Wed, 2014-08-20 at 07:35 -0700, redtailjason wrote:
 Here is the dump from one of the scanners:
 
 netset: cannot include 127.0.0.1/32 as it has already been included
 0.000  0  3  0  non-token data: bayes db version
 0.000  0613  0  non-token data: nspam
 0.000  0  0  0  non-token data: nham
 0.000  0  50382  0  non-token data: ntokens
 0.000  0 1362372138  0  non-token data: oldest atime
 0.000  0 1396547409  0  non-token data: newest atime

That's back in April -- and obviously not a production database.

You need to run sa-update as the user SA uses during scan. In your case
that's the user Amavis uses.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Delays with Check_Bayes

2014-08-20 Thread Karsten Bräckelmann
On Wed, 2014-08-20 at 08:51 -0700, redtailjason wrote:
 The initial post was data extracted from mail.log on the scanner using cat
 /var/log/mail.log | grep check_bayes while logged as administrator. 

It doesn't matter what user greps the logs.

It was Amavis generating the logs. Thus, for debugging, all execution of
Amavis or SA commands must be done as the user Amavis runs as.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Delays with Check_Bayes

2014-08-20 Thread Karsten Bräckelmann
On Wed, 2014-08-20 at 06:15 -0700, redtailjason wrote:
 Hello and good morning. We are running into some delays that we are trying to
 pin down a root cause for. 
 
 Below are some examples. Within the examples, you can see that the
 check_bayes: scan is consuming most of the timing. Does anyone have any
 suggests on what to look at? We use 3.3.2. We have eight scanners setup to
 handle the scanning with 5GB RAM and 4 CPUs each. Volume is 250K - 500K per
 day. 

That volume means throughput of about 350 messages per minute, 5.8 per
second. Sounds reasonable for 8 dedicated scanners.

Your samples are showing overall timings between about 90 seconds and
more than 2 minutes. Which means processing commonly takes less time,
and these are some extreme cases -- unless you really do have 50-100
busy processes per machine.

How many such long-running processes do you see, how frequent are they?

Also, you mentioned you are using the MySQL backend for Bayes. You did
not add any further detail, though.

Do you have dedicated MySQL servers for Bayes? Or does each scanner
machine run a local MySQL server? Do they share / sync databases
somehow?

Please elaborate on your environment, in particular everything
concerning Bayes.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Delays with Check_Bayes

2014-08-20 Thread Karsten Bräckelmann
On Wed, 2014-08-20 at 13:38 -0700, redtailjason wrote:
 We are seeing about 4000-7000 delayed messages per day. We do utilize a
 dedicated MySQL Server for the Bayes and all 8 scanners share it. Please let
 me know if this does not fully clarify our setup for you. 

So we're talking about 1% of the messages.

Does this happen with all scanner machines, or is this isolated to a
single one? If not all scanners are affected, any differences in network
connection?

When did this start? Any relevant changes roughly about that time?

What's your DB server load? Any noticeable load spikes, like 5k times a
day? In particular, while a message is taking 2 minutes wall-clock time
for Bayes, does either the scanner or database server have an unusual
high load? Do you have MySQL logs which might show issues?

Can you reproduce the Bayes lags? That is, can you identify a sample
message, and re-process manually?


When replying, please include the relevant quoted parts you're directly
referring to. With some context it is easier to follow the thread.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Advice sought on how to convince irresponsible Megapath ISP.

2014-08-17 Thread Karsten Bräckelmann
On Sun, 2014-08-17 at 07:37 -0700, Linda Walsh wrote:
 Karsten Bräckelmann wrote:

  Be liberal in what you accept, strict in what you send. In particular,
  later stages simply must not be less liberal than early stages.

  Your MX has accepted the message.
 
 My ISP's MX has accepted it, because it doesn't do domain checking.  My 
 machine's MX rejects it so fetchmail keeps trying to deliver it. 

There is only one MX, run by your ISP. You are running an SMTP relay,
not an MX.

 While I *could* figure out how to hack sendmail to not reject the message,

You don't have a choice. That sendmail is an *internal* SMTP relay after
the MX border. While you certainly are not looking at it this way, your
own services *together* with the SMTP run by your ISP form your internal
network.

The internal relay you run must not be stricter than the MX. In fact, it
simply cannot be stricter, without mail ending up in limbo. Exactly what
you have...


  There is no forwarding.
 
 It comes in their MX, and is forwarded to their users.

Again, that is not forwarding. (Hint: You are using fetchmail, not
being-forwarded-to-me-mail.)


   Any ideas on how to get a cheapo-doesn't want to support anything ISP to 
   start blocking all the garbage the pass on?
 
  Change ISP. You decided for them to run your MX.
 
 I didn't decide for them, I inherited them when they bought out the 
 competition to supply lower quality service for the same price.

We're about to split hairs, but it is your decision to try get your ISP
to behave as you want, instead of taking your business elsewhere. So,
yes, it is your decision to let them run your MX.

  It is your choice to aim for a cheapo service (your words).
 
 It wasn't when I signed up.   Cost $100 extra/month.  Now only $30
 extra/month that I don't host the domain with them.

But it is now, and all you're doing is complaining about it.

Expenses dropped to a fraction of what it used to be, yet you expect the
same service as before?

  If you're unhappy with the service, take your business elsewhere.
  Better service doesn't necessarily mean more expensive, but you
  might need to shell out a few bucks for the service you want.
 
 I already am... my ISP (cable company) doesn't have the services I want 
 for mail hosting.  I went to another company for that,

It is irrelevant weather your mail service provider happens to also be
your cable provider. You are paying for mail services. And if you want
better service, you might need to pay more -- which is what I said.

Besides, your wording is almost ironic. Your ISP didn't offer the email
service you want, so you went for another company. Now your current
(mail) service provider doesn't offer the service you want...


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Advice sought on how to convince irresponsible Megapath ISP.

2014-08-16 Thread Karsten Bräckelmann
On Fri, 2014-08-15 at 19:06 -0700, Linda A. Walsh wrote:
 My old email service was bought out by Megapath who is letting alot of 
 services slide.
 
 My main issue is that my incoming email scripts follow the SMTP RFC's and if
 the sender address isn't valid, then it's not a valid email that should be
 forwarded. 
 
 My script simply check for the domain existing or not - if it doesn't exist,
 then it rejects it.  This causes about 100-200 messages a month that get
 stuck in an IMAP queue waiting for download -- only to be downloaded and 
 rejected due to the sender domain not existing.

Linda, your are rather vague on details, and definitely confusing terms
and terminology.

You state your ISP would forward mail to you. While on the other hand, a
sub-set of the mail is not accepted by your scripts, thus stuck in an
IMAP account waiting for download. Both, the usage of IMAP as well as
mentioning download shows, your ISP is not forwarding mail, but you
fetching mail.

Similarly, your scripts do not reject messages, but choose not to fetch
them.


Pragmatic solution: If you insist on your scripts to not fetch those
spam messages (which have been accepted by the MX, mind you), automate
the manual download and delete stage, which frankly only exists due to
your choice of not downloading them in the first place. Make your
scripts delete, instead of skipping over them.

Be liberal in what you accept, strict in what you send. In particular,
later stages simply must not be less liberal than early stages.

Your MX has accepted the message. At that point, there is absolutely no
way to not accept, reject it later. You can classify, which you use SA
for (I guess, given you posting here). You can filter or even delete
based on classification, or other criteria.


 The only response my ISP will give is to turn on their spam filtering. 
 I tried that. In about a 2 hour time frame, over 400 messages were
 blocked as spam.  Of those less than 10 were actually spam, the rest
 were from various lists.
 
 So having them censoring my incoming mail isn't gonna work, but neither will
 the reject the obvious invalid domain email.
 
 I can't believe that they insist on forwarding SPAM to their users even 
 though they know it is invalid and is spam. 

There is no censoring. There is no forwarding.

 Any ideas on how to get a cheapo-doesn't want to support anything ISP to 
 start blocking all the garbage the pass on?

Change ISP. You decided for them to run your MX.

It is your choice to aim for a cheapo service (your words). If you're
unhappy with the service, take your business elsewhere. Better service
doesn't necessarily mean more expensive, but you might need to shell out
a few bucks for the service you want.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



RE: Hotfix/phishing spam

2014-08-16 Thread Karsten Bräckelmann
On Thu, 2014-08-14 at 19:37 -0500, John Traweek CCNA, Sec+ wrote:
 Usually an end user has to request the hotfix and fill out a form on
 the MS site and then MS will send out an email with the URI.

Pardon my ignorance, but... WHY!?

Why would anyone require filling out a web form, to send an automated
email with a link as response? Why not simply, you know, put the link in
the page the user gets in return after sending that completed form
anyway?

Using an email message as response to an HTTP GET or POST request to
transfer a http(s) URI is beyond clusterfuck.


(Yes, I do realize you merely described what MS does, and you're not
responsible for their lame process.)


 So to answer your question, yes, MS does send out emails with
 hotfixes, but only when an end user requests it, at least in my
 experience… 
 
 If the end user did not specifically fill out a form/request the hot
 fix, then I would be very suspicious…


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Second step with SA

2014-08-15 Thread Karsten Bräckelmann
On Fri, 2014-08-15 at 12:21 -0400, Daniel Staal wrote:
 --As of August 15, 2014 1:23:37 PM +0200, Antony Stone is alleged to have 
 said:

  http://spamassassin.apache.org/full/3.0.x/dist/doc/Mail_SpamAssassin_Conf
  .html#language_options

 Both of these links are out of date.  The whitelist/blacklist it probably 
 doesn't matter to much, but the language option in the first has been 
 discontinued entirely.

Nope. The ok_languages option has not been discontinued. It has been
plugin-ized since 3.1, still lives to this date in the TextCat language
guesser plugin.


I do however agree, that those 3.0 links are way too old. I guess Antony
should clean up some bookmarks. ;)

Regarding white- and blacklist options, there have been some significant
changes since. Most notably, in addition to the whitelist_from_rcvd,
today there's the most convenient whitelist_auth and its piece-meal
whitelist_from_(spf|dk|dkim) counterparts.


 The correct links for the current version of Spamassassin are:
 http://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html#language_options
 http://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html#whitelist_and_blacklist_options

Latest stable version documentation, always:

  http://spamassassin.apache.org/doc/


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: spamassassin at 100 percent CPU

2014-08-13 Thread Karsten Bräckelmann
On Wed, 2014-08-13 at 11:20 -0700, Noah wrote:
 This is a new machine with rules copied over from another machine.  How 
 about this?  I just start new.  Is there a good page out that explains 
 setting up spamassassin from scratch and getting the sa rules set up 
 well and cleaned up nicely?  I am happy to start from the beginning with 
 best practices.

If you cannot answer our rather specific questions, you're in for a much
steeper learning curve than you seem to expect...


What the best way of setting up SA on a new machine is? Just install the
distro provided SA packages.

Getting the SA rules set up well? Same. Cleaned up? Do not copy over
configuration and rules from $ome other system, unless you know what you
are copying. IOW, don't. That's clean by definition.

What I really don't get from your reply is this, though:

A new machine, with rules copied over. Yet, you seem to be unable to
answer our questions regarding custom rules and configuration you put
there. Which equals everything you copied over to begin with. If you
did, why can't you answer our question?

Or revert that copying over, which results in the cleaned up state
you asked for.


Regardless of continuing with the current system, or setting up the
whole system from scratch again -- there are important questions raised,
you just didn't answer. Which, frankly, are likely to have a *much* more
severe impact than removing bad, copied rules.

What mail is that system handling, if it is not an MX? How large are
those messages, and what's your size limit? How is SA integrated, what
software is passing mail to SA?

What is the actual process's name, and for how long does it run at CPU
max?


Without answering these (basically, get back to my previous post and
actually answer all my very specific questions), there is absolutely no
point in you posing more or other questions. It won't help.


Reference:

 On 8/11/14 4:31 PM, Karsten Bräckelmann wrote:
  On Mon, 2014-08-11 at 09:18 -0400, Joe Quinn wrote:
  Keep replies on list.
 
  Do you remember making any changes, or are you using spamassassin as it
  comes? What kind of email is going through your server? Very large
  emails can cause trouble with poorly written rules. If you can, perhaps
  systematically turn off things that are pushing email to that server
  could narrow it down to a particular type of email.
 
  On 8/9/2014 4:41 PM, Noah wrote:
  thanks for your response.  I am not handling much email its a new
  server and currently the MX points to another server.
 
  What mail is it handling?
 
  Not MX, so I assume it does not receive externally generated mail at
  all. Which pretty much leaves us with locally generated -- cron noise
  and other report types.
 
  How is SA integrated? What's your message size limit (see config of the
  service passing mail to SA)? Are you per chance scanning multi MB text
  reports?
 
  A sane size limit is about 500 kB. Besides, local generated mail isn't
  worth processing with SA, and in the case of cron mail often harmful
  (think virus scanner report).
 
 
  How do I check the SA configuration?  How do I check if I am using
  additional rules?
 
  By additional rules, we mean any rules or configuration that is not
  stock SA. Anything other than the debian package or running sa-update.
  Generally, anything *you* added.
 
 
  On 7/31/2014 3:19 PM, Noah wrote:
  what are some things to check with spamassassin commonly running at
  100 percent?
 
  For how long does it run at CPU max? What is the actual process name?
 
  It would be rather common for the plain 'spamassassin' script to consume
  a couple wall-clock seconds of CPU, since it has to read and compile the
  full rule-set at each invocation.
 
  Unlike the 'spamd' daemon, which has that considerable overhead only
  once during service start. In both cases may the actual scan time with
  high CPU load be lower than the start-up overhead.
 
 
-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Rule for single URL in body with very few text

2014-08-12 Thread Karsten Bräckelmann
On Tue, 2014-08-12 at 11:42 -0400, Karl Johnson wrote:
 Thanks for the rule Karsten. I've already searched the archive to find
 this kind of rule and found few topic but I haven't been able to make
 it works yet. I will try this one and see how it goes.

Searching is much easier, if you know some unique pointers like the
sub-rule's name in question. Which is what I used to dig up the
rules. ;)

I didn't mean to RTFM you, just didn't feel like discussing yet again
what should be possible to deduct from the rules itself, or from the
archived threads. Hence me pointing at the archives with info on how to
find what you need, just in case you do need or want more details.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Running SA without the bayesian classifier

2014-08-11 Thread Karsten Bräckelmann
On Mon, 2014-08-11 at 16:38 +0200, Matteo Dessalvi wrote:
 I am planning to install SA on our SMTP MTAs, which deals only with
 outgoing traffic generated in the internal network.

Outgoing traffic. That means, most DNSBLs are either completely useless
or effectively disabled. You'll also need to zero out the ALL_TRUSTED
rule for the same reason.


 I am making the assumption that our clients are mostly sending 'clean'
 email (I know, I am trusting *a lot* my users but nevertheless).
 
 So the question is: how efficient will be SA without using the bayesian
 classifier? Are all the remaining rulesets (apart from BAYES_*)
 sufficient to shave off spam email?

Define spam.

Running SA on your outgoing SMTP will not catch botnet generated junk,
neither spam nor malware. This would require sniffing raw traffic. Or
completely firewalling off outgoing port 25 connections.

You explicitly mention your users (corporate or home?) sending mail.
Are you talking about them possibly running bulk sending services, or
hand crafted unsolicited mail to individual recipients?

Unless there's a 419 gang operating from your internal network, there
might not be much left for SA with stock rules to classify spam...


That said, it is entirely possible to run SA without the Bayesian
classifier. There's an option to disable it, and different score sets
are used generated specifically for this case.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Rule for single URL in body with very few text

2014-08-11 Thread Karsten Bräckelmann
On Mon, 2014-08-11 at 15:48 -0400, Karl Johnson wrote:
 Is there any rule to score an email with only 1 URL and very few text?
 It could trigger only text formatted email because they usually aren't
 in HTML.

Identify very short (raw)bodies.

  rawbody __RB_GT_200  /^.{201}/s
  meta__RB_LE_200  !__RB_GT_200

Chain together with the stock __HAS_URI sub-test.

  metaSHORT_BODY_WITH_URI  __RB_LE_200  __HAS_URI


I have discussed and explained the rule to identify short messages a few
times already. Please search your preferred archive [1] for the rule's
name, to find the complete threads.


[1] List of archives: http://wiki.apache.org/spamassassin/MailingLists

-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Rule for single URL in body with very few text

2014-08-11 Thread Karsten Bräckelmann
On Mon, 2014-08-11 at 22:57 +0300, Jari Fredriksson wrote:

 *  1.8 DKIM_ADSP_DISCARD No valid author signature, domain signs all mail
 *  and suggests discarding the rest

 This is a corner case. I got it tagged, but probably just because I
 tested it later and URIBL has it now.

Minus the 1.8 score for DKIM_ADSP_DISCARD, it wouldn't have crossed the
5.0 threshold for you either.

Seeing all those x instead of (real|user|host) names and domains, it
seems safe to assume the unredacted message does not claim to be sent
from an x.com address... ;)


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: spamassassin at 100 percent CPU

2014-08-11 Thread Karsten Bräckelmann
On Mon, 2014-08-11 at 09:18 -0400, Joe Quinn wrote:
 Keep replies on list.
 
 Do you remember making any changes, or are you using spamassassin as it 
 comes? What kind of email is going through your server? Very large 
 emails can cause trouble with poorly written rules. If you can, perhaps 
 systematically turn off things that are pushing email to that server 
 could narrow it down to a particular type of email.
 
 On 8/9/2014 4:41 PM, Noah wrote:
  thanks for your response.  I am not handling much email its a new 
  server and currently the MX points to another server.

What mail is it handling?

Not MX, so I assume it does not receive externally generated mail at
all. Which pretty much leaves us with locally generated -- cron noise
and other report types.

How is SA integrated? What's your message size limit (see config of the
service passing mail to SA)? Are you per chance scanning multi MB text
reports?

A sane size limit is about 500 kB. Besides, local generated mail isn't
worth processing with SA, and in the case of cron mail often harmful
(think virus scanner report).


  How do I check the SA configuration?  How do I check if I am using 
  additional rules?

By additional rules, we mean any rules or configuration that is not
stock SA. Anything other than the debian package or running sa-update.
Generally, anything *you* added.


   On 7/31/2014 3:19 PM, Noah wrote:
what are some things to check with spamassassin commonly running at
100 percent?

For how long does it run at CPU max? What is the actual process name?

It would be rather common for the plain 'spamassassin' script to consume
a couple wall-clock seconds of CPU, since it has to read and compile the
full rule-set at each invocation.

Unlike the 'spamd' daemon, which has that considerable overhead only
once during service start. In both cases may the actual scan time with
high CPU load be lower than the start-up overhead.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Similar pattern of emails Comparing Prices

2014-08-07 Thread Karsten Bräckelmann
On Thu, 2014-08-07 at 17:14 +0100, emailitis.com wrote:
 I have had a fair number of VERY similar Spam emails that are all
 about comparing prices.  I have put a number in a pastebin below.

We need full, raw samples. Those are mostly just headers with the raw
body missing (multipart/alternative, thus most likely HTML and plain
text versions).

The blobs including a body-ish part appear to be copied from your MUA's
rendered display.


 They all seem to be originating from Fasthosts in UK which I cannot
 really blacklist in entirety.
 
 Can anyone suggest how to block it with a Spamassassin rule?

First impression thought was to match on that List-Unsubscribe header's
domain. On second thought, bad idea, since cloudapp.net is MS Azure, not
the spammer's domain.

Still, that might make for an easy rule. That unsub link includes some
campaign, recipient, etc identifying numbers. And one that most likely
identifies the sender, identical in all 7 samples.

  header AZURE_BAD_CUSTOMER  List-Unsubscribe =~ 
/email-delivery.cloudapp.net\/sender\/box.php?.*s=bfa2e2429e7a4f0b0993c32a75aebc0e/

Note: This is only assuming the s value identifies the campaign's sender
and misbehaving Azure customer.

The body most certainly contains links with very similar structure.


 http://pastebin.com/B9YqTsvZ
 
 I had tried to create something from a meta rule, but that has not
 worked so far: 
 
 body __CGK_CLOUDAPP_1 /cloudapp/i
 body __CGK_CLOUDAPP_2 /\bCompare\b/i
 meta CGK_CLOUDAPP (( __CGK_CLOUDAPP_1 +  __CGK_CLOUDAPP_2)  1)

No surprise. There is no cloudapp string in the body at all, according
to your two formatted samples.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: unsubscribe

2014-08-05 Thread Karsten Bräckelmann
Wrong address. To unsubscribe, send a mail to the appropriate
list-command address, not the mailing list itself.

See the headers of each and every post on this list:

  list-help: mailto:users-h...@spamassassin.apache.org
  list-unsubscribe: mailto:users-unsubscr...@spamassassin.apache.org


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: New at SpamAssassin - how to not get headers

2014-08-04 Thread Karsten Bräckelmann
On Mon, 2014-08-04 at 14:11 -0700, Robert Grimes wrote:
 Both spamc and hMailServer SA service are running in the same directory
 where the binaries for SA are. I am not sure the significance of the
 directory name. As I stated both use the same parameters which is only -l
 therefore SA uses default config file locations for both.

Earlier in this thread you mentioned using the -l option with spamd. Now
you mention using that option with both. So, by hMailServer SA
service, are you referring to spamd?

In either case, your assumption of using identical command line options
resulting in spamd and spamc using the same configuration is false.

* For spamc, the -l option sends log messages to stderr instead of
syslog. Given you're running Windows, I don't even know if that option
has any effect at all.

* For spamd, the -l option enables telling, that is allowing learning
(Bayes) and reporting spam to external services via spamc.

The latter is a rather uncommon option, and even less likely to be used
deliberately in the environment of a new SA user.


For spamc/d options and a lot more details, see the documentation. In
particular the docs named after their respective programs and the Conf
one.

  http://spamassassin.apache.org/doc/


 I have had serveral hundred hams. Wouldn't that be enough?

Yes, as Martin mentioned, learning 200 spam and ham each is sufficient
for Bayes to start working.

But see my other reply to this thread in a few.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: New at SpamAssassin - how to not get headers

2014-08-04 Thread Karsten Bräckelmann
On Mon, 2014-08-04 at 13:02 -0700, Robert Grimes wrote:
 Robert Grimes wrote

  I have changed the user that runs the spamd service to be the same as when
  I ran from command line. I will see what, if any changes occur. I will
  leave Bayes alone for the moment; just try one thing at a time to keep the
  confusion down.

By that change of the user your spamd service runs as, you lost your
previous Bayes training (which seems to be linked to the service user).
Unless you deliberately nuked the Bayes DB to start fresh.


Ignoring DNSBL blocking and broken format, which has been covered
already.

 X-Spam-Status: No, score=0.0 required=5.0 tests=HTML_MESSAGE,SPF_HELO_PASS,
   URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0

There is no BAYES_xx rule hit. If Bayes is enabled and has been trained
sufficiently, there will *always* be a BAYES_xx rule indicating the
Bayesian probability of being spam.

The absence of any such rule since you changed the spamd service user
means, that user has no access to the previously trained Bayes DB.

 I saved the messaged from outlook and ran spamc [...]

 X-Spam-Status: Yes, score=7.3 required=5.0 tests=MISSING_DATE,MISSING_FROM,   
 
   MISSING_HEADERS,MISSING_MID,MISSING_SUBJECT,NO_HEADERS_MESSAGE,NO_RECEIVED,
   NO_RELAYS,NULL_IN_BODY,URIBL_BLOCKED,URI_HEX autolearn=no autolearn_force=no
   version=3.4.0

No BAYES_xx rule either, same problem as above.

However, do note the autolearn=no part. Bayes is enabled (just not
sufficiently trained yet). In a follow-up to this thread, you pasted
headers of spam manually scanned with spamc, showing autolearn=ham.

A spam message incorrectly has been learned as ham. You want to correct
that by re-training (simply learn as spam). And keep an eye on that part
in future.


 both should be running under the same administrator account.

It is important to use the same user  (a) scanning incoming mail, and
(b) using for training as well as  (c) manually running through spamc
later.

Unless spamd changes user on a per-recipient basis (which it seems is
not the case in your setup), that's a single user. Changing that user as
you just did, requires moving $HOME data or changing ownership for the
Bayes DB.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: moving from fetched mail to direct deliver mail

2014-08-04 Thread Karsten Bräckelmann
On Mon, 2014-08-04 at 18:16 -0400, Joe Acquisto-j4 wrote:
 On 8/4/2014 at 5:03 PM, RW rwmailli...@googlemail.com wrote:

   Do I gotta start fresh?  or will the config changes to SA for direct
   drop allow magic to happen?

There's magic. And there's probably no SA conf changes. ;)


  I'm not sure whether you are referring to the Bayes database or a
  collection of email, but either way I'd keep it - at least until I
  had a few thousand new hams and spams to reset it. 

 Well, either or both, I guess.   I guess my question really is, is
 Bayes OK as is, or will the changes that will exist in the headers
 make it useless.I think I hear, it should be ok, for now. ?

Bayes is entirely fine with that. For now, and later.

Your change in environment only effects a very few headers added by the
relays, like Received ones. Bayes tokens taken from headers do include
header specifics. With a change like this, you will only lose a *very*
few indicators for spam vs ham. There's hardly any potential for damage
at all regarding your Bayes training.

You'll probably not even notice.


  If you are going to learn from older mail you should ideally keep the
  old internal and trusted network settings. You can comment them out in
  normal use, but they should be present for sa-learn.
 
 Umm.  ?.   So,  I  can keep the existing Bayes, but if I should have to
 re-learn,   I should revert to my old settings for learning.

Yes. The only settings you'd want to keep in case of re-training from a
corpus including those old mail are internal_ and trusted_networks,
though.

If at all. SA does detect certain mail fetching and does the magic for
you. E.g. in a rather straight environment of using 'fetchmail' with
local SA afterward (postfix, and possibly procmail), the internal and
trusted networks do not need to be set.

So in that case, there's no config needed to be retained, because
there's no config you had to set due to your mail fetching environment
in the first place.

Point in case: Retain configuration you did need in the previous setup,
which becomes obsolete with your new environment.


 I guess I should also, once I change,  start a second corpus with the
 new settings and, at least until I amass a sufficient store of new
 mail, relearn from both, adjusting SA config as appropriate?

As I hopefully made clear above, there's no need for starting a new
corpus. There's probably no need for new settings, if at all very
limited.

Your text sounds like major conf changes to me. Go through 'em, which
changes do you think you'll need? My guess is little to none.


 Make sense?  Am I way off base and/or making this too complicated?

Too complicated. ;)


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: stable branch vs trunk (was: Re: colors TLDs in spam)

2014-08-04 Thread Karsten Bräckelmann
On Sun, 2014-08-03 at 09:22 -0400, Kevin A. McGrail wrote:
 Hi Karsten, I did bring this up a few months ago discussing releases.

I'm currently catching up on list mail, and figured recent threads might
be more important than revising old-ish, finished threads, in particular
about releases already published. *sigh*

 Right now trunk is effectively 3.4.1 and there is no reason to
 maintain a branch. When 3.4.1 is released, I would make sure this was
 the case and recopy from trunk but do not stress as I will confirm
 this. We should aim for a sept 30 3.4.1 release.
 
 But until we have a need for the branch, to me it is a waste of time
 to sync both.

Fair enough.

 And the plugin system let's new, experimental code go into trunk
 without risking stability.

That holds true only for new plugins, like TxRep (trunk) or the Redis
BayesStore during 3.4 development. It does not prevent potential major
issues in cases like e.g. new URIDNSBL features, general DNS system
rewrite or tflags changes, which happened in trunk with the (then)
stable 3.3 branch being unaffected.

Not opposing in general. Just pointing out that this argument is only
valid, as long as substantial changes are in fact isolated in new
plugins.


 So right now, I do not really envision a need for a branch and I run
 trunk. My $0.02.

Hey, I didn't say trunk is unsafe either! Even while Mark happens to
rewrite large parts of DNS handling or DNSBL return value masks. ;)


As long as there is no real need for separating stable and development
branches, I'm fine with this. Given branching will happen prior to
disruptive commits.

I guess my concerns also can be outlined by anecdotal evidence: I
recently asked for RTC votes, to commit a patch not only to trunk, but
the 3.4 branch also. You told me we're not in RTC mode and to go ahead,
so I committed to the stable branch and closed the bug report. You did
not tell me committing to the branch would be needless...


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



  1   2   3   4   5   6   7   8   9   10   >