Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

2018-04-09 Thread RW
On Sun, 8 Apr 2018 07:41:50 -0500
David Jones wrote:

> On 04/07/2018 10:42 AM, Sebastian Arcus wrote:

> > I've enclosed one of the messages received here:
> > 
> > https://pastebin.com/9Bmu3pj1  
> 
> I added this to the 60_whitelist_auth.cf to trust this sender:
> 
> def_whitelist_auth *@*.tpr.gov.uk
> 
> This will get pushed out in a couple of days by sa-update.
> 
> I know it's not directly addressing your question about the rule's
> high score 

FWIW with the defaults it would have scored only 1.04. Even with
BAYES_50 instead of BAYES_00 or without RCVD_IN_DNSWL_MED, it's still
comfortably under threshold.  


That said, perhaps someone could see how this compares with the existing
version:

  /^\s*

Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

2018-04-09 Thread Sebastian Arcus


On 08/04/18 13:41, David Jones wrote:

On 04/07/2018 10:42 AM, Sebastian Arcus wrote:
I'm not entirely sure what is the cause of this - notification emails 
from The Pension Regulator in UK (a government body overseeing 
pensions) have the destination email in upper case as part of the 
Message-ID. I don't know if the user has input their email address in 
caps when creating the account with TPR, and the system at TPR just 
preserves caps - or maybe their email software does that on purpose 
somehow. In all events, all email notifications from them go straight 
to the Junk folder. Do the standards really require a message id to be 
in all lower case?


I've enclosed one of the messages received here:

https://pastebin.com/9Bmu3pj1


I added this to the 60_whitelist_auth.cf to trust this sender:

def_whitelist_auth *@*.tpr.gov.uk

This will get pushed out in a couple of days by sa-update.

I know it's not directly addressing your question about the rule's high 
score but this is how I address these types of issues.  If you create a 
"fast lane" for trusted senders then this allows for more aggressive 
tactics/scores for new and untrusted senders.


Thank you David. It sounds like a reasonable solution to me.


Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

2018-04-08 Thread David Jones

On 04/07/2018 10:42 AM, Sebastian Arcus wrote:
I'm not entirely sure what is the cause of this - notification emails 
from The Pension Regulator in UK (a government body overseeing pensions) 
have the destination email in upper case as part of the Message-ID. I 
don't know if the user has input their email address in caps when 
creating the account with TPR, and the system at TPR just preserves caps 
- or maybe their email software does that on purpose somehow. In all 
events, all email notifications from them go straight to the Junk 
folder. Do the standards really require a message id to be in all lower 
case?


I've enclosed one of the messages received here:

https://pastebin.com/9Bmu3pj1


I added this to the 60_whitelist_auth.cf to trust this sender:

def_whitelist_auth *@*.tpr.gov.uk

This will get pushed out in a couple of days by sa-update.

I know it's not directly addressing your question about the rule's high 
score but this is how I address these types of issues.  If you create a 
"fast lane" for trusted senders then this allows for more aggressive 
tactics/scores for new and untrusted senders.


--
David Jones


Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

2018-04-08 Thread Sebastian Arcus


On 07/04/18 21:20, Bill Cole wrote:

On 7 Apr 2018, at 11:42 (-0400), Sebastian Arcus wrote:


Do the standards really require a message id to be in all lower case?


Of course not, and that's also not an accurate description of 
MSGID_SPAM_CAPS.


A small minority of rules in SA are based on any external standard. They 
are empirical and pragmatic, not legalistic. There is a complex analysis 
of multiple mail streams  used to generate scores for the rules and to 
decide which rules are good enough to publish in updates, run on a daily 
basis because it takes most of a day to run. The fact that 
MSGID_SPAM_CAPS exists with that name (and mot with a 'T_' or 
developer's tag prefix) implies that at some point in the past it was 
reliable enough as an indicator of spam to be part of the default set.


Thank you Bill. That is useful to know.


Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

2018-04-08 Thread Sebastian Arcus


On 07/04/18 17:22, Antony Stone wrote:

On Saturday 07 April 2018 at 18:10:18, Sebastian Arcus wrote:


On 07/04/18 16:52, Reindl Harald wrote something.



Thank you for answering, but really, in effect you haven't answered at
all my question.



And the way I customise the scores are based on the type of emails
received at this particular site. It might seem "idiotic" to you, but
there are reasons for those scores. Not everyone receives the same mix
of email - so it isn't constructive to start calling other people's
scoring "idiotic" just because they are not the same as your own or the
defaults.


Please note that there are good reasons why you received only a private
response from this person, and that he is no longer permitted to post to the
list.

My personal recommendation is to consider carefully anything he says, judge
whether you find it useful, and not to reply.


Hi Antony. Thank you kindly for the information. I didn't notice that 
the message was private and not from the list - as the message CC'ed the 
list - so it looked like a regular reply. I will take your advice - 
thank you.




Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

2018-04-08 Thread Sebastian Arcus


On 07/04/18 17:14, Reindl Harald wrote:



Am 07.04.2018 um 18:10 schrieb Sebastian Arcus:

And the way I customise the scores are based on the type of emails
received at this particular site. It might seem "idiotic" to you, but
there are reasons for those scores. Not everyone receives the same mix
of email - so it isn't constructive to start calling other people's
scoring "idiotic" just because they are not the same as your own or the
defaults

if a single misfired rule make a BAYES_00 message to a spam message it's
idiotic - it's that easy - with or without MSGID_SPAM_CAPS that can
happen at every moment in time and when you trust your bayes -0.2 is not
justified and if you don't trust your bayes train it


A default score of 3.1 for MSGID_SPAM_CAPS is pretty high - even 
compared with some of the DNS blacklists rules - and some of those are 
pretty powerful INMHO. Hence why I was trying to understand why this 
rule is assigned such a high score and what is the significance of it.


Secondly, I found in the past that a high negative score for BAYES_00 is 
counter-productive, because:


1. As soon as you receive a spam message with a new type of content, it 
essentially has a free ride until it gets put through the bayes training 
- as the high negative on BAYES_00 counteracts any other rule it hits - 
even pretty effective rules, such as Pyzor and blacklists.


2. Spammers have learned from the above, and I get a lot of spam which 
changes the wording all the time, so that bayes becomes essentially 
ineffective against it - but at the same time it stops other rules from 
working - because of the high negative scores on low BAYES.


3. Spammers have also learned from no.1 , and I see a lot of extremely 
short spam messages - just one short line of few words. Bayes seems to 
be extremely ineffective on these very short messages, not matter how 
much you train it - because of the small amount of data to work on, and 
with a little bit of cunning and varying the words used - they all score 
as BAYES_00. Again, the high negative score gives these spammers a 
guaranteed free ride, as it overrides any other rules.


So at least from the type of spam that I see, BAYES_00 with a large 
negative score is really counter-productive and it makes SA far less 
efficient at picking spam.


BAYES_00 doesn't necessarily mean "I am sure this is not spam" - as a 
good quality whitelist rule would, for example. It merely means "I 
haven't really seen this type of spam before", or simply "this message 
is too short and I really can't say anything useful about it". For these 
reasons, I don't think low BAYES scores should be given large negative 
scores - and hence why I changed them on my systems - with really good 
results.


Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

2018-04-07 Thread Bill Cole

On 7 Apr 2018, at 11:42 (-0400), Sebastian Arcus wrote:


Do the standards really require a message id to be in all lower case?


Of course not, and that's also not an accurate description of 
MSGID_SPAM_CAPS.


A small minority of rules in SA are based on any external standard. They 
are empirical and pragmatic, not legalistic. There is a complex analysis 
of multiple mail streams  used to generate scores for the rules and to 
decide which rules are good enough to publish in updates, run on a daily 
basis because it takes most of a day to run. The fact that 
MSGID_SPAM_CAPS exists with that name (and mot with a 'T_' or 
developer's tag prefix) implies that at some point in the past it was 
reliable enough as an indicator of spam to be part of the default set.


--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Currently Seeking Steady Work: https://linkedin.com/in/billcole


Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

2018-04-07 Thread Antony Stone
On Saturday 07 April 2018 at 18:10:18, Sebastian Arcus wrote:

> On 07/04/18 16:52, Reindl Harald wrote something.

> Thank you for answering, but really, in effect you haven't answered at
> all my question.

> And the way I customise the scores are based on the type of emails
> received at this particular site. It might seem "idiotic" to you, but
> there are reasons for those scores. Not everyone receives the same mix
> of email - so it isn't constructive to start calling other people's
> scoring "idiotic" just because they are not the same as your own or the
> defaults.

Please note that there are good reasons why you received only a private 
response from this person, and that he is no longer permitted to post to the 
list.

My personal recommendation is to consider carefully anything he says, judge 
whether you find it useful, and not to reply.


Regards,


Antony.

-- 
This sentence contains exacly three erors.

   Please reply to the list;
 please *don't* CC me.


Re: MSGID_SPAM_CAPS fp's hitting messages from The Pension Regulator in UK

2018-04-07 Thread Sebastian Arcus


On 07/04/18 16:52, Reindl Harald wrote:

Content analysis details:   (5.1 points, 4.0 required)

who did set the *non default* required score to 4.0?
why did the person not adjust -0.2 for BAYES_00 too?

the scoring of this system is idiotic!

required score here is 5.5 and BAYES_00 is scored to -3.5 while milter
reject starts with 8.0 so nothing would happen just because *one single*
rule hti wrongly


Thank you for answering, but really, in effect you haven't answered at 
all my question. I was merely trying to understand the MSGID_SPAM_CAPS 
rule - and what rationale it is based on. I know I can alter the score 
just for it - I was trying to understand what other implications this 
might have. I didn't even suggest that SA default config or scoring 
needs to change!


And the way I customise the scores are based on the type of emails 
received at this particular site. It might seem "idiotic" to you, but 
there are reasons for those scores. Not everyone receives the same mix 
of email - so it isn't constructive to start calling other people's 
scoring "idiotic" just because they are not the same as your own or the 
defaults.





Am 07.04.2018 um 17:42 schrieb Sebastian Arcus:

I'm not entirely sure what is the cause of this - notification emails
from The Pension Regulator in UK (a government body overseeing pensions)
have the destination email in upper case as part of the Message-ID. I
don't know if the user has input their email address in caps when
creating the account with TPR, and the system at TPR just preserves caps
- or maybe their email software does that on purpose somehow. In all
events, all email notifications from them go straight to the Junk
folder. Do the standards really require a message id to be in all lower
case?

I've enclosed one of the messages received here:

https://pastebin.com/9Bmu3pj