Re: Bayes autolearn: how does it resolve whether rules are body or header related?

2021-05-09 Thread Loren Wilton

so you don't have points from body rules.

your mentioned URI_DEOBFU_INSTR is a meta rule:

meta URI_DEOBFU_INSTR __URI_DEOBFU_INSTR && !__MSGID_OK_HOST

so maybe it's not considered.


They are treated as header, or ignored if marked as net.


I think a bug report should be submitted for this.

Either they should be treated split 50/50 as header and body score, or when 
the metas are built they shoudl have a "body rule" flag, and that used to 
determine where the score goes.


I tried, but for some reason apache decided that I'm evil and blocked the 
submission attempt, so someone else can do it.


   Loren



Re: Bayes autolearn: how does it resolve whether rules are body or header related?

2021-05-09 Thread RW
On Sun, 9 May 2021 20:03:27 +0200
Matus UHLAR - fantomas wrote:


> so you don't have points from body rules.
> 
> your mentioned URI_DEOBFU_INSTR is a meta rule:
> 
> meta URI_DEOBFU_INSTR __URI_DEOBFU_INSTR && !__MSGID_OK_HOST
> 
> so maybe it's not considered.

They are treated as header, or ignored if marked as net. 


Re: Bayes autolearn: how does it resolve whether rules are body or header related?

2021-05-09 Thread Matus UHLAR - fantomas

On 09.05.21 04:17, Bert Van de Poel wrote:

Dear fellow Spamassassin users,

I recently noticed that quite a lot of spam emails with high scores 
weren't marked for Bayes autolearning. While some senders and 
receivers were a common match, explaining why autolearn was nog, there 
was no clear explanation for other cases. I therefore put Spamassassin 
in debug mode to check in more detail, and noticed that fairly often 
autolearn is not used because the minimum score for body tests isn't 
achieved. After looking at some specific cases, it seems however that 
several rules are either not considered when calculating the header 
rule score and body rule score for Bayes autolearning. I've always 
presumed these scores are calculated based on whether the underlying 
rule performs a regex on a header or on the body, but now I'm not so 
sure any more. I hope you can help clear up whether this is intended 
behaviour (and what that behaviour is) or whether I should report this 
as a bug.


One example I noticed is URI_DEOBFU_INSTR=3.595. This is if I 
understand it correctly a URI test that's performed on the body. 
Should a test like this be counted towards the body score count? Then 
there's the question of meta rules such as MONEY_NOHTML. If you 
resolve the different meta levels within this rule, it's a combination 
of header and body, however it's only counted towards the header 
score. Finally, it seems as if custom rules I've added within local.cf 
aren't considered. Is that indeed the case (and if so, is that by 
design)? I'm also not completely sure if UNWANTED_BODY_LANGUAGE and 
tests like razor, pyzor and DCC are considered for body scores.


Within the same realm, I'm also wondering whether these expected 
numbers for body and header can be tweaked and if so, how. For example 
the case below isn't autolearned even though it has a huge score and a 
vast amount of tests going off, but seemingly not enough body-related 
scores. Is that really the intended behaviour?


May  8 10:40:32 mail amavis[4076058]: (4076058-16) 
header_edits_for_quar:  -> 
, Yes, score=24.619 tag=- tag2=5 
kill=7.5 tests=[ADVANCE_FEE_3_NEW_MONEY=0.001, 
AXB_XMAILER_MIMEOLE_OL_024C2=0.001, BAYES_50=0.8, BERT_KULSPAM=1, 
FORGED_MUA_OUTLOOK=1.927, FREEMAIL_FORGED_REPLYTO=2.095, 
FREEMAIL_REPLYTO=1, FREEMAIL_REPLYTO_END_DIGIT=0.25, 
FROM_MISSPACED=0.001, FROM_MISSP_EH_MATCH=0.001, 
FROM_MISSP_FREEMAIL=0.001, FROM_MISSP_MSFT=0.001, 
FROM_MISSP_REPLYTO=2.497, FSL_BULK_SIG=0.001, FSL_CTYPE_WIN1251=0.001, 
FSL_NEW_HELO_USER=0.001, KHOP_HELO_FCRDNS=0.398, LOTS_OF_MONEY=0.001, 
MISSING_HEADERS=1.021, MISSING_MID=0.497, MONEY_FREEMAIL_REPTO=1.202, 
MONEY_FROM_MISSP=0.001, MONEY_NOHTML=2.497, NSL_RCVD_HELO_USER=0.001, 
PYZOR_CHECK=1.392, REPLYTO_WITHOUT_TO_CC=1.552, REPTO_419_FRAUD=2.996, 
SPF_HELO_NONE=0.001, TO_NO_BRKTS_FROM_MSSP=1.593, 
TO_NO_BRKTS_MSFT=1.888, XFER_LOTSA_MONEY=0.001] autolearn=no 
autolearn_force=no


Thank you in advance for your help. If you need any more examples or 
would us to run some tests, then feel free to let me know.


looks like most of those are meta rules:

header FREEMAIL_REPLYTO_END_DIGIT
header MISSING_HEADERS
body BAYES_50
header SPF_HELO_NONE
header FSL_CTYPE_WIN1251
header NSL_RCVD_HELO_USER
header REPTO_419_FRAUD

score FREEMAIL_REPLYTO_END_DIGIT 0.25
score MISSING_HEADERS 0.915 1.207 1.204 1.021
score SPF_HELO_NONE 0.001

so you don't have points from body rules.

your mentioned URI_DEOBFU_INSTR is a meta rule:

meta URI_DEOBFU_INSTR __URI_DEOBFU_INSTR && !__MSGID_OK_HOST

so maybe it's not considered.


--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Linux IS user friendly, it's just selective who its friends are...


Re: How do I search and capture text for use in a rule?

2021-05-09 Thread Jared Hall

On 5/8/2021 11:56 AM, Loren Wilton wrote:
I think the OP was trying to find a way to match "To: " 
to "Hi user".


   Loren


Correct you are.  I've been eyeballing that myself for CHAOS.

If you have other examples (like "Hi there $USER_PART," "Hello 
$USER_PART:", "Dear Esteemed $USER_PART", etc.) let me know.



Thanks.

-- Jared Hall









Re: Bayes autolearn: how does it resolve whether rules are body or header related?

2021-05-09 Thread RW
On Sun, 9 May 2021 04:17:26 +0200
Bert Van de Poel wrote:


> Within the same realm, I'm also wondering whether these expected
> numbers for body and header can be tweaked and if so, how.

You can create a meta-rule for definite spam and set:
 
tflags  autolearn_force

a hit on any rule with this flag set causes the 3+3 check to be
ignored. It does nothing else.



One thing that does look wrong is that maybe_body_only() looks
for:

(($type == $TYPE_BODY_TESTS) || ($type == $TYPE_BODY_EVALS)
|| ($type == $TYPE_URI_TESTS) || ($type == $TYPE_URI_EVALS))

so it's missing any rawbody and full rules. 


Specifically Pyzor, Razor2 and DCC are full eval rules.