Re: APOSTROPHE_TOCC score

2018-03-06 Thread John Hardin

On Tue, 6 Mar 2018, David Jones wrote:


On 03/06/2018 12:54 PM, John Hardin wrote:

On Tue, 6 Mar 2018, RW wrote:


On Tue, 6 Mar 2018 08:47:35 -0800 (PST)
John Hardin wrote:


On Tue, 6 Mar 2018, David Jones wrote:



In this case these were really bad spam so the APOSTROPHE_TOCC is
just riding on the back of other rules, BLs, and high Bayes
scores.


What I generally look at is the detailed rule performance in
masscheck. If it primarily hits on spams that score in total 1-3
points.


Why not under 5?


If it's close to 5 and there's a limit that suggests the limit could be 
increased a bit.


It also needs to take into account the ham hits, which is why having a 
ham-starved corpus is such a problem.


Are you saying we have a ham-starved corpus?


We have at times in the past. When you're performing analyses like this 
you need to bear in mind the size of the ham corpus.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Failure to plan ahead on someone else's part does not constitute
  an emergency on my part. -- David W. Barts in a.s.r
---
 5 days until Daylight Saving Time begins in U.S. - Spring Forward


Re: APOSTROPHE_TOCC score

2018-03-06 Thread David Jones

On 03/06/2018 12:54 PM, John Hardin wrote:

On Tue, 6 Mar 2018, RW wrote:


On Tue, 6 Mar 2018 08:47:35 -0800 (PST)
John Hardin wrote:


On Tue, 6 Mar 2018, David Jones wrote:



In this case these were really bad spam so the APOSTROPHE_TOCC is
just riding on the back of other rules, BLs, and high Bayes
scores.


What I generally look at is the detailed rule performance in
masscheck. If it primarily hits on spams that score in total 1-3
points.


Why not under 5?


If it's close to 5 and there's a limit that suggests the limit could be 
increased a bit.


It also needs to take into account the ham hits, which is why having a 
ham-starved corpus is such a problem.




Are you saying we have a ham-starved corpus?

OVERALL  SPAMHAM
ena-week0   77,945  36,459  41,486
ena-week1   93,847  52,781  41,066
ena-week2   69,297  30,328  38,969
ena-week3   75,853  31,995  43,858
ena-week4   92,680  37,511  55,169
409,622 189,074 220,548 

http://ruleqa.spamassassin.org

--
David Jones


Re: APOSTROPHE_TOCC score

2018-03-06 Thread John Hardin

On Tue, 6 Mar 2018, RW wrote:


On Tue, 6 Mar 2018 08:47:35 -0800 (PST)
John Hardin wrote:


On Tue, 6 Mar 2018, David Jones wrote:



In this case these were really bad spam so the APOSTROPHE_TOCC is
just riding on the back of other rules, BLs, and high Bayes
scores.


What I generally look at is the detailed rule performance in
masscheck. If it primarily hits on spams that score in total 1-3
points.


Why not under 5?


If it's close to 5 and there's a limit that suggests the limit could be 
increased a bit.


It also needs to take into account the ham hits, which is why having a 
ham-starved corpus is such a problem.


Generally speaking there's a spike, if the spike is at less than 5 it 
needs attention and the lower the spike is the more generous the score 
limit may be, bearing in mind that poison pills should be rare.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Failure to plan ahead on someone else's part does not constitute
  an emergency on my part. -- David W. Barts in a.s.r
---
 5 days until Daylight Saving Time begins in U.S. - Spring Forward


Re: APOSTROPHE_TOCC score

2018-03-06 Thread RW
On Tue, 6 Mar 2018 08:47:35 -0800 (PST)
John Hardin wrote:

> On Tue, 6 Mar 2018, David Jones wrote:

> > In this case these were really bad spam so the APOSTROPHE_TOCC is
> > just riding on the back of other rules, BLs, and high Bayes
> > scores.  
> 
> What I generally look at is the detailed rule performance in
> masscheck. If it primarily hits on spams that score in total 1-3
> points.

Why not under 5?


Re: APOSTROPHE_TOCC score

2018-03-06 Thread John Hardin

On Tue, 6 Mar 2018, David Jones wrote:


On 03/05/2018 06:57 PM, John Hardin wrote:

On Mon, 5 Mar 2018, Alex wrote:


Hi,

On Mon, Mar 5, 2018 at 5:59 PM, John Hardin  wrote:

On Mon, 5 Mar 2018, Alex wrote:


To: =?utf-8?Q?DermotO=27reilly?= 
*  2.6 APOSTROPHE_TOCC To or CC address contains an apostrophe

2.6 points for this is just unreasonable. This was a completely
legitimate email.


Is such an address even deliverable?


Yes, it's beyond me why anyone would want to use an apostrophe, but
it's valid.


OK.

That rule is 8 years stale. I've added a masscheck score limit of 1.000

I'm open to discussion of converting it to a subrule and/or adding some 
extra conditions to it.




Here are some samples of what I found in my corpora which supplies the 
majority of the nightly masscheck corpora.


https://pastebin.com/QchEu2BA
https://pastebin.com/pbYnvzU4
https://pastebin.com/EjnQSE7H

In this case these were really bad spam so the APOSTROPHE_TOCC is just riding 
on the back of other rules, BLs, and high Bayes scores.


What I generally look at is the detailed rule performance in masscheck. If 
it primarily hits on spams that score in total 1-3 points I generally 
tend to set the score limit somewhat higher. Having a tail of 
higher-scoring hits doesn't affect that analysis.


This looks like one of those rules.

In this case I'd probably set the score limit on this rule low and add 
more generously-scored metas for the high-spam-low-ham rule overlaps from 
the masscheck results.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Failure to plan ahead on someone else's part does not constitute
  an emergency on my part. -- David W. Barts in a.s.r
---
 5 days until Daylight Saving Time begins in U.S. - Spring Forward

Re: APOSTROPHE_TOCC score

2018-03-06 Thread David Jones

On 03/05/2018 06:57 PM, John Hardin wrote:

On Mon, 5 Mar 2018, Alex wrote:


Hi,

On Mon, Mar 5, 2018 at 5:59 PM, John Hardin  wrote:

On Mon, 5 Mar 2018, Alex wrote:


To: =?utf-8?Q?DermotO=27reilly?= 
*  2.6 APOSTROPHE_TOCC To or CC address contains an apostrophe

2.6 points for this is just unreasonable. This was a completely
legitimate email.


Is such an address even deliverable?


Yes, it's beyond me why anyone would want to use an apostrophe, but
it's valid.


OK.

That rule is 8 years stale. I've added a masscheck score limit of 1.000

I'm open to discussion of converting it to a subrule and/or adding some 
extra conditions to it.




Here are some samples of what I found in my corpora which supplies the 
majority of the nightly masscheck corpora.


https://pastebin.com/QchEu2BA
https://pastebin.com/pbYnvzU4
https://pastebin.com/EjnQSE7H

In this case these were really bad spam so the APOSTROPHE_TOCC is just 
riding on the back of other rules, BLs, and high Bayes scores.


--
David Jones


Re: APOSTROPHE_TOCC score

2018-03-05 Thread John Hardin

On Mon, 5 Mar 2018, Alex wrote:


Hi,

On Mon, Mar 5, 2018 at 5:59 PM, John Hardin  wrote:

On Mon, 5 Mar 2018, Alex wrote:


To: =?utf-8?Q?DermotO=27reilly?= 
*  2.6 APOSTROPHE_TOCC To or CC address contains an apostrophe

2.6 points for this is just unreasonable. This was a completely
legitimate email.


Is such an address even deliverable?


Yes, it's beyond me why anyone would want to use an apostrophe, but
it's valid.


OK.

That rule is 8 years stale. I've added a masscheck score limit of 1.000

I'm open to discussion of converting it to a subrule and/or adding some 
extra conditions to it.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Failure to plan ahead on someone else's part does not constitute
  an emergency on my part. -- David W. Barts in a.s.r
---
 6 days until Daylight Saving Time begins in U.S. - Spring Forward


Re: APOSTROPHE_TOCC score

2018-03-05 Thread Alex
Hi,

On Mon, Mar 5, 2018 at 5:59 PM, John Hardin  wrote:
> On Mon, 5 Mar 2018, Alex wrote:
>
>> To: =?utf-8?Q?DermotO=27reilly?= 
>> *  2.6 APOSTROPHE_TOCC To or CC address contains an apostrophe
>>
>> 2.6 points for this is just unreasonable. This was a completely
>> legitimate email.
>
> Is such an address even deliverable?

Yes, it's beyond me why anyone would want to use an apostrophe, but
it's valid. We discourage its use because it just makes sharing your
address more difficult, and there's also probably some weird system
that doesn't know how to handle it out there.

https://en.wikipedia.org/wiki/Email_address#Local-part


Re: APOSTROPHE_TOCC score

2018-03-05 Thread John Hardin

On Mon, 5 Mar 2018, Alex wrote:


To: =?utf-8?Q?DermotO=27reilly?= 
*  2.6 APOSTROPHE_TOCC To or CC address contains an apostrophe

2.6 points for this is just unreasonable. This was a completely
legitimate email.


Is such an address even deliverable?


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Failure to plan ahead on someone else's part does not constitute
  an emergency on my part. -- David W. Barts in a.s.r
---
 6 days until Daylight Saving Time begins in U.S. - Spring Forward


Re: APOSTROPHE_TOCC score

2018-03-05 Thread RW
On Mon, 5 Mar 2018 16:28:33 -0600
David Jones wrote:

> On 03/05/2018 04:20 PM, John Hardin wrote:
> > On Mon, 5 Mar 2018, Alex wrote:
> >   
> >> 2.6 points for this is just unreasonable. This was a completely
> >> legitimate email.  
> > 
> > What is the S/O in masscheck?
> >   
> 
> http://ruleqa.spamassassin.org/20180304-r1825801-n/APOSTROPHE_TOCC/detail
> 
> It's a high S/O in the masscheck but I don't think that alone is an 
> indicator of spam.  I need to check my ena corpora to see what is
> going on there.
> 
> This rule should probably be limited to a max of 1.0.

Or perhaps change the rule from:

  headerAPOSTROPHE_TOCC ToCc:addr =~ /'/

to: 

  headerAPOSTROPHE_TOCC ToCc:addr =~ /[^do]'/


Re: APOSTROPHE_TOCC score

2018-03-05 Thread David Jones

On 03/05/2018 04:20 PM, John Hardin wrote:

On Mon, 5 Mar 2018, Alex wrote:


2.6 points for this is just unreasonable. This was a completely
legitimate email.


What is the S/O in masscheck?



http://ruleqa.spamassassin.org/20180304-r1825801-n/APOSTROPHE_TOCC/detail

It's a high S/O in the masscheck but I don't think that alone is an 
indicator of spam.  I need to check my ena corpora to see what is going 
on there.


This rule should probably be limited to a max of 1.0.

--
David Jones


Re: APOSTROPHE_TOCC score

2018-03-05 Thread John Hardin

On Mon, 5 Mar 2018, Alex wrote:


2.6 points for this is just unreasonable. This was a completely
legitimate email.


What is the S/O in masscheck?

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Failure to plan ahead on someone else's part does not constitute
  an emergency on my part. -- David W. Barts in a.s.r
---
 6 days until Daylight Saving Time begins in U.S. - Spring Forward


Re: APOSTROPHE_TOCC score

2018-03-05 Thread Alex
Hi,


On Mon, Mar 5, 2018 at 4:48 PM, RW  wrote:
> On Mon, 5 Mar 2018 16:23:31 -0500
> Alex wrote:
>
>> Hi,
>>
>> I just received a false-positive because of the following address:
>>
>> To: "'i...@example.se'" 
>>
>> Apparently the apostrophe is enough to warrant 2.5 points alone? Is
>> this intended to catch addresses like tom.o'rei...@example.com or more
>> like my example above?
>
> Only the former, but I can't reproduce the bug from the above example.

I'm sorry, too many terminals open. The email producing this hit was
indeed with o'reilly in it:

To: =?utf-8?Q?DermotO=27reilly?= 
 *  2.6 APOSTROPHE_TOCC To or CC address contains an apostrophe

2.6 points for this is just unreasonable. This was a completely
legitimate email.


Re: APOSTROPHE_TOCC score

2018-03-05 Thread RW
On Mon, 5 Mar 2018 16:23:31 -0500
Alex wrote:

> Hi,
> 
> I just received a false-positive because of the following address:
> 
> To: "'i...@example.se'" 
> 
> Apparently the apostrophe is enough to warrant 2.5 points alone? Is
> this intended to catch addresses like tom.o'rei...@example.com or more
> like my example above?

Only the former, but I can't reproduce the bug from the above example.