Re: APOSTROPHE_TOCC score
On Tue, 6 Mar 2018, David Jones wrote: On 03/06/2018 12:54 PM, John Hardin wrote: On Tue, 6 Mar 2018, RW wrote: On Tue, 6 Mar 2018 08:47:35 -0800 (PST) John Hardin wrote: On Tue, 6 Mar 2018, David Jones wrote: In this case these were really bad spam so the APOSTROPHE_TOCC is just riding on the back of other rules, BLs, and high Bayes scores. What I generally look at is the detailed rule performance in masscheck. If it primarily hits on spams that score in total 1-3 points. Why not under 5? If it's close to 5 and there's a limit that suggests the limit could be increased a bit. It also needs to take into account the ham hits, which is why having a ham-starved corpus is such a problem. Are you saying we have a ham-starved corpus? We have at times in the past. When you're performing analyses like this you need to bear in mind the size of the ham corpus. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Failure to plan ahead on someone else's part does not constitute an emergency on my part. -- David W. Barts in a.s.r --- 5 days until Daylight Saving Time begins in U.S. - Spring Forward
Re: APOSTROPHE_TOCC score
On 03/06/2018 12:54 PM, John Hardin wrote: On Tue, 6 Mar 2018, RW wrote: On Tue, 6 Mar 2018 08:47:35 -0800 (PST) John Hardin wrote: On Tue, 6 Mar 2018, David Jones wrote: In this case these were really bad spam so the APOSTROPHE_TOCC is just riding on the back of other rules, BLs, and high Bayes scores. What I generally look at is the detailed rule performance in masscheck. If it primarily hits on spams that score in total 1-3 points. Why not under 5? If it's close to 5 and there's a limit that suggests the limit could be increased a bit. It also needs to take into account the ham hits, which is why having a ham-starved corpus is such a problem. Are you saying we have a ham-starved corpus? OVERALL SPAMHAM ena-week0 77,945 36,459 41,486 ena-week1 93,847 52,781 41,066 ena-week2 69,297 30,328 38,969 ena-week3 75,853 31,995 43,858 ena-week4 92,680 37,511 55,169 409,622 189,074 220,548 http://ruleqa.spamassassin.org -- David Jones
Re: APOSTROPHE_TOCC score
On Tue, 6 Mar 2018, RW wrote: On Tue, 6 Mar 2018 08:47:35 -0800 (PST) John Hardin wrote: On Tue, 6 Mar 2018, David Jones wrote: In this case these were really bad spam so the APOSTROPHE_TOCC is just riding on the back of other rules, BLs, and high Bayes scores. What I generally look at is the detailed rule performance in masscheck. If it primarily hits on spams that score in total 1-3 points. Why not under 5? If it's close to 5 and there's a limit that suggests the limit could be increased a bit. It also needs to take into account the ham hits, which is why having a ham-starved corpus is such a problem. Generally speaking there's a spike, if the spike is at less than 5 it needs attention and the lower the spike is the more generous the score limit may be, bearing in mind that poison pills should be rare. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Failure to plan ahead on someone else's part does not constitute an emergency on my part. -- David W. Barts in a.s.r --- 5 days until Daylight Saving Time begins in U.S. - Spring Forward
Re: APOSTROPHE_TOCC score
On Tue, 6 Mar 2018 08:47:35 -0800 (PST) John Hardin wrote: > On Tue, 6 Mar 2018, David Jones wrote: > > In this case these were really bad spam so the APOSTROPHE_TOCC is > > just riding on the back of other rules, BLs, and high Bayes > > scores. > > What I generally look at is the detailed rule performance in > masscheck. If it primarily hits on spams that score in total 1-3 > points. Why not under 5?
Re: APOSTROPHE_TOCC score
On Tue, 6 Mar 2018, David Jones wrote: On 03/05/2018 06:57 PM, John Hardin wrote: On Mon, 5 Mar 2018, Alex wrote: Hi, On Mon, Mar 5, 2018 at 5:59 PM, John Hardin wrote: On Mon, 5 Mar 2018, Alex wrote: To: =?utf-8?Q?DermotO=27reilly?= * 2.6 APOSTROPHE_TOCC To or CC address contains an apostrophe 2.6 points for this is just unreasonable. This was a completely legitimate email. Is such an address even deliverable? Yes, it's beyond me why anyone would want to use an apostrophe, but it's valid. OK. That rule is 8 years stale. I've added a masscheck score limit of 1.000 I'm open to discussion of converting it to a subrule and/or adding some extra conditions to it. Here are some samples of what I found in my corpora which supplies the majority of the nightly masscheck corpora. https://pastebin.com/QchEu2BA https://pastebin.com/pbYnvzU4 https://pastebin.com/EjnQSE7H In this case these were really bad spam so the APOSTROPHE_TOCC is just riding on the back of other rules, BLs, and high Bayes scores. What I generally look at is the detailed rule performance in masscheck. If it primarily hits on spams that score in total 1-3 points I generally tend to set the score limit somewhat higher. Having a tail of higher-scoring hits doesn't affect that analysis. This looks like one of those rules. In this case I'd probably set the score limit on this rule low and add more generously-scored metas for the high-spam-low-ham rule overlaps from the masscheck results. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Failure to plan ahead on someone else's part does not constitute an emergency on my part. -- David W. Barts in a.s.r --- 5 days until Daylight Saving Time begins in U.S. - Spring Forward
Re: APOSTROPHE_TOCC score
On 03/05/2018 06:57 PM, John Hardin wrote: On Mon, 5 Mar 2018, Alex wrote: Hi, On Mon, Mar 5, 2018 at 5:59 PM, John Hardin wrote: On Mon, 5 Mar 2018, Alex wrote: To: =?utf-8?Q?DermotO=27reilly?= * 2.6 APOSTROPHE_TOCC To or CC address contains an apostrophe 2.6 points for this is just unreasonable. This was a completely legitimate email. Is such an address even deliverable? Yes, it's beyond me why anyone would want to use an apostrophe, but it's valid. OK. That rule is 8 years stale. I've added a masscheck score limit of 1.000 I'm open to discussion of converting it to a subrule and/or adding some extra conditions to it. Here are some samples of what I found in my corpora which supplies the majority of the nightly masscheck corpora. https://pastebin.com/QchEu2BA https://pastebin.com/pbYnvzU4 https://pastebin.com/EjnQSE7H In this case these were really bad spam so the APOSTROPHE_TOCC is just riding on the back of other rules, BLs, and high Bayes scores. -- David Jones
Re: APOSTROPHE_TOCC score
On Mon, 5 Mar 2018, Alex wrote: Hi, On Mon, Mar 5, 2018 at 5:59 PM, John Hardin wrote: On Mon, 5 Mar 2018, Alex wrote: To: =?utf-8?Q?DermotO=27reilly?= * 2.6 APOSTROPHE_TOCC To or CC address contains an apostrophe 2.6 points for this is just unreasonable. This was a completely legitimate email. Is such an address even deliverable? Yes, it's beyond me why anyone would want to use an apostrophe, but it's valid. OK. That rule is 8 years stale. I've added a masscheck score limit of 1.000 I'm open to discussion of converting it to a subrule and/or adding some extra conditions to it. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Failure to plan ahead on someone else's part does not constitute an emergency on my part. -- David W. Barts in a.s.r --- 6 days until Daylight Saving Time begins in U.S. - Spring Forward
Re: APOSTROPHE_TOCC score
Hi, On Mon, Mar 5, 2018 at 5:59 PM, John Hardin wrote: > On Mon, 5 Mar 2018, Alex wrote: > >> To: =?utf-8?Q?DermotO=27reilly?= >> * 2.6 APOSTROPHE_TOCC To or CC address contains an apostrophe >> >> 2.6 points for this is just unreasonable. This was a completely >> legitimate email. > > Is such an address even deliverable? Yes, it's beyond me why anyone would want to use an apostrophe, but it's valid. We discourage its use because it just makes sharing your address more difficult, and there's also probably some weird system that doesn't know how to handle it out there. https://en.wikipedia.org/wiki/Email_address#Local-part
Re: APOSTROPHE_TOCC score
On Mon, 5 Mar 2018, Alex wrote: To: =?utf-8?Q?DermotO=27reilly?= * 2.6 APOSTROPHE_TOCC To or CC address contains an apostrophe 2.6 points for this is just unreasonable. This was a completely legitimate email. Is such an address even deliverable? -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Failure to plan ahead on someone else's part does not constitute an emergency on my part. -- David W. Barts in a.s.r --- 6 days until Daylight Saving Time begins in U.S. - Spring Forward
Re: APOSTROPHE_TOCC score
On Mon, 5 Mar 2018 16:28:33 -0600 David Jones wrote: > On 03/05/2018 04:20 PM, John Hardin wrote: > > On Mon, 5 Mar 2018, Alex wrote: > > > >> 2.6 points for this is just unreasonable. This was a completely > >> legitimate email. > > > > What is the S/O in masscheck? > > > > http://ruleqa.spamassassin.org/20180304-r1825801-n/APOSTROPHE_TOCC/detail > > It's a high S/O in the masscheck but I don't think that alone is an > indicator of spam. I need to check my ena corpora to see what is > going on there. > > This rule should probably be limited to a max of 1.0. Or perhaps change the rule from: headerAPOSTROPHE_TOCC ToCc:addr =~ /'/ to: headerAPOSTROPHE_TOCC ToCc:addr =~ /[^do]'/
Re: APOSTROPHE_TOCC score
On 03/05/2018 04:20 PM, John Hardin wrote: On Mon, 5 Mar 2018, Alex wrote: 2.6 points for this is just unreasonable. This was a completely legitimate email. What is the S/O in masscheck? http://ruleqa.spamassassin.org/20180304-r1825801-n/APOSTROPHE_TOCC/detail It's a high S/O in the masscheck but I don't think that alone is an indicator of spam. I need to check my ena corpora to see what is going on there. This rule should probably be limited to a max of 1.0. -- David Jones
Re: APOSTROPHE_TOCC score
On Mon, 5 Mar 2018, Alex wrote: 2.6 points for this is just unreasonable. This was a completely legitimate email. What is the S/O in masscheck? -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Failure to plan ahead on someone else's part does not constitute an emergency on my part. -- David W. Barts in a.s.r --- 6 days until Daylight Saving Time begins in U.S. - Spring Forward
Re: APOSTROPHE_TOCC score
Hi, On Mon, Mar 5, 2018 at 4:48 PM, RW wrote: > On Mon, 5 Mar 2018 16:23:31 -0500 > Alex wrote: > >> Hi, >> >> I just received a false-positive because of the following address: >> >> To: "'i...@example.se'" >> >> Apparently the apostrophe is enough to warrant 2.5 points alone? Is >> this intended to catch addresses like tom.o'rei...@example.com or more >> like my example above? > > Only the former, but I can't reproduce the bug from the above example. I'm sorry, too many terminals open. The email producing this hit was indeed with o'reilly in it: To: =?utf-8?Q?DermotO=27reilly?= * 2.6 APOSTROPHE_TOCC To or CC address contains an apostrophe 2.6 points for this is just unreasonable. This was a completely legitimate email.
Re: APOSTROPHE_TOCC score
On Mon, 5 Mar 2018 16:23:31 -0500 Alex wrote: > Hi, > > I just received a false-positive because of the following address: > > To: "'i...@example.se'" > > Apparently the apostrophe is enough to warrant 2.5 points alone? Is > this intended to catch addresses like tom.o'rei...@example.com or more > like my example above? Only the former, but I can't reproduce the bug from the above example.