Re: Why aren't base rules rescoring?
On Sun, 16 Jun 2019, Henrik K wrote: It's not like the whole world uses 5 as a baseline, people might also have all kinds of local poison pill rules. 8-10 seems quite ok to use and I remember some wiki page even recommending that. The recommendation is 5 points for marking as spam, 10 points for considering auto-quarantine or auto-discard. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Public Education: the bureaucratic process of replacing an empty mind with a closed one. -- Thorax --- 2 days until SWMBO's Birthday
Re: Why aren't base rules rescoring?
Agreed, I'll certainly think of how to improve this and similar problems that crop up from time to time. As mentioned, I'm wanting to get the masses working "as they should" at the minute (I think this is mostly done apart from nice rule rescores) and then going for improvements to the actual process and reliability. It's been this way for many years, change is needed - as is careful consideration. Enjoy the weekend!! Paul On Sun, 16 Jun 2019 at 09:08, Henrik K wrote: > > I figured it does something like that, probably fine for most of those > rules > that don't hit much mail at all. Then we have stuff that hit 20%+ of ham > like STYLE_GIBBERISH, probably the rescorer should take that more into > account instead of just "crunching numbers". :-) It's not like the whole > world uses 5 as a baseline, people might also have all kinds of local > poison > pill rules. 8-10 seems quite ok to use and I remember some wiki page even > recommending that. > > > On Sun, Jun 16, 2019 at 08:42:57AM +0100, Paul Stead wrote: > > So let's look at the following rule which isn't promotable in QA: [1] > https://ruleqa.spamassassin.org/20190615-r1861371-n/URI_WP_HACKED_2/detail > > > > This has a publish tflag. > > > > Because of the publish tflag it is included in the active.list > > > > Because it's in the active.list it is considered for rescoring. > > > > When it is rescored, the iterative process scores against both ham and > spam in several thousand iterations for the rules from the rev# of that day. > > During these iterations the score that came out triggered minimal FPs > (ham mail > 5.0) and helped towards the spam score the best. > > > > The rescore seems to be doing the right thing in my opinion. > > It might show scores for rules that hit more ham than spam on the qa > site, but during the check of the corpus the score generated triggered > minimal emails hitting FPs. > > > > > > Paul > > > > > > On Sat, 15 Jun 2019 at 18:06, John Hardin <[2]jhar...@impsec.org> wrote: > > > > On Fri, 14 Jun 2019, Henrik K wrote: > > > > > PS. John, all these rules from your sandbox seem to have very > broken > > > scores, could you perhaps add informative scores to > > > [3]73_sandbox_manual_scores.cf for these? Atleast that method > should > > work > > > 100% for now.. > > > > > > FROM_IN_TO_AND_SUBJ 2.199 > > > OBFU_TEXT_ATTACH 1.699 > > > MIME_NO_TEXT 1.542 > > > AD_PREFS 1.399 > > > URI_WP_HACKED_2 1.304 > > > STYLE_GIBBERISH 1.111 > > > UC_GIBBERISH_OBFU 1.000 > > > LUCRATIVE 1.000 > > > HEXHASH_WORD 1.000 > > > FROM_WORDY 1.000 > > > AC_HTML_NONSENSE_TAGS 1.000 > > > LONG_HEX_URI 0.896 > > > FROM_PAYPAL_SPOOF 0.727 > > > > Not all of those are in my sandbox. For example, > AC_HTML_NONSENSE_TAGS is > > in KAM's. > > > > I spent some time today (which I did not have yesterday) to review > and > > update the tuning on many of those rules to improve their S/O. > > > > I also tried adding scores to [4]73_sandbox_manual_scores.cf for > them to > > suppress the net scores until those changes can be evaluated by the > weekly > > masscheck, but ran into a problem - see SA bug 7721. > > > > The tuning should minimize the problem from the stale net scores, so > I'm > > reluctant to alter their global scores, except for AD_PREFS, which > is a > > very simple rule that seems to be falling afoul of a lot of > "legitimate" > > marketing emails (i.e. actually subscribed to) in the masscheck ham > > corpora and thus can't really be tuned. > > > > > > -- > > John Hardin KA7OHZ[5] > http://www.impsec.org/~jhardin/ > > [6]jhar...@impsec.orgFALaholic #11174 pgpk -a [7] > > jhar...@impsec.org > > key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 > 2E79 > > > --- > >Are you a mildly tech-literate politico horrified by the level of > >ignorance demonstrated by lawmakers gearing up to regulate online > >technology they don't even begin to grasp? Cool. Now you have a > >tiny glimpse into a day in the life of a gun owner. -- Sean > Davis > > > --- > > 3 days until SWMBO's Birthday > > > > > > References: > > > > [1] > https://ruleqa.spamassassin.org/20190615-r1861371-n/URI_WP_HACKED_2/detail > > [2] mailto:jhar...@impsec.org > > [3] http://73_sandbox_manual_scores.cf/ > > [4] http://73_sandbox_manual_scores.cf/ > > [5] http://www.impsec.org/~jhardin/ > > [6] mailto:jhar...@impsec.org > > [7] mailto:jhar...@impsec.org >
Re: Why aren't base rules rescoring?
I figured it does something like that, probably fine for most of those rules that don't hit much mail at all. Then we have stuff that hit 20%+ of ham like STYLE_GIBBERISH, probably the rescorer should take that more into account instead of just "crunching numbers". :-) It's not like the whole world uses 5 as a baseline, people might also have all kinds of local poison pill rules. 8-10 seems quite ok to use and I remember some wiki page even recommending that. On Sun, Jun 16, 2019 at 08:42:57AM +0100, Paul Stead wrote: > So let's look at the following rule which isn't promotable in QA: > [1]https://ruleqa.spamassassin.org/20190615-r1861371-n/URI_WP_HACKED_2/detail > > This has a publish tflag. > > Because of the publish tflag it is included in the active.list > > Because it's in the active.list it is considered for rescoring. > > When it is rescored, the iterative process scores against both ham and spam > in several thousand iterations for the rules from the rev# of that day. > During these iterations the score that came out triggered minimal FPs (ham > mail > 5.0) and helped towards the spam score the best. > > The rescore seems to be doing the right thing in my opinion. > It might show scores for rules that hit more ham than spam on the qa site, > but during the check of the corpus the score generated triggered minimal > emails hitting FPs. > > > Paul > > > On Sat, 15 Jun 2019 at 18:06, John Hardin <[2]jhar...@impsec.org> wrote: > > On Fri, 14 Jun 2019, Henrik K wrote: > > > PS. John, all these rules from your sandbox seem to have very broken > > scores, could you perhaps add informative scores to > > [3]73_sandbox_manual_scores.cf for these? Atleast that method should > work > > 100% for now.. > > > > FROM_IN_TO_AND_SUBJ 2.199 > > OBFU_TEXT_ATTACH 1.699 > > MIME_NO_TEXT 1.542 > > AD_PREFS 1.399 > > URI_WP_HACKED_2 1.304 > > STYLE_GIBBERISH 1.111 > > UC_GIBBERISH_OBFU 1.000 > > LUCRATIVE 1.000 > > HEXHASH_WORD 1.000 > > FROM_WORDY 1.000 > > AC_HTML_NONSENSE_TAGS 1.000 > > LONG_HEX_URI 0.896 > > FROM_PAYPAL_SPOOF 0.727 > > Not all of those are in my sandbox. For example, AC_HTML_NONSENSE_TAGS is > in KAM's. > > I spent some time today (which I did not have yesterday) to review and > update the tuning on many of those rules to improve their S/O. > > I also tried adding scores to [4]73_sandbox_manual_scores.cf for them to > suppress the net scores until those changes can be evaluated by the weekly > masscheck, but ran into a problem - see SA bug 7721. > > The tuning should minimize the problem from the stale net scores, so I'm > reluctant to alter their global scores, except for AD_PREFS, which is a > very simple rule that seems to be falling afoul of a lot of "legitimate" > marketing emails (i.e. actually subscribed to) in the masscheck ham > corpora and thus can't really be tuned. > > > -- > John Hardin KA7OHZ [5]http://www.impsec.org/~jhardin/ > [6]jhar...@impsec.org FALaholic #11174 pgpk -a [7] > jhar...@impsec.org > key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 > --- > Are you a mildly tech-literate politico horrified by the level of > ignorance demonstrated by lawmakers gearing up to regulate online > technology they don't even begin to grasp? Cool. Now you have a > tiny glimpse into a day in the life of a gun owner. -- Sean Davis > --- > 3 days until SWMBO's Birthday > > > References: > > [1] https://ruleqa.spamassassin.org/20190615-r1861371-n/URI_WP_HACKED_2/detail > [2] mailto:jhar...@impsec.org > [3] http://73_sandbox_manual_scores.cf/ > [4] http://73_sandbox_manual_scores.cf/ > [5] http://www.impsec.org/~jhardin/ > [6] mailto:jhar...@impsec.org > [7] mailto:jhar...@impsec.org
Re: Why aren't base rules rescoring?
So let's look at the following rule which isn't promotable in QA: https://ruleqa.spamassassin.org/20190615-r1861371-n/URI_WP_HACKED_2/detail This has a publish tflag. Because of the publish tflag it is included in the active.list Because it's in the active.list it is considered for rescoring. When it is rescored, the iterative process scores against both ham and spam in several thousand iterations for the rules from the rev# of that day. During these iterations the score that came out triggered minimal FPs (ham mail > 5.0) and helped towards the spam score the best. The rescore seems to be doing the right thing in my opinion. It might show scores for rules that hit more ham than spam on the qa site, but during the check of the corpus the score generated triggered minimal emails hitting FPs. Paul On Sat, 15 Jun 2019 at 18:06, John Hardin wrote: > On Fri, 14 Jun 2019, Henrik K wrote: > > > PS. John, all these rules from your sandbox seem to have very broken > > scores, could you perhaps add informative scores to > > 73_sandbox_manual_scores.cf for these? Atleast that method should work > > 100% for now.. > > > > FROM_IN_TO_AND_SUBJ 2.199 > > OBFU_TEXT_ATTACH 1.699 > > MIME_NO_TEXT 1.542 > > AD_PREFS 1.399 > > URI_WP_HACKED_2 1.304 > > STYLE_GIBBERISH 1.111 > > UC_GIBBERISH_OBFU 1.000 > > LUCRATIVE 1.000 > > HEXHASH_WORD 1.000 > > FROM_WORDY 1.000 > > AC_HTML_NONSENSE_TAGS 1.000 > > LONG_HEX_URI 0.896 > > FROM_PAYPAL_SPOOF 0.727 > > Not all of those are in my sandbox. For example, AC_HTML_NONSENSE_TAGS is > in KAM's. > > I spent some time today (which I did not have yesterday) to review and > update the tuning on many of those rules to improve their S/O. > > I also tried adding scores to 73_sandbox_manual_scores.cf for them to > suppress the net scores until those changes can be evaluated by the weekly > masscheck, but ran into a problem - see SA bug 7721. > > The tuning should minimize the problem from the stale net scores, so I'm > reluctant to alter their global scores, except for AD_PREFS, which is a > very simple rule that seems to be falling afoul of a lot of "legitimate" > marketing emails (i.e. actually subscribed to) in the masscheck ham > corpora and thus can't really be tuned. > > > -- > John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ > jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org > key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 > --- >Are you a mildly tech-literate politico horrified by the level of >ignorance demonstrated by lawmakers gearing up to regulate online >technology they don't even begin to grasp? Cool. Now you have a >tiny glimpse into a day in the life of a gun owner. -- Sean Davis > --- > 3 days until SWMBO's Birthday >
Re: Why aren't base rules rescoring?
On Fri, 14 Jun 2019, Henrik K wrote: PS. John, all these rules from your sandbox seem to have very broken scores, could you perhaps add informative scores to 73_sandbox_manual_scores.cf for these? Atleast that method should work 100% for now.. FROM_IN_TO_AND_SUBJ 2.199 OBFU_TEXT_ATTACH 1.699 MIME_NO_TEXT 1.542 AD_PREFS 1.399 URI_WP_HACKED_2 1.304 STYLE_GIBBERISH 1.111 UC_GIBBERISH_OBFU 1.000 LUCRATIVE 1.000 HEXHASH_WORD 1.000 FROM_WORDY 1.000 AC_HTML_NONSENSE_TAGS 1.000 LONG_HEX_URI 0.896 FROM_PAYPAL_SPOOF 0.727 Not all of those are in my sandbox. For example, AC_HTML_NONSENSE_TAGS is in KAM's. I spent some time today (which I did not have yesterday) to review and update the tuning on many of those rules to improve their S/O. I also tried adding scores to 73_sandbox_manual_scores.cf for them to suppress the net scores until those changes can be evaluated by the weekly masscheck, but ran into a problem - see SA bug 7721. The tuning should minimize the problem from the stale net scores, so I'm reluctant to alter their global scores, except for AD_PREFS, which is a very simple rule that seems to be falling afoul of a lot of "legitimate" marketing emails (i.e. actually subscribed to) in the masscheck ham corpora and thus can't really be tuned. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Are you a mildly tech-literate politico horrified by the level of ignorance demonstrated by lawmakers gearing up to regulate online technology they don't even begin to grasp? Cool. Now you have a tiny glimpse into a day in the life of a gun owner. -- Sean Davis --- 3 days until SWMBO's Birthday
Re: Why aren't base rules rescoring?
On Fri, Jun 14, 2019 at 04:46:00PM -0700, John Hardin wrote: > > One point does not a poison pill make. I have disagree in this context. Having a very limited amount of masscheckers running on code that's been created over decade ago, does not fool proof scoring make. Even a score of 1 could make a difference to classification for some people. Always better to be safe regarding FPs. > Primarily remembering to undo a temporary fix. Hardly worse than "remembering to fix it after the weekend". That's why I committed the 73.cf myself, as I had no idea if you are even online anymore.
Re: Why aren't base rules rescoring?
On Fri, 14 Jun 2019, Henrik K wrote: On Fri, Jun 14, 2019 at 08:11:11AM -0700, John Hardin wrote: On Fri, 14 Jun 2019, Paul Stead wrote: On Fri, 14 Jun 2019 at 12:37, Paul Stead wrote: existing setup work * work is a relative term, hopefully by Sunday's masscheck and rescore things will be a little more even for weeklies/nets Yeah; I'd rather wait until the weekly has a look before making any manual score changes. I would say this is failing attitude. There is clearly bad automatic scoring going on and massive amounts of people's mail are affected due to this (atleast with the gibberish stuff). One point does not a poison pill make. I spent a little time poking at __STYLE_GIBBERISH_01 and couldn't improve it, so I've disabled the entire set until I can give it some more focused attention. Using 73_sandbox_manual_scores.cf should not affect rescorer in any way, right? Adjust scores there now and IF after the weekend something actually scores for the better, it's a simple matter of commenting them out. The file is already full of your old comments, so what's the problem? Primarily remembering to undo a temporary fix. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- It is not the business of government to make men virtuous or religious, or to preserve the fool from the consequences of his own folly. -- Henry George --- 4 days until SWMBO's Birthday
Re: Why aren't base rules rescoring?
On 6/14/2019 7:32 AM, Henrik K wrote: > I don't have access to sa-vm boxes That can be fixed. Just ask! -- Kevin A. McGrail Member, Apache Software Foundation Chair Emeritus Apache SpamAssassin Project https://www.linkedin.com/in/kmcgrail - 703.798.0171
Re: Why aren't base rules rescoring?
On Fri, Jun 14, 2019 at 08:11:11AM -0700, John Hardin wrote: > On Fri, 14 Jun 2019, Paul Stead wrote: > > >On Fri, 14 Jun 2019 at 12:37, Paul Stead wrote: > > > >> existing setup work > > > >* work is a relative term, hopefully by Sunday's masscheck and rescore > >things will be a little more even for weeklies/nets > > Yeah; I'd rather wait until the weekly has a look before making any manual > score changes. I would say this is failing attitude. There is clearly bad automatic scoring going on and massive amounts of people's mail are affected due to this (atleast with the gibberish stuff). Using 73_sandbox_manual_scores.cf should not affect rescorer in any way, right? Adjust scores there now and IF after the weekend something actually scores for the better, it's a simple matter of commenting them out. The file is already full of your old comments, so what's the problem?
Re: Why aren't base rules rescoring?
On Fri, 14 Jun 2019, Paul Stead wrote: On Fri, 14 Jun 2019 at 12:37, Paul Stead wrote: existing setup work * work is a relative term, hopefully by Sunday's masscheck and rescore things will be a little more even for weeklies/nets Yeah; I'd rather wait until the weekly has a look before making any manual score changes. I'm much more concerned about why the S/O for __STYLE_GIBBERISH_1 went so far south so suddenly... -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- People think they're trading chaos for order [by ceding more and more power to the Government], but they're just trading normal human evil for the really dangerous organized kind of evil, the kind that simply does not give a shit. Only bureaucrats can give you true evil. -- Larry Correia --- 4 days until SWMBO's Birthday
Re: Why aren't base rules rescoring?
On Fri, 14 Jun 2019, Henrik K wrote: Something else I just noticed.. How can a rule that's been performing like this for atleast the past few weeks, SPAM% HAM%S/O RANKSCORE NAMEWHO/AGE 3.3833 25.8343 0.116 0.370.00STYLE_GIBBERISH Whoa. I was not aware its S/O had deteriorated that badly... -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Are you a mildly tech-literate politico horrified by the level of ignorance demonstrated by lawmakers gearing up to regulate online technology they don't even begin to grasp? Cool. Now you have a tiny glimpse into a day in the life of a gun owner. -- Sean Davis --- 4 days until SWMBO's Birthday
Re: Why aren't base rules rescoring?
On Fri, 14 Jun 2019 at 12:37, Paul Stead wrote: > existing setup work > * work is a relative term, hopefully by Sunday's masscheck and rescore things will be a little more even for weeklies/nets
Re: Why aren't base rules rescoring?
The setup as-is is quite fragile, errors cascade. I'd strongly advise against putting anything to svn that hasn't been triple checked. That said I've already fixed a big number of bugs within code to make the existing setup work, but I too have a semi-working QA setup here, but most of the existing stuff isn't flexible enough to allow for mis-runs or non-exact planetary alignment. Agreed, feels more like a working group approach would be needed tbh, the big fixes aren't a case of tweak this or that Paul
Re: Why aren't base rules rescoring?
I can look around and try to make my private ruleqa clone work again. I'm afraid to commit anything though, as I don't have access to sa-vm boxes, so I can't even fix anything then if needed. Perhaps open up a new bug to investigate, so we can better keep track on what everyone is doing. PS. John, all these rules from your sandbox seem to have very broken scores, could you perhaps add informative scores to 73_sandbox_manual_scores.cf for these? Atleast that method should work 100% for now.. FROM_IN_TO_AND_SUBJ 2.199 OBFU_TEXT_ATTACH 1.699 MIME_NO_TEXT 1.542 AD_PREFS 1.399 URI_WP_HACKED_2 1.304 STYLE_GIBBERISH 1.111 UC_GIBBERISH_OBFU 1.000 LUCRATIVE 1.000 HEXHASH_WORD 1.000 FROM_WORDY 1.000 AC_HTML_NONSENSE_TAGS 1.000 LONG_HEX_URI 0.896 FROM_PAYPAL_SPOOF 0.727 On Fri, Jun 14, 2019 at 11:15:27AM +, Paul Stead wrote: > This is because net scoring is done weekly, see my previous comment about > missing rescores and 2 week delays - some of the logic is flawed. We've > missed 2 weeks of net rescoring on top of this. > > Aware of the issues, working on it - if you would like to chip in I'd welcome > the input, the code is quite fragmented. > > Paul > > ???On 14/06/2019, 11:43, "Henrik K" wrote: > > > Just cleaned up worst offenders in 50_scores.cf that have low S/O in > ruleqa. > > Other pending examples in sandbox like > > 0 0 0.0010 0.000 0.46 1.00 UC_GIBBERISH_OBFU > 0 0.0008 0.0060 0.119 0.45 1.00 HEXHASH_WORD > > IMO rules should not even be allowed a score over 0.001 with such low > hitrates, something is very wrong. > > > On Fri, Jun 14, 2019 at 11:55:35AM +0300, Henrik K wrote: > > > > Something else I just noticed.. > > > > How can a rule that's been performing like this for atleast the past > few weeks, > > > > MSECS SPAM% HAM% S/O RANK SCORE NAME WHO/AGE > > 0 3.3833 25.8343 0.116 0.37 0.00 STYLE_GIBBERISH > > > > ... result in score of 1.111 with network checks on?? > > > > 72_scores.cf:score STYLE_GIBBERISH 0.001 1.111 > 0.001 1.111 > > > > Doesn't seem like the scoring logic is working properly.. > > -- > Paul Stead > Senior Engineer > Zen Internet > Direct: 01706 902018 > Web: zen.co.uk > > Winner of 'Services Company of the Year' at the UK IT Industry Awards > > This message is private and confidential. If you have received this message > in error, please notify us and remove it from your system. > > Zen Internet Limited may monitor email traffic data to manage billing, to > handle customer enquiries and for the prevention and detection of fraud. We > may also monitor the content of emails sent to and/or from Zen Internet > Limited for the purposes of security, staff training and to monitor quality > of service. > > Zen Internet Limited is registered in England and Wales, Sandbrook Park, > Sandbrook Way, Rochdale, OL11 1RY Company No. 03101568 VAT Reg No. 686 0495 01
Re: Why aren't base rules rescoring?
This is because net scoring is done weekly, see my previous comment about missing rescores and 2 week delays - some of the logic is flawed. We've missed 2 weeks of net rescoring on top of this. Aware of the issues, working on it - if you would like to chip in I'd welcome the input, the code is quite fragmented. Paul On 14/06/2019, 11:43, "Henrik K" wrote: Just cleaned up worst offenders in 50_scores.cf that have low S/O in ruleqa. Other pending examples in sandbox like 0 0 0.0010 0.000 0.46 1.00 UC_GIBBERISH_OBFU 0 0.0008 0.0060 0.119 0.45 1.00 HEXHASH_WORD IMO rules should not even be allowed a score over 0.001 with such low hitrates, something is very wrong. On Fri, Jun 14, 2019 at 11:55:35AM +0300, Henrik K wrote: > > Something else I just noticed.. > > How can a rule that's been performing like this for atleast the past few weeks, > > MSECS SPAM% HAM% S/O RANK SCORE NAME WHO/AGE > 0 3.3833 25.8343 0.116 0.37 0.00 STYLE_GIBBERISH > > ... result in score of 1.111 with network checks on?? > > 72_scores.cf:score STYLE_GIBBERISH 0.001 1.111 0.001 1.111 > > Doesn't seem like the scoring logic is working properly.. -- Paul Stead Senior Engineer Zen Internet Direct: 01706 902018 Web: zen.co.uk Winner of 'Services Company of the Year' at the UK IT Industry Awards This message is private and confidential. If you have received this message in error, please notify us and remove it from your system. Zen Internet Limited may monitor email traffic data to manage billing, to handle customer enquiries and for the prevention and detection of fraud. We may also monitor the content of emails sent to and/or from Zen Internet Limited for the purposes of security, staff training and to monitor quality of service. Zen Internet Limited is registered in England and Wales, Sandbrook Park, Sandbrook Way, Rochdale, OL11 1RY Company No. 03101568 VAT Reg No. 686 0495 01
Re: Why aren't base rules rescoring?
Just cleaned up worst offenders in 50_scores.cf that have low S/O in ruleqa. Other pending examples in sandbox like 0 0 0.0010 0.000 0.461.00UC_GIBBERISH_OBFU 0 0.0008 0.0060 0.119 0.451.00HEXHASH_WORD IMO rules should not even be allowed a score over 0.001 with such low hitrates, something is very wrong. On Fri, Jun 14, 2019 at 11:55:35AM +0300, Henrik K wrote: > > Something else I just noticed.. > > How can a rule that's been performing like this for atleast the past few > weeks, > > MSECS SPAM% HAM%S/O RANKSCORE NAMEWHO/AGE > 0 3.3833 25.8343 0.116 0.370.00STYLE_GIBBERISH > > ... result in score of 1.111 with network checks on?? > > 72_scores.cf:score STYLE_GIBBERISH 0.001 1.111 0.001 > 1.111 > > Doesn't seem like the scoring logic is working properly.. > > > > On Thu, Jun 13, 2019 at 04:15:08PM +, Paul Stead wrote: > > Historically this was done periodically - it's not been done for a long > > time. > > > > I've been working on the QA system - it's definitely feasible to get all of > > the rules going past the QA eyes and a score assigned automatically. > > > > I'd like to iron out a few of the kinks and bugs within QA before pursuing > > this - it's currently overly complex and too many edge cases and exceptions > > to count - though I'm squishing the big ones in place as I see them. > > Currently a bad sandbox rule can break the daily releases which in turn > > could end up with an empty ruleset if things landed correctly. > > > > I think it should be thought about and if right to do implemented with a > > concise re-look at the QA scripts, their purpose and chronological ordering > > - some rules can take up to 2 weeks to be QAd and released, others take > > 24-48 hours, depending. I'd like this to be more predictable and reliable. > > > > Paul > > > > ???On 13/06/2019, 16:58, "Henrik K" wrote: > > > > Continuing on list. > > > > I've been wondering about this, 50_scores.cf is never updated > > automatically. > > When is that supposed to be done? > > > > Should we move all rules inside sandbox so things actually start scoring > > automatically? Lol. > > > > > > > > -- > > Paul Stead > > Senior Engineer > > Zen Internet > > Direct: 01706 902018 > > Web: zen.co.uk > > > > Winner of 'Services Company of the Year' at the UK IT Industry Awards > > > > This message is private and confidential. If you have received this message > > in error, please notify us and remove it from your system. > > > > Zen Internet Limited may monitor email traffic data to manage billing, to > > handle customer enquiries and for the prevention and detection of fraud. We > > may also monitor the content of emails sent to and/or from Zen Internet > > Limited for the purposes of security, staff training and to monitor quality > > of service. > > > > Zen Internet Limited is registered in England and Wales, Sandbrook Park, > > Sandbrook Way, Rochdale, OL11 1RY Company No. 03101568 VAT Reg No. 686 0495 > > 01
Re: Why aren't base rules rescoring?
Something else I just noticed.. How can a rule that's been performing like this for atleast the past few weeks, MSECS SPAM% HAM%S/O RANKSCORE NAMEWHO/AGE 0 3.3833 25.8343 0.116 0.370.00STYLE_GIBBERISH ... result in score of 1.111 with network checks on?? 72_scores.cf:score STYLE_GIBBERISH 0.001 1.111 0.001 1.111 Doesn't seem like the scoring logic is working properly.. On Thu, Jun 13, 2019 at 04:15:08PM +, Paul Stead wrote: > Historically this was done periodically - it's not been done for a long time. > > I've been working on the QA system - it's definitely feasible to get all of > the rules going past the QA eyes and a score assigned automatically. > > I'd like to iron out a few of the kinks and bugs within QA before pursuing > this - it's currently overly complex and too many edge cases and exceptions > to count - though I'm squishing the big ones in place as I see them. > Currently a bad sandbox rule can break the daily releases which in turn could > end up with an empty ruleset if things landed correctly. > > I think it should be thought about and if right to do implemented with a > concise re-look at the QA scripts, their purpose and chronological ordering - > some rules can take up to 2 weeks to be QAd and released, others take 24-48 > hours, depending. I'd like this to be more predictable and reliable. > > Paul > > ???On 13/06/2019, 16:58, "Henrik K" wrote: > > Continuing on list. > > I've been wondering about this, 50_scores.cf is never updated > automatically. > When is that supposed to be done? > > Should we move all rules inside sandbox so things actually start scoring > automatically? Lol. > > > > -- > Paul Stead > Senior Engineer > Zen Internet > Direct: 01706 902018 > Web: zen.co.uk > > Winner of 'Services Company of the Year' at the UK IT Industry Awards > > This message is private and confidential. If you have received this message > in error, please notify us and remove it from your system. > > Zen Internet Limited may monitor email traffic data to manage billing, to > handle customer enquiries and for the prevention and detection of fraud. We > may also monitor the content of emails sent to and/or from Zen Internet > Limited for the purposes of security, staff training and to monitor quality > of service. > > Zen Internet Limited is registered in England and Wales, Sandbrook Park, > Sandbrook Way, Rochdale, OL11 1RY Company No. 03101568 VAT Reg No. 686 0495 01
Re: Why aren't base rules rescoring?
Historically this was done periodically - it's not been done for a long time. I've been working on the QA system - it's definitely feasible to get all of the rules going past the QA eyes and a score assigned automatically. I'd like to iron out a few of the kinks and bugs within QA before pursuing this - it's currently overly complex and too many edge cases and exceptions to count - though I'm squishing the big ones in place as I see them. Currently a bad sandbox rule can break the daily releases which in turn could end up with an empty ruleset if things landed correctly. I think it should be thought about and if right to do implemented with a concise re-look at the QA scripts, their purpose and chronological ordering - some rules can take up to 2 weeks to be QAd and released, others take 24-48 hours, depending. I'd like this to be more predictable and reliable. Paul On 13/06/2019, 16:58, "Henrik K" wrote: Continuing on list. I've been wondering about this, 50_scores.cf is never updated automatically. When is that supposed to be done? Should we move all rules inside sandbox so things actually start scoring automatically? Lol. -- Paul Stead Senior Engineer Zen Internet Direct: 01706 902018 Web: zen.co.uk Winner of 'Services Company of the Year' at the UK IT Industry Awards This message is private and confidential. If you have received this message in error, please notify us and remove it from your system. Zen Internet Limited may monitor email traffic data to manage billing, to handle customer enquiries and for the prevention and detection of fraud. We may also monitor the content of emails sent to and/or from Zen Internet Limited for the purposes of security, staff training and to monitor quality of service. Zen Internet Limited is registered in England and Wales, Sandbrook Park, Sandbrook Way, Rochdale, OL11 1RY Company No. 03101568 VAT Reg No. 686 0495 01