Re: bayes_auto_learn default value
On Tue, Feb 08, 2022 at 05:33:59AM -0500, Kevin A. McGrail wrote: > > Since we don't seem to have consensus on changing the default does anybody > object to a pre-file that disables it? That would be more clearly documented > in > people will look at the pre-file for V4. Good grief, bundled pre-files are not meant for config clauses. They are supposed to load modules. Defaults are to be changed in the _codebase_ (Conf.pm) and not in a pre-file, which might or might not be loaded by someone. It makes no sense to have defaults in two separate places. -1 for default change anyway, why bother.
Re: bayes_auto_learn default value
On Tue, 8 Feb 2022, Kevin A. McGrail wrote: Since we don't seem to have consensus on changing the default does anybody object to a pre-file that disables it? That would be more clearly documented in people will look at the pre-file for V4. +1 for making it explicitly disabled in the v4.0 PRE file. I was going to respond "I have no objections" to the initial request but I've been recovering from a hardware failure in my mail server... :( Regards, KAM On Tue, Feb 8, 2022, 04:43 Giovanni Bechis wrote: On 2/7/22 20:03, Henrik K wrote: On Mon, Feb 07, 2022 at 06:32:18PM +0100, Giovanni Bechis wrote: Hi, as per Mail::SpamAssassin::Conf(3), bayes_auto_learn defaults to 1/true. Is anybody against changing its default value to 0/false on trunk (aka SpamAssassin 4.x) ? What is the reasoning for this proposal? IMHO using autolearn without a correct learning process frequently poisons bayes data, I think bayes_auto_learn should be enabled only if you know what you are doing and not by default. I understand that changing a default value now could be a problem for users. Giovanni -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.org pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- The yardstick you should use when considering whether to support a given piece of legislation is "what if my worst enemy is chosen to administer this law?" --- 74 more days working to pay your (average) annual US tax bill before you're finally working for yourself.
Re: bayes_auto_learn default value
On 2022-02-08 at 07:46:17 UTC-0500 (Tue, 8 Feb 2022 13:46:17 +0100) Axb is rumored to have said: > On 2/8/22 11:33, Kevin A. McGrail wrote: >> Auto learning is something that should never of existed. All it does is >> reinforce misclassification and slowly spirals the database into having >> wrong answers be more wrong. > > I don't agree - I've been running autoloearn for years and my bayes results > have always been solid. > (and I'm speaking of a global bayes redis DB in a 200k user setup) With substantially smaller systems (my own personal server and those I manage for my employer) I have the same benign experience. I don't think we should disable auto-learn by default *in any way* without actual research and hard data beyond anecdotal experience. > Where I see potential is in optimizing auto expiration when using a file > based DB. Very often DB is locked and tokens cannot be expired which leads to > what you call "reinforce misclassification". If tokens are expired regularly, > skewing is very improbable. > Thankfully, using Redis, it's way more controllable. I think that's also not a problem for systems that are not persistently loaded with in-process mail. All we see as SA maintainers are our own systems and cases that people are having problems with. I don't think we really know whether auto-learn works well generally or why/how it breaks when it does. >> Since we don't seem to have consensus on changing the default does anybody >> object to a pre-file that disables it? That would be more clearly >> documented in people will look at the pre-file for V4. > > I'm -1 for disabling, one way or another. Same. It would substantially change how peoples' existing stable systems operate. I'm less averse to tweaking default auto-learning parameters. In ALL cases where I use auto-learn I have reduced both thresholds, so I learn as ham ONLY mail with negative scores (< -0.1, so effectively at least 2 ham-signs...) and learn as spam substantially more than just the absurdly spammy stuff. This sacrifices some overall effectiveness in theory but I think it also helps make Bayes less brittle. I have NOT done rigorous testing to prove that. I believe that SA has reached the point of broad use where we should be making substantial change decisions based on hard data rather than anecdote and lore. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Not Currently Available For Hire
Re: bayes_auto_learn default value
On 2/8/22 11:33, Kevin A. McGrail wrote: Auto learning is something that should never of existed. All it does is reinforce misclassification and slowly spirals the database into having wrong answers be more wrong. I don't agree - I've been running autoloearn for years and my bayes results have always been solid. (and I'm speaking of a global bayes redis DB in a 200k user setup) Where I see potential is in optimizing auto expiration when using a file based DB. Very often DB is locked and tokens cannot be expired which leads to what you call "reinforce misclassification". If tokens are expired regularly, skewing is very improbable. Thankfully, using Redis, it's way more controllable. Since we don't seem to have consensus on changing the default does anybody object to a pre-file that disables it? That would be more clearly documented in people will look at the pre-file for V4. I'm -1 for disabling, one way or another. Regards, KAM On Tue, Feb 8, 2022, 04:43 Giovanni Bechis wrote: On 2/7/22 20:03, Henrik K wrote: On Mon, Feb 07, 2022 at 06:32:18PM +0100, Giovanni Bechis wrote: Hi, as per Mail::SpamAssassin::Conf(3), bayes_auto_learn defaults to 1/true. Is anybody against changing its default value to 0/false on trunk (aka SpamAssassin 4.x) ? What is the reasoning for this proposal? IMHO using autolearn without a correct learning process frequently poisons bayes data, I think bayes_auto_learn should be enabled only if you know what you are doing and not by default. I understand that changing a default value now could be a problem for users. Giovanni
Re: bayes_auto_learn default value
Auto learning is something that should never of existed. All it does is reinforce misclassification and slowly spirals the database into having wrong answers be more wrong. Since we don't seem to have consensus on changing the default does anybody object to a pre-file that disables it? That would be more clearly documented in people will look at the pre-file for V4. Regards, KAM On Tue, Feb 8, 2022, 04:43 Giovanni Bechis wrote: > On 2/7/22 20:03, Henrik K wrote: > > > > On Mon, Feb 07, 2022 at 06:32:18PM +0100, Giovanni Bechis wrote: > >> Hi, > >> as per Mail::SpamAssassin::Conf(3), bayes_auto_learn defaults to 1/true. > >> Is anybody against changing its default value to 0/false on trunk (aka > SpamAssassin 4.x) ? > > > > What is the reasoning for this proposal? > > > IMHO using autolearn without a correct learning process frequently poisons > bayes data, I think bayes_auto_learn should be enabled only if you know > what you are doing and not by default. > I understand that changing a default value now could be a problem for > users. > Giovanni >
Re: bayes_auto_learn default value
On 2/7/22 20:03, Henrik K wrote: > > On Mon, Feb 07, 2022 at 06:32:18PM +0100, Giovanni Bechis wrote: >> Hi, >> as per Mail::SpamAssassin::Conf(3), bayes_auto_learn defaults to 1/true. >> Is anybody against changing its default value to 0/false on trunk (aka >> SpamAssassin 4.x) ? > > What is the reasoning for this proposal? > IMHO using autolearn without a correct learning process frequently poisons bayes data, I think bayes_auto_learn should be enabled only if you know what you are doing and not by default. I understand that changing a default value now could be a problem for users. Giovanni OpenPGP_signature Description: OpenPGP digital signature