Re: bayes_auto_learn default value

2022-02-08 Thread Henrik K
On Tue, Feb 08, 2022 at 05:33:59AM -0500, Kevin A. McGrail wrote:
> 
> Since we don't seem to have consensus on changing the default does anybody
> object to a pre-file that disables it? That would be more clearly documented 
> in
> people will look at the pre-file for V4.

Good grief, bundled pre-files are not meant for config clauses.  They are
supposed to load modules.  Defaults are to be changed in the _codebase_
(Conf.pm) and not in a pre-file, which might or might not be loaded by
someone.  It makes no sense to have defaults in two separate places.

-1 for default change anyway, why bother.



Re: bayes_auto_learn default value

2022-02-08 Thread John Hardin

On Tue, 8 Feb 2022, Kevin A. McGrail wrote:


Since we don't seem to have consensus on changing the default does anybody
object to a pre-file that disables it? That would be more clearly
documented in people will look at the pre-file for V4.


+1 for making it explicitly disabled in the v4.0 PRE file.

I was going to respond "I have no objections" to the initial request but 
I've been recovering from a hardware failure in my mail server... :(




Regards, KAM

On Tue, Feb 8, 2022, 04:43 Giovanni Bechis  wrote:


On 2/7/22 20:03, Henrik K wrote:


On Mon, Feb 07, 2022 at 06:32:18PM +0100, Giovanni Bechis wrote:

Hi,
as per Mail::SpamAssassin::Conf(3), bayes_auto_learn defaults to 1/true.
Is anybody against changing its default value to 0/false on trunk (aka

SpamAssassin 4.x) ?


What is the reasoning for this proposal?


IMHO using autolearn without a correct learning process frequently poisons
bayes data, I think bayes_auto_learn should be enabled only if you know
what you are doing and not by default.
I understand that changing a default value now could be a problem for
users.
 Giovanni





--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The yardstick you should use when considering whether to support a
  given piece of legislation is "what if my worst enemy is chosen to
  administer this law?"
---
 74 more days working to pay your (average) annual US tax bill
 before you're finally working for yourself.


Re: bayes_auto_learn default value

2022-02-08 Thread Bill Cole
On 2022-02-08 at 07:46:17 UTC-0500 (Tue, 8 Feb 2022 13:46:17 +0100)
Axb 
is rumored to have said:

> On 2/8/22 11:33, Kevin A. McGrail wrote:
>> Auto learning is something that should never of existed. All it does is
>> reinforce misclassification and slowly spirals the database into having
>> wrong answers be more wrong.
>
> I don't agree - I've been running autoloearn for years and my bayes results 
> have always been solid.
> (and I'm speaking of a global bayes redis DB in a 200k user setup)

With substantially smaller systems (my own personal server and those I manage 
for my employer) I have the same benign experience. I don't think we should 
disable auto-learn by default *in any way* without actual research and hard 
data beyond anecdotal experience.


> Where I see potential is in optimizing auto expiration when using a file 
> based DB. Very often DB is locked and tokens cannot be expired which leads to 
> what you call "reinforce misclassification". If tokens are expired regularly, 
> skewing is very improbable.
> Thankfully, using Redis, it's way more controllable.

I think that's also not a problem for systems that are not persistently loaded 
with in-process mail.

All we see as SA maintainers are our own systems and cases that people are 
having problems with. I don't think we really know whether auto-learn works 
well generally or why/how it breaks when it does.

>> Since we don't seem to have consensus on changing the default does anybody
>> object to a pre-file that disables it? That would be more clearly
>> documented in people will look at the pre-file for V4.
>
> I'm -1 for disabling, one way or another.

Same. It would substantially change how peoples' existing stable systems 
operate.

I'm less averse to tweaking default auto-learning parameters. In ALL cases 
where I use auto-learn I have reduced both thresholds, so I learn as ham ONLY 
mail with negative scores (< -0.1, so effectively at least 2 ham-signs...) and 
learn as spam substantially more than just the absurdly spammy stuff. This 
sacrifices some overall effectiveness in theory but I think it also helps make 
Bayes less brittle. I have NOT done rigorous testing to prove that.

I believe that SA has reached the point of broad use where we should be making 
substantial change decisions based on hard data rather than anecdote and lore.

-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: bayes_auto_learn default value

2022-02-08 Thread Axb

On 2/8/22 11:33, Kevin A. McGrail wrote:

Auto learning is something that should never of existed. All it does is
reinforce misclassification and slowly spirals the database into having
wrong answers be more wrong.


I don't agree - I've been running autoloearn for years and my bayes 
results have always been solid.

(and I'm speaking of a global bayes redis DB in a 200k user setup)

Where I see potential is in optimizing auto expiration when using a file 
based DB. Very often DB is locked and tokens cannot be expired which 
leads to what you call "reinforce misclassification". If tokens are 
expired regularly, skewing is very improbable.

Thankfully, using Redis, it's way more controllable.


Since we don't seem to have consensus on changing the default does anybody
object to a pre-file that disables it? That would be more clearly
documented in people will look at the pre-file for V4.


I'm -1 for disabling, one way or another.



Regards, KAM

On Tue, Feb 8, 2022, 04:43 Giovanni Bechis  wrote:


On 2/7/22 20:03, Henrik K wrote:


On Mon, Feb 07, 2022 at 06:32:18PM +0100, Giovanni Bechis wrote:

Hi,
as per Mail::SpamAssassin::Conf(3), bayes_auto_learn defaults to 1/true.
Is anybody against changing its default value to 0/false on trunk (aka

SpamAssassin 4.x) ?


What is the reasoning for this proposal?


IMHO using autolearn without a correct learning process frequently poisons
bayes data, I think bayes_auto_learn should be enabled only if you know
what you are doing and not by default.
I understand that changing a default value now could be a problem for
users.
  Giovanni







Re: bayes_auto_learn default value

2022-02-08 Thread Kevin A. McGrail
Auto learning is something that should never of existed. All it does is
reinforce misclassification and slowly spirals the database into having
wrong answers be more wrong.

Since we don't seem to have consensus on changing the default does anybody
object to a pre-file that disables it? That would be more clearly
documented in people will look at the pre-file for V4.

Regards, KAM

On Tue, Feb 8, 2022, 04:43 Giovanni Bechis  wrote:

> On 2/7/22 20:03, Henrik K wrote:
> >
> > On Mon, Feb 07, 2022 at 06:32:18PM +0100, Giovanni Bechis wrote:
> >> Hi,
> >> as per Mail::SpamAssassin::Conf(3), bayes_auto_learn defaults to 1/true.
> >> Is anybody against changing its default value to 0/false on trunk (aka
> SpamAssassin 4.x) ?
> >
> > What is the reasoning for this proposal?
> >
> IMHO using autolearn without a correct learning process frequently poisons
> bayes data, I think bayes_auto_learn should be enabled only if you know
> what you are doing and not by default.
> I understand that changing a default value now could be a problem for
> users.
>  Giovanni
>


Re: bayes_auto_learn default value

2022-02-08 Thread Giovanni Bechis
On 2/7/22 20:03, Henrik K wrote:
> 
> On Mon, Feb 07, 2022 at 06:32:18PM +0100, Giovanni Bechis wrote:
>> Hi,
>> as per Mail::SpamAssassin::Conf(3), bayes_auto_learn defaults to 1/true.
>> Is anybody against changing its default value to 0/false on trunk (aka 
>> SpamAssassin 4.x) ?
> 
> What is the reasoning for this proposal?
> 
IMHO using autolearn without a correct learning process frequently poisons 
bayes data, I think bayes_auto_learn should be enabled only if you know what 
you are doing and not by default.
I understand that changing a default value now could be a problem for users.
 Giovanni


OpenPGP_signature
Description: OpenPGP digital signature