Re: correct AWL on training

Karsten Bräckelmann Thu, 04 Sep 2014 16:06:37 -0700

On Thu, 2014-09-04 at 09:11 -0600, Jesse Norell wrote:
> On Thu, 2014-09-04 at 13:04 +0200, Matus UHLAR - fantomas wrote:
> > On 03.09.14 15:13, Jesse Norell wrote:


> > >   Both today and in the past I've looked at some FP's that scored very
> > > high on AWL.  At least today I dug up the old messages that caused AWL
> > > to get out of line, and trained them as ham.  AWL's scores still show
> > > the high scores on those (in this case I manually corrected AWL).  It
> > > sure seems like manual training should at minimum remove the incorrect
> > > score from AWL, if not actually make an adjustment in the opposite
> > > direction.

I can see how one could wish for this.

However, keep in mind those are entirely unrelated sub-systems. The AWL
really only is a rather simple historic score-averager.

In this context it is also important to note, that sa-learn is Bayes
only. Any other type of reporting is spamc or spamassassin, including
AWL manipulation. The spamassassin executable notably is the only one
that actually can handle both.

The AWL manipulating options are rather limited, offering addition of a
high scoring positive or negative entry, or plain removal of an address.
In particular unlike Bayes, AWL doesn't work on a per-message basis.
Forgetting a single message's history entry is not supported.


> > spamassassin has options for manipulating adress list:
> > --add-to-whitelist --add-to-blacklist --remove-from-whitelist
> > --add-addr-to-whitelist --add-addr-to-blacklist --remove-addr-from-whitelist
> > 
> > and you can clean up AWL by using sa-awl.
> 
>   I can as an admin, but pop/imap users can't.  They can access the
> spam/ham training, it just doesn't correct the AWL data any.  In this

So you implemented a feedback / training mechanism for Bayes for your
POP or IMAP users. SA doesn't provide it.

> case I'm looking at, a few messages came in first that got AWL way off,
> and now training it as ham (which is hard enough to get users to do)
> doesn't help the situation.  (Some of our systems allow the user access
> to whitelist, but unfortunately this one doesn't - they can't "fix it".)

Bayes training will have an effect of ~5.5 at max, which is the extreme
between BAYES_00 and 999. Real life effect of training is commonly about
half of that max. This is likely to not suffice "way off" AWL scores.
Besides you're trying to correct AWL by Bayes training.


The question is: Why was the AWL score way off in the first place?

In your FP case, why have (more than one?) messages from that sender
address, originating from a given net-block been classified spam before?
Even worse, given AWL now was "way off" and pulled the score above
threshold, the previous messages recorded in AWL are not just spam, but
with a high score. Again, why?


> > >   Ie. after training, AWL had score of ~47 from 7 messages.  Seems like
> > > those FP scores should be subtracted, and even another -5 per message
> > > trained wouldn't hurt.  Likewise, FN should adjust AWL upwards on manual
> > > training, no?
> > 
> > I am not sure how should the manual training be done when talking about AWL.
> > The only way I think is to remove the address from AWL.
> 
>   Just adjust the score would be another option.  "AWL, you got it
> wrong, lets take the score the other direction."  (or at least undue the
> mistake/damage it just did)  You could have a config option for how much
> adjustment to make in the other direction (maybe 3 to 5ish?).

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: correct AWL on training

Reply via email to