Re: AWL functionality messed up?

2009-05-28 Thread Jeff Mincy
   From: Linda Walsh sa-u...@tlinx.org
   Date: Wed, 27 May 2009 17:28:35 -0700
   
   Jeff Mincy wrote:
   From: Linda Walsh sa-u...@tlinx.org
   Date: Wed, 27 May 2009 12:48:43 -0700
   
   Bowie Bailey wrote:  
   At face value, this seems very counter productive.
   
You still aren't understanding the wiki or the AWL scoring or what AWL
is trying to do.
   
Ah, but it only seems I'm daft, today...:-)
   
   If I get spam from 1000 senders, they all end up in my
   AWL???
   
yes.   every email+ip address pair that sends you email winds up in
your AWL with an average score for that pair.  This is ok.
   
GRRRnot so ok in my mindset, but ... and ... errr..
   well that only makes it more confusing, in a way...since I was
   only 99% certain that I'd never gotten any HAM from hostname
   '518501.com' (thinking for a short period that AWL might be classify
   things by hosts as reliable or not, instead of, or in addition to
   by email-addr), but I'm 99.97% certain I've never gotten any HAM
   from user 'paypal.notify' (at) hostname '5185
   
It is using the relay IP address, not the hostname...
You've most likely received some other spam from this email+ip pair
that was scored as ham.  Hard to tell without seeing the original
scores.
   
   AWL should only be added to by emails judged to be 'ham' via
   the feed back mechanisms --, spammers shouldn't get bonuses for
   being repeat senders...
   
You are getting too attached to the 'whitelist' part of the name.
Pretend AWL stands for average weighting list.
   =
Aw...come on.  Isn't the world difficult enough without
   changing white to black or white to weighing?  I mean, we humans
   have enough trouble agreeing on what our symbols, words mean in
   relation to concepts and all without ya goin' and redefining perfectly
   good acceptable symbols to mean something else completely and still
   claim it to be some semblance of English.   No wonder most of the
   non-techno-literate humans on this world regard us techies with
   a hint of suspicion regarding the difficulty of problems.  We go around
   redefining words to suit reality and catch the heat when the rest of
   the world doesn't understand our meaning:
   
I don't think AWL is the best possible name for the functionality,
simply because it is easy to misinterpret.

AWL isn't whitelisting spammers.   It is pushing the score to the
average for that sender.   The sender can have a high average or a low
average.   
   ---
An average?  So it keeps the scores of all the past emails of every 
email we 
   ever got sent?  Must just store a weighted average -- otherwise
   the space (hmm...someone said something about 80MB+ auto-whitelist DB
   files?)
   
AWL tracks the total score and the number of messages.

Why not call it the Historically Based Score Normalizer or
   HBSN module?  Db file could be historical-norms or something.
   
Call it BOB if that will help ...
   
If the previous email from a particular sender was FP or FN then AWL
will have an incorrect average and will wind up doing or trying to do
the wrong thing with subsequent email for that sender.
   
Maybe it shouldn't add in the 'average' unless it exceeds
   the 'auto-learning threshold'??  I.e. something like the
   'bayes_auto_learn_threshold_nonspam' for HAM and the
   'bayes_auto_learn_threshold_spam' for SPAM.  Assuming it doesn't
   already do such a thing, it would make a little sense...so as
   not to train it on 'bad data'...
   
Perhaps.   I don't have a particularly strong opinion.

When I run sa-learn --spam email over a message, can I
   assume (or is it the case) that telling SA, a message was 'spam'
   would assign a sufficiently large value to the 'HBSN' value for that
   sender to reduce any effect of having falsely (if it is likely to happen)
   incorrect value?
   
Nope.

Or might I at least assume that each sa-learn over a message
   will modify it's AWL score appropriately?
   
no.  You shouldn't assume.  sa-learn doesn't modify the AWL entry.
You can use spamassassin --add-to-blacklist.

You can remove addresses using spamassassin --remove-from-whitelist
   
Yes...saw that after visiting the wiki.  Is there a
   --show-whitelist-with-current-scores-and-their-weight switch as well
   (as opposed to one that only showed the addr's in the white list, or only
   showed the non-weighted scores)?
   
If I understand what you are asking for here, you can add an X-Spam-AWL
header that gives you the current scores:
  add_header all AWL awl=_AWL_, mean=_AWLMEAN_, count=_AWLCOUNT_, 
prescore=_AWLPRESCORE_
The awl scores are stored in a database file.  You can do db type
things with the awl file.

Thanks...and um...
How difficult would it be to have the name of the module reflect
   what it's 

AWL functionality messed up?

2009-05-27 Thread Linda Walsh

Bowie Bailey wrote:

Linda Walsh wrote:


I got a really poorly scored piece of spam -- one thing that stood out
as weird was report claimed the sender was in my AWL.


Any sender who has sent mail to you previously will be in your AWL.  
This is probably the most misunderstood component of SA.  Read the wiki.


http://wiki.apache.org/spamassassin/AutoWhitelist




At face value, this seems very counter productive.

If I get spam from 1000 senders, they all end up in my
AWL???

WTF?

AWL should only be added to by emails judged to be 'ham' via
the feed back mechanisms --, spammers shouldn't get bonuses for
being repeat senders...

How do I delete spammer addresses from my 'auto-white-list'?

(That's just insane..whitelisting spammers?!?!)




Re: AWL functionality messed up?

2009-05-27 Thread Bowie Bailey

Linda Walsh wrote:

Bowie Bailey wrote:

Linda Walsh wrote:


I got a really poorly scored piece of spam -- one thing that stood out
as weird was report claimed the sender was in my AWL.


Any sender who has sent mail to you previously will be in your AWL.  
This is probably the most misunderstood component of SA.  Read the wiki.


http://wiki.apache.org/spamassassin/AutoWhitelist




At face value, this seems very counter productive.

If I get spam from 1000 senders, they all end up in my
AWL???

WTF?

AWL should only be added to by emails judged to be 'ham' via
the feed back mechanisms --, spammers shouldn't get bonuses for
being repeat senders...

How do I delete spammer addresses from my 'auto-white-list'?

(That's just insane..whitelisting spammers?!?!)


Did you read the wiki link that I gave you???

Despite it's name, this is NOT a simple whitelist.  It is a score 
averaging system.  It will attempt to adjust a sender's score based on 
their past history.  So when a friend who normally sends low-scoring 
emails forwards you something that matches a bunch of spam rules, this 
will push the score back down towards his previous average.  Similarly, 
when a spammer sends something that doesn't match many rules, the score 
gets pushed back up towards his previous average.


Spammers don't get bonuses for being repeat senders, they get 
penalized.  Take another look:


http://wiki.apache.org/spamassassin/AutoWhitelist

and also:

http://wiki.apache.org/spamassassin/AwlWrongWay

--
Bowie


Re: AWL functionality messed up?

2009-05-27 Thread Jeff Mincy
   From: Linda Walsh sa-u...@tlinx.org
   Date: Wed, 27 May 2009 12:48:43 -0700
   
   Bowie Bailey wrote:
Linda Walsh wrote:
   
I got a really poorly scored piece of spam -- one thing that stood out
as weird was report claimed the sender was in my AWL.

Any sender who has sent mail to you previously will be in your AWL.  
This is probably the most misunderstood component of SA.  Read the wiki.

http://wiki.apache.org/spamassassin/AutoWhitelist
   
   
   At face value, this seems very counter productive.
   
You still aren't understanding the wiki or the AWL scoring or what AWL
is trying to do.

   If I get spam from 1000 senders, they all end up in my
   AWL???
   
yes.   every email+ip address pair that sends you email winds up in
your AWL with an average score for that pair.  This is ok.

   WTF?
   
   AWL should only be added to by emails judged to be 'ham' via
   the feed back mechanisms --, spammers shouldn't get bonuses for
   being repeat senders...
   
You are getting too attached to the 'whitelist' part of the name.
Pretend AWL stands for average weighting list.

   How do I delete spammer addresses from my 'auto-white-list'?
   
   (That's just insane..whitelisting spammers?!?!)

AWL isn't whitelisting spammers.   It is pushing the score to the
average for that sender.   The sender can have a high average or a low
average.   

If the previous email from a particular sender was FP or FN then AWL
will have an incorrect average and will wind up doing or trying to do
the wrong thing with subsequent email for that sender.

You can remove addresses using spamassassin --remove-from-whitelist

-jeff


Re: AWL functionality messed up?

2009-05-27 Thread LuKreme

On 27-May-2009, at 13:48, Linda Walsh wrote:

Bowie Bailey wrote:

Linda Walsh wrote:


I got a really poorly scored piece of spam -- one thing that stood  
out

as weird was report claimed the sender was in my AWL.
Any sender who has sent mail to you previously will be in your  
AWL.  This is probably the most misunderstood component of SA.   
Read the wiki.

http://wiki.apache.org/spamassassin/AutoWhitelist



At face value, this seems very counter productive.


At face value, you still haven't read the docs, have you?


If I get spam from 1000 senders, they all end up in my
AWL???


Yep.


WTF?


Read the docs.


AWL should only be added to by emails judged to be 'ham' via


No, you are confused. This is common, lots of people are confused  
about this. This is why many people think the name needs to be changed  
to Averaged Weight List or something similar.



the feed back mechanisms --, spammers shouldn't get bonuses for
being repeat senders...


that's not how the AWL works.  In fact, spammers get MORE points for  
being repeat senders.



How do I delete spammer addresses from my 'auto-white-list'?


That's a very bad idead.

--
++?++ Out of Cheese Error. Redo From Start.



Re: AWL functionality messed up?

2009-05-27 Thread Linda Walsh

Jeff Mincy wrote:

   From: Linda Walsh sa-u...@tlinx.org
   Date: Wed, 27 May 2009 12:48:43 -0700
   
   Bowie Bailey wrote:  

   At face value, this seems very counter productive.
   
You still aren't understanding the wiki or the AWL scoring or what AWL

is trying to do.


Ah, but it only seems I'm daft, today...:-)


   If I get spam from 1000 senders, they all end up in my
   AWL???
   
yes.   every email+ip address pair that sends you email winds up in

your AWL with an average score for that pair.  This is ok.


GRRRnot so ok in my mindset, but ... and ... errr..
well that only makes it more confusing, in a way...since I was
only 99% certain that I'd never gotten any HAM from hostname
'518501.com' (thinking for a short period that AWL might be classify
things by hosts as reliable or not, instead of, or in addition to
by email-addr), but I'm 99.97% certain I've never gotten any HAM
from user 'paypal.notify' (at) hostname '5185



   AWL should only be added to by emails judged to be 'ham' via
   the feed back mechanisms --, spammers shouldn't get bonuses for
   being repeat senders...
   
You are getting too attached to the 'whitelist' part of the name.

Pretend AWL stands for average weighting list.

=
Aw...come on.  Isn't the world difficult enough without
changing white to black or white to weighing?  I mean, we humans
have enough trouble agreeing on what our symbols, words mean in
relation to concepts and all without ya goin' and redefining perfectly
good acceptable symbols to mean something else completely and still
claim it to be some semblance of English.   No wonder most of the
non-techno-literate humans on this world regard us techies with
a hint of suspicion regarding the difficulty of problems.  We go around
redefining words to suit reality and catch the heat when the rest of
the world doesn't understand our meaning:

Pointy-Haired Boss: Well, how long did you say it would take?

Geek: Well, I said it was 3-4 weeks worth of work.

PHB: Then why has it been 6 weeks with no product? I told you
  anything over 4 weeks was unacceptable!

G: 6 weeks, but...to get under 4 weeks, I assumed you were talking
168-hour pure-programming time weeks -- not CALENDAR weeks!



AWL isn't whitelisting spammers.   It is pushing the score to the
average for that sender.   The sender can have a high average or a low
average.   

---
	An average?  So it keeps the scores of all the past emails of every email we 
ever got sent?  Must just store a weighted average -- otherwise

the space (hmm...someone said something about 80MB+ auto-whitelist DB
files?)

Why not call it the Historically Based Score Normalizer or
HBSN module?  Db file could be historical-norms or something.



If the previous email from a particular sender was FP or FN then AWL
will have an incorrect average and will wind up doing or trying to do
the wrong thing with subsequent email for that sender.


Maybe it shouldn't add in the 'average' unless it exceeds
the 'auto-learning threshold'??  I.e. something like the
'bayes_auto_learn_threshold_nonspam' for HAM and the
'bayes_auto_learn_threshold_spam' for SPAM.  Assuming it doesn't
already do such a thing, it would make a little sense...so as
not to train it on 'bad data'...

When I run sa-learn --spam email over a message, can I
assume (or is it the case) that telling SA, a message was 'spam'
would assign a sufficiently large value to the 'HBSN' value for that
sender to reduce any effect of having falsely (if it is likely to happen)
incorrect value?

Or might I at least assume that each sa-learn over a message
will modify it's AWL score appropriately?



You can remove addresses using spamassassin --remove-from-whitelist


Yes...saw that after visiting the wiki.  Is there a
--show-whitelist-with-current-scores-and-their-weight switch as well
(as opposed to one that only showed the addr's in the white list, or only
showed the non-weighted scores)?


Thanks...and um...
How difficult would it be to have the name of the module reflect
what it's actually doing?  maybe roll out a name change with the next
.dot release of SA?  (3.3? 3.4?)  Might alleviate some amount of
confusion(?)...

Does the AWL also keep track of when it last saw an 'email' addr
so it can 'expire' the oldest entries so the db doesn't grow to eventually
consume all forms of matter and energy in the universe?  :-)

Thanks for the clarification and info!!

-linda


Re: AWL functionality messed up?

2009-05-27 Thread Spiro Harvey
Linda Walsh sa-u...@tlinx.org wrote:
 We go
 around redefining words to suit reality and catch the heat when the
 rest of the world doesn't understand our meaning:

Please repeat after me:

AWL is not an auto whitelist
AWL is not an auto whitelist
AWL is not an auto whitelist

It's one of those funny jokes, like GNU. Feel free to click your heels
together while repeating this affirmation, just whatever you do, DON'T
say it in front of a mirror. Seriously, there's a crater somewhere in
Mexico where a data warehouse used to sit the last time someone tried
that.

   An average?  So it keeps the scores of all the past emails of
 every email we ever got sent?  Must just store a weighted average --
 otherwise the space (hmm...someone said something about 80MB+
 auto-whitelist DB files?)

Time to upgrade those 80MB drives, huh?

   How difficult would it be to have the name of the module
 reflect what it's actually doing?  maybe roll out a name change with
 the next .dot release of SA?  (3.3? 3.4?)  Might alleviate some
 amount of confusion(?)...

Why? It's not broken. Just pretend it stands for Averaged Weight List,
and then you'll be able to sleep at night.

Oh, and there's no need to reply to all. You're on a mailing list, so
anybody who sent you a message from it is already on the list,
and will get your replies.

-- 
Top-posting is the computer equivalent of mailing a letter glued
to the *outside* of an envelope, with a stamp attached via paper clip.
-- Xcott Craver


signature.asc
Description: PGP signature


Re: AWL functionality messed up?

2009-05-27 Thread Benny Pedersen

On Wed, May 27, 2009 21:48, Linda Walsh wrote:
 http://wiki.apache.org/spamassassin/AutoWhitelist
 At face value, this seems very counter productive.

read the docs one more time

 If I get spam from 1000 senders, they all end up in my
 AWL???

yes

 WTF?

not here please

 AWL should only be added to by emails judged to be 'ham' via
 the feed back mechanisms --, spammers shouldn't get bonuses for
 being repeat senders...

thay dont either, AWL tracks the sender ip also

but i agrea its silly doing it with a fuss of /16

 How do I delete spammer addresses from my 'auto-white-list'?

perldoc Mail::SpamAssassin::Conf

 (That's just insane..whitelisting spammers?!?!)

its NOT a whitelist

-- 
http://localhost/ 100% uptime and 100% mirrored :)



Re: AWL functionality messed up?

2009-05-27 Thread Matt Kettler
Linda Walsh wrote:
 Bowie Bailey wrote:
 Linda Walsh wrote:

 I got a really poorly scored piece of spam -- one thing that stood out
 as weird was report claimed the sender was in my AWL.

 Any sender who has sent mail to you previously will be in your AWL. 
 This is probably the most misunderstood component of SA.  Read the wiki.

 http://wiki.apache.org/spamassassin/AutoWhitelist


 
 At face value, this seems very counter productive.
It's obvious you're taking it at face value and you've not read the
URL above.

You're seeing whitelist in the name, and beliving it. Sorry the name
is misleading, but the AWL is not a whitelist.

 If I get spam from 1000 senders, they all end up in my
 AWL???

 WTF?
You're leaping to wildly incorrect conclusions, mostly because you're
assuming the AWL is a whitelist. It's not.

*READ* the URL above. No, really READ IT. You don't understand the AWL yet.

 AWL should only be added to by emails judged to be 'ham' via
 the feed back mechanisms --, spammers shouldn't get bonuses for
 being repeat senders...
Who says they get bonuses just for being a repeat sender?? They get
bonuses or penalties, all depending.

The AWL isn't a whitelist Linda. It's an averager. It can whitelist or
blacklist messages. If they send a message that scores less than their
previous average, they get a positive AWL score (blacklisting). If they
send one that's higher they get a negative score (whitelisting).

HOWEVER, in the AWL, a simple look at the positive or negative sign on
the score doesn't really tell you much.

Take this example: Pre-AWL score +12, AWL -2, Final score +10, . What
did the AWL think of this sender based on history? +6, spammer.

If the same sender instead sent: Pre-AWL score +4 the AWL would hit at
+1.0 resulting in Final score +5.0.

End result: same sender, different messages, different signs on the AWL,
but both are still tagged as spam. And in one example, a false negative
was avoided based on their history.


 How do I delete spammer addresses from my 'auto-white-list'? \
spamassassin --remove-addr-from-whitelist=...@example.com


 (That's just insane..whitelisting spammers?!?!)
No, it's insane to have the AWL named AWL, because it's not a white list.

It's really A history-based score averaging system with automatic
whitelisting and blacklisting effects. However, AHBSASWAWB is an awfuly
long name.

I *REALLY* suggest you read up on how the AWL works, for real, before
jumping to conclusions about what it is, and what it does. It really
doesn't work the way you think.