Re: DNSWL and JMF White false positives, what to do exactly?

2009-10-02 Thread RW
On Thu, 1 Oct 2009 18:54:40 -0600
LuKreme krem...@kreme.com wrote:

 On Oct 1, 2009, at 18:36, Karsten Bräckelmann
 guent...@rudersport.de wrote:
 
  Same for RCVD_IN_DNSWL. If it positively matches, it either it is
  correct, or wrong. A false positive is a match, that is wrong. No  
  matter
  the score you assign the test.
 
 Lke others havecsaid, you can make the words mean whatever you want.  
 However, if you want to be understood you need to speak the Lingua  
 Franca. If you choose to use a term differently than everyone else
 you WILL be misunderstood and corrected.

Except that so far the lunatics haven't taken-over the asylum and you
are in a 3 to 2 minority, so please don't claim to be speaking for
everyone. 

A false match on a test is a false-positive. It doesn't reverse for a
ham test, simply because you're more used to thinking about spam tests. 

Do you apply the same usage to anything else? For example, do you
reverse the meaning of off and on for air-conditioning to make it
consistent with heating, so on always mean make hotter?


Re: DNSWL and JMF White false positives, what to do exactly?

2009-10-02 Thread Charles Gregory

On Fri, 2 Oct 2009, RW wrote:

However, if you want to be understood you need to speak the Lingua
Franca. If you choose to use a term differently than everyone else
you WILL be misunderstood and corrected.


If everyone calls an apple an orange, then yeah, it's an orange.


A false match on a test is a false-positive. It doesn't reverse for a
ham test, simply because you're more used to thinking about spam tests.


The distinction is whether the 'false positive' refers to the overall 
scoring of the message (FP=ham flagged as spam) or an individual test 
(FP=test triggered incorrectly). I consider *both* usages correct in this 
group. And as I vaguely recall, the OP did use sufficient context for even 
a lame-brain like myself to realize he meant the latter.


The FP on the named rule had the potential to cause an FN.


Do you apply the same usage to anything else? For example, do you
reverse the meaning of off and on for air-conditioning to make it
consistent with heating, so on always mean make hotter?


Do you TURN UP or TURN DOWN your air-conditioning?
Depends on whether someone has a simple numerical control
or is adjusting a thermostat. Plus colloquial usage, of course. :)
But yeah, you hit pretty close with your analogy. Just chose
the wrong words. :)

- Charles



Re: DNSWL and JMF White false positives, what to do exactly?

2009-10-02 Thread Marc Perkel



Charles Gregory wrote:

On Fri, 2 Oct 2009, RW wrote:

However, if you want to be understood you need to speak the Lingua
Franca. If you choose to use a term differently than everyone else
you WILL be misunderstood and corrected.


If everyone calls an apple an orange, then yeah, it's an orange.


A false match on a test is a false-positive. It doesn't reverse for a
ham test, simply because you're more used to thinking about spam tests.


The distinction is whether the 'false positive' refers to the overall 
scoring of the message (FP=ham flagged as spam) or an individual test 
(FP=test triggered incorrectly). I consider *both* usages correct in 
this group. And as I vaguely recall, the OP did use sufficient context 
for even a lame-brain like myself to realize he meant the latter.


The FP on the named rule had the potential to cause an FN.


Do you apply the same usage to anything else? For example, do you
reverse the meaning of off and on for air-conditioning to make it
consistent with heating, so on always mean make hotter?


Do you TURN UP or TURN DOWN your air-conditioning?
Depends on whether someone has a simple numerical control
or is adjusting a thermostat. Plus colloquial usage, of course. :)
But yeah, you hit pretty close with your analogy. Just chose
the wrong words. :)

- Charles



Q. Do I make a left at the next intersection?
A. Right!



Re: DNSWL and JMF White false positives, what to do exactly?

2009-10-02 Thread mouss
RW wrote:
 On Fri, 02 Oct 2009 00:14:52 +0200
 mouss mo...@ml.netoyen.net wrote:
 
 RW wrote:
 
 The term  false-positive can apply to any test. A test for ham
 that matches a spam is a false-positive, it's a matter of context.
 spam too can be (re)defined. and actually any term. but it is assumed
 here that we talk about spam detection. so false negative means miss
 and false positive means false alarm. this is the common terminology
 inherited from intrusion detection.
 
 The term comes from statistics, not intrusion detection. I don't
 know much about the latter, perhaps people in that field are a little
 sloppy in their usage, more  likely all the tests are expressed as
 tests for intrusion, so the same kind of issue doesn't arise.
 
 The source of your confusion is that you are mixing-up the terminology
 of the overall classification and individual test results. Think of
 this way, in a fingerprint comparison the meanings of TP, TN, FP and FN
 are obvious and intrinsic to the test, it would be absurd to switch
 them around depending on whether it's evidence for the defence or
 prosecution.

let's take it more easily: Please explain to me what was an FP in this
thread.


Re: DNSWL and JMF White false positives, what to do exactly?

2009-10-02 Thread mouss
Karsten Bräckelmann wrote:
 On Fri, 2009-10-02 at 00:08 +0200, mouss wrote:
 Karsten Bräckelmann wrote:
 False positive. Something, that matches (positive) the criterion for a
 certain test, but should not (false).
 
 I stand to what I said.
 

I'm not surprised:)

 you can certainly devise a system to detect alpha(foo) where alpha is a
 function mapping a Banach space to a Hilbert Space, and define what FP,
 FN, FX mean in the context you consider. you can also say let PI=69,
 ... . but conventions are here for a reason. they allow us to
 understand each others more easily. the fact that children of today can
 solve computation problems that great scientists of the old times
 couldn't handle is thanks to conventions (think of a/b * c/d =
 (a*c)/(b*d), which looks trivial today, but wasn't before).

 when talking about spam or intrusion detection, FN means missing and
 FP means false alarm. if we allow defining FN and FP differently, then
 we'll need to rewrite a lot of books, reports, articles, ...
 
 IFF you are talking about the black box that spam detection is, that is
 true.
 
 If you are talking about a rule like FORGED_MUA_OUTLOOK, it appears to
 be that simple. However, it is not. You are looking at a single test,
 which -- if positive -- either is correct or wrong.
 

I understand the rationale, but I find this too abstract for common
discussions.

 Same for RCVD_IN_DNSWL. If it positively matches, it either it is
 correct, or wrong. A false positive is a match, that is wrong. No matter
 the score you assign the test.
 

except that it depends what the test really means. dnswl doesn't mean
the listed hosts never send spam. I am happy that it lists debian list
servers, Orange, ... etc.

 
 This concept is NOT specific to spam detection, or even computer
 science. As a matter of fact, when I first really grasped the concept, a
 medical scientist explained it to me.
 

now that you say it, this is true. I too believ that medical science has
precedence in this area.

 Yes, a FP for a rule that identifies *ham* actually evaluated positive
 on a spam. It only appears to be spam centric on this list, cause it is
 mainly dedicated to identifying spam, not ham.
 
 You might want to ask wikipedia as well. And don't focus on the spam
 filtering *example*, which again exclusively talks about a rule
 identifying spam. Not ham.
 

my point was that in a spam oriented forum, the meaning of some words is
what most of us (yes, this is hard to define) think they mean. the
principle of least astonishment.


anyway, I'm sorry for bringing the discussion to this sand. so I will
stop here (of course, offlist is ok for any discussion, including
garbage without collection:)





Re: DNSWL and JMF White false positives, what to do exactly?

2009-10-02 Thread Karsten Bräckelmann
On Sat, 2009-10-03 at 00:25 +0200, mouss wrote:
 Karsten Bräckelmann wrote:

False positive. Something, that matches (positive) the criterion for a
certain test, but should not (false).
  
  I stand to what I said.
 
 I'm not surprised:)

;)


  IFF you are talking about the black box that spam detection is, that is
  true.
  
  If you are talking about a rule like FORGED_MUA_OUTLOOK, it appears to
  be that simple. However, it is not. You are looking at a single test,
  which -- if positive -- either is correct or wrong.
 
 I understand the rationale, but I find this too abstract for common
 discussions.

*shrug*  You're not obliged to participate in a thread, if it is
confusing to you. That's the wonders of open discussion and diverse
input. You might stumble upon something you didn't know before... ;)


  Same for RCVD_IN_DNSWL. If it positively matches, it either it is
  correct, or wrong. A false positive is a match, that is wrong. No matter
  the score you assign the test.
 
 except that it depends what the test really means. dnswl doesn't mean
 the listed hosts never send spam. I am happy that it lists debian list
 servers, Orange, ... etc.

Exactly, in the context of a single rule (as opposed to detecting
spam), it depends on what the rule really means. Or in short, its
score's sign...


  This concept is NOT specific to spam detection, or even computer
  science. As a matter of fact, when I first really grasped the concept, a
  medical scientist explained it to me.
 
 now that you say it, this is true. I too believ that medical science has
 precedence in this area.
 
  Yes, a FP for a rule that identifies *ham* actually evaluated positive
  on a spam. It only appears to be spam centric on this list, cause it is
  mainly dedicated to identifying spam, not ham.
  
  You might want to ask wikipedia as well. And don't focus on the spam
  filtering *example*, which again exclusively talks about a rule
  identifying spam. Not ham.
 
 my point was that in a spam oriented forum, the meaning of some words is
 what most of us (yes, this is hard to define) think they mean. the
 principle of least astonishment.

Of course, these terms mostly come up WRT to overall score of a message,
which applies to detecting spam.

However, on this very list, it also commonly is referred to single rules
FP'ing, *without* pushing the ham above the required_score threshold.


The only aspect new and obviously confusing to some regulars on this
list is the negative sign of the rule's score. Inverting the is spam
test logic also inverts the meaning of F[PN]. Whether one likes this or
not.

It's all about context.


And FWIW, it is wrong to base your definitions on what the majority
thinks is correct. The majority and what's believed to be common
knowledge too often is wrong. You can observe this in real life, too...
I prefer to educate the masses instead.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: DNSWL and JMF White false positives, what to do exactly?

2009-10-02 Thread RW
On Sat, 03 Oct 2009 00:12:37 +0200
mouss mo...@ml.netoyen.net wrote:

 RW wrote:
  On Fri, 02 Oct 2009 00:14:52 +0200
  mouss mo...@ml.netoyen.net wrote:
  

  The source of your confusion is that you are mixing-up the
  terminology of the overall classification and individual test
  results. Think of this way, in a fingerprint comparison the
  meanings of TP, TN, FP and FN are obvious and intrinsic to the
  test, it would be absurd to switch them around depending on whether
  it's evidence for the defence or prosecution.
 
 let's take it more easily: Please explain to me what was an FP in this
 thread.

A test intended for identifying ham was being hit on spam.

A hit on a rule is a positive result. When a rule hits something it's
intended to identify, it's a true positive. When a rule hits something
it's not intended to identify, it's a false positive, and so on.

The same terminology can be used for SpamAssassin's overall spam
classification, but that's a different matter. If you talk about a rule
hit being an FN, because it might contribute to a classification FN then
you are using the terminology like a cargo-cultist.



Re: DNSWL and JMF White false positives, what to do exactly?

2009-10-01 Thread mouss
Karsten Bräckelmann wrote:
 On Wed, 2009-09-30 at 23:35 +0200, mouss wrote:
 Warren Togami wrote:
 I scanned my spam folders and found a few false positives that hit on
 either DNSWL 
 FP with DNSWL?

 FP = False Positive = legitimaite mail tagged as spam
 DNSWL = Whitelist
 
 False positive. Something, that matches (positive) the criterion for a
 certain test, but should not (false).
 
 if your system adds points because of dnswl, you have a serious problem. ..

 or do you mean FN (false negative)?
 
 Granted, the wording (FPs that hit ham rules) could need some polish,
 but I believe Warren was talking about spam that falsely hits ham rules.
 
 


you can certainly devise a system to detect alpha(foo) where alpha is a
function mapping a Banach space to a Hilbert Space, and define what FP,
FN, FX mean in the context you consider. you can also say let PI=69,
... . but conventions are here for a reason. they allow us to
understand each others more easily. the fact that children of today can
solve computation problems that great scientists of the old times
couldn't handle is thanks to conventions (think of a/b * c/d =
(a*c)/(b*d), which looks trivial today, but wasn't before).

when talking about spam or intrusion detection, FN means missing and
FP means false alarm. if we allow defining FN and FP differently, then
we'll need to rewrite a lot of books, reports, articles, ...




Re: DNSWL and JMF White false positives, what to do exactly?

2009-10-01 Thread mouss
RW wrote:
 On Wed, 30 Sep 2009 23:35:31 +0200
 mouss mo...@ml.netoyen.net wrote:
 
 Warren Togami wrote:
 I scanned my spam folders and found a few false positives that hit
 on either DNSWL 
 FP with DNSWL?

 FP = False Positive = legitimaite mail tagged as spam
 DNSWL = Whitelist
 
 The term  false-positive can apply to any test. A test for ham
 that matches a spam is a false-positive, it's a matter of context.

spam too can be (re)defined. and actually any term. but it is assumed
here that we talk about spam detection. so false negative means miss
and false positive means false alarm. this is the common terminology
inherited from intrusion detection.

I used to have a clock that was anti-clockwise. but it was for fun. I
always understood what clockwise meant.


Re: DNSWL and JMF White false positives, what to do exactly?

2009-10-01 Thread Karsten Bräckelmann
On Fri, 2009-10-02 at 00:08 +0200, mouss wrote:
 Karsten Bräckelmann wrote:
  False positive. Something, that matches (positive) the criterion for a
  certain test, but should not (false).

I stand to what I said.

 you can certainly devise a system to detect alpha(foo) where alpha is a
 function mapping a Banach space to a Hilbert Space, and define what FP,
 FN, FX mean in the context you consider. you can also say let PI=69,
 ... . but conventions are here for a reason. they allow us to
 understand each others more easily. the fact that children of today can
 solve computation problems that great scientists of the old times
 couldn't handle is thanks to conventions (think of a/b * c/d =
 (a*c)/(b*d), which looks trivial today, but wasn't before).
 
 when talking about spam or intrusion detection, FN means missing and
 FP means false alarm. if we allow defining FN and FP differently, then
 we'll need to rewrite a lot of books, reports, articles, ...

IFF you are talking about the black box that spam detection is, that is
true.

If you are talking about a rule like FORGED_MUA_OUTLOOK, it appears to
be that simple. However, it is not. You are looking at a single test,
which -- if positive -- either is correct or wrong.

Same for RCVD_IN_DNSWL. If it positively matches, it either it is
correct, or wrong. A false positive is a match, that is wrong. No matter
the score you assign the test.


This concept is NOT specific to spam detection, or even computer
science. As a matter of fact, when I first really grasped the concept, a
medical scientist explained it to me.

Yes, a FP for a rule that identifies *ham* actually evaluated positive
on a spam. It only appears to be spam centric on this list, cause it is
mainly dedicated to identifying spam, not ham.

You might want to ask wikipedia as well. And don't focus on the spam
filtering *example*, which again exclusively talks about a rule
identifying spam. Not ham.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: DNSWL and JMF White false positives, what to do exactly?

2009-10-01 Thread LuKreme
On Oct 1, 2009, at 18:36, Karsten Bräckelmann guent...@rudersport.de  
wrote:



Same for RCVD_IN_DNSWL. If it positively matches, it either it is
correct, or wrong. A false positive is a match, that is wrong. No  
matter

the score you assign the test.


Lke others havecsaid, you can make the words mean whatever you want.  
However, if you want to be understood you need to speak the Lingua  
Franca. If you choose to use a term differently than everyone else you  
WILL be misunderstood and corrected.


Saying everyone else is wrong isn't going to help.
 

Re: DNSWL and JMF White false positives, what to do exactly?

2009-10-01 Thread RW
On Fri, 02 Oct 2009 00:14:52 +0200
mouss mo...@ml.netoyen.net wrote:

 RW wrote:

  The term  false-positive can apply to any test. A test for ham
  that matches a spam is a false-positive, it's a matter of context.
 
 spam too can be (re)defined. and actually any term. but it is assumed
 here that we talk about spam detection. so false negative means miss
 and false positive means false alarm. this is the common terminology
 inherited from intrusion detection.

The term comes from statistics, not intrusion detection. I don't
know much about the latter, perhaps people in that field are a little
sloppy in their usage, more  likely all the tests are expressed as
tests for intrusion, so the same kind of issue doesn't arise.

The source of your confusion is that you are mixing-up the terminology
of the overall classification and individual test results. Think of
this way, in a fingerprint comparison the meanings of TP, TN, FP and FN
are obvious and intrinsic to the test, it would be absurd to switch
them around depending on whether it's evidence for the defence or
prosecution.


Re: DNSWL and JMF White false positives, what to do exactly?

2009-09-30 Thread mouss
Warren Togami wrote:
 I scanned my spam folders and found a few false positives that hit on
 either DNSWL 

FP with DNSWL?

FP = False Positive = legitimaite mail tagged as spam
DNSWL = Whitelist

if your system adds points because of dnswl, you have a serious problem. ..

or do you mean FN (false negative)?

 or JMF (HOSTKARMA?  See how confusing it is not knowing
 what to call it?)
 
 Is there an easy automated way we can forward FP's to DNSWL and JMF so
 their maintainers can decide what to do about the offending senders?

offending? then you probably mean FN.

yes, you can report offending IPs, if that makes sense. for example, if
the offending IP is that of an ISP relay, then don't report it: ISPs do
relay spam. if on the other hand you see FNs from paypal or bank of
blahblah, then do submit.

 I'd
 attach it to mail but it might get caught in the spam filter...
 

post the s(p)ample on a web site instead. you can use pastebin for example.


Re: DNSWL and JMF White false positives, what to do exactly?

2009-09-30 Thread Henrik K
On Wed, Sep 30, 2009 at 11:35:31PM +0200, mouss wrote:
 
 yes, you can report offending IPs, if that makes sense. for example, if
 the offending IP is that of an ISP relay, then don't report it: ISPs do
 relay spam.

Ehm.. surely you should report spam sending ISP relays if they are
miscategorized as low or higher.



Re: DNSWL and JMF White false positives, what to do exactly?

2009-09-30 Thread RW
On Wed, 30 Sep 2009 23:35:31 +0200
mouss mo...@ml.netoyen.net wrote:

 Warren Togami wrote:
  I scanned my spam folders and found a few false positives that hit
  on either DNSWL 
 
 FP with DNSWL?
 
 FP = False Positive = legitimaite mail tagged as spam
 DNSWL = Whitelist

The term  false-positive can apply to any test. A test for ham
that matches a spam is a false-positive, it's a matter of context.


Re: DNSWL and JMF White false positives, what to do exactly?

2009-09-30 Thread Karsten Bräckelmann
On Wed, 2009-09-30 at 23:35 +0200, mouss wrote:
 Warren Togami wrote:
  I scanned my spam folders and found a few false positives that hit on
  either DNSWL 
 
 FP with DNSWL?
 
 FP = False Positive = legitimaite mail tagged as spam
 DNSWL = Whitelist

False positive. Something, that matches (positive) the criterion for a
certain test, but should not (false).

 if your system adds points because of dnswl, you have a serious problem. ..
 
 or do you mean FN (false negative)?

Granted, the wording (FPs that hit ham rules) could need some polish,
but I believe Warren was talking about spam that falsely hits ham rules.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



DNSWL and JMF White false positives, what to do exactly?

2009-09-29 Thread Warren Togami
I scanned my spam folders and found a few false positives that hit on 
either DNSWL or JMF (HOSTKARMA?  See how confusing it is not knowing 
what to call it?)


Is there an easy automated way we can forward FP's to DNSWL and JMF so 
their maintainers can decide what to do about the offending senders? 
I'd attach it to mail but it might get caught in the spam filter...


Warren Togami
wtog...@redhat.com