Thanks for the reply.

My concern is that valid japanese mail may always be getting 3-4 points from
FPs which puts it behind the eightball. If bayes isn't low enough and there
are other FPs it won't take too much to misclassify japanese mail as spam.

I only use the least aggresive SARE rulesets, lower many of the scores and
also comment out any of the ones which look dubious for foriegn languages
(but its hard to tell with OBFU ones). I gateway mail for several non-english
offices and haven't had any feedback from them about mail being incorrectly
tagged as spam, but this doesn't mean anything - I know some of it is being
tagged as spam, but it doesn't seem to be work related mail so who cares.


Are there any other specific rules in the SARE rulesets which are likely to FP
on foriegn languages such as the SUB_RAND_LETTRS ones?


Thanks.

From: "Loren Wilton" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Subject: Re: SARE false positives
Date: Mon, 21 Jun 2004 23:33:32 -0700

> Anyway, below header shows valid Japanese mail with ISO-2022-JP encoded
> text that triggered several SARE header rules from 70_sare_genl_subj0.cf:-
>
> SARE_SUB_CASH_CHAR
> SARE_SUB_RAND_LETTRS2
> SARE_SUB_RAND_LETTRS5


> SARE_SUB_RAND_LETTRS2

You need to update.  I moved LETTERS2 to the -1 file from -0 a couple of
days ago because it was getting too many ham hits.


> X-Spam-Status: No, hits=-1.7 required=6.8 tests=AWL,BAYES_00,J_BACKHAIR_31,

We may want to move some of the other rules you mention to -1 from -0 also.

But keep in mind the difference between the -0 and -1 files: -0 is supposed
to be rules that don't (to our knowledge, subject to revision) hit non-spam.
The -1 rules are rules that we KNOW will occasionally hit non-spam, but also
hit way more spam than they do ham.


Which is why many of the rules have relatively low scores. It is quite
reasonable to have rules that will hit the occasional non-spam phrase. As
long as not too many rules hit, it is not a problem. As witessed by the
mail you cited: it got -1.7 points (probably largely from the Bayes_00 hit),
and that is way short of the 6.8 points required to be spam. Even without
bayes it doesn't look like it would have triggered.


The moral is, you should expect to see rule hits on ham. You just shouldn't
expect to see enough rule hits to trip it over the edge into being spam,
unless it is spam.


        Loren


_________________________________________________________________
Get a Virgin Credit Card and win an adventure: http://ad.doubleclick.net/clk;8661322;9498324;s?http://www.promo.com.au/virgincreditcard/firstbirthday/track.cfm?source=N92




Reply via email to