In message <[EMAIL PROTECTED]>, Robert Menschel writes: >Hello Loren, Mario, >Wednesday, August 25, 2004, 12:39:23 PM, Loren wrote: >LW> The specific rule you asked for would be written as >LW> header SUB_UNDERSCORES Subject =~ /__/ >LW> score SUB_UNDERSCORES 0.1 >LW> But don't use it, or at least not with any significant score. >Well, actually, a quick scan of my corpus, 24k ham and 46k spam, shows 40 >spam hits and no ham hits. IMO that could warrant a SARE score as high as >0.777 (my email client often gives different results than mass-check >does, so don't take this as gospel). Expect to see this in my next SARE >mass-check request, so we can see if it works on other corpora.
I would advice against it. At least one big free email provider (yahoo.se, not sure about the rest of yahoo) will produce this kind of subject when you send quoted-printable encoded headers to and from it, due to a buggy QP-encoding. Essentially, if there's a space before the word with the QP-encoded letter in it, it erroneously adds one extra `_'. This eventually leads to subject like these: Subject: Re: Som man bäddar, _____________________får man ligga... //Christer -- | Tellusgatan 54 | Telefon: Hem 031 - 42 52 03 CTH: 031 - 772 5431 | | 415 19 Göteborg | Epost: [EMAIL PROTECTED] Nalle: +46 (0)707 535757 | | | WWW: http://www.cd.chalmers.se/~mort/ | "An NT server can be run by an idiot, and usually is." -- Tom Holub, a.h.b-o-i