Probably should also replace the obvious numeric and special characrters like 
zer0, thr33, f|ve, $even, etc. while you are at it.

         Loren

I have to wonder if it is worth the processor time though.  Might be faster to 
simply build a thesarus of creative misspellings and analyze the sentence that 
results from the subsitiutions.  I expect that is probably essentially what the 
Bayes stuff does.


-----Original Message-----
From: Michal Szymanski <[EMAIL PROTECTED]>
Sent: Feb 6, 2004 6:44 AM
To: Robert Menschel <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED]
Subject: Re: dealing with subjects forged with accented letters

On Thu, Feb 05, 2004 at 11:23:02PM -0800, Robert Menschel wrote:
> 
> I use the following (we get foreign email, but since we only understand
> English, we expect all subject headings to be in English):
> 
> header    RM_sl_ForeignChar      Subject =~ /\w[����]\w/
> ...

Hi Robert,

unfortunately, a solution that simple is not for me. We get emails in
Polish and occasionally also in Spanish or German (not to mention
English, of course, but these are no problem :) so we cannot just
spam-them-all. what we need is to filter Subject lines (changing
all "����" to "aeou" and *then* apply SA rules to them.

Michal.

-- 
  Michal Szymanski ([EMAIL PROTECTED])
  Warsaw University Observatory, Warszawa, POLAND

Reply via email to