On Sat, 2013-02-02 at 20:23 +0200, Eliezer Croitoru wrote:
> On 2/2/2013 7:39 PM, Martin Gregorie wrote:
> > In that case something like this would work:
> >
> > describe EC_BANNED_ADDRESS Mail from a spamming address
> > header   EC_BANNED_ADDRESS From =~ sender@spamming_address
> > score    EC_BANNED_ADDRESS 10.0
> 
> >
> > There's no point in writing rules against the message body when the mail
> > is all from an address that you know.
> >
> >
> > Martin
> 
> Thanks Martin.
> I do have..
> The mail is fine.
> I just need to know about a pattern match in the content since it's a form.
> This address spam is pretty specific.
> This is why I wanted to use specific check for this kind of mail.
> The start and end has specific percentage of Hebrew language.
> Most of the mail should be in hebrew and if there is more then 50 
> percent of the body in english it's 100% spam.
> less then that I can score it with basic rules.
> 
> I was thinking of meta rule like here:
> http://spamassassin.1065346.n5.nabble.com/spamassassin-conditional-rules-td42578.html
> 
Use a meta-rule to combine non-scoring rules:

 describe EC_SPAMTRAP Mail from a spamming address
 header __EC_BANNED_ADDRESS From =~ sender@spamming_address
 body   __EC_MOSTLY_HEBREW  rule to decide if the body is mostly Hebrew
 meta     EC_SPAMTRAP  (__EC_BANNED_ADDRESS && __EC_MOSTLY_HEBREW )
 score    EC_SPAMTRAP  10

The meta rule will only fire if its subrules both fire. This sort of
structure is the way to encode logical relationships, but there's no way
to control whether a subrule is run. 

However, I can't suggest how to code the 'mostly Hebrew' test. AFAIK
there's no easy way to recognise languages or national character sets
now that universal character coding sets like UTF-8 and UTF-16 are
common: formerly you could do it by seeing which Microsoft codepage was
used for the body, though that was only useful if the sender was using a
Windows PC. You'd probably might have to write a plugin to handle
language recognition, especially as your weighting scheme requires every
word in the body to be counted as Hebrew or non-Hebrew.


Martin


Reply via email to