On 2/3/2013 7:23 AM, John Hardin wrote:
On Sat, 2 Feb 2013, Eliezer Croitoru wrote:

I wrote something in ruby which actually works fine as a starter.

#code start
spam_content = "the long part from the
mail".force_encoding("Windows-1255")

template_hebrew_chars = 270

def hebrew_char(char)
  if (223..251).member?(char.unpack("H*")[0].hex)
    return true
  elsif (192..203).member?(char.unpack("H*")[0].hex)
     return true
  elsif (205..219).member?(char.unpack("H*")[0].hex)
     return true
  end
  return false
end

counter = 0; spam_content.each_char {|char| if
hebrew_char(char);counter += 1 ;end;}

if counter == template_hebrew_chars
 puts "this is a spam"
else
 puts "might not be a spam"
end
##code end

Now *that* might be possible in plain SA rules without a plugin: count
the number of characters in the message body, and the number of
characters that fall in a given range (e.g. those that are hebrew
glyphs), and calculate the percentage. I *think* you can do math in meta
rules...

However, a plugin would be _much_ more efficient than something like:

   body   __HBRW_CHARS    /[\xc0-\xcb\xcd-\xdb\xdf-\xfb]/
   tflags __HBRW_CHARS    multiple
   body   __TOTAL_CHARS   /\S/
   tflags __TOTAL_CHARS   multiple
   meta   __HBRW_PCT      ((__HBRW_CHARS * 100) / __TOTAL_CHARS)
   meta   HBRW_SPAM       (__HBRW_PCT < 50) && __HBRW_ENCODING

I don't know whether the division in __HBRW_PCT or the less-than
comparison in HBRW_SPAM would work, that's totally off the top of my
head and untested. I also leave the __HBRW_ENCODING rule as an exercise
for the student. :)

Thanks

I had the __HBRW_ENCODING ready from before.
I think I will use a meta hat will check the mail then the encoding and then th percentage.

Thanks Again,

--
Eliezer Croitoru
http://www1.ngtech.co.il
IT consulting for Nonprofit organizations
eliezer <at> ngtech.co.il

Reply via email to