On 2/3/2013 7:23 AM, John Hardin wrote:
On Sat, 2 Feb 2013, Eliezer Croitoru wrote:
I wrote something in ruby which actually works fine as a starter.
#code start
spam_content = "the long part from the
mail".force_encoding("Windows-1255")
template_hebrew_chars = 270
def hebrew_char(char)
if (223..251).member?(char.unpack("H*")[0].hex)
return true
elsif (192..203).member?(char.unpack("H*")[0].hex)
return true
elsif (205..219).member?(char.unpack("H*")[0].hex)
return true
end
return false
end
counter = 0; spam_content.each_char {|char| if
hebrew_char(char);counter += 1 ;end;}
if counter == template_hebrew_chars
puts "this is a spam"
else
puts "might not be a spam"
end
##code end
Now *that* might be possible in plain SA rules without a plugin: count
the number of characters in the message body, and the number of
characters that fall in a given range (e.g. those that are hebrew
glyphs), and calculate the percentage. I *think* you can do math in meta
rules...
However, a plugin would be _much_ more efficient than something like:
body __HBRW_CHARS /[\xc0-\xcb\xcd-\xdb\xdf-\xfb]/
tflags __HBRW_CHARS multiple
body __TOTAL_CHARS /\S/
tflags __TOTAL_CHARS multiple
meta __HBRW_PCT ((__HBRW_CHARS * 100) / __TOTAL_CHARS)
meta HBRW_SPAM (__HBRW_PCT < 50) && __HBRW_ENCODING
I don't know whether the division in __HBRW_PCT or the less-than
comparison in HBRW_SPAM would work, that's totally off the top of my
head and untested. I also leave the __HBRW_ENCODING rule as an exercise
for the student. :)
Thanks
I had the __HBRW_ENCODING ready from before.
I think I will use a meta hat will check the mail then the encoding and
then th percentage.
Thanks Again,
--
Eliezer Croitoru
http://www1.ngtech.co.il
IT consulting for Nonprofit organizations
eliezer <at> ngtech.co.il