RE: Re[2]: [sniffer] Charset

2004-08-20 Thread Michiel Prins
Pete, even your message had a chaset header:

Content-Type: text/plain; charset=us-ascii

I think you'll generate more FP's if you do something like that than FN's
you might have now. Aren't there spamassassin config files that detect this
spam?


Met vriendelijke groet,

ing. Michiel Prins
SOS Small Office Solutions / REJECT
Wannepad 27
1066 HW Amsterdam
tel. 020-4082627
fax. 020-4082628
[EMAIL PROTECTED]


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Pete McNeil
Sent: vrijdag 20 augustus 2004 4:58
To: Jorge Asch
Subject: Re[2]: [sniffer] Charset

On Thursday, August 19, 2004, 10:45:37 PM, Jorge wrote:

JA Could a filter be created that will tag as spam any messages that 
JA contaning NON-ascii characters? I mean allow only CHRS 1 through 255.

JA I believe this fill filter out all these foreign character sets, and 
JA let through regular old and plain messages through...

JA Of course such a rule will only apply for most of us on the western 
JA hemisphere...

In theory this could be done, but it would be a tricky gadget - probably
best done as something programatic... There are a lot of opportunities for
false positives.

I will think about this...

Then again - why not simply block on anything that says charset= ? If it's
plain old ascii, then there's no need for charset. (Lots of FPs with this,
but then I would never use a filter like that... It might be very close to
what you are looking for.

The other way to do it would be to build patterns that match all of the
known character sets -- or at least the majority. That would be a chunk of
work but doable - especially with a few well placed wildcards and a good
comprehensive list.

_M



This E-Mail came from the Message Sniffer mailing list. For information and
(un)subscription instructions go to
http://www.sortmonster.com/MessageSniffer/Help/Help.html



This E-Mail came from the Message Sniffer mailing list. For information and 
(un)subscription instructions go to 
http://www.sortmonster.com/MessageSniffer/Help/Help.html


Re: Re[2]: [sniffer] Charset

2004-08-20 Thread Scott Fisher
We don't want any violent Mad Scientists!

 [EMAIL PROTECTED]  8/20 11:59a 
On Friday, August 20, 2004, 11:20:44 AM, Vivek wrote:


VK On Aug 20, 2004, at 10:36 AM, Jorge Asch wrote:

 Well, since 100% of my users speak english/spanish I can safely bet
 that NONE of my mail should have strange character sets. So I can 
 assume if they do, they must be spam.

VK Be careful about that.  I've gotten pure English email from folks in
VK various parts of the world who's default character set was other than
VK one I'd expect.  Charset != Language.

Along these lines, I saw spam today that was in english but used one
of the character sets that were recently blocked by request (Only
locally - no such thing will happen in the core system so nobody has
to worry).

I violently agree - blocking on character sets can be dangerous, so if
you request these rules to be added be sure you watch for unexpected
false positives afterward. ;-)

_M




This E-Mail came from the Message Sniffer mailing list. For information and 
(un)subscription instructions go to 
http://www.sortmonster.com/MessageSniffer/Help/Help.html



This E-Mail came from the Message Sniffer mailing list. For information and 
(un)subscription instructions go to 
http://www.sortmonster.com/MessageSniffer/Help/Help.html


Re[2]: [sniffer] Charset

2004-08-19 Thread Pete McNeil
On Thursday, August 19, 2004, 10:11:45 AM, Jorge wrote:

JA Michiel Prins wrote:

Can't you use the content filter of your mail server to detect if the
charset is used? 

JA I've tried, but it's not 100% effective

I recall the earlier conversations about this. We have not had a lot
of call for generally blocking foreign character sets so that project
has not received much attention.

Another issue with this is that many of our customers are not in the
US and so defining foreign is often problematic.

We can more easily establish local black rules for you.

When you have an example of a character set you would like to block,
please send us a note to support@ with your license ID in the subject
line and the words Local black rule please

Explain in your note that you want us to block the character set(s) in
the message.

Attach the message to your note.

We will verify your license ID and then create local black rules for
the character sets we find in the message.

Over a short time this should have the effect you are looking for.

Hope this helps,
_M

PS: We do filter foreign spam that is submitted to us at spam@ using
the same rules that we follow for other messages. That is, we don't
treat them as foreign - only as spam in general. Russian spam in
particular has rapidly become heavily obfuscated - though there are
usually patterns that can be found to block the messages.



This E-Mail came from the Message Sniffer mailing list. For information and 
(un)subscription instructions go to 
http://www.sortmonster.com/MessageSniffer/Help/Help.html


Re: Re[2]: [sniffer] Charset

2004-08-19 Thread Scott Fisher
I'll chime in on the subject too.
 I've finally managed to get the spam in Chinese under control on my system, but for a 
while I really wished Message Sniffer has language based filters.
I.e. Result 40 Chinese 
Result 41 Cyrillic
Result 42 Spanish
Result 43 Germain

We could then turn on or off the languages we didn't want.
From my foray with dealing with Chinese, it certainly much easier said than done. 
Chinese was doable, I've had no luck stopping my Spanish spam.
Then again, you might be better at it than I.

 [EMAIL PROTECTED]  8/19  9:52a 
On Thursday, August 19, 2004, 10:11:45 AM, Jorge wrote:

JA Michiel Prins wrote:

Can't you use the content filter of your mail server to detect if the
charset is used? 

JA I've tried, but it's not 100% effective

I recall the earlier conversations about this. We have not had a lot
of call for generally blocking foreign character sets so that project
has not received much attention.

Another issue with this is that many of our customers are not in the
US and so defining foreign is often problematic.

We can more easily establish local black rules for you.

When you have an example of a character set you would like to block,
please send us a note to support@ with your license ID in the subject
line and the words Local black rule please

Explain in your note that you want us to block the character set(s) in
the message.

Attach the message to your note.

We will verify your license ID and then create local black rules for
the character sets we find in the message.

Over a short time this should have the effect you are looking for.

Hope this helps,
_M

PS: We do filter foreign spam that is submitted to us at spam@ using
the same rules that we follow for other messages. That is, we don't
treat them as foreign - only as spam in general. Russian spam in
particular has rapidly become heavily obfuscated - though there are
usually patterns that can be found to block the messages.



This E-Mail came from the Message Sniffer mailing list. For information and 
(un)subscription instructions go to 
http://www.sortmonster.com/MessageSniffer/Help/Help.html



This E-Mail came from the Message Sniffer mailing list. For information and 
(un)subscription instructions go to 
http://www.sortmonster.com/MessageSniffer/Help/Help.html


Re[2]: [sniffer] Charset

2004-08-19 Thread Pete McNeil
On Thursday, August 19, 2004, 3:54:20 PM, Jorge wrote:


We could then turn on or off the languages we didn't want.
From my foray with dealing with Chinese, it certainly much
easier said than done. Chinese was doable, I've had no luck
stopping my Spanish spam.
Then again, you might be better at it than I.

JA Problem with spanish, is that we use the same western character set as
JA you do... so it makes it harder to detect...

Well,... If you really wanted to do it then it could be done.

Create a set of rules that look for any of the most common spanish
words - especially any that use high-bit characters. With enough of
these it should be broad enough to catch most... The trick is to
include words that are also not common in normal conversation on the
local system.

That would be an awfully aggressive filter though - and a bunch of
work. Of course we can contract to code any ruleset that's possible. I
suspect there aren't many systems out there that can afford to be so
aggressive - but that's just my guess.

_M




This E-Mail came from the Message Sniffer mailing list. For information and 
(un)subscription instructions go to 
http://www.sortmonster.com/MessageSniffer/Help/Help.html


Re[2]: [sniffer] Charset

2004-08-19 Thread Pete McNeil
On Thursday, August 19, 2004, 10:45:37 PM, Jorge wrote:

JA Could a filter be created that will tag as spam any messages that
JA contaning NON-ascii characters? I mean allow only CHRS 1 through 255.

JA I believe this fill filter out all these foreign character sets, and let
JA through regular old and plain messages through...

JA Of course such a rule will only apply for most of us on the western
JA hemisphere...

In theory this could be done, but it would be a tricky gadget -
probably best done as something programatic... There are a lot of
opportunities for false positives.

I will think about this...

Then again - why not simply block on anything that says charset= ? If
it's plain old ascii, then there's no need for charset. (Lots of FPs
with this, but then I would never use a filter like that... It might
be very close to what you are looking for.

The other way to do it would be to build patterns that match all of
the known character sets -- or at least the majority. That would be a
chunk of work but doable - especially with a few well placed
wildcards and a good comprehensive list.

_M



This E-Mail came from the Message Sniffer mailing list. For information and 
(un)subscription instructions go to 
http://www.sortmonster.com/MessageSniffer/Help/Help.html