RE: [SAtalk] Phrases I have modified....

Chris Santerre Thu, 09 Oct 2003 08:05:33 -0700

I have some notes on these as well. I think it would be great to put on the
wiki! Or maybe I'll just make a separate cf file on "remove me" phrases.
I'll try to get that started today. I am soooo far behind in work it isn't
funny. However I did get to go to a great sushi bar in Manhattan yesterday!
;)


The only problem is not tagging legit unsubscribe phrases. I have some rules
on things like unsub.gif already. I haven't got a chance to update the
emporium in a while. 

Jen W. would you like to help me on these?

I've actually had a spam say "no more of this shit" as a phrase! 

--Chris Santerre

> -----Original Message-----
> From: VonEssen, John [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, October 08, 2003 2:13 PM
> To: [EMAIL PROTECTED]
> Subject: [SAtalk] Phrases I have modified....
> 
> 
> Just food for thought for the next release...
> 
> I have been seeing more and more spam using different phrases for
> "remove me" phrases.
> 
> Some use the work "cease":
> 
> Cease offer(s)
> Cease update(s)
> Cease email
> Cease mailing(s)
> 
> 
> John
> 
> 
> -----Original Message-----
> From: Scott A Crosby [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, October 08, 2003 12:37 PM
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Subject: [SAtalk] Re: holy cow, FN city
> 
> On Wed, 8 Oct 2003 08:34:46 -0700 (PDT), [EMAIL PROTECTED] writes:
> 
> > Wow... 10 false negatives this morning. =/ 
> > 
> > Is 2.60's bayes really a lot better than 2.55's?
> > Here's an example of a FN that came through this morning:
> 
> 
> > Notice the gobbledygook text at the end -
> 
> Sure. The goal of that is to add in new tokens that are unique and
> have never been seen before. Those can bias an email toward neutral.
> 
> > <DIV>gmifewdxnavfo xlmdhwdeqb tftwgocpmkxh mfhfnpdaatb</DIV>
> > <DIV>phjtdedsnnxdz ciwqencxdspt dztzeabyeumkc jmldxrchpoyvt 
> > lgnzxrcjncoyv</DIV>
> > <DIV>wstcrjdwjshjsc esumvrbqll</DIV>
> > <DIV>hccwdohenxnn nptaihbczsbeir tjicwvdyewxii dcekolccikrej
> qmgblgcgowf 
> > fhncedbistifx
> 
> I can see several ways of dealing with them. The first approaches
> 
> First, the character probabilities of the preceding lines are very
> unlike english --- too many consonants. So, this particular case can
> be detected if any portion of an email has written text that is
> statistically very different from ordinary english. The spamware
> reaction to this is to bias the character probabilities to resemble
> english. So repeat this again, except use bigram (character pair)
> probabilities. So, text that has a 'q' not followed by a 'u' would
> look alien. 
> 
> These statistical tests mean that spamware must use real english
> words, or text that at least resembles real english words. To detect
> the second case, have SA look up each new token in a dictionary, and
> note if it isn't found. Again, if one portion of a message has too
> many non-english words, that is a spam sign.
> 
> These could be useful tests in general to detect email in a foreign
> language, not just avoid bayes poisoning.
> 
> A second and perhaps stronger sign: this group of text contains a
> large number of tokens that have never been seen before. This can be
> detected by an adaptive threshold, as more ham is learned, the
> threshold for 'too many new tokens' can decrease.
> 
> Scott
> 
> 
> -------------------------------------------------------
> This SF.net email is sponsored by: SF.net Giveback Program.
> SourceForge.net hosts over 70,000 Open Source Projects.
> See the people who have HELPED US provide better services:
> Click here: http://sourceforge.net/supporters.php
> _______________________________________________
> Spamassassin-talk mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/spamassassin-talk
> 
> 
> -------------------------------------------------------
> This SF.net email is sponsored by: SF.net Giveback Program.
> SourceForge.net hosts over 70,000 Open Source Projects.
> See the people who have HELPED US provide better services:
> Click here: http://sourceforge.net/supporters.php
> _______________________________________________
> Spamassassin-talk mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/spamassassin-talk
> 


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

RE: [SAtalk] Phrases I have modified....

Reply via email to