I have some notes on these as well. I think it would be great to put on the wiki! Or maybe I'll just make a separate cf file on "remove me" phrases. I'll try to get that started today. I am soooo far behind in work it isn't funny. However I did get to go to a great sushi bar in Manhattan yesterday! ;)
The only problem is not tagging legit unsubscribe phrases. I have some rules on things like unsub.gif already. I haven't got a chance to update the emporium in a while. Jen W. would you like to help me on these? I've actually had a spam say "no more of this shit" as a phrase! --Chris Santerre > -----Original Message----- > From: VonEssen, John [mailto:[EMAIL PROTECTED] > Sent: Wednesday, October 08, 2003 2:13 PM > To: [EMAIL PROTECTED] > Subject: [SAtalk] Phrases I have modified.... > > > Just food for thought for the next release... > > I have been seeing more and more spam using different phrases for > "remove me" phrases. > > Some use the work "cease": > > Cease offer(s) > Cease update(s) > Cease email > Cease mailing(s) > > > John > > > -----Original Message----- > From: Scott A Crosby [mailto:[EMAIL PROTECTED] > Sent: Wednesday, October 08, 2003 12:37 PM > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > Subject: [SAtalk] Re: holy cow, FN city > > On Wed, 8 Oct 2003 08:34:46 -0700 (PDT), [EMAIL PROTECTED] writes: > > > Wow... 10 false negatives this morning. =/ > > > > Is 2.60's bayes really a lot better than 2.55's? > > Here's an example of a FN that came through this morning: > > > > Notice the gobbledygook text at the end - > > Sure. The goal of that is to add in new tokens that are unique and > have never been seen before. Those can bias an email toward neutral. > > > <DIV>gmifewdxnavfo xlmdhwdeqb tftwgocpmkxh mfhfnpdaatb</DIV> > > <DIV>phjtdedsnnxdz ciwqencxdspt dztzeabyeumkc jmldxrchpoyvt > > lgnzxrcjncoyv</DIV> > > <DIV>wstcrjdwjshjsc esumvrbqll</DIV> > > <DIV>hccwdohenxnn nptaihbczsbeir tjicwvdyewxii dcekolccikrej > qmgblgcgowf > > fhncedbistifx > > I can see several ways of dealing with them. The first approaches > > First, the character probabilities of the preceding lines are very > unlike english --- too many consonants. So, this particular case can > be detected if any portion of an email has written text that is > statistically very different from ordinary english. The spamware > reaction to this is to bias the character probabilities to resemble > english. So repeat this again, except use bigram (character pair) > probabilities. So, text that has a 'q' not followed by a 'u' would > look alien. > > These statistical tests mean that spamware must use real english > words, or text that at least resembles real english words. To detect > the second case, have SA look up each new token in a dictionary, and > note if it isn't found. Again, if one portion of a message has too > many non-english words, that is a spam sign. > > These could be useful tests in general to detect email in a foreign > language, not just avoid bayes poisoning. > > A second and perhaps stronger sign: this group of text contains a > large number of tokens that have never been seen before. This can be > detected by an adaptive threshold, as more ham is learned, the > threshold for 'too many new tokens' can decrease. > > Scott > > > ------------------------------------------------------- > This SF.net email is sponsored by: SF.net Giveback Program. > SourceForge.net hosts over 70,000 Open Source Projects. > See the people who have HELPED US provide better services: > Click here: http://sourceforge.net/supporters.php > _______________________________________________ > Spamassassin-talk mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/spamassassin-talk > > > ------------------------------------------------------- > This SF.net email is sponsored by: SF.net Giveback Program. > SourceForge.net hosts over 70,000 Open Source Projects. > See the people who have HELPED US provide better services: > Click here: http://sourceforge.net/supporters.php > _______________________________________________ > Spamassassin-talk mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/spamassassin-talk > ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. SourceForge.net hosts over 70,000 Open Source Projects. See the people who have HELPED US provide better services: Click here: http://sourceforge.net/supporters.php _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk