Re: further spam detection algorithms

2004-07-26 Thread Loren Wilton
> A quite interesting HTML code to cheat the filters > > http://www=2espyware-killer-software=2ecom/cgi-bin/rd=2ecgi= > ?IvC7R3lvJb">http://www=2espyware-killer-software=2ecom/cgi-bin/rd=2ecgi?=Iv C7R3lvJb > > space substituted by "3d" > dot (.) substituted by "=2e" > ? substituted by "=?" This is

Re: further spam detection algorithms

2004-07-26 Thread Rakesh
Justin Mason said: In the technology, when a mail comes in it is first cleared of the HTML tags so words like viagra is brought to its original clear text form. Then on this cleared message the entropy type compression that you have suggested is carried out and the ratio of similarity is matche

Re: further spam detection algorithms

2004-07-26 Thread Lucas Albers
Justin Mason said: >> In the technology, when a mail comes in it is first cleared of the HTML >> tags so words like viagra is brought to its original >> clear text form. Then on this cleared message the entropy type >> compression that you have suggested is carried out and the ratio of >> similar

Re: further spam detection algorithms

2004-07-26 Thread Justin Mason
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Rakesh writes: > Lucas, > > The concept seems to be interesting, but BrightMail one of the biggest > Spam Control company, uses a combination of these two tests (Entropy and > HTML test). Which they call it as BrightSig2 technology and is > unfort

Re: further spam detection algorithms

2004-07-25 Thread Scott A Crosby
On Sun, 25 Jul 2004 05:31:35 -0700, "Loren Wilton" <[EMAIL PROTECTED]> writes: > Just reading their summary, I think it is nice research, but not > really useful. The correlation method seems like a good idea. > Except as they point out it makes granite flow look swift. So not > actually useful

Re: further spam detection algorithms

2004-07-25 Thread Loren Wilton
Just reading their summary, I think it is nice research, but not really useful. The correlation method seems like a good idea. Except as they point out it makes granite flow look swift. So not actually useful at this point intime, but it should be kept in mind. On their second point with html, t

Re: further spam detection algorithms

2004-07-25 Thread LuKreme
On 25 Jul 2004, at 03:58, John Andersen wrote: On Saturday 24 July 2004 10:45 pm, Lucas Albers wrote: The article states The HTML Test: Most people do not send messages in HTML and there are many good reasons for this -- What planet does that writer live on? The estimates I've seen are that at lea

Re: further spam detection algorithms

2004-07-25 Thread John Andersen
On Saturday 24 July 2004 10:45 pm, Lucas Albers wrote: The article states > The HTML Test: Most people do not send messages in HTML and there are many > good reasons for this -- What planet does that writer live on? The estimates I've seen are that at least 70% of mail users send html, especially

Re: further spam detection algorithms

2004-07-25 Thread Lucas Albers
Justin Mason said: > Well, we already do the second, and I think it's in 2.6x too (HTML_90_100 > et al). > > We took a look at the first one a while back, but it was very slow. > I wonder if these guys have any more info on their success rate > with it? >>compression/entropy test: For each new m

Re: further spam detection algorithms

2004-07-25 Thread Lucas Albers
Found another spam detection algorithm: http://www.inf.fu-berlin.de/inst/ag-db/software/ties/text-class-exp.html description below: Before normalization: 1. Number of errors on the last 10x500 mails 2. False negatives (spam misclassified as nonspam) on the last 10x500 mails 3. False positives (n

Re: further spam detection algorithms

2004-07-25 Thread Rakesh
Lucas, The concept seems to be interesting, but BrightMail one of the biggest Spam Control company, uses a combination of these two tests (Entropy and HTML test). Which they call it as BrightSig2 technology and is unfortunately patented. In the technology, when a mail comes in it is first clear

Re: further spam detection algorithms

2004-07-25 Thread Justin Mason
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Lucas Albers writes: > This webpage: > http://lynx.auton.cs.cmu.edu/~agoode/spam/spam > > Mentions two other spamassassin algorithms for spam detecting in addition > to the current ones. Well, we already do the second, and I think it's in 2.6x too (

further spam detection algorithms

2004-07-25 Thread Lucas Albers
This webpage: http://lynx.auton.cs.cmu.edu/~agoode/spam/spam Mentions two other spamassassin algorithms for spam detecting in addition to the current ones. Ideas on whether they are worthwhile? " With Professor Atkeson's spam problem in mind, we devised the following tests: The compression/entrop