Re: A New Approach: Find the Ham

2007-02-12 Thread michael moncur
I agree that this isn't going to be the best approach. Detecting ham is simply more difficult: 1. New types of ham emerge more often than new types of spam. Spammers generally stick to tried-and-true subjects while ham is all over the place. 2. Ham is more personalized than spam. Everyone gets

Re: A New Approach: Find the Ham

2007-02-12 Thread Dan
Duncan Michael, Thank you for the careful thought and detailed input. Please read my Protype Config email of yesterday afternoon. This is not as it appears, NOT a weighted ham finding rules approach but rather a non weighted ham tuned spam finding rules approach. Its unconventional

HTML mail (was Re: A New Approach: Find the Ham)

2007-02-12 Thread Kelson
Tom Allison wrote: Personally, I think HTML email should be outright discarded from the start. If you look at this arguement presented by the OP then it reinforces the idea that most ascii is ham and most html is spam. Therefore, reject delivery of all html based email. Or to be more

Re: HTML mail (was Re: A New Approach: Find the Ham)

2007-02-12 Thread Gene Heskett
On Monday 12 February 2007 13:27, Kelson wrote: Tom Allison wrote: Personally, I think HTML email should be outright discarded from the start. If you look at this arguement presented by the OP then it reinforces the idea that most ascii is ham and most html is spam. Therefore, reject delivery

RE: HTML mail (was Re: A New Approach: Find the Ham)

2007-02-12 Thread Coffey, Neal
Gene Heskett wrote: On Monday 12 February 2007 13:27, Kelson wrote: Now, if you can come up with another markup language for formatting email... [...] * And you can get all the major email clients to use it for formatted composition instead of HTML (so end users can still make their

Re: HTML mail (was Re: A New Approach: Find the Ham)

2007-02-12 Thread Kelson
Gene Heskett wrote: With all due respect, that's 100% BS. MIME was invented to handle the non-ascii stuff, and does it very well except for M$, who couldn't follow a std rule with a loaded 44 magnum stuck in Bills ear. 100% BS? So end-users don't like formatting in their messages? Email

Re: HTML mail (was Re: A New Approach: Find the Ham)

2007-02-12 Thread Kenneth Porter
--On Monday, February 12, 2007 12:50 PM -0800 Kelson [EMAIL PROTECTED] wrote: In other words, what can adequately replace text/html in the non-plaintext multipart/alternative section such that HTML becomes irrelevant for legitimate uses? Microsoft Word? PDF? RTF? Any of those would be

Re: HTML mail (was Re: A New Approach: Find the Ham)

2007-02-12 Thread John Rudd
Kelson wrote: Tom Allison wrote: Personally, I think HTML email should be outright discarded from the start. If you look at this arguement presented by the OP then it reinforces the idea that most ascii is ham and most html is spam. Therefore, reject delivery of all html based email. Or to

Re: A New Approach: Find the Ham

2007-02-12 Thread Duncan Findlay
On Sun, Feb 11, 2007 at 11:10:53PM -0500, Duncan Findlay wrote: I've read most of the e-mails on this topic and I think the underlying problem is that this method relies on knowing exactly which profiles (i.e. combinations of rules) valid ham can hit. After re-reading your message with your

Re: A New Approach: Find the Ham

2007-02-12 Thread Duncan Findlay
On Mon, Feb 12, 2007 at 11:00:06PM -0500, Duncan Findlay wrote: On Sun, Feb 11, 2007 at 11:10:53PM -0500, Duncan Findlay wrote: I've read most of the e-mails on this topic and I think the underlying problem is that this method relies on knowing exactly which profiles (i.e. combinations of

Re: A New Approach: Find the Ham

2007-02-11 Thread John Rudd
Giampaolo Tomassoni wrote: From: Miles Fidelman [mailto:[EMAIL PROTECTED] Dan wrote: I've developed a new approach to scoring that I want to 1) share with everyone and 2) make into a working system thats as accurate as what I've already built, but easier to use. First, the theory: NEW

Re: A New Approach: Find the Ham

2007-02-11 Thread John Andersen
On Saturday 10 February 2007, Dan wrote: On Feb 10, 2007, at 14:38, Mathieu Bouchard wrote: How do you ever find FPs if you have so many TP to sort through   that it's not even worth sorting through FP+TP to find the FP ?   IMHO, that'd be why we assume that mails are ham rather than assume

Re: A New Approach: Find the Ham

2007-02-11 Thread Justin Mason
Long-time SpamAssassin users with a good memory might recall back in SpamAssassin 2.4x, we included quite a few ham-targeting rules, such as was this sent using User-Agent: Mozilla?, is this formatted like a reply to a previous message?, does it include headers from a mailing list? and is it

Re: A New Approach: Find the Ham

2007-02-11 Thread tom
On Feb 10, 2007, at 3:19 PM, Giampaolo Tomassoni wrote: From: Tom Allison [mailto:[EMAIL PROTECTED] Personally, I think HTML email should be outright discarded from the start. If you look at this arguement presented by the OP then it reinforces the idea that most ascii is ham and most html is

RE: A New Approach: Find the Ham

2007-02-11 Thread Giampaolo Tomassoni
From: tom [mailto:[EMAIL PROTECTED] On Feb 10, 2007, at 3:19 PM, Giampaolo Tomassoni wrote: From: Tom Allison [mailto:[EMAIL PROTECTED] Personally, I think HTML email should be outright discarded from the start. If you look at this arguement presented by the OP then it reinforces

Re: A New Approach: Find the Ham

2007-02-11 Thread Theo Van Dinter
On Sat, Feb 10, 2007 at 08:22:41PM +, Nigel Frankcom wrote: What do Theo, Matt Co have to say? They've been doing this a lot longer than us. Unless I'm missing something, this approach is the standard block everything except for what we explicitly want to receive. Which is great, if you

RE: A New Approach: Find the Ham

2007-02-11 Thread Philip Seccombe
Subject: Re: A New Approach: Find the Ham On Sat, 10 Feb 2007 15:14:56 -0500, Miles Fidelman [EMAIL PROTECTED] wrote: Dan wrote: I've developed a new approach to scoring that I want to 1) share with everyone and 2) make into a working system thats as accurate as what I've already built

Re: A New Approach: Find the Ham

2007-02-11 Thread .rp
On 10 Feb 2007 at 11:43, Dan wrote: I've developed a new approach to scoring that I want to 1) share with everyone and 2) make into a working system thats as accurate as what I've already built, but easier to use. First, the theory: [...] NEW SITUATION Ham is now the tiniest minority of

Re: A New Approach: Find the Ham

2007-02-11 Thread Duncan Findlay
Hey Dan, I've read most of the e-mails on this topic and I think the underlying problem is that this method relies on knowing exactly which profiles (i.e. combinations of rules) valid ham can hit. I see a number of problems: - How do we actually generate the profiles that are to be considered

RE: A New Approach: Find the Ham

2007-02-10 Thread Giampaolo Tomassoni
From: Dan [mailto:[EMAIL PROTECTED] I've developed a new approach to scoring that I want to 1) share with everyone and 2) make into a working system thats as accurate as what I've already built, but easier to use. First, the theory: SITUATION In the beginning, all email was ham.

Re: A New Approach: Find the Ham

2007-02-10 Thread Nigel Frankcom
On Sat, 10 Feb 2007 20:52:17 +0100, Giampaolo Tomassoni [EMAIL PROTECTED] wrote: From: Dan [mailto:[EMAIL PROTECTED] I've developed a new approach to scoring that I want to 1) share with everyone and 2) make into a working system thats as accurate as what I've already built, but easier

Re: A New Approach: Find the Ham

2007-02-10 Thread Tom Allison
CHALLENGE All filtering software is written to score for results that equal spam - catch the bad SOLUTION Make filtering software score for results that equal ham - uncatch the good. Your thoughts? How can this method spend less time and energy? Aren't you going to build a mirrored

Re: A New Approach: Find the Ham

2007-02-10 Thread Miles Fidelman
Dan wrote: I've developed a new approach to scoring that I want to 1) share with everyone and 2) make into a working system thats as accurate as what I've already built, but easier to use. First, the theory: NEW ASSUMPTION All messages are spam unless x,y,z score says they're ham. NEW

RE: A New Approach: Find the Ham

2007-02-10 Thread Giampaolo Tomassoni
From: Tom Allison [mailto:[EMAIL PROTECTED] CHALLENGE All filtering software is written to score for results that equal spam - catch the bad SOLUTION Make filtering software score for results that equal ham - uncatch the good. Your thoughts? How can this method spend

RE: A New Approach: Find the Ham

2007-02-10 Thread Giampaolo Tomassoni
From: Tom Allison [mailto:[EMAIL PROTECTED] CHALLENGE All filtering software is written to score for results that equal spam - catch the bad SOLUTION Make filtering software score for results that equal ham - uncatch the good. Your thoughts? How can this method spend

Re: A New Approach: Find the Ham

2007-02-10 Thread urgrue
One consideration is that spam getting through is never more than an annoyance. Ham getting caught can be a big problem. So any kind of deny by default system has to deal with how to respond to people sending you mail that gets trapped and provide a way for the sender to get approval. How

Re: A New Approach: Find the Ham

2007-02-10 Thread urgrue
This would be easier to filter. It would also be more adaptive to a statistical approach than a regex approach. Personally, I think HTML email should be outright discarded from the start. If you look at this arguement presented by the OP then it reinforces the idea that most ascii is ham

Re: A New Approach: Find the Ham

2007-02-10 Thread Nigel Frankcom
On Sat, 10 Feb 2007 15:14:56 -0500, Miles Fidelman [EMAIL PROTECTED] wrote: Dan wrote: I've developed a new approach to scoring that I want to 1) share with everyone and 2) make into a working system thats as accurate as what I've already built, but easier to use. First, the theory: NEW

RE: A New Approach: Find the Ham

2007-02-10 Thread Giampaolo Tomassoni
From: Miles Fidelman [mailto:[EMAIL PROTECTED] Dan wrote: I've developed a new approach to scoring that I want to 1) share with everyone and 2) make into a working system thats as accurate as what I've already built, but easier to use. First, the theory: NEW ASSUMPTION All

Re: A New Approach: Find the Ham

2007-02-10 Thread Dan
Clarifications: 1) I'm not talking about generating new rules. Rules stay the same. I'm describing a new scoring process only. 2) This would not be a replacement to SA, but an improvement. Just a new way to process results already generated by SA. Ideally, this would be a replacement

Re: A New Approach: Find the Ham

2007-02-10 Thread Mark Samples
Is that the same as whitelisting, maybe I do not understand, but a very rigorous approach would be a whitelist methodology which, once a new account is created, they send email to everyone they want to communicate with, and it 'autowhitelists' those addresses, so you can only receive from those

Re: A New Approach: Find the Ham

2007-02-10 Thread Dan
On Feb 10, 2007, at 12:14, Miles Fidelman wrote: Dan wrote: I've developed a new approach to scoring that I want to 1) share with everyone and 2) make into a working system thats as accurate as what I've already built, but easier to use. First, the theory: NEW ASSUMPTION All messages are

Re: A New Approach: Find the Ham

2007-02-10 Thread Raul Dias
NEW SITUATION Ham is now the tiniest minority of all email. NEW ASSUMPTION All messages are spam unless x,y,z score says they're ham. NEW APPROACH Block everything, then create rules to not catch what you do want. ie, build tests that target the spam (keeping all the tests you've

Re: A New Approach: Find the Ham

2007-02-10 Thread Mathieu Bouchard
On Sat, 10 Feb 2007, Dan wrote: With Find the Ham, whitelisting is almost obsolete. When you find an FP, How do you ever find FPs if you have so many TP to sort through that it's not even worth sorting through FP+TP to find the FP ? IMHO, that'd be why we assume that mails are ham rather

Re: A New Approach: Find the Ham

2007-02-10 Thread Dan
On Feb 10, 2007, at 14:38, Mathieu Bouchard wrote: How do you ever find FPs if you have so many TP to sort through that it's not even worth sorting through FP+TP to find the FP ? IMHO, that'd be why we assume that mails are ham rather than assume that they are spam. I haven't found FP

Re: A New Approach: Find the Ham

2007-02-10 Thread Burak Ueda
Good point, but will cause trouble UNLESS we find a way to recognize ham 100%. And it must me exactly 100% (99% won't be enough). As other users said, with current system, if we can filter 70-80 of the spam, remaining 20-30% will only be an annoyance, but ham will be delivered. But with the