Re: Hashing spam
On 04:19 19/12/03, Keith Moore said: Content-Transfer-Encoding: 7bit It just strikes me as highly unlikely that a WG would ever change course because of what would look like random comments from outsiders -- it's not consistent with the dynamics of a WG, or with human nature. and that just might be one of our biggest problems, in a nutshell. True. From experience, the only real way I discovered to make a WG to change its mind is to give it a rewarding RD spirit and to work together on experimentation, with a reverse pyramidal spirit. I mean: you say the WG is to _solve_ a user documented problem with a real solution as a demo, from a starting point (user documented so the target is not disputed within the WG). Every outsider is welcome to explore new ways from that staring point. these ways and blocking points are kept documented in a sucess/failures/mailestones tree. So people may either try other avenues or to fix failures. Also to compare global results between solution. Often moving a failure within the tree makes it a solution. What I find frustrating with the RFC system is that nothing final, proven, validated at a given time so you can build on top. Only projects are final. I would prefer a lore recipes orieted system where people could also say this is the way I do it (or updated it): feel free to help and copy. It would have probably the same technical initial results but it would be more rewarding and less blocking. WG would be dynamic clubs where to concert about development, experimentations and compatibilities. jfc
Hashing spam
I work on an approach to block spam with a database of hash (md5) string of spam email: 1) Reporting a verified spam to the database server on the web 2) the mail client check incoming mail, generate a hash string send to and verify the presence on the server, is yes block email. 3) download a hot list to block directly on the machine i don't know if it's a good or bad idea. --giuseppe
Re: Hashing spam
On 18 Dec 2003, at 13:10, escom wrote: I work on an approach to block spam with a database of hash (md5) string of spam email: 1) Reporting a verified spam to the database server on the web 2) the mail client check incoming mail, generate a hash string send to and verify the presence on the server, is yes block email. 3) download a hot list to block directly on the machine i don't know if it's a good or bad idea. http://www.rhyolite.com/anti-spam/dcc/
Re: Hashing spam
From: escom [EMAIL PROTECTED] I work on an approach to block spam with a database of hash (md5) string of spam email: 1) Reporting a verified spam to the database server on the web 2) the mail client check incoming mail, generate a hash string send to and verify the presence on the server, is yes block email. 3) download a hot list to block directly on the machine i don't know if it's a good or bad idea. The several existing implementations of something like that idea suggest that some people think it is a reasonable idea. I think it is useful but has limitations. See http://www.google.com/search?q=vipul+razor and http://cloudmark.com for one (set of?) implementation(s). See http://www.dcc-servers.net/ for another. The DCC is often used with SpanAssassin. Refusing mail that has been seen anywhere else (i.e. with non-zero DCC target counts) seems like a perfect fit for a mailing list, but I'm probably biased and so won't suggest that. I vote no if someone is taking a vote about trashing the Subject headers. Access to this mailing list should be more, not less difficult. Anyone who cannot figure out how to sort mail from this list based on its existing headers and rewrite Subject headers or anything else to taste is not really interested in the nominal purpose of the list. If it were practical, it would be good to require subscribers or at least contributors to prove their interest by showing they can fetch, compile, install, configure, and operate a simple TCP application such as an SMTP server. Anyone who lacks sufficient interest to do (or already have done) something like that is unlikely to have anything interesting to say to this list, except to other go-ers. Such a test might reduce the number of people who are interested only in non-technical issues such as the administrative work of the IETF, the nasty evil power grabbing U.N., ICANN, or legacy internet engineers widening the digital divide, or whatever else concerns the people who are prompting the continued statements of the painfully obvious. (For some reason perhaps related to procmail, I'm not seeing the questions that prompt the obvious answers. It would be swell if those offering the answers would desist.) Anyone who cannot find a usable POP3 or SMTP server with which to subscribe to this list would certainly be better served and better serve the rest of us by using the web pages of its archive. See http://www.ietf.org/mail-archive/ietf/Current/maillist.html Vernon Schryver[EMAIL PROTECTED]
Re: Hashing spam
escom wrote: I work on an approach to block spam with a database of hash (md5) string of spam email: 1) Reporting a verified spam to the database server on the web 2) the mail client check incoming mail, generate a hash string send to and verify the presence on the server, is yes block email. 3) download a hot list to block directly on the machine It's been done, and the spammers have already evolved to get around it: they randomize the messages so that the hashes don't match. -- /=\ |John Stracke |[EMAIL PROTECTED] | |Principal Engineer|http://www.centive.com| |Centive |My opinions are my own. | |=| |No, no, that's *not* a boat, that's Queen Victoria.| \=/
Re: Hashing spam
From: John Stracke [EMAIL PROTECTED] I work on an approach to block spam with a database of hash (md5) string of spam email: ... It's been done, and the spammers have already evolved to get around it: they randomize the messages so that the hashes don't match. Unless you are mean naive and simplistic hashes, that is an overstatement. As long as you want to accept mail from strangers, no spam filter can perfectly predict whether copies of the next message from a stranger are being sent to 30,000,000 of your intimate friends, but the various hashing filters do some good work. An estimate of the effectiveness of a large scale filter can be obtained from what it sees as the spam ratio. If it claims that 60% of all mail is spam but the real ratio is 70%, then it must be 85% effective. Concerning false positives for this mailing list--it would be wise to define what mail is legitimate. In many places, you must accept at least 99.9% of all even remotely legitimate mail. However, this context is different. Here a boolean good/spam is simplistic and wrong. Instead we have a spectrum: 1. on-topic messages from subscribers 2. on-topic messages from non-subscribers 3. noise from subscribers 4. noise from non-subscribers 5. pure spam such as advertisements for loan sharks In this list, only #1 is clearly good. It is good to avoid rejecting #2, but there is surely no harm in sometimes delaying #2. If the senders of any rejected or false positive #2 received an informative non-delivery report so that they could retransmit, what would be the harm? SpamAssassin is reported to be better than 60% accurate. #2 is surely rare compared to #1. Thus, as long as SpamAssassin white-lists all subscribers, there would be no harm in the occasional rejection of #2. Vernon Schryver[EMAIL PROTECTED]
Re: Hashing spam
The problem with this analysis is that it assigns greater value to contributions from subscribers than to contributions from non-subscribers. But often the failure to accept clues from outsiders causes working groups to do harm - and filtering messages in the #2 category increases this tendency. The occasional rejection of #2 messages can be very harmful. On Dec 18, 2003, at 3:01 PM, Vernon Schryver wrote: 1. on-topic messages from subscribers 2. on-topic messages from non-subscribers 3. noise from subscribers 4. noise from non-subscribers 5. pure spam such as advertisements for loan sharks In this list, only #1 is clearly good. It is good to avoid rejecting #2, but there is surely no harm in sometimes delaying #2. If the senders of any rejected or false positive #2 received an informative non-delivery report so that they could retransmit, what would be the harm? SpamAssassin is reported to be better than 60% accurate. #2 is surely rare compared to #1. Thus, as long as SpamAssassin white-lists all subscribers, there would be no harm in the occasional rejection of #2.
Re: Hashing spam
On Thu, Dec 18, 2003 at 03:39:58PM -0500, Keith Moore wrote: The problem with this analysis is that it assigns greater value to contributions from subscribers than to contributions from non-subscribers. But often the failure to accept clues from outsiders causes working groups to do harm I don't believe this is true, for any normal definition of often. Occasionally might be believable. - and filtering messages in the #2 category increases this tendency. One could just as easily argue that such filtering would decrease the tendency, because people would modify their behavior to subscribe to groups they cared about. Also, one could just as easily argue that working groups are just as likely to be harmed by distracting comments from outsiders... The occasional rejection of #2 messages can be very harmful. Seems more likely to me that the amount of harm would be lost in the normal noise of ietf processes. Regards Kent On Dec 18, 2003, at 3:01 PM, Vernon Schryver wrote: 1. on-topic messages from subscribers 2. on-topic messages from non-subscribers 3. noise from subscribers 4. noise from non-subscribers 5. pure spam such as advertisements for loan sharks In this list, only #1 is clearly good. It is good to avoid rejecting #2, but there is surely no harm in sometimes delaying #2. If the senders of any rejected or false positive #2 received an informative non-delivery report so that they could retransmit, what would be the harm? SpamAssassin is reported to be better than 60% accurate. #2 is surely rare compared to #1. Thus, as long as SpamAssassin white-lists all subscribers, there would be no harm in the occasional rejection of #2. -- Kent Crispin Be good, and you will be [EMAIL PROTECTED],[EMAIL PROTECTED]lonesome. p: +1 310 823 9358 f: +1 310 823 8649 -- Mark Twain SIP: [EMAIL PROTECTED]
Re: Hashing spam
But often the failure to accept clues from outsiders causes working groups to do harm I don't believe this is true, for any normal definition of often. Occasionally might be believable. if I look at why working groups do harm, the failure to accept clues from outsiders does seem to crop up often. Of course, this is my assessment (others might read the situation differently) and I can only make this statement about the groups I've actually looked at, which is a small and nonrandom sample. One could just as easily argue that such filtering would decrease the tendency, because people would modify their behavior to subscribe to groups they cared about. You're incorrectly assuming that people with clues have the time to subscribe to and follow every single group that might do something harmful. Also, one could just as easily argue that working groups are just as likely to be harmed by distracting comments from outsiders... You could argue that. I haven't found it to be the case. The occasional rejection of #2 messages can be very harmful. Seems more likely to me that the amount of harm would be lost in the normal noise of ietf processes. Some noise is more harmful than others. Some WGs have more potential to do harm than others, and those are the very WGs that need outside input.
Re: Hashing spam
It just strikes me as highly unlikely that a WG would ever change course because of what would look like random comments from outsiders -- it's not consistent with the dynamics of a WG, or with human nature. and that just might be one of our biggest problems, in a nutshell.