Re: Hashing spam

2003-12-19 Thread jfcm
On 04:19 19/12/03, Keith Moore said:
Content-Transfer-Encoding: 7bit

It just strikes me as highly unlikely that a WG would ever change course
because of what would look like random comments from outsiders -- it's
not consistent with the dynamics of a WG, or with human nature.
and that just might be one of our biggest problems, in a nutshell.
True. From experience, the only real way I discovered to make a WG to
change its mind is to give it a rewarding RD spirit and to work together
on experimentation, with a reverse pyramidal spirit. I mean: you say the
WG is to _solve_ a user documented problem with a real solution as a
demo, from a starting point (user documented so the target is not disputed
within the WG). Every outsider is welcome to explore new ways from
that staring point. these ways and blocking points are kept documented
in a sucess/failures/mailestones tree. So people may either try other
avenues or to fix failures. Also to compare global results between solution.
Often moving a failure within the tree makes it a solution.
What I find frustrating with the RFC system is that nothing final, proven,
validated at a given time so you can build on top. Only projects are final.
I would prefer a lore recipes orieted system where people could also say
this is the way I do it (or updated it): feel free to help and copy. It 
would
have probably the same technical initial results but it would be more
rewarding and less blocking. WG would be dynamic clubs where to
concert about development, experimentations and compatibilities.

jfc




Hashing spam

2003-12-18 Thread escom
I work on an approach to block spam with a database of hash (md5) string of
spam email:
1) Reporting a verified spam to the database server on the web
2) the mail client check incoming mail, generate a hash string send to and
verify the presence on the server, is yes block email.
3) download a hot list to block directly on the machine

i don't know if it's a good or bad idea.

--giuseppe




Re: Hashing spam

2003-12-18 Thread Joe Abley
On 18 Dec 2003, at 13:10, escom wrote:

I work on an approach to block spam with a database of hash (md5) 
string of
spam email:
1) Reporting a verified spam to the database server on the web
2) the mail client check incoming mail, generate a hash string send to 
and
verify the presence on the server, is yes block email.
3) download a hot list to block directly on the machine

i don't know if it's a good or bad idea.
http://www.rhyolite.com/anti-spam/dcc/




Re: Hashing spam

2003-12-18 Thread Vernon Schryver
 From: escom [EMAIL PROTECTED]

 I work on an approach to block spam with a database of hash (md5) string of
 spam email:
 1) Reporting a verified spam to the database server on the web
 2) the mail client check incoming mail, generate a hash string send to and
 verify the presence on the server, is yes block email.
 3) download a hot list to block directly on the machine

 i don't know if it's a good or bad idea.

The several existing implementations of something like that idea suggest
that some people think it is a reasonable idea.  I think it is useful
but has limitations.
See http://www.google.com/search?q=vipul+razor and http://cloudmark.com
for one (set of?) implementation(s).
See http://www.dcc-servers.net/ for another.  The DCC is often used
with SpanAssassin.  Refusing mail that has been seen anywhere else
(i.e. with non-zero DCC target counts) seems like a perfect fit for a
mailing list, but I'm probably biased and so won't suggest that.


I vote no if someone is taking a vote about trashing the Subject
headers.  Access to this mailing list should be more, not less difficult.
Anyone who cannot figure out how to sort mail from this list based on
its existing headers and rewrite Subject headers or anything else to
taste is not really interested in the nominal purpose of the list.

If it were practical, it would be good to require subscribers or at
least contributors to prove their interest by showing they can fetch,
compile, install, configure, and operate a simple TCP application such
as an SMTP server.  Anyone who lacks sufficient interest to do (or
already have done) something like that is unlikely to have anything
interesting to say to this list, except to other go-ers.  Such a test
might reduce the number of people who are interested only in non-technical
issues such as the administrative work of the IETF, the nasty evil
power grabbing U.N., ICANN, or legacy internet engineers widening the
digital divide, or whatever else concerns the people who are prompting
the continued statements of the painfully obvious.  (For some reason
perhaps related to procmail, I'm not seeing the questions that prompt
the obvious answers.  It would be swell if those offering the answers
would desist.)

Anyone who cannot find a usable POP3 or SMTP server with which to
subscribe to this list would certainly be better served and better
serve the rest of us by using the web pages of its archive.
See http://www.ietf.org/mail-archive/ietf/Current/maillist.html


Vernon Schryver[EMAIL PROTECTED]



Re: Hashing spam

2003-12-18 Thread John Stracke
escom wrote:

I work on an approach to block spam with a database of hash (md5) string of
spam email:
1) Reporting a verified spam to the database server on the web
2) the mail client check incoming mail, generate a hash string send to and
verify the presence on the server, is yes block email.
3) download a hot list to block directly on the machine
 

It's been done, and the spammers have already evolved to get around it: 
they randomize the messages so that the hashes don't match.

--
/=\
|John Stracke  |[EMAIL PROTECTED]  |
|Principal Engineer|http://www.centive.com|
|Centive   |My opinions are my own.   |
|=|
|No, no, that's *not* a boat, that's Queen Victoria.|
\=/




Re: Hashing spam

2003-12-18 Thread Vernon Schryver
 From: John Stracke [EMAIL PROTECTED]

 I work on an approach to block spam with a database of hash (md5) string of
 spam email:

 ...
 It's been done, and the spammers have already evolved to get around it: 
 they randomize the messages so that the hashes don't match.

Unless you are mean naive and simplistic hashes, that is an overstatement.
As long as you want to accept mail from strangers, no spam filter can
perfectly predict whether copies of the next message from a stranger
are being sent to 30,000,000 of your intimate friends, but the various
hashing filters do some good work.

An estimate of the effectiveness of a large scale filter can be obtained
from what it sees as the spam ratio.  If it claims that 60% of all
mail is spam but the real ratio is 70%, then it must be 85% effective.

 

Concerning false positives for this mailing list--it would be wise to
define what mail is legitimate.  In many places, you must accept at
least 99.9% of all even remotely legitimate mail.  However, this context
is different.  Here a boolean good/spam is simplistic and wrong.
Instead we have a spectrum:
  1. on-topic messages from subscribers
  2. on-topic messages from non-subscribers
  3. noise from subscribers
  4. noise from non-subscribers
  5. pure spam such as advertisements for loan sharks

In this list, only #1 is clearly good. It is good to avoid rejecting
#2, but there is surely no harm in sometimes delaying #2.  If the
senders of any rejected or false positive #2 received an informative
non-delivery report so that they could retransmit, what would be the harm?

SpamAssassin is reported to be better than 60% accurate.  #2 is surely
rare compared to #1.  Thus, as long as SpamAssassin white-lists all
subscribers, there would be no harm in the occasional rejection of #2.


Vernon Schryver[EMAIL PROTECTED]



Re: Hashing spam

2003-12-18 Thread Keith Moore
The problem with this analysis is that it assigns greater value to 
contributions from subscribers than to contributions from 
non-subscribers.  But often the failure to accept clues from 
outsiders causes working groups to do harm - and filtering messages 
in the #2 category increases this tendency.  The occasional rejection 
of #2 messages can be very harmful.

On Dec 18, 2003, at 3:01 PM, Vernon Schryver wrote:

  1. on-topic messages from subscribers
  2. on-topic messages from non-subscribers
  3. noise from subscribers
  4. noise from non-subscribers
  5. pure spam such as advertisements for loan sharks
In this list, only #1 is clearly good. It is good to avoid rejecting
#2, but there is surely no harm in sometimes delaying #2.  If the
senders of any rejected or false positive #2 received an informative
non-delivery report so that they could retransmit, what would be the 
harm?

SpamAssassin is reported to be better than 60% accurate.  #2 is surely
rare compared to #1.  Thus, as long as SpamAssassin white-lists all
subscribers, there would be no harm in the occasional rejection of #2.




Re: Hashing spam

2003-12-18 Thread kent
On Thu, Dec 18, 2003 at 03:39:58PM -0500, Keith Moore wrote:
 The problem with this analysis is that it assigns greater value to 
 contributions from subscribers than to contributions from 
 non-subscribers.  But often the failure to accept clues from 
 outsiders causes working groups to do harm

I don't believe this is true, for any normal definition of often.  
Occasionally might be believable.

  - and filtering messages 
 in the #2 category increases this tendency.

One could just as easily argue that such filtering would decrease the
tendency, because people would modify their behavior to subscribe to
groups they cared about.  Also, one could just as easily argue that
working groups are just as likely to be harmed by distracting comments
from outsiders... 

 The occasional rejection 
 of #2 messages can be very harmful.

Seems more likely to me that the amount of harm would be lost in the
normal noise of ietf processes.

Regards
Kent

 On Dec 18, 2003, at 3:01 PM, Vernon Schryver wrote:
 
   1. on-topic messages from subscribers
   2. on-topic messages from non-subscribers
   3. noise from subscribers
   4. noise from non-subscribers
   5. pure spam such as advertisements for loan sharks
 
 In this list, only #1 is clearly good. It is good to avoid rejecting
 #2, but there is surely no harm in sometimes delaying #2.  If the
 senders of any rejected or false positive #2 received an informative
 non-delivery report so that they could retransmit, what would be the 
 harm?
 
 SpamAssassin is reported to be better than 60% accurate.  #2 is surely
 rare compared to #1.  Thus, as long as SpamAssassin white-lists all
 subscribers, there would be no harm in the occasional rejection of #2.

-- 
Kent Crispin   Be good, and you will be
[EMAIL PROTECTED],[EMAIL PROTECTED]lonesome.
p: +1 310 823 9358  f: +1 310 823 8649   -- Mark Twain
SIP: [EMAIL PROTECTED]




Re: Hashing spam

2003-12-18 Thread Keith Moore
 But often the failure to accept clues from
outsiders causes working groups to do harm
I don't believe this is true, for any normal definition of often.
Occasionally might be believable.
if I look at why working groups do harm, the failure to accept clues 
from outsiders does seem to crop up often.  Of course, this is my 
assessment (others might read the situation differently) and I can only 
make this statement about the groups I've actually looked at, which is 
a small and nonrandom sample.

One could just as easily argue that such filtering would decrease the
tendency, because people would modify their behavior to subscribe to
groups they cared about.
You're incorrectly assuming that people with clues have the time to 
subscribe to and follow every single group that might do something 
harmful.

 Also, one could just as easily argue that
working groups are just as likely to be harmed by distracting comments
from outsiders...
You could argue that.  I haven't found it to be the case.

The occasional rejection
of #2 messages can be very harmful.
Seems more likely to me that the amount of harm would be lost in the
normal noise of ietf processes.
Some noise is more harmful than others.  Some WGs have more potential 
to do harm than others, and those are the very WGs that need outside 
input.




Re: Hashing spam

2003-12-18 Thread Keith Moore
It just strikes me as highly unlikely that a WG would ever change 
course
because of what would look like random comments from outsiders -- it's
not consistent with the dynamics of a WG, or with human nature.
and that just might be one of our biggest problems, in a nutshell.