Re: Approximate hashes

2004-09-06 Thread Len Sassaman
On Wed, 1 Sep 2004, Marcel Popescu wrote:

 Hence my question: is there some approximate hash function (which I could
 use instead of SHA-1) which can verify that a text hashes very close to a
 value? So that if I change, say, tabs into spaces, I won't get exactly the
 same value, but I would get a good enough?

Hi Marcel,

You may wish to look at Cmeclax's nilsimsa. It has been used to detect
slightly-modified message floods in anonymous remailer systems, and was
also used in Spamassassin at some point.

http://lexx.shinn.net/cmeclax/nilsimsa.html



-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: Approximate hashes

2004-09-01 Thread Marcel Popescu
From: Marcel Popescu [EMAIL PROTECTED]

 Hence my question: is there some approximate hash function (which I
could
 use instead of SHA-1) which can verify that a text hashes very close to
a
 value? So that if I change, say, tabs into spaces, I won't get exactly the
 same value, but I would get a good enough?

I just had an idea. Would this work?

- let S be the input string, whose hash I want to verify
- make S uppercase
- remove everything but A-Z, 0-9, and common punctuation (!;:',.?)
- calculate the SHA1 hash of the result

This should keep any insignificant changes out of the final result. Does
anyone know of a mail transformation which could upset it? Can anyone see a
way to attack this by letting a significantly different message collide on
the same hash? (I'm ignoring the recent discoveries - they're not that
practical, I'm only trying to fight spam, not the government.)

Thanks,
Marcel

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: Approximate hashes

2004-09-01 Thread C. Scott Ananian
On Wed, 1 Sep 2004, Marcel Popescu wrote:

 My problem is that I don't know what happens with the email in transit
 some mail server might dislike ASCII characters with the high bit set, or
 Hence my question: is there some approximate hash function (which I could

PGP has this issue with 'clear-signed' output.  An excerpt from the pgp
man page:
If CLEARSIG is enabled, then when signing and ASCII-armoring a text
file, PGP uses a different format that includes the plaintext in
human-readable form.  Lines beginning with - are quoted with - .
To cope with some of the stupider mailers in the world, lines
beginning with From  are also quoted, and trailing whitespace on
lines is stripped.  PGP will remove the quoting if you use it to
decrypt the message, but the trailing whitespace is not recovered.
This is still useful enough to be enabled by default.
You might find more details about typical mailer-munging from the PGP docs
or source.
 --scott

GRALLSPICE Moscow MKSEARCH affinity group KUDESK Nazi operative Clinton
   Register to vote!  http://www.yourvotematters.org/VerifiedVoting
 ( http://cscott.net/ )

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: Approximate hashes

2004-09-01 Thread Marcel Popescu
From: Hal Finney [EMAIL PROTECTED]

 As you are probably aware, existing hashcash implementations do not base
 the stamp on the message content.  Instead they only lock the stamp to
 the receiver's email address.  Then the receiver keeps a list of the
 hashcash stamps he has seen recently, to prevent reuse.  I'm not sure
 what you hope to gain by locking the stamp to the message content.

Me dumb, sorry. I was actually thinking of some other thing I wanted to do
about spam (which involved the whole content), and haven't re-read the
hashcash site in about a week.

You are perfectly right, of course. Windows spam victims, here I come! g

Thanks,
Marcel

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


Re: Approximate hashes

2004-09-01 Thread David Honig
At 06:02 PM 9/1/04 +0300, Marcel Popescu wrote:
From: Marcel Popescu [EMAIL PROTECTED]

 Hence my question: is there some approximate hash function (which I
could
 use instead of SHA-1) which can verify that a text hashes very close to
a
 value? So that if I change, say, tabs into spaces, I won't get exactly the
 same value, but I would get a good enough?

This is completely what secure hashes are supposed to prevent.  *Any* change
in the input will flip half the hash-bits on average.  Just like a block
cipher.  There is no input nearness preservation.  That's part of the point
of the algorithm.


I just had an idea. Would this work?

- let S be the input string, whose hash I want to verify
- make S uppercase
- remove everything but A-Z, 0-9, and common punctuation (!;:',.?)
- calculate the SHA1 hash of the result

This should keep any insignificant changes out of the final result. 

You can encode your message in some format which is not subject
to mangling, and use a hash of that encoding.  You can then
decode the encoding back into unicode or whatever funky but
net-fragile character set you like.  This is somewhat like
ascii-armoring of PGP-encrypted messages.





=
36 Laurelwood Dr
Irvine CA 92620-1299

VOX: (714) 544-9727 (home) mnemonic: P1G JIG WRAP

ICBM: -117.7621, 33.7275
HTTP: http://68.5.216.23:81 (back up, but not 99.999% reliable)
PGP PUBLIC KEY: by arrangement

Send plain ASCII text not HTML lest ye be misquoted

--

Don't 'sir' me, young man, you have no idea who you're dealing with
Tommy Lee Jones, MIB



No, you're not 'tripping', that is an emu ---Hank R. Hill

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


RE: Approximate hashes

2004-09-01 Thread Keith Ray
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Marcel Popescu
 Sent: Wednesday, September 01, 2004 9:56 AM
 To: [EMAIL PROTECTED]
 Subject: Approximate hashes
 
 I am trying to build a Windows anti-spam thingy; it's 
 supposed to sit in
 between the mail client and the outer world, and indicate through mail
 headers whether the incoming mail has a valid hashcash
 http://www.hashcash.org/ coin (and, of course, to automatically add
 hashcash to outgoing emails).
 
 My problem is that I don't know what happens with the email in transit
 (this, I believe, is an observation in the hashcash FAQ). I 
 am worried that
 some mail server might dislike ASCII characters with the high 
 bit set, or
 that a client uses some encoding which for some reason 
 doesn't make it to
 the destination unchanged.
 
 Hence my question: is there some approximate hash function 
 (which I could
 use instead of SHA-1) which can verify that a text hashes 
 very close to a
 value? So that if I change, say, tabs into spaces, I won't 
 get exactly the
 same value, but I would get a good enough?
 
 I don't know if this is possible. But if it is, I though this 
 would be a
 good place to find out about it.

nilsimsa

Computes nilsimsa codes of messages and compares the codes and finds
clusters of similar messages so as to trash spam.

What's a nilsimsa code?

A nilsimsa code is something like a hash, but unlike hashes, a small change
in the message results in a small change in the nilsimsa code.

http://lexx.shinn.net/cmeclax/nilsimsa.html

 --
Keith Ray [EMAIL PROTECTED] -- OpenPGP Key: 0x79269A12

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]


RE: Approximate hashes

2004-09-01 Thread Jerrold Leichter
| nilsimsa
| Computes nilsimsa codes of messages and compares the codes and finds
| clusters of similar messages so as to trash spam.
|
| What's a nilsimsa code?
|
| A nilsimsa code is something like a hash, but unlike hashes, a small change
| in the message results in a small change in the nilsimsa code.
|
| http://lexx.shinn.net/cmeclax/nilsimsa.html
I had a look at the code (which isn't easy to follow).  This appears to be a
new application of Bloom filters.
-- Jerry

-
The Cryptography Mailing List
Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]