Re: Approximate hashes
On Wed, 1 Sep 2004, Marcel Popescu wrote: Hence my question: is there some approximate hash function (which I could use instead of SHA-1) which can verify that a text hashes very close to a value? So that if I change, say, tabs into spaces, I won't get exactly the same value, but I would get a good enough? Hi Marcel, You may wish to look at Cmeclax's nilsimsa. It has been used to detect slightly-modified message floods in anonymous remailer systems, and was also used in Spamassassin at some point. http://lexx.shinn.net/cmeclax/nilsimsa.html - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]
Re: Approximate hashes
From: Marcel Popescu [EMAIL PROTECTED] Hence my question: is there some approximate hash function (which I could use instead of SHA-1) which can verify that a text hashes very close to a value? So that if I change, say, tabs into spaces, I won't get exactly the same value, but I would get a good enough? I just had an idea. Would this work? - let S be the input string, whose hash I want to verify - make S uppercase - remove everything but A-Z, 0-9, and common punctuation (!;:',.?) - calculate the SHA1 hash of the result This should keep any insignificant changes out of the final result. Does anyone know of a mail transformation which could upset it? Can anyone see a way to attack this by letting a significantly different message collide on the same hash? (I'm ignoring the recent discoveries - they're not that practical, I'm only trying to fight spam, not the government.) Thanks, Marcel - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]
Re: Approximate hashes
On Wed, 1 Sep 2004, Marcel Popescu wrote: My problem is that I don't know what happens with the email in transit some mail server might dislike ASCII characters with the high bit set, or Hence my question: is there some approximate hash function (which I could PGP has this issue with 'clear-signed' output. An excerpt from the pgp man page: If CLEARSIG is enabled, then when signing and ASCII-armoring a text file, PGP uses a different format that includes the plaintext in human-readable form. Lines beginning with - are quoted with - . To cope with some of the stupider mailers in the world, lines beginning with From are also quoted, and trailing whitespace on lines is stripped. PGP will remove the quoting if you use it to decrypt the message, but the trailing whitespace is not recovered. This is still useful enough to be enabled by default. You might find more details about typical mailer-munging from the PGP docs or source. --scott GRALLSPICE Moscow MKSEARCH affinity group KUDESK Nazi operative Clinton Register to vote! http://www.yourvotematters.org/VerifiedVoting ( http://cscott.net/ ) - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]
Re: Approximate hashes
From: Hal Finney [EMAIL PROTECTED] As you are probably aware, existing hashcash implementations do not base the stamp on the message content. Instead they only lock the stamp to the receiver's email address. Then the receiver keeps a list of the hashcash stamps he has seen recently, to prevent reuse. I'm not sure what you hope to gain by locking the stamp to the message content. Me dumb, sorry. I was actually thinking of some other thing I wanted to do about spam (which involved the whole content), and haven't re-read the hashcash site in about a week. You are perfectly right, of course. Windows spam victims, here I come! g Thanks, Marcel - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]
Re: Approximate hashes
At 06:02 PM 9/1/04 +0300, Marcel Popescu wrote: From: Marcel Popescu [EMAIL PROTECTED] Hence my question: is there some approximate hash function (which I could use instead of SHA-1) which can verify that a text hashes very close to a value? So that if I change, say, tabs into spaces, I won't get exactly the same value, but I would get a good enough? This is completely what secure hashes are supposed to prevent. *Any* change in the input will flip half the hash-bits on average. Just like a block cipher. There is no input nearness preservation. That's part of the point of the algorithm. I just had an idea. Would this work? - let S be the input string, whose hash I want to verify - make S uppercase - remove everything but A-Z, 0-9, and common punctuation (!;:',.?) - calculate the SHA1 hash of the result This should keep any insignificant changes out of the final result. You can encode your message in some format which is not subject to mangling, and use a hash of that encoding. You can then decode the encoding back into unicode or whatever funky but net-fragile character set you like. This is somewhat like ascii-armoring of PGP-encrypted messages. = 36 Laurelwood Dr Irvine CA 92620-1299 VOX: (714) 544-9727 (home) mnemonic: P1G JIG WRAP ICBM: -117.7621, 33.7275 HTTP: http://68.5.216.23:81 (back up, but not 99.999% reliable) PGP PUBLIC KEY: by arrangement Send plain ASCII text not HTML lest ye be misquoted -- Don't 'sir' me, young man, you have no idea who you're dealing with Tommy Lee Jones, MIB No, you're not 'tripping', that is an emu ---Hank R. Hill - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]
RE: Approximate hashes
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marcel Popescu Sent: Wednesday, September 01, 2004 9:56 AM To: [EMAIL PROTECTED] Subject: Approximate hashes I am trying to build a Windows anti-spam thingy; it's supposed to sit in between the mail client and the outer world, and indicate through mail headers whether the incoming mail has a valid hashcash http://www.hashcash.org/ coin (and, of course, to automatically add hashcash to outgoing emails). My problem is that I don't know what happens with the email in transit (this, I believe, is an observation in the hashcash FAQ). I am worried that some mail server might dislike ASCII characters with the high bit set, or that a client uses some encoding which for some reason doesn't make it to the destination unchanged. Hence my question: is there some approximate hash function (which I could use instead of SHA-1) which can verify that a text hashes very close to a value? So that if I change, say, tabs into spaces, I won't get exactly the same value, but I would get a good enough? I don't know if this is possible. But if it is, I though this would be a good place to find out about it. nilsimsa Computes nilsimsa codes of messages and compares the codes and finds clusters of similar messages so as to trash spam. What's a nilsimsa code? A nilsimsa code is something like a hash, but unlike hashes, a small change in the message results in a small change in the nilsimsa code. http://lexx.shinn.net/cmeclax/nilsimsa.html -- Keith Ray [EMAIL PROTECTED] -- OpenPGP Key: 0x79269A12 - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]
RE: Approximate hashes
| nilsimsa | Computes nilsimsa codes of messages and compares the codes and finds | clusters of similar messages so as to trash spam. | | What's a nilsimsa code? | | A nilsimsa code is something like a hash, but unlike hashes, a small change | in the message results in a small change in the nilsimsa code. | | http://lexx.shinn.net/cmeclax/nilsimsa.html I had a look at the code (which isn't easy to follow). This appears to be a new application of Bloom filters. -- Jerry - The Cryptography Mailing List Unsubscribe by sending unsubscribe cryptography to [EMAIL PROTECTED]