On Apr 28, 2008, at 12:31 PM, Jack Lloyd wrote: > On Mon, Apr 28, 2008 at 08:33:08AM -0600, zooko wrote: >>> AFAIK some DHTs use order preserving hash functions, but I do not >>> know >>> which ones. >> >> I've never heard of this and would be interested to learn more. >> >> It's important to distinguish between things which go under the >> rubric of "hash functions". All DHTs that I am familiar with use >> *cryptographic* hash functions, which are different beasts from the >> kind of hash function that you use in data structures. >> >> There is no such thing as an order-preserving cryptographic hash >> function, by definition (preserving order would be a violation of the >> intended security properties of a cryptographic hash function). > > It feel like you could get some of the properties of a cryptographic > hash function along with order preservation if you were willing to > tolerate a hash that was larger than the cryptographic security level.
First "problem" here is that you have lost the "crypto" properties of the hash and security has dropped to the level of whatever OPHF() provides. The reality is that this is not a big deal. Most DHT implementors use cryptographic hash functions not because there is any real need for the "cryptographic" properties of the hash but instead because the crypto community is the only one producing fast, big/ collision-resistant hashes; almost all of the other hash functions available in the standard toolkit are too small (e.g. Jenkins, Hseigh, et al top out at either 32 or 64 bits.) Very, _very_ few DHTs really need protection from malicious collision attacks and every person building a DHT who does not reach for MD4 first (or Tiger if you know all of your nodes will be on 64bit platforms) should be poked with a stick until they justify the choice :) To get back to the original issue, I would suggest reading "On Routing in Distributed Hash Tables" by Klemm at al (http://lsirpeople.epfl.ch/klemm/On%20Routing%20in%20Distributed%20Hash%20Tables.pdf ) for some thoughts on how to de-couple the peer routing from the identifier space, and turn your attention to the identifer space. The only real option here is a linear hash unless you know all of the possible keys in advance (in which case you can aim for a order- preserving minimal perfect hash) so hunt up info on linear hash algorithms, distributed linear hash functions, and dynamic hash functions and go from there. I would also advise a bit of caution here, as it is quite likely that you will end up creating a system where data reads are a _lot_ easier to manage than data writes (this sort of hash function almost always assumes a single processor or some sort of oracle that can maintain locks over bucket splits -- arbitrary insertion from any particular node that most collision-resistant hash functions offer may turn out to be impossible in such a system.) A cursory google search turned up a mailing list message from someone using an order-preserving hash function in a variant of the PAST DHT, so the other smart move would be to track this person down and get some pointers :) jim _______________________________________________ p2p-hackers mailing list [email protected] http://lists.zooko.com/mailman/listinfo/p2p-hackers
