On Apr 28, 2008, at 12:31 PM, Jack Lloyd wrote:

> On Mon, Apr 28, 2008 at 08:33:08AM -0600, zooko wrote:
>>> AFAIK some DHTs use order preserving hash functions, but I do not  
>>> know
>>> which ones.
>>
>> I've never heard of this and would be interested to learn more.
>>
>> It's important to distinguish between things which go under the
>> rubric of "hash functions".  All DHTs that I am familiar with use
>> *cryptographic* hash functions, which are different beasts from the
>> kind of hash function that you use in data structures.
>>
>> There is no such thing as an order-preserving cryptographic hash
>> function, by definition (preserving order would be a violation of the
>> intended security properties of a cryptographic hash function).
>
> It feel like you could get some of the properties of a cryptographic
> hash function along with order preservation if you were willing to
> tolerate a hash that was larger than the cryptographic security level.

First "problem" here is that you have lost the "crypto" properties of  
the hash and security has dropped to the level of whatever OPHF()  
provides.  The reality is that this is not a big deal.  Most DHT  
implementors use cryptographic hash functions not because there is any  
real need for the "cryptographic" properties of the hash but instead  
because the crypto community is the only one producing fast, big/ 
collision-resistant hashes; almost all of the other hash functions  
available in the standard toolkit are too small (e.g. Jenkins, Hseigh,  
et al top out at either 32 or 64 bits.)   Very, _very_ few DHTs really  
need protection from malicious collision attacks and every person  
building a DHT who does not reach for MD4 first (or Tiger if you know  
all of your nodes will be on 64bit platforms) should be poked with a  
stick until they justify the choice :)

To get back to the original issue, I would suggest reading "On Routing  
in Distributed Hash Tables" by Klemm at al 
(http://lsirpeople.epfl.ch/klemm/On%20Routing%20in%20Distributed%20Hash%20Tables.pdf
 
) for some thoughts on how to de-couple the peer routing from the  
identifier space, and turn your attention to the identifer space.  The  
only real option here is a linear hash unless you know all of the  
possible keys in advance (in which case you can aim for a order- 
preserving minimal perfect hash) so hunt up info on linear hash  
algorithms, distributed linear hash functions, and dynamic hash  
functions and go from there.  I would also advise a bit of caution  
here, as it is quite likely that you will end up creating a system  
where data reads are a _lot_ easier to manage than data writes (this  
sort of hash function almost always assumes a single processor or some  
sort of oracle that can maintain locks over bucket splits -- arbitrary  
insertion from any particular node that most collision-resistant hash  
functions offer may turn out to be impossible in such a system.)

A cursory google search turned up a mailing list message from someone  
using an order-preserving hash function in a variant of the PAST DHT,  
so the other smart move would be to track this person down and get  
some pointers :)

jim

_______________________________________________
p2p-hackers mailing list
[email protected]
http://lists.zooko.com/mailman/listinfo/p2p-hackers

Reply via email to