First, just review the birthday problem. Collision free set of k
uniformly selected objects from a total set of N occurs with
probability:
(1)(1-1/N)(1-2/N)...(1-(k-1)/N)
If k << N, you can approximate the above by taking the ln, and noting
that ln(1-x) ~ -x for small x:
(1)(1-1/N)(1-2/N)...(1-(k-1)/N) ~ exp(-k(k-1)/(2N))
If you generate keys are rate R we can ask when you expect to have a collision:
k = Rt, and solve for exp(k(k-1)/2N) ~ 1/2. Just to get the order,
you get the square-root result: t = (1/R) \sqrt{N}, or the maximum
rate: R = (1/t)\sqrt{N}
In your second case, if we assume one timestamp per time P (or P units
of time per timestamp), a maximum time T, and smaller set of names: n.
We have a maximum number of timestamps as T/P. To compare, total
names are N=nT/P. A collision happens if you choose the same name in
the set n, at the same timestamp. If we assume key generation rate R
per time, then you have RP keys per timestamp. So, use the above
formula with k = RP and maximum number of names n = NP/T:
approximately: (RP)^2 T/PN = 1, so T = N/(PR^2). The maximum key rate
this can support: R = \sqrt{N/PT}. You can recover the original
result if have one big timestamp: P = T, then R = (1/T) \sqrt{N}. To
support a bigger R, we can control the parameter P. Clearly P <= T,
so the timestamp approach is always better when compared this way
(because \sqrt{N/PT} >= \sqrt{N/T^2} = (1/T)\sqrt{N}. So, choose P as
small as is practical: so, the unit of time for the least precise
timer you want to invoke. I guess you want to make sure T is
ridiculously large to avoid Y2K issues.
So, seems like a good idea. I would add one additional issue however:
timestamps increase monotonically. You might want to use some
one-to-one mapping for the bits of the timestamp that destroys the
monotonicity property before using them as parts of keys in a
hashtable.
I guess this is well known, or I made a mistake in the analysis.
Best,
On Fri, Apr 24, 2009 at 2:54 AM, Russ Weeks <[email protected]> wrote:
> Hi, All.
> I'm playing around with building some data structures backed by a
> conventional DHT like Chord or Kademlia. It seems like many of the elements
> in these structures don't need a semantically useful key. Take a linked
> list, for instance. I need to be able to do a "get" on the head or tail of
> the list, but the keys of the interior elements don't need to be exposed to
> the caller. This suggests that I can use random data for these keys. My
> question is, what's the best way to generate these keys.
> The obvious approach is just to fill the key (say 128 bits) with random
> data. I think I understand the birthday problem and how to estimate the
> probability that a collision will occur.
> An alternative approach that was proposed to me was to use 64 bits of random
> data and a 64-bit timestamp. Here, we hope that the random number generator
> is not seeded with the system time. The theory is, the odds of generating 2
> identical 64-bit numbers in the same nanosecond is less than the odds of
> generating 2 128-bit numbers over the lifetime of the system. If this is
> true, it's not clear to me.
> Can anybody suggest a way to quantify the likelyhood of collision in the
> second approach?
> Thanks,
> Russ
> _______________________________________________
> p2p-hackers mailing list
> [email protected]
> http://lists.zooko.com/mailman/listinfo/p2p-hackers
>
>
--
P. Oscar Boykin http://boykin.acis.ufl.edu
Assistant Professor, Department of Electrical and Computer Engineering
University of Florida
_______________________________________________
p2p-hackers mailing list
[email protected]
http://lists.zooko.com/mailman/listinfo/p2p-hackers