My apologies. I did not mean to steer anyone away from Beehive or spread misinformation about it, but my memory failed to serve me correctly. And I have not read the Honeycomb paper, but it sounds interesting and I will read it when I find time this weekend.

If the OP was trying to use replication to decrease lookup times, I still posit that parallelizing lookups could be an easy, alternative solution. Just thinking out loud, given a bandwidth budget, I wonder what the sweet spot is between using bandwidth to replicate (and preserve) data, and using bandwidth to improve lookup times.

Anyway...
- Mike


On Jul 25, 2006, at 10:10 PM, Emin Gun Sirer wrote:

Hi Mike,

One of the nicest features of Beehive is that it enables the system
designer to specify the performance target for the system, and it then
performs just the bare minimum replication necessary to achieve that
goal.

If you set a target lookup goal of (log N) hops, Beehive will simply
make no additional replicas to improve performance. It will still make
k' replicas to compensate for failures of the home node. You choose k'
just like you choose "r-successors" in DHash.

If you set a performance target of, say, 0.5 hops for lookups, Beehive
will make just enough replicas such that on average, more than 50% of
lookups are satisfied at the first overlay node. Some queries will take another hop, some will require two, some might require up to log N, for
an average lookup path length of 0.5 hops.

If you were running, say, a P2P videoclips-on-demand service, you could
guarantee that half the clips that a user selects are already on their
home machine. The number of additional replicas to go from (log N) to,
say, 0.5 hop lookups is often quite modest. Our NSDI'04 paper has a
graph showing the bandwidth vs. lookup-time tradeoff.

It would be incorrect to claim that Beehive is aggressive in its
replication. It gives you a pretty convenient knob with which you can
adjust the performance goals of the system. It does just enough
replication to achieve the desired goal. By "just enough," we mean
"absolutely optimal" if your app is amenable to the original Beehive
approach (NSDI'04), or "near-optimal" if you are using the more general
Honeycomb technique (NSDI'06).

We built a few systems based on Beehive/Honeycomb:
        CoDoNS (SIGCOMM'04), a safety net and replacement for DNS,
        Corona (NSDI'06), a next-gen RSS replacement,
        and CobWeb, and Akamai-like web cache that fields >10 million
        hits/day.
So it's a practical algorithm, not very hard to layer on top of Pastry,
Chord, Kademlia, Koorde, Viceroy, SkipNets, and many other structured
overlays. Feel free to contact us (beehive AT systems DOT cs DOT cornell
DOT edu) if you have any questions about the system.

Best,
Gun,
Venugopalan "Rama" Ramasubramanian,
Ryan Peterson,
Yee Jiun Song.

On Tue, 2006-07-25 at 20:35 -0700, Michael Parker wrote:
Although it's been awhile since I've read the paper, IIRC, Beehive's
replication policy is very aggressive. If your goal is to replicate so
that the data remains on the network under the harshest levels of
churn imaginable, then Beehive would be a good idea. But if you're not expecting this sort of workload, and your primary goal for replication
was to make retrieving the data quicker, it might be better to use a
more "modest" but proven replication strategy, such as DHash, and then
look into speeding up lookups using parallelism. (See Kademlia,
Epichord, or Accordion.)


- Mike



On Jul 25, 2006, at 6:44 AM, Bob Harris wrote:

Seems like replication is a constant topic around here. Beehive is
the state of
the art in replication. They can turn a O(lg N) overlay like chord
or pastry into
an O(1) overlay purely via replication.

Bob.


_______________________________________________
p2p-hackers mailing list
[email protected]
http://lists.zooko.com/mailman/listinfo/p2p-hackers

_______________________________________________
p2p-hackers mailing list
[email protected]
http://lists.zooko.com/mailman/listinfo/p2p-hackers

Reply via email to