That was very well put. DHTs might seem fancy, but were not invented
just to provide us geeks something to twiddle around. There are a lot
of benefits out there, and you need to read the papers to get to know
why they are so swell. I think the implementation complexity of a DHT
(if you might call it so) can be hidden if the higher-level user
interface is simple and intuitive. Laymen do not really need to know
how the underlying DHT works. They just need to know how to use the
most basic operations to do their stuff. (You don't need to know how a
TV really works. All you need, is to know how to handle the remote. :))
Regards,
Ratul Mukhopadhyay
-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Alen Peacock
Sent: Wednesday, June 13, 2007 9:57 AM
To: theory and practice of decentralized computer networks;
[EMAIL PROTECTED]
Subject: Re: [p2p-hackers] announcing Allmydata-Tahoe v0.3
On 6/11/07, John Bäckstrand <[EMAIL PROTECTED]> wrote:
This is very interesting. I have been looking for software that does
this (the friends-backup use-case, that is) for a long time, but I
never
found anything that did what I wanted it to.
It is still far from a perfect match though:
Sounds like you are looking for something more like CrashPlan
(http://www.crashplan.com), perhaps?
... I guess its possible to setup a
private cloud though. I was imagining a very simple system where I
specify for each file stored how much availability I want it to have,
minimum, and then just store it on that amount of nodes, no fancy FEC
nor DHT at all. A good question of course is what happens when nodes
go
offline, but not a huge problem if you are actually using this
together
with a set of close friends.
Theoretically, the simple replication you are talking about is just
FEC with an outrageous expansion factor and/or lowered reliability
(http://oceanstore.cs.berkeley.edu/publications/papers/pdf/erasure_iptps.pdf).
I believe Tahoe uses an expansion factor of 4x (correct me if I'm
wrong zooko). Suppose you have 8 friends who are willing to back up
your files. For the same amount of space and bandwidth, you could
either use FEC and store bits of your file+parity on all 8 nodes, or
you could choose 4 nodes and store a complete copy on each. In the
latter case, even when half of those friends are online, you may not
be able to retrieve your file (if they are the wrong 4 friends). In
the former, you'll be able to recover your file even if only 2 (any
two!) of those friends are online. If you are worried about
reliability and performance, the FEC route chosen by Tahoe seems
clearly better.
... I only care about having a few (2-10) mostly-trusted nodes,
and not a whole lot about a DHT with the entire world which seems
to be the point here: I feel both reliability and foremost performance
will be much better in a smaller set of nodes with better
connectivity.
When you say "mostly-trusted nodes," what does that mean? Do the
nodes have to belong to individuals who you personally know? What if
you could find reliable nodes that are controlled by strangers, and
make them part of the set of nodes that you perform backup to? Could
that really be any worse? I mean, my best friend's internet
connection might be flaky, my mom's computer might be susceptible to
viruses, my computer at work might be squirreled away behind a
firewall, my brother might be prone to turn his computer off in the
evenings, etc. Is it really any better to trust those computers than
it would be to find computers controlled by strangers who have
*demonstrably* reliable operation, and then harness enough of these so
that you are virtually guaranteed to be able to recover your data?
The only way to determine reliability is to measure it directly. In
flŭd backup (http://www.flud.org), each node uses a localized trust
metric to determine reliability, and learns to prefer demonstrably
reliable nodes over time
(http://www.flud.org/wiki/index.php/LocalizedTrust). Additionally,
flŭd treats storage resources as a type of currency, creating an
economic incentive for fairness and symmetry
(http://www.flud.org/wiki/index.php/Architecture#Storage_Layer). I
believe that Tahoe uses some of these same techniques, but since I am
not intimately familiar, I'll let the Tahoe peeps address that.
There's one more minus to using computers from people that you know:
they often exhibit poor geographic diversity. It's a tired example, I
know, but if you happened to live on the Gulf Coast in 2005, and were
backing up mostly to other computers in the New Orleans region then
chances are that even an aggresive FEC scheme might not have helped
you...
Alen
The information contained in this electronic message and any
attachments to this message are intended for the exclusive use of the
addressee(s) and may contain proprietary, confidential or privileged
information. If you are not the intended recipient, you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately and destroy all copies of this message and any attachments.
WARNING: Computer viruses can be transmitted via email. The recipient
should check this email and any attachments for the presence of
viruses. The company accepts no liability for any damage caused by any
virus transmitted by this email.
www.wipro.com_______________________________________________