On 6/11/07, John Bäckstrand <[EMAIL PROTECTED]> wrote:
This is very interesting. I have been looking for software that does this (the friends-backup use-case, that is) for a long time, but I never found anything that did what I wanted it to. It is still far from a perfect match though:
Sounds like you are looking for something more like CrashPlan (http://www.crashplan.com), perhaps?
... I guess its possible to setup a private cloud though. I was imagining a very simple system where I specify for each file stored how much availability I want it to have, minimum, and then just store it on that amount of nodes, no fancy FEC nor DHT at all. A good question of course is what happens when nodes go offline, but not a huge problem if you are actually using this together with a set of close friends.
Theoretically, the simple replication you are talking about is just FEC with an outrageous expansion factor and/or lowered reliability (http://oceanstore.cs.berkeley.edu/publications/papers/pdf/erasure_iptps.pdf). I believe Tahoe uses an expansion factor of 4x (correct me if I'm wrong zooko). Suppose you have 8 friends who are willing to back up your files. For the same amount of space and bandwidth, you could either use FEC and store bits of your file+parity on all 8 nodes, or you could choose 4 nodes and store a complete copy on each. In the latter case, even when half of those friends are online, you may not be able to retrieve your file (if they are the wrong 4 friends). In the former, you'll be able to recover your file even if only 2 (any two!) of those friends are online. If you are worried about reliability and performance, the FEC route chosen by Tahoe seems clearly better.
... I only care about having a few (2-10) mostly-trusted nodes, and not a whole lot about a DHT with the entire world which seems to be the point here: I feel both reliability and foremost performance will be much better in a smaller set of nodes with better connectivity.
When you say "mostly-trusted nodes," what does that mean? Do the nodes have to belong to individuals who you personally know? What if you could find reliable nodes that are controlled by strangers, and make them part of the set of nodes that you perform backup to? Could that really be any worse? I mean, my best friend's internet connection might be flaky, my mom's computer might be susceptible to viruses, my computer at work might be squirreled away behind a firewall, my brother might be prone to turn his computer off in the evenings, etc. Is it really any better to trust those computers than it would be to find computers controlled by strangers who have *demonstrably* reliable operation, and then harness enough of these so that you are virtually guaranteed to be able to recover your data? The only way to determine reliability is to measure it directly. In flŭd backup (http://www.flud.org), each node uses a localized trust metric to determine reliability, and learns to prefer demonstrably reliable nodes over time (http://www.flud.org/wiki/index.php/LocalizedTrust). Additionally, flŭd treats storage resources as a type of currency, creating an economic incentive for fairness and symmetry (http://www.flud.org/wiki/index.php/Architecture#Storage_Layer). I believe that Tahoe uses some of these same techniques, but since I am not intimately familiar, I'll let the Tahoe peeps address that. There's one more minus to using computers from people that you know: they often exhibit poor geographic diversity. It's a tired example, I know, but if you happened to live on the Gulf Coast in 2005, and were backing up mostly to other computers in the New Orleans region then chances are that even an aggresive FEC scheme might not have helped you... Alen
_______________________________________________ p2p-hackers mailing list [email protected] http://lists.zooko.com/mailman/listinfo/p2p-hackers
