N.B. there are four different arguments about issue #302 in this letter; don't mix them up.
On Monday,2009-10-19, at 13:34 , Brian Warner wrote: > .. and we'd suffer from the "lumpy distribution" problems that are > discussed in ticket #302, where servers get unequal load depending > upon where they sit in the ring, and where servers who become full > can dump inordinate amounts of traffic on the poor node just > clockwise from them. ... [reordering quotes from your message] > I still believe that permuted-ring gives us better overall > behavior. I'm willing to be proved wrong, though :). 1. I wrote a simulation which convinced me that this is wrong -- that both share placement algorithms have an indistinguishable (and highly varying) pattern of servers filling up. However, the results that I posted to tahoe-dev were confusing and hard to follow, and you seem to have ignored them. I see that I didn't link them in from #302 either. I should go find that letter in tahoe-dev archives and link to it from #302. Here it is: http://allmydata.org/pipermail/ tahoe-dev/2008-July/000676.html . > And, an attacker who took out several neighboring-in-id-space > servers would kill or seriously damage several files (if they took > out N-k+1 consecutive servers in a non-trivially utilized grid, > they'd be guaranteed to kill some files). [snip more interesting consequences of placement strategy on an attacker who attacks many servers] 2. Neat! I hadn't thought of this malicious case before. Perhaps you could add a link from #302 to your letter about the malicious case. 3. We were already familiar with designs like Chord (and Chord File System) and Kademlia when you (Brian) came up with the "permute per fileid" trick. It should be seen as an (arguable) improvement on those designs. However, Chord and Kademlia have been deployed with success, sometimes on a massive scale -- e.g. Cassandra DB [1] and Vuze [2] -- where load-balancing is also an issue. This suggests that either this phenomenon isn't a problem in many situations in practice (which would be consistent with my simulation -- argument 1) or that the designers of Cassandra DB and Vuze ought to think about adopting the permute-per-fileid trick (or both). In fact, Cassandra's unique appeal among "post-relational" (a.k.a. "nosql") databases is that it supports range queries, and the way it does so relies upon the "natural" chord ordering. If you're familiar with Chord, you can think of Cassandra as being a lot like Chord plus the added feature of *not* running the keys through a secure hash to load-balance them. This is an "improvement" on Chord in the opposite direction of our improvement! :-) It makes the load-balancing properties much *worse* than the standard Chord load-balancing. Assuming anyone actually uses Cassandra in this mode, then this demonstrates that the sort of "balancing tools" discussed in e.g. [3] are usable to some people. 4. This thread started because Shawn Willden needed to do some mucking about with his shares, and the permute-per-fileid feature makes it harder for him to muck his shares. This is a real live example of my argument (in e.g. http://allmydata.org/pipermail/tahoe- dev/2008-July/000672.html ) that the simpler placement strategy can help people administer their Tahoe-LAFS deployments. I need to link to this thread from #302 and claim that this is an example. Of all these four arguments, I think argument 4 is the most important. I think the next steps here are to document the arguments better on ticket #302 and also to create a new separate ticket which is all about making per-file-permutation optional (leaving #302 as the repository of arguments about which is better in what situations, whether the default should be changed, etc.). Regards, Zooko tickets mentioned in this email: http://allmydata.org/trac/tahoe/ticket/302 # stop permuting peerlist, use SI as offset into ring instead? [1] http://wiki.apache.org/cassandra/PoweredBy # says Cassandra is used for inbox search at Facebook, which is up to 40 TB of data across 120 machines in two separate data centers [2] http://vanish.cs.washington.edu/pubs/usenixsec09-geambasu.pdf # says that Vuze has a million nodes at a time, spanning the globe [3] http://allmydata.org/trac/tahoe/ticket/302#comment:7 _______________________________________________ tahoe-dev mailing list [email protected] http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev
