> > Even with 40 machines, you shouldn't run into any issues with scalability > from a performance perspective. The commercial allmydata site had far more > than that. > > One thing you'll want to think carefully about is your choice of encoding > parameters M and K. What you want to make sure won't happen is for M-K > servers to be down simultaneously, because if you have enough files, the > loss of any M-K servers will make some of those files unavailable. If you > have deep directory trees, it becomes very likely that any given file is > unavailable, because losing any directory node between root and leaf makes > the leaf unavailable. >
this may sound like a silly question, I was reading through the configuration.txt file again to look up the M and K values, but which one is the M value? shares.needed = (int, optional) aka "k", default 3 shares.total = (int, optional) aka "N", N >= k, default 10 shares.happy = (int, optional) 1 <= happy <= N, default 7 I get the impression M is the total number of shares (or machines?) > > With only three servers, if the data is important I'd go with K=1, M=3. > That means that each server will store all of the data for all of the > files. K=2 would allow you to lose only one server, a second problem would > result in the loss of all data. > > With 40 servers, I'd set M close to 40. Maybe 35. Then I'd set K to about > 25. That means you'd have to lose 10 servers before any healthy files were > lost. Losing 16 servers would lose all of your data, but it's vanishingly > unlikely that any independent sort of failure mode would take out that many > before you could run a repair, and any failure that affects that many would > probably affect all of them anyway. > > If you're paranoid you could go with K=20 or even K=15, but I don't think > your reliability would be any higher in practice. > > Note that Tahoe has not been tested much with larger values of K and M. > Nearly all usage has been with the default values of K=3, M=10. I wouldn't > expect that using larger values would uncover data-loss bugs, but it might > uncover some performance issues. Probably not, but you'd need to test to > be sure. > > One thing you wouldn't need to worry about much, IMO, is performance > degrading as more files are added. Performance is dependent on bandwidth, > number of servers and encoding parameters, it's not really sensitive to data > volumes or file counts. > > thanks for the above notes, its certainly has clarified a lot for me.
_______________________________________________ tahoe-dev mailing list [email protected] http://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev
