"Zooko O'Whielacronx" <zo...@zooko.com> writes: > On the bright side, writing this letter has shown me a solution! Set M > = the number of servers on your grid (while keeping K/M the same as > it was before). So if you have 100 servers on your grid, set K=30, > H=70, M=100 instead of K=3, H=7, M=10! Then there is no small set of > servers which can fail and cause any file or directory to fail.
There's a semi-related reliability issue, which is that a grid of N servers which are each available most of the time should allow a user to store, check and repair without a lot of churn. So rather than setting M to N, I'd want to (without precise justification) set it to 0.8N or something, and use the not-yet-implemented facility to sprinkle more shares during repair without invalidating the ones that are there. For those that didn't follow my ticket, the example is assume 3/7/10. 10 shares seqN placed onto 15 servers check finds 9 shares, because only 13/15 are up. currently, this results in writing 10 shares of seqN+1 on those 13 next check, a different 13 are up, repeat Instead, of check synthesized the missing share and placed it, then there would be two copies of one share and still 10 reachable shares and then as servers fade in and out the verify process can still succeed. So for a/b availabilty and M shares, we know have M*b/a copies of M shares placed, and I think that's ok.
pgpXukcMki1u7.pgp
Description: PGP signature
_______________________________________________ tahoe-dev mailing list tahoe-dev@tahoe-lafs.org http://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev