It would be my pleasure.  But I won't have time to do it until the weekend.

It might be faster, and all-around better, to create a unit test that exercises the scenario in my original message. Then my buildbot (which has way more free time than I do) can try it for me.

Incidentally, I understand how I created that scenario. The machine that had all the shares is always on, and runs deep-check --repair crons. My other machines aren't reliably on the grid, so after repeated repair operations, the always-on machine tends to get a lot of shares. Eventually, it accumulated shares.needed, and then a repair happened while it was the only machine on the grid. Because repair didn't care about shares.happy, this machine got all shares.total shares. Then, because an upload cares about shares.happy but wouldn't rebalance, it had to fail.

A grid whose nodes don't have similar uptime is surprisingly fragile. Failure of that single always-on machine makes the file totally unretrievable, definitely not the desired behavior.



On 09/16/13 09:57, Zooko O'Whielacronx wrote:
Dear Kyle:

Could you try Mark Berger's #1382 patch on your home grid and tell us
if it fixes the problem?

https://tahoe-lafs.org/trac/tahoe-lafs/ticket/1382# immutable peer
selection refactoring and enhancements

https://github.com/tahoe-lafs/tahoe-lafs/pull/60

Regards,

Zooko
_______________________________________________
tahoe-dev mailing list
tahoe-dev@tahoe-lafs.org
https://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev


--
Kyle Markley

_______________________________________________
tahoe-dev mailing list
tahoe-dev@tahoe-lafs.org
https://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev

Reply via email to