Jody Harris wrote: > As a user/administrator of an ad-hoc Tahoe-lafs, there are several > assumptions that are not valid for our use case: > - All nodes do not start out with the same size storage space
The simulations looked at that case for simplicity, but Tahoe-LAFS doesn't assume it. Especially when #778 is fixed (scheduled for the next release, 1.6), a non-uniform distribution of storage space should be tolerated fairly well, *provided* that the range of capacities is not too large. However, "4 GB to 2 TB" is probably asking too much: the servers with 4 GB will inevitably receive a lot of requests to store shares that they don't have room for, and to download shares that they haven't stored. I've submitted ticket #872 about that. > - Almost all nodes will be used to store data other than Tahoe shares This has a similar effect to starting with non-uniform storage sizes. Note that each node has a reserved_space setting (see <http://allmydata.org/trac/tahoe/browser/docs/configuration.txt#L282>), and will stop accepting shares when less than that much space is left. As of release 1.6 this setting will also work for nodes running Windows (ticket #637). However, this reservation isn't currently applied to the download cache if the node is configured as a gateway (ticket #871). > - Bandwidth balancing will be more important than storage space balancing Storage balancing and bandwidth balancing are closely related: if a server holds some fraction f of the shares, then it will be expected to receive roughly a fraction f of the share requests, on average, provided that the distribution of frequencies with which shares are accessed is not too heavily skewed. Note that if a server has more storage relative to other servers, then it is probably a more recent machine and so might be able to handle a higher bandwidth. In the longer term it might be possible to measure the bandwidth and use that as an input to share rebalancing (i.e. if a server is bandwidth-constrained, move its most frequently accessed shares to servers with more available bandwidth), but that's complicated, and dependent on ticket #543. Again, the current algorithm will only do a reasonable job of bandwidth balancing under the assumption that the range of capacities is not too large. > - Nodes will be expected to join, leave and fail at random There are currently some significant limitations here, that will require fixing tickets #521 and/or #287. ---- http://allmydata.org/trac/tahoe/ticket/287 'download/upload: tolerate lost or missing servers' http://allmydata.org/trac/tahoe/ticket/521 'disconnect unresponsive servers (using foolscap's disconnectTimeout)' http://allmydata.org/trac/tahoe/ticket/543 'rebalancing manager' http://allmydata.org/trac/tahoe/ticket/637 'support "keep this much disk space free" on Windows as well as other platforms' http://allmydata.org/trac/tahoe/ticket/778 '"shares of happiness" is the wrong measure; "servers of happiness" is better' http://allmydata.org/trac/tahoe/ticket/871 'handle out-of-disk-space condition' http://allmydata.org/trac/tahoe/ticket/872 'Adjust the probability of selecting a node according to its storage capacity' -- David-Sarah Hopwood ⚥ http://davidsarah.livejournal.com
signature.asc
Description: OpenPGP digital signature
_______________________________________________ tahoe-dev mailing list [email protected] http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev
