Aaron Cordova wrote: > Beyond repair to replenish the number of desired shares when machines > go missing, does Tahoe do any creation/replication of shares to > balance the load when new machines are added to the storage grid?
Not really. If the Repairer is invoked, and there are new servers around which weren't part of the original upload set, then the current repair process will put shares on them. We're looking at improving the repairer/uploader to be more aware of existing shares (i.e. once you see the first pre-existing share, ask around to find the others before you commit to the placement sharemap), in #362. It's an open question as to whether this new uploader ought to move existing shares, or leave them alone. It's a bandwidth hit, but it gets you rebalancing, but you need more code to delete the old shares (or more likely cancel their leases and wait for expiration). If we treat shares not being in their ideal locations (i.e. not at the top of the permuted serverlist) as "damage" (albeit a rather low-priority form), and invoke the repairer on them, then moving them to their ideal locations would provide the sort of adapt-to-new-servers load-balancing you're talking about. #699 covers some of this. Of course, this happens all the time for mutable files, since every update witness a change to the serverlist, and the choice I made on that was to try to leave the shares alone. #232 is about changing that to auto-rebalancing, which I think would be a good idea. The mutable storage-server protocol already has a command to delete the share (since mutable writers can delete the contents anyways), so we can update the client code to implement a rebalancing move with copy-then-delete, and not have to change the servers at all. The immutable protocol has no such provision, because of course you don't want just anybody to be able to delete an immutable share. So it's less obvious how a client-side move-share operation should be implemented that preserves the safety of the shares. It'd be nice to have a specialized form of repair for this, though, since it can be done with a lot less work than the usual kind (where you have to generate shares). In this case, you're just moving existing shares from one place to another. If we improve the storage-server protocol, you might even be able to talk the two servers into moving the share directly, instead of pulling it down from one and then pushing it back up to the other: this would consume the same amount of their bandwidth, much less of yours. You can imagine a repairer looking at the sharemap and figuring out the minimum number of moves necessary to get all shares pushed up to the top of the permuted list. > I imagine this could be implemented using an asynchronous, low- > priority background operation, even one that runs independently of the > software as it is, that simply directs the creation or moving of > shares. Yeah, there's a ticket (#543) about creating a "rebalancing manager", which could be an out-of-band tool that talks to a bunch of storage servers at once, using some new interface. It would ask each one for a catalog of the shares it holds, use the same permutation as the clients do to figure out the best homes for those shares, and then direct the servers to push them around into the right places (and, with authority over the storage servers, it could tell them to delete the old shares right away, instead of waiting a month for them to expire). An external tool like this would be appropriate for an allmydata.com-style grid, in which you're trying to manage overall system load: the tool could move things slowly enough to avoid interference with user operations. We never got around to building it, though. I'm not sure what the answer is for a non-centrally-managed grid, since anyone you give this move-share power to can also just delete the shares, and you don't really want to give them that. Maybe something where servers vaguely trust each other, and server A is willing to delete a share after it sends it to server B and gets an ACK back. But that goes against some of the other security patterns we've established. So the copy-then-wait-to-expire approach is the best I've been able to think of so far. thanks, -Brian _______________________________________________ tahoe-dev mailing list [email protected] http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev
