On Sun, 31 May 2009 20:16:35 -0600 Shawn Willden <[email protected]> wrote:
> Has there been any discussion or thought about how hard it would be > to bootstrap Tahoe storage nodes via sneakernet? Not crazy at all. Not trvial, but it's a good idea. For a first cut, run a single storage node (which can double as a client), with its NODEDIR/storage/ directory symlinked at your 2TB drive. Upload everything. Now you'll have all the shares jammed together on that one disk, each SI directory containing ten files numbered 0 through 9. A naive solution would then be to simply copy all the "0" shares to the first server, all the "1" shares to the second server, etc, splitting up the pieces however you like. This will work well-enough for reads, better if there aren't a lot of servers in your grid, due to the fact that Tahoe's read-side code falls back to asking all servers when necessary, so having shares in the "wrong" place won't really hurt. If you re-upload a file, the current (simplistic) upload code will double up shares (i.e. it will try to place them in the "right" place, and will put share#0 in that place even if it later discovers share#0 living elsewhere). There might be value in creating a tool that takes an SI and a list of nodeids, and tells you which nodeid each share wants to live on (basically just dump the serverlist-permutation step). You could build this into something which decides which shares to copy to each target drive. The mutable upload code will behave a bit better, since it will try to re-use a share in place rather than duplicating it somewhere. But here's where the write-enabler will bite you: each mutable share gets a secret that's based upon the writecap and the storage server id, so when you move mutable shares to a new server, the write-enabler won't match, and subsequent writers won't be allowed to modify the share. You'd need to finish the half-written "share migration" code to make this work: the client should be able to convince the server to replace its old write-enabler secret with a new one. #489 has some details. Without this, you couldn't modify any of the directories that you'd created centrally. You might not need to do that, if you thought of the pre-sneakernet step as preparing the cache for immutable files, and then created all the tahoe directories later (on the real grid). Another useful tool might be one which takes a dircap, computes the storageindex and correct write-enablers, then modifies a bunch of shares in-place to update the write-enablers. I imagine something which has a table of dircaps on one side, and a disk full of NODEID/storage/shares/prefix/SI/SHNUM on the other, and for each share it figures out which dircap matches, computes the right WE, then writes it into place. You'd run this tool after running some other tool which splits the shares into their correct NODEID/ directories. Oh, duh, a simpler approach: if your friends are willing to run with nodes that you've created for them, then run ten nodes locally, each of them pointing into a separate directory on your 2TB drive, each node using the real Tub certificate (and therefore the real nodeid) that it will use after it's been sneakernetted to your friend's place. Then all of the write-enablers will be exactly right, and all of the shares will be in the right place, ready to be copied over to a new home post-sneakernetting. Another possibility: I'm currently working on #467 (static specification of serverlist), and one likely direction for it is to create new types of "servers" beyond the existing native foolscap-based one (like an S3 backend, things which probably wouldn't be annouced by an Introducer). This could make it fairly clean to define a "local disk sneakernet" kind of server, with a serverid (picked to match your friend's pre-existing server) that isn't actually backed by a Tub certificate. This server class could handle writes by stashing the share somewhere on a local disk, ready for later sneakernetting. That would get all the share-locations and write-enablers right without any hassle. We could even define this class to read normally, over the network, and fall back to the disk stash if the far end doesn't actually have the share yet. You could plug in this sort of server, upload a bunch of stuff for a while, then carry over the stash to their machine and just copy the shares into place. Once you'd finished your big upload, you could switch off the writecache and let everything pass through to the real server. Incidentally, one allmydata.com project that we never got around to implementing was a send-us-your-ciphertext service, in which some tool runs on your machine and encrypts (but does not necessarily encode) your files and sticks them on a DVD. Then you mail us the DVD and we encode+upload everything from a machine in colo. Sort of like an offline Helper. That project, even if we'd actually written it, wouldn't help here, because there's no single good point to run it, but it certainly falls along the same lines. cheers, -Brian _______________________________________________ tahoe-dev mailing list [email protected] http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev
