Shawn Willden wrote: > On Saturday 03 October 2009 01:26:16 am Brian Warner wrote: >> Incidentally, we removed 1 and 2 forever ago, to squash the >> partial-information-guessing-attack. > > Makes sense. The diagrams in the docs should be updated.
Yeah, I'll see if I can get to that today. > Since this is for immutable files, there is currently no writecap or > traversalcap, just a readcap and perhaps a verifycap. This scheme > would require either adding either a share-update cap or providing a > master cap (from which share-update and read caps could be computed). So, one suggestion that follows would be to store the immutable "share-update" cap in the same dirnode column that contains writecaps for mutable files. Hm. Part of me says ok, part of me says that's bad parallelism. Why should a mutable-directory writecap-holder then get access to the re-encoding caps of the enclosed immutable files? Again, it gets back to the policy decision that distinguishes re-encoding-cap holders from read-cap holders: who would you give one-but-not-the-other to, and why? When would you be willing to be vulnerable to [whatever it is that a re-encoding cap allows] in exchange for allowing someone else to help you with [whatever it is that a re-encoding cap allows]? That sort of thing. (incidentally, I'm not fond of the term "master cap", because it doesn't actually convey what authorities the cap provides.. it just says that it provides more authority than any other cap. "re-encoding cap" feels more meaningful to me. I suppose it's possible to have a re-encoding cap which doesn't also provide the ability to read the file, in which case the master cap that lives above both re-encoding- and read- caps could be called the read-and-re-encode-cap, or something). >> Deriving the filecap without performing FEC doesn't feel like a huge >> win to me.. it's just a performance difference in testing for >> convergence, right? > > No, it's more than that. It allows you to produce and store caps for > files that haven't been uploaded to the grid yet. You can make a "this > is where the file will be if it ever gets added" cap. I still don't follow. You could hash+encrypt+FEC, produce shares, hash the shares, produce the normal CHK readcap, and then throw away the shares (without ever touching the network): this gives you caps for files that haven't been uploaded to the grid yet. Removing the share hashes just reduces the amount of work you have to do to get the readcap (no FEC). > Also, it would be possible to do it without the actual file contents, > just the right hashes, which can make a huge performance difference in > testing for convergence if the actual file doesn't have to be > delivered to the Tahoe node doing the testing. Hm, we're assuming a model in which the full file is available to some process A, and that there is a Tahoe webapi-serving node running in process B, and that A and B communicate, right? So part of the goal is to reduce the amount of data that goes between A and B? Or to make it possible for A to do more stuff without needing to send a lot of data to node B? In that case, I'm not sure I see as much of an improvement as you do. A has to provide B with a significant amount of uncommon data about the file to compute the FEC-less readcap: A must encrypt the file with the right key, segment it correctly (and the segment size must be a multiple of 'k'), build the merkle tree, and then deliver both the flat hashes and the whole merkle tree. This makes it sounds like there's a considerable amount of Tahoe-derived code running locally on A (so it can produce this information in the exact same way that B will eventually do so). In fact it starts to sound more and more like a Helper-ish relationship: some Tahoe code on A, some other Tahoe code over on B. If you've got help from your local filesystem to compute and store those uncommon hashes, then this might help. Or if you've got some other system on that side (like, say, tahoe's backupdb) to remember things for you, then it might work. But if you have those, why not just store the whole filecap there? (hey, wouldn't it be cool if local filesystems would let you store a bit of metadata about the file which would be automatically deleted if the file's contents were changed?) Hm, it sounds like some of the use case might be addressed by making it easier to run additional code in the tahoe node (i.e. a tahoe plugin), which might then let you move "B" over to where "A" is, and then generally tell the tahoe node to upload/examine files directly from disk instead of over an HTTP control+data channel. still intrigued, -Brian _______________________________________________ tahoe-dev mailing list [email protected] http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev
