Zooko Wilcox-O'Hearn wrote: >> So we need a new uploader protocol that lets us upload to an as-yet- >> unnamed slot, and then provide the slot's storage-index at the very >> end of the process. > > Remember that I really, really want this anyway, because this is > necessary to have "one-pass" == "on-line" upload.
Yeah, I'm ok with that. It requires a change to the protocol, which is significant work, but the benefits are worth it. > Argh! You are right! Another few bits needed in the readcap! Boo > hoo. :-( Just to be clear, both the set-the-SI-afterwards uploader protocol and the need for a separate SSI are for *immutable* files. We don't need the extra SSI bits for *mutable* files, because we can derive everything we need from "R" in the readcap. > Re: separate SSI (server-selection-index) value, what makes you > nervous about it? I guess it's just that we don't really know how long to make it, and that it is tied closely into the server-selection policy. Which means the cap-format parse-tree gets fuzzy. We'll start by defining a cap format which means "use Tahoe2, given an SSI from a field defined as follows..". But then to enable the second person to use a different policy, we'll have to define a second cap format (with some sort of version identifier so you can tell the difference), which means "use Tahoe17, with this other SSI thingy from over here", or to derive the SSI from existing field, or to not use an SSI at all. I'm all for crazy server-selection policies. It's just that I want filecaps to include enough information to know which policy you're expected to use. (I kind of want a gridid in there too, which is another aspect of that policy decision). Otherwise the filecaps are necessary-but-not-sufficient, and must be interpreted relative to something (perhaps a grid and a single grid-wide policy). [hm, but maybe this is ok: if you pass a filecap to someone using the same grid+policy as you, then you can pass a short one, but when you want to pass it to someone who doesn't know which grid+policy it's coming from, you have to construct a larger cap that contains that extra information. Sort of like fully-qualified domain names versus "www" in the right context being interpreted as "www.allmydata.com"] Anyways, I guess I'm anticipating confusion or contention in the filecap format definition process, when we've got several different server-selection algorithms competing for the shortest caps. But don't worry, I'll get over that anxiety soon enough :). before this line is for immutable files ====================================== after this line is for mutable files > Re: server-side validation, what do you think of my proposal in [1]? I think I like it, but I'm still trying to get my head around it. I think the terminology is confusing, and I need to draw up a picture. How about I express your idea using David-Sarah's mutable picture, but in which we define: A = H(K1enc, Kverify) B = H(R) We can compute both from the writecap, with no server interaction. Then make the following definitions: readcap = R + A[:20] server-selection-index = B verifycap = B + A storageindex = B + A partial-verifycap = B + A[:20] (derivable offline from the readcap) When the share is uploaded, the client uses the SSI (B, or a truncated version of it) to decide which servers to contact. Then it uses the SI (B+A) to index the share. The server, on its own, can check the share contents up to A, and can compare A against the SI that it was provided. It has to accept the value of B on faith. (we can put B inside the signature, but that merely proves that somebody with the given privkey agreed to use that value: it doesn't prove that B is actually the right value for that privkey). The readcap-holding client, when it wants to fetch the share, takes R and computes B, truncates it to get the SSI, then contacts those servers. Then it asks for all shares that have a storage index which starts with (B+A[:20]). (note: this requires changes to the server-side storage layout, so it can answer queries like this correctly. If the client uses an incomplete SI, which might not be unique, the protocol must allow the server to return multiple hits, which adds an extra roundtrip unless we do something clever) (also note that the writecap-holding client can submit the full B+A storage index and avoid all this ambiguity, so the protocol should probably allow queries with variable-length SI values). The job of the roadblock-attacker (who learns B+A by running a storage server and seeing the client's first share upload) is to create some alternate share, with a different pubkey Kverify' (so they can create a valid signature that will pass the hash/sig check), such that: A' = H(K1enc, Kverify') A'[:20] == A[:20] Then they submit this share to the other storage servers, with SI=B+A', who accept it as valid. When a readcap holder tries to download, they'll submit their B+A[:20] as usual, and will be given the attacker's share. They'll notice the problem during their verification step. The downloader will decrypt the correct K1 value (assuming the attacker left K1enc unmodified, otherwise they'll get some other K1' value), but since Kverify' is different, the R'=H(K1+Kverify') step will produce an R'!=R. In practice, this verification is done once per "prefix": the part of the share that includes the sequence number, merkle-tree root hash, the other UEB data, Kverify, and the signature. if no funny business is taking place then all mutable files (for the same version) will have identical prefixes. An attacker who only worked hard enough to come up with one 20-bit collision will cause the downloader to see two prefixes: one for the good shares, and one for the bad ones. So, someone with the writecap can verify everything, someone with the readcap has a 20-bit advantage over the DoS'er, and someone with the verifycap has a 128-bit advantage over the DoS'er. The storage-server can verify everything except the "B" value, but the fact that readcap holders know 20 bits of A makes it less valuable for an attacker to take advantage of this gap. Disk errors that cause B to flip will be unnoticed until the downloader is unable to locate their shares (since they'll be asking with the wrong SI). Disk errors that cause two shares to be swapped and also swap the "A" part of the SI (but leave "B" part alone) will likewise not be detected until the downloader shows up. I think this is good enough: those sorts of errors should be pretty infrequent. The "B" bitflips could conceivably be tolerated by downloaders if they had a way to ask for all shares with the given A[:20] value and then try to decode each one, but that's certainly not worth the effort. I threw in the "partial-verifycap" there just to see what its properties are.. they aren't actually very useful. It can be derived offline from the readcap. Someone who holds it has enough information to find the shares, but they'll only get 20 bits of validation on them, which is trivial. So we'd still be stuck with online attenuation, possible pre-computed and cached in the dirnodes or something. I don't know what the storage-server share layout would need to look like to support this sort of variable-length query, but I'm sure it's doable. Likewise a protocol that allows the DYHB queries to return either "yes, here's the data" (in one RTT), "no", or "we have several: A,B,C, which would you like?" (using two RTTs) is non-trivial but still feasible. cheers, -Brian _______________________________________________ tahoe-dev mailing list [email protected] http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev
