Michael Walsh wrote: > - Adding a HTTP header with this data but requires something like a > server module or output script. It also doesn't ugly up the URL (but > then again, we have url shortner services for manual typing).
Ah, but see, that loses the security. If the URL doesn't contain the root hash, then you're depending upon somebody else for your authentication, and then it's not end-to-end anymore. URL shortener services are great for the location properties, but lousy for the identification properties. Not only are you relying upon your DNS, and your network, and every other network between you and the server, and the server who's providing you with the data.. now you're also dependent upon the operator of the shortener service, and their database, and anyone who's managed to break into their database, etc. I guess there are three new things to add: * secure identifier in the URL * an upstream request, to say what additional integrity information you want * a downstream response, to provide that additional integrity information The stuff I proposed used extra HTTP requests for those last two. But, if you use the HTTP request headers to ask for the extra integrity metadata, you could use the HTTP response headers to convey it. The only place where this would make sense would be to fetch the merkle tree, or the signature. (if the file was small and you only check the flat hash, then there's nothing else to fetch; if the file is encrypted, then you can define its data layout to be whatever you like, and just include the integrity information in it directly). Oh, and that would make the partial-range request a lot simpler: the client does a GET with a "Range: 123-456" header, and the server looks at the associated merkle tree, figures out which chain they'll need to validate those bytes, and returns a header that includes all of those hash tree nodes (and the segment size, filesize). And returns enough file data to cover those segments (i.e. the Content-Range: response would be for a larger range than the Range: request, basically rounded up to a segment size). The client would hash the segments it receives, build and verify the merkle chain, then compare the root of the chain against the roothash in the URL and make sure they match. The response headers might get a bit large: log2(filesize/segsize)*hashsize . And you have to fetch at least a full segsize. But that's predictable and fairly well bounded, so maybe it isn't that big of a deal. Doing this with HTTP headers instead of a separate GET would avoid a roundtrip, since the server (which now takes an active role in the process) can decide these things (like which hash tree nodes are needed) on behalf of the client. Instead of the client pulling one file for the data, then pulling part of another file to get the segment size (so it can figure out which segments it wants), then pulling a different part of that file to get the hash nodes... the client just does a GET Range:, and the server figures out the rest. As you said, it requires a server module. But compared to a client-side plugin, that's positively easy :). It'd probably be a good idea to think about a scheme that would take advantage of a server which had additional capabilities like this. Maybe the client could try the header thing first, and if the response didn't indicate that the server is able to provide that data, go and fetch the .hashtree file. > My thoughts purely turn to verifying files and all webpage resources > integrity in a transparent and backward compatible way. Who has not > encountered unstable connections where images get corrupted and css > files don't fully load? Solving that problem would make me very happy! Yeah. We've been having an interesting thread on tahoe-dev recently about the backwards-compatible question, what sorts of failure modes are best to aim for when you're faced with an old server or whatever. You can't make this stuff totally transparent and backwards compatible (in the sense that existing webpages and users should start benefiting from it without doing any work).. I think that's part of the brokenness of the SSL+CA model, where they wanted the only change to be starting from "https" instead of "http". But you can certainly create a framework that lets people get better control over what they're loading. Now, if only distributed programs could be written in languages with those sorts of properties... cheers, -Brian _______________________________________________ tahoe-dev mailing list [email protected] http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev
