An anonymous hacker offered to help out with any GSoC projects about integrating Tahoe-LAFS and Tor. Here is my reply:
Thanks! Right now we don't have a Tahoe-LAFS+Tor project on our GSoCIdeas page [1]. It would be nice to add one, but it needs to have enough detail that a student can get started on the right track from it. Also, it needs to have enough "meat" to keep a student busy all summer. I think the right way to do that is to make the project be #467/#573 (allow the user to specify which servers are used for uploads). Once #467/573 is done, then we can configure Tahoe-LAFS to upload a few shares to Tor-hidden-servers (K shares -- just enough to download) and the rest of the shares to non-Tor-hidden-servers. That would mean that content gets downloaded from the non-hidden-servers exclusively except in the case that the non-hidden-servers are unavailable, in which case the Tahoe-LAFS downloader will automatically fall back to downloading from the Tor-hidden-servers. I think, based mostly on what Harold Gonzales told me, that this is the best way to structure this because: (a) It minimizes the load on Tor when the content is not under attack, which is important because Tor doesn't handle bulk data loads very nicely and bulk data loads destroy the latency of the interactive loads (like ssh sessions or interactive web sessions). (b) It optimizes the performance experienced by users when the content they are viewing is not under attack. This is important so that we are not asking users to endure a performance penalty for all of their normal web browsing just so that they can be using an attack-resistant service. The goal is that this would perform well enough and be convenient enough to serve as a normal way to host static files. (c) But, if the non-hidden servers *were* to disappear, or start serving up corrupted data, or just become reallly reallly slow, or something, then the Tahoe-LAFS storage client would automatically use shares served by the hidden servers. The result should be that the content is very attack-resistant. Another big advantage of doing it this way is that #467/#573 is also wanted by other users with completely different use cases. #467/#573 is what distributed database folks call "rack awareness", meaning that they want to ensure that shares get spread across multiple racks and not just across multiple servers that might happen to be in the same rack, because a major unfortunate event (usually power related, it seems) could disconnect or even damage multiple servers in the same rack. The generalization of this, of course, is "location awareness" or even more generally "correlated-failure awareness". You don't want all your shares stored on servers that all sit above the San Andreas Fault, you don't want all your shares stored on servers that are operated by the same sysadmin team (even if the servers are isolated from one another in physical and geographical dimensions), etc. etc. #467/#573, if done right, should simultaneously satisfy the Tahoe-LAFS-over-Tor use case and the "rack awareness" use case, as well as others. See the tickets for a list of requests we've received for different use cases. One of them is the "Shawn Willden's mom" use case, which is that files which are family photos have to have at least K shares stored on Shawn's mom's home computer so that she can view them instantaneously. :-) Regards, Zooko [1] http://tahoe-lafs.org/trac/tahoe-lafs/wiki/GSoCIdeas http://allmydata.org/trac/tahoe-lafs/ticket/467# allow the user to specify which servers are used for uploads http://allmydata.org/trac/tahoe-lafs/ticket/573# Allow client to control which storage servers receive shares _______________________________________________ tahoe-dev mailing list [email protected] http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev
