Rahul Golwalkar <[email protected]> writes: > As a part of our curriculum we are supposed to contribute towards an > open source project. So, can anyone suggest a Network Security project > available in Tahoe-LAFS which requires attention. > I have read most of the material related to Tahoe-LAFS available on the > site.
I'm not Zooko, but as a semiregular ranter I'll offer my $0.02: * availability in the face of flaky servers and networks tahoe-lafs does very well at placing multiple shares of data and being able to reconstitute the original data. However, the code is not currently robust against servers that appear to be present but are flaky, and not entirely robust in the face of servers that come and go. integrity and confidentiality can be achieved through crypto, but availability is much harder. This project involves taking a system-level look at the issue of availability under two assumptions: flaky servers and malicious servers. Then, it involves code and perhaps protocol changes to mitigate any problems that are apparent. There are several specific issues already: ** non-responding servers It's known that having a server that connects to the introducer but doesn't respond to any queries slows 'tahoe check' to a crawl. The entire fix is not clear, but surely an important step is to have each client scoreboard the behavior of servers and e.g. stop waiting for them after they have been shown to be nonresponsive a few times. ** servers that won't accept shares Currently, one sees the number of servers connected, but in the pubgrid many of them are not taking shares. This should be apparent in monitoring, as the lack of awareness contributes to system-level poor availability. ** mutable file repair Currently, mutable file repair seems to place shares of an incremented seqN++. My opinion is that this is the wrong choice and instead missing shares of the current sequence should be regenerated and placed. It would be interesting to build a simulator that has (different) poisson distriutions for on and off and run this against clients that place a hierarchy and periodically 'tahoe deep-check --verify --repair --add-lease' or equivalent. * read-only vs writable introducer caps Currently there is a volunteergrid, but the introducer cap is guarded because there is a potential leaching problem, in terms of storage used vs provided to the group. Having a read-only introducer cap would help; this would let people connect to the grid and fetch shares but not upload them. * quotas In a shared grid of multiple people, a natural desire is to make sure everyone is being evenhanded in terms of resource consumption vs provision, at least as soon as things become full. Typical filesystems have quotas, or someone runs du and yells at people, but in tahoe one can't do that (and that's a feature). A possible way to do this is to have leases on shares be associated with some 'storage use capability', and perhaps this should be via digital cash. Someone who provides 1 TB of share storage for a month would perhaps get 500G-months of share storage credits. The trick is to do this without breaking any of the security properties tahoe-lafs already has. * NAT problems The pubgrid currently has servers that are unreachable via their advertised addresses. However, storage servers with real addresses connect to them, surely because the NAT/FW-impaired servers connect out. However, client nodes cannot use these servers. So, files placed by nodes offering storage cannot in general be retrieved by nodes not offering storage, or if so they won't be healthy. This problem is really a subcase of 'flaky servers'. The challenge is to find some way to deny unreachable servers from being part of the storage grid while not opening up any opportunities for an adversary to manipulate the grid into a non-working state. It's possible that a client-side fix not to advertise RFC1918 addresses would take the edge of this problem. Another approach is a distributed directory of performance data conveyed back to the introducer. Each node could sign a statement about each storage node saying what it can connect to and whether they are taking shares. But, publishing lots of data could have privacy implications.
pgpLZNaVAyvSR.pgp
Description: PGP signature
_______________________________________________ tahoe-dev mailing list [email protected] http://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev
