Dear Rufus Pollock and other OKF folks: I'm one of the maintainers of Tahoe-LAFS. I'm joining this conversation in order to learn more about use cases that Tahoe-LAFS might serve now or in the future and in order to contribute some of my expertise to OKF in your search for a storage solution. In addition to contributing to Tahoe-LAFS, I also happen to have recent experience using Cassandra at SimpleGeo.com, so of all people in the whole world who have experience with both Tahoe-LAFS and Cassandra, I'm definitely one of them. ;-)
I've reviewed the discussion that Rufus started on the tahoe-dev mailing list nine months ago [1]. Back then I thought that what Rufus was asking for sounded reasonable enough, and much of it seemed definitely doable, but for some of it I wasn't really sure of the details—what specifically was required and if it was a reasonable thing to want or if it was even possible to implement it all. I'm still not entirely sure today, and I'm interested in seeing how some other tools such as MongoDB provide for OKF's needs. If it can, then that example can show me how Tahoe-LAFS can be used likewise. If it can't, then this gives me increased confidence that the original desiderata for the OKF grid were too strong. In this note I'll talk about first encryption and then space accounting. Let's tackle the issue of encryption, because I think it is kind of a red herring and I hope to get it out of the way and concentrate on the really hard issues. Tahoe-LAFS's encryption can be understood as: 1. Create a unique symmetric encryption key for every file, and encrypt that file with it. 2. Embed that encryption key into the file handle for that file. Now I understand that for OKF's purposes all files are supposed to be public. This is a perfectly good policy and it is a use case that Tahoe-LAFS is intended to support. If you want a set of files to be public, you simply make them accessible, such as is done on Tahoe-LAFS's public demo directory, here: http://pubgrid.tahoe-lafs.org/uri/URI%3ADIR2%3Actmtx2awdo4xt77x5xxaz6nyxm%3An5t546ddvd6xlv4v6se6sjympbdbvo7orwizuzl42urm73sxazqa/ I argue that the difference between Tahoe-LAFS and any other distributed storage system is one of degree, not of kind. Tahoe-LAFS makes it easy to make your files private (while hopefully also making it similarly easy to make your files public). Other distributed filesystems make it easy to make your files public, and don't make it easy to make them private. However no distributed filesystem can make it *impossible* for you to encrypt your files before storing them. So, while I admit that it could be a problem that Tahoe-LAFS makes it *easy* to do so—for example people might do so accidentally—any other distributed system could face a similar problem if users were to do so deliberately. In other words, I think of it as more a potential usability issue than a security issue. Usability issues are important and I don't mean to belittle it, but in practice I'm not sure that it would be a big problem. I would want to wait to get empirical evidence in the form of usage reports from the field to learn what sorts of usability issues crop up in practice. Next, let's talk about the "space accounting" issue. This one I definitely understand as being a reasonable thing to want and a thing that could be feasibly implemented. Let's distinguish between two goals: Goal 1: I want to allow users to read (download) files without thereby allowing them to write (upload) them. Goal 2: I want to allow server operators to contribute space on their storage server without thereby allowing them to consume space on other storage servers. Goal 1 is already possible using an HTTP proxy in front of the Tahoe-LAFS gateway. This is already done in practice, as recently discussed on the tahoe-dev list [2]. Goal 2 is much trickier. To allow goal 2, as has been mentioned on this thread, Tahoe-LAFS developers have a plan to add strong distributed space accounting in the future, which plan we haven't made much progress on in the last nine months. What interests me for the OKF grid is: what are the alternatives? From my experience using Cassandra I'm pretty sure that it is even less capable than Tahoe-LAFS is at goal 2, and it can be served up behind an HTTP proxy just as well as Tahoe-LAFS can. I would assume (without knowing much) that the same goes for MongoDB and couchdb and every other system on the planet. :-) So in sum, Tahoe-LAFS already allows goal 1 and is actually used that way in practice, and Tahoe-LAFS might in the future (especially if someone else pitches in and helps) achieve goal 2, which no other current system to my knowledge can offer either. Oh, we should really think about another goal which wasn't explicitly mentioned before but which is probably actually very important: Goal 3: I want to allow server operators to contribute space on their storage server without thereby allowing them to overwrite or delete files on other storage servers. Tahoe-LAFS already offers goal 3, and I'm pretty sure that it is the only system that offers goal 3 and the only one that is likely to in the near future. (I would love to be proven wrong.) Okay, so now that I've sat down and written this letter, it sounds to me like maybe Tahoe-LAFS is a reasonable tool for OKF to move forward with after all. Or at least, it isn't that much more unreasonable than any alternative that I know of. ;-) I'm sorry that I didn't figure this out and write this letter nine months ago when you first asked, but honestly, I was uncertain. In the time that has passed since then I've learned a lot and gotten familiar with Cassandra. It wasn't until I actually wrote this letter that I thought things through in these terms. Regards, Zooko [1] http://tahoe-lafs.org/pipermail/tahoe-dev/2009-June/001985.html [2] http://tahoe-lafs.org/pipermail/tahoe-dev/2010-October/005336.html _______________________________________________ okfn-discuss mailing list [email protected] http://lists.okfn.org/mailman/listinfo/okfn-discuss
