Thanks to Kevan Carstensen's help we added Medium-Sized Distributed Mutable Files to the GSoCIdeas list:
http://tahoe-lafs.org/trac/tahoe-lafs/wiki/GSoCIdeas I'm very interested in MDMF nowadays because my current employer, http://simplegeo.com, uses the Cassandra distributed key-value store [1]. I am paying attention to how we use Cassandra and thinking to myself "What would it take for Tahoe-LAFS to support this sort of use case?". I think MDMF is a step on the road to that. Appended below is Kevan's write-up of the MDMF GSoC Idea. Regards, Zooko [1] http://cassandra.apache.org Medium-Sized Distributed Mutable Files (MDMF) ΒΆ Mutable files in Tahoe-LAFS have some significant limitations and performance issues, as discussed in docs/performance.txt. Users who aren't aware of these limitations are surprised when they find out that mutable files can't scale to large sizes without using unacceptable levels of memory, and that reading one byte of the file costs as much as reading the entire file. A fix for this issue would essentially be fixing #393. That is, * Developing mutable files that are segmented on upload, as with immutable files. Part of this would involve making sure that the way we currently ensure the integrity of the parts of mutable files stored on servers is adequate for your new design, and altering it if it isn't. * Implementing efficient reading and writing of arbitrary spans of those mutable files. This would make Tahoe-LAFS less surprising to users, and allow mutable files to be used in more ways than they currently are. If successful enough, this might allow Tahoe-LAFS to support range queries or "graph database"-style access, in the style of the "NoSQL" projects. To learn more about this issue, you should first read docs/performance.txt, so you're familiar with the performance problems with mutable files as currently implemented. You should also look at the file encoding specification, to understand how immutable files are segmented (since you'll be doing something similar with this project). The mutable file specification may be informative as well. The mutable file upload and download code is in mutable, and, for comparison, the immutable file upload and download code is in immutable. _______________________________________________ tahoe-dev mailing list [email protected] http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev
