On 03/24/2010 01:29 PM, Hamilton, Jessica wrote: > > I work at Massey University (New Zealand), and am planning to use our > lab fleet to build a distributed storage grid. I imagine I could > probably get about 100TB out of the system. > > We will have about 1,500 PCs, with somewhere between 400GB and 800GB > of available disk space per machine to utilise for the storage grid. >
That sounds like a fun setup to play with. > > I have read about Chord/DHash which uses DHTs; but it looks like > Tahoe-LAFS isn’t a DHT, and am concerned about nodes leaving/joining > often. Given that you only need 3 nodes out of 10 to reproduce > content, I imagine hardware failures wouldn’t be much of an issue. > Tahoe isn't particularly good with nodes leaving and joining at a high rate. It assumes that you have a relatively stable pool of storage nodes which are all connected together (so the grid is fully connected). When you upload a file, the uploading node chooses a set of nodes out of the available connected set and stores to them. A new node joining the grid needs to get in contact with an introducer machine, which is a single point of failure, but once introduced they nodes can function without it (hm, I guess they get told by the introducer when a new node joins). There are some tickets filed (#295, and #444 for scaling to very large grids) concerning adding a decentralized introduction protocol, and it seems likely to me that implementing it would involve some form of DHT. Whether or not a DHT has a further role in Tahoe's functioning, such as in node selection, is point of debate (I think the current state is "doesn't look like it would work well"). > Also, is there any existing work that can provide an NFS/SMB interface > to Tahoe-LAFS? > Tahoe's mutable files can only be completely rewritten, not incrementally updated in a block-like way; they're also pretty expensive in computation and memory to deal with. So at the moment there's a pretty deep mismatch with SMB/NFS, which would probably have to be dealt with somehow in the NFS/SMB server (a local cache of modified files, for example). A read-only implementation should be much easier, and just come down to how to map Tahoe's metadata to your target protocol. As is the case with a lot of Tahoe, there are a number of proposals for more efficient mutable files (such as Tickets #217 and #393), but I don't know how close any of them are to implementation. > I’m sure much of this is in Trac, there’s just a **lot** of > reading/research/testing to do… :P > There's a lot of stuff in there. Don't overlook the tickets; much of the interesting discussion is in there. J _______________________________________________ tahoe-dev mailing list [email protected] http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev
