Ravi Pinjala <[email protected]> writes: > As far as description languages for data allocation go, Ceph has > already solved this problem - check out the "CRUSH" algorithm. > Basically, it's a description language for data placement that > controls replication and data placement, and I think it also lets > clients figure out which servers a piece of data is on without > querying them first. IIRC, the code for it is in a separate library > from the rest of Ceph, so it might be feasible to just put a thin > python wrapper around it and use it.
That's very interesting - I have been thinking that as one sets up multiple nodes controlling data placement is important to get the intended redundancy against physical loss. But when you then start thinking about server-controlled rebalancing and migration, it becomes necessary to be able to express the placement rules programmatically for evaluation by others, not just have them run on the client. I would hope that we could come up with one schema that would satisfy the needs of 95% of the grids. One obvious concern, arguably the primary one, is physical loss/reliability correlation (what ceph seems to thinking about). Another is policy; one might have data of a type that is not permissible to store in some places (e.g., ITAR, http://en.wikipedia.org/wiki/Data_Protection_Directive). So far these two are orthogonal, and perhaps there are more.
pgp47yOccd7qu.pgp
Description: PGP signature
_______________________________________________ tahoe-dev mailing list [email protected] http://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev
