Ravi Pinjala <[email protected]> writes:

> As far as description languages for data allocation go, Ceph has
> already solved this problem - check out the "CRUSH" algorithm.
> Basically, it's a description language for data placement that
> controls replication and data placement, and I think it also lets
> clients figure out which servers a piece of data is on without
> querying them first. IIRC, the code for it is in a separate library
> from the rest of Ceph, so it might be feasible to just put a thin
> python wrapper around it and use it.

That's very interesting - I have been thinking that as one sets up
multiple nodes controlling data placement is important to get the
intended redundancy against physical loss.   But when you then start
thinking about server-controlled rebalancing and migration, it becomes
necessary to be able to express the placement rules programmatically for
evaluation by others, not just have them run on the client.   

I would hope that we could come up with one schema that would satisfy
the needs of 95% of the grids.  One obvious concern, arguably the
primary one, is physical loss/reliability correlation (what ceph seems
to thinking about).  Another is policy; one might have data of a type
that is not permissible to store in some places (e.g., ITAR,
http://en.wikipedia.org/wiki/Data_Protection_Directive).  So far these
two are orthogonal, and perhaps there are more.

Attachment: pgp47yOccd7qu.pgp
Description: PGP signature

_______________________________________________
tahoe-dev mailing list
[email protected]
http://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev

Reply via email to