On Fri, 12 Jun 2009 18:59:08 +0100 Rufus Pollock <[email protected]> wrote:
> First off, I wanted do say a big thank-you for developing Tahoe -- > it's a great piece software serving a really important function. Thanks! > 1. Can you have a "Grid Administrator" (with root-style permissions)? As Kevin pointed out, the answer is generally "no", but you'll want to distinguish "permission to read/write files" from "permission to consume storage space". Tahoe explicitly denies the first sort of permission: to see the contents of a file you must either create it or receive the filecap from someone who already knew it. The second sort of permission is still being developed, but it will derive from whims of the admin of each storage server. Each admin will be free to delegate their grant-space-or-not decisions to some other party, and we expect that a common mode will be for all participants to grant that control to a centralized "account manager", who will then grant access to specific users as desired. > In our setup we want people to be able to "donate" nodes to the grid. > At the same time there needs to be some way to monitor/control what > people upload (the aim is to store open data of general interest not > someone's personal backups or their CD collection) and we also want to > ensure not just anyone can come and delete objects. We should also define "deletion" more carefully than we have been so far. There are two things that you might want to happen when you push the "delete foo.jpg" button in some Tahoe directory. The first is that you remove the link which associates the name "foo.jpg" with some particular filecap. The second is that you might like the space consumed by foo.jpg to be released and made available for other purposes. In Tahoe, the first is implemented by modifying the mutable file which makes up the parent directory. Tahoe directories are just tables of name+filecap, serialized into bytes, and stored in a mutable file. Therefore anyone with a writecap to the mutable file will be able to unlink its children. The reclaim-the-space part is trickier, and we implement it with garbage-collection. The storage servers don't know which directories hold a reference to any given file (because they aren't allowed to read the directories), so the rule is that clients are responsible for updating leases, and servers are supposed to keep a file's shares alive until all of the leases on that share have expired. This is less immediate than an explicit delete operation would be, but it avoids race conditions and removes the danger that the server might delete a file which is still being referenced by some other parent directory. (think reference-counting). So there isn't really an explicit "delete" permission, but write-permission on all of the directories that currently contain a reference to foo.jpg is pretty similar. > What this suggests to us is we want a "Grid Administrator" role with > root style permissions: > > a) A "Grid Administrator" can see all objects (files/directories) > created on the grid. > > b) "Grid Administrator" has full access to all objects (in particular, > can delete them if necessary) > > c) We don't (always) make the writecap world-available. > > As I understand it, to ensure a) we need every node owner to create > objects within a designated root directory (otherwise their > directories and files will be hidden from everyone else on the system > -- as one would want for privacy ...). Yup. If clients voluntarily give their filecaps/dircaps to somebody, then that somebody can do anything they want. > To get (b)+(c) requires that when objects are created on the grid > (which may happen on a local node) that information is automatically > passed to the "Grid Administrator"? AFAICT the only way to achieve > this is to have all users only create objects on the grid via some > central node/api/upload point. Is this correct? Yes. Basically you're looking at hiding the Tahoe grid behind a proxy, and that proxy limits the operations allowed to users: they can't just upload an unlinked file (data->filecap), nor can they just create a new unlinked directory, but they can (upload+link) a file into an existing directory, and they can (mkdir+link) a new subdirectory of an existing directory. You could conceivably give out the readcap and let users download data on their own, without the proxy, but the Tahoe storage server protocol doesn't currently distinguish between read-authority and write-authority, so those users would also be able to upload unlinked files and create unlinked directories, which you want to be able to prevent. As Kevin pointed out, once Accounting is in place (some day..), you'll be able to explicitly control space-consumption as an orthogonal issue to write files you can read or write. You could implement some other schemes on top of this: only give consume-space permission to the proxy, only publish the root directory's readcap. Then clients could read to their hearts content, but no server would give them space to upload new (unlinked) files. The proxy would accept file data from clients, upload them (using its consume-space powers), and link them into the root directory (using its writecap). > 2. How do you control who can join a grid? > > Is there any way to configure my node only to talk to these other > nodes? Given that new nodes join a grid via an introducer I wondered > if there were some way to use the introducer for this function. (E.g. > I have to be a given a token which I pass to the introducer in order > to be "allowed in") The answer depends upon how you'd answer Kevin's question about "why do you want to do this". It's also strongly influenced by the current storage server protocol, which (as described above) doesn't split out upload-shares permission from download-shares permission. One reason to control who can join a grid is so that a storage-server operator can control who gets to consume their disk space. The Accounting project is our plan for this: it doesn't matter who can connect, as long as they can't consume space without some sort of authorization that you control. (our plan for Accounting involves authorized clients holding private DSA keys which correspond to a DSA public keys that's been added to the server's tahoe.cfg). Another reason might be to control who can read certain files. We prefer using the readcap for this: posession of the file/directory's readcap is both necessary and sufficient to retrieve the contents. Two-factor authorization at the file level (knowledge of the readcap PLUS membership in the grid) is harder to delegate and reason about. Yet another reason is to control which servers are used when you (or some other "member" of the grid) uploads a file. For example, you might want your shares to be placed on 100%-genuine high-quality allmydata.com(TM!) servers, not those shabby fly-by-night allmydata.org servers, because you've got limited bandwidth and want to entrust your precious few shares to the most reliable servers :-). So, regardless of who you meet through this Introducer, you only want to use servers that are branded "100% allmydata.com". For this goal, we're working on ticket #466 (signed introducer announcements), and you'll express your preference by putting a DSA pubkey in your tahoe.cfg . Your client will only use servers whose introducer-brokered announcements were signed by the matching privkey. We considered using "secret introducers" to achieve this last goal, but we decided to use signed-announcements instead. The main reason is that we want to switch to gossip-based decentralized introduction at some point, which just wouldn't work with a secret-introducer scheme. Another is that a secret-introducer scheme unnecessarily conflates client access with server access: since clients need to know the secret introducer FURL, they could inject not-100%-allmydata.com server announcements. In the #466 scheme, the introducer (and the client-side code which talks to it) has just one job: distribute a list of likely servers. The introducer is *not* responsible for making value judgements about those servers.. that job is left to the client, who is the one who really cares about it anyways. > 3. Is it ever possible to revoke capabilities. > > For example, if I give you the writecap to directory X is there any > way to rescind that later on (i.e. can I change the writecap for that > directory without deleting it)? Nope, not yet. In the current release, sharing filecaps and dircaps is an irrevocable act. You'd need to introduce some out-of-band mechanism to control access carefully enough to provide strong revocation properties. Note, however, that dircaps reference mutable state, and there's nothing to stop you from emptying out the directory and switching to using a different one instead. The analogy we use is to change your phone number and tell everybody your new one except that annoying guy that you don't want to talk to anymore. They continue to have access to the old empty directory, but nobody else is using it anymore, so who cares? (I suppose the analogy works better if you've left an answering machine on the old number, but never check the messages.. he can talk to himself all he wants, but nobody else is listening). If you're revoking access because you want to keep them from modifying some state that you care about, then the move-and-don't-forward trick works just fine, although it becomes a coordination issue if you've told lots of people (i.e. you have to inform N-1 parties to revoke the remaining 1, and you need some way to explain to them which directory it is you want to replace). But if you're revoking access because you want to prevent them from reading some file that they used to have access to, then it's a race between their decision to read and your decision to revoke. They might have done a "cp -r" the moment you first gave them the dircap, and did it again every ten seconds, and really you've got no way to know whether they've read the files or not. So from a security point of view, the most conservative position to take is that read access is effectively irrevocable: if they've ever had read access, you must assume that they've used it already, so there's no point to taking it back now. (Note that if the storage servers counted share-reads, you might be able to know that nobody had read the file, and that might make revokers feel a bit better, but there would be a lot of false-negatives. You could also ask clients nicely to notify you whenever they read the file, and then feel better if they hadn't done so before you revoked their access, but you'd never really know for sure). Revocation is a complicated topic. As Kevin said, it basically requires an intermediary, which might either be a single proxy/gatekeeper or something distributed (like an intermediate tahoe directory that you can later empty). Any extra layer will hurt availability/reliability in hard-to-model ways (what if the proxy is down? what is the probability two directories are recoverable versus just a single directory?). So we've not yet implemented any sort of revocation. The way I explain it to folks is that Tahoe offers a very clear and understandable access-control model. It might not do everything that what you want, but it's pretty easy to see what it does and does not offer you, and you can use that to make good decisions about how you use the features that it does have. A lot of people want some sort of revocation (at least they think they do), but when you try to nail down exactly what sorts of properties they'd be happy with, so far we always wind up with a scheme that is a lot harder to explain: it might do what you want, but it might not, and it's hard to tell what it does and does not offer you. So we've chosen to err on the conservative/lazy side and defer native Tahoe revocation for a time when we understand it better. cheers, -Brian _______________________________________________ tahoe-dev mailing list [email protected] http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev
