Note that the default server configuration is conservative and does not perform any GC at all. You must explicitly enable anything which threatens data safety like GC. When you enable it, you will provide the expiration timeout, which of course determines how quickly your server's users must renew their leases.
Basically, since we have a distributed filesystem (which makes reference counting expensive at best) with least-authority access semantics (which makes it impossible for the servers to count references themselves), the easiest and most reliable approach to deletion is to use leases, timers, and garbage collection. > 3. If a file is listed in a directory then this will lead > automatically to renewal of the relevant leases Nope, not without client involvement. As other folks have pointed out, servers can't read directories (since they don't hold a readcap). So clients (who *do* hold a readcap, or hold one to a parent directory, which gives them transitive access) must be responsible for performing renewals. (if you do a periodic "tahoe deep-check --add-lease ROOT:", then your statement becomes true, because your client is doing the work of finding everything reachable from that directory and renewing the leases, and merely adding/removing a file to any reachable directory is enough to start/stop renewing its lease) Explicitly renewing every single reachable file/directory from the client side is kind of a drag, as it means you (as the client) have a lot of work to do. Part of our designs on Accounting are hoping to reduce this work, but of course there's a funky tradeoff between effort (bandwidth), storage overhead, and safety. The secrets you need to renew (or cancel) a lease are generated in a way that makes them least-authority: there is a short string you could give to someone that would enable them to renew the lease on a single file for you (and not do anything else). Without any format changes (merely new code), we could build something that would let you pass a list of "lease-renewal caps" to a helper service of some sort, and have them renew your leases for you. This was the model we envisioned for the allmydata.com use case, where end users were not expected to have to do work to keep their files alive. The "tahoe manifest" command is part of this. There is also a "master lease renewal secret" which is used to compute the per-share renewal secrets: someone who has that master secret and a file's verifycap can renew the lease on its shares. One idea we've had is to mark all your leases with an account-id of some sort (perhaps just a unique number), and refresh a timer on the account-id instead of on every individual share. Basically you'd be saying "yes, I'm still alive, please keep my files alive too", and sending one message per month instead of one for every file and directory you care about. Lot less bandwidth that way. But that would only reclaim space in large chunks (or never reclaim space at all, if the user sticks around). If you're confident that you can enumerate all the files and directories that you care about, you can periodically compare this manifest against a previous version, and then send out explicit lease-cancel message for the objects that are no longer on the list. (the "cancel lease XYZ" message is the closest we've got to actual server-side deletion). But note that if you get it wrong (perhaps due to race conditions between two machines modifying a shared directory), you could accidentally lose files. Adding one lease per starting point (i.e. per root directory) per walker instance feels like it might avoid the race worries. I suspect that the best approach will be to combine this manifest-delta cancellation thing with the account-id-based timer. If clients are charged/accountable for how much space they're using, then they'll be motivated to make a reasonable-to-them tradeoff between the work done to build/compare these manifests, lease timers, and safety. Another idea (ticket #308) is to change the encryption format of dirnodes to introduce another level into the "writecap -> readcap -> verifycap" hierarchy. The new order would be "writecap -> readcap -> traversalcap -> verifycap", and a traversalcap would give somebody the ability to get the traversalcaps of all children (as well as the verifycap of the dirnode itself). Then, if you gave a hypothetical Lease Renewal Service the traversalcap of your root directory (as well as your master lease-renewing secret), they could renew all of your leases, but couldn't actually read your directories or files. (this requires "semi-private DSA keys", see http://allmydata.org/trac/pycryptopp/ticket/13 for details). You might be willing to reveal the shape of your directory structure to this service, in exchange for letting it take responsibility for your lease-renewal duties. If the service lives network-wise close to the storage servers, it may be considerably faster too. cheers, -Brian _______________________________________________ tahoe-dev mailing list [email protected] http://allmydata.org/cgi-bin/mailman/listinfo/tahoe-dev
