Re: [zfs-discuss] User quota design discussion..

Eric Schrock Thu, 12 Mar 2009 08:55:02 -0700

Note that:

6501037 want user/group quotas on ZFS


Is already committed to be fixed in build 113 (i.e. in the next month).

- Eric

On Thu, Mar 12, 2009 at 12:04:04PM +0900, Jorgen Lundman wrote:
> 
> In the style of a discussion over a beverage, and talking about 
> user-quotas on ZFS, I recently pondered a design for implementing user 
> quotas on ZFS after having far too little sleep.
> 
> It is probably nothing new, but I would be curious what you experts 
> think of the feasibility of implementing such a system and/or whether or 
> not it would even realistically work.
> 
> I'm not suggesting that someone should do the work, or even that I will, 
> but rather in the interest of chatting about it.
> 
> Feel free to ridicule me as required! :)
> 
> Thoughts:
> 
> Here at work we would like to have user quotas based on uid (and 
> presumably gid) to be able to fully replace the NetApps we run. Current 
> ZFS are not good enough for our situation. We simply can not mount 
> 500,000 file-systems on all the NFS clients. Nor do all servers we run 
> support mirror-mounts. Nor do auto-mount see newly created directories 
> without a full remount.
> 
> Current UFS-style-user-quotas are very exact. To the byte even. We do 
> not need this precision. If a user has 50MB of quota, and they are able 
> to reach 51MB usage, then that is acceptable to us. Especially since 
> they have to go under 50MB to be able to write new data, anyway.
> 
> Instead of having complicated code in the kernel layer, slowing down the 
> file-system with locking and semaphores (and perhaps avoiding learning 
> indepth ZFS code?), I was wondering if a more simplistic setup could be 
> designed, that would still be acceptable. I will use the word 
> 'acceptable' a lot. Sorry.
> 
> My thoughts are that the ZFS file-system will simply write a 
> 'transaction log' on a pipe. By transaction log I mean uid, gid and 
> 'byte count changed'. And by pipe I don't necessarily mean pipe(2), but 
> it could be a fifo, pipe or socket. But currently I'm thinking 
> '/dev/quota' style.
> 
> User-land will then have a daemon, whether or not it is one daemon per 
> file-system or really just one daemon does not matter. This process will 
> open '/dev/quota' and empty the transaction log entries constantly. Take 
> the uid,gid entries and update the byte-count in its database. How we 
> store this database is up to us, but since it is in user-land it should 
> have more flexibility, and is not as critical to be fast as it would 
> have to be in kernel.
> 
> The daemon process can also grow in number of threads as demand increases.
> 
> Once a user's quota reaches the limit (note here that /the/ call to 
> write() that goes over the limit will succeed, and probably a couple 
> more after. This is acceptable) the process will "blacklist" the uid in 
> kernel. Future calls to creat/open(CREAT)/write/(insert list of calls) 
> will be denied. Naturally calls to unlink/read etc should still succeed. 
> If the uid goes under the limit, the uid black-listing will be removed.
> 
> If the user-land process crashes or dies, for whatever reason, the 
> buffer of the pipe will grow in the kernel. If the daemon is restarted 
> sufficiently quickly, all is well, it merely needs to catch up. If the 
> pipe does ever get full and items have to be discarded, a full-scan will 
> be required of the file-system. Since even with UFS quotas we need to 
> occasionally run 'quotacheck', it would seem this too, is acceptable (if 
> undesirable).
> 
> If you have no daemon process running at all, you have no quotas at all. 
> But the same can be said about quite a few daemons. The administrators 
> need to adjust their usage.
> 
> I can see a complication with doing a rescan. How could this be done 
> efficiently? I don't know if there is a neat way to make this happen 
> internally to ZFS, but from a user-land only point of view, perhaps a 
> snapshot could be created (synchronised with the /dev/quota pipe 
> reading?) and start a scan on the snapshot, while still processing 
> kernel log. Once the scan is complete, merge the two sets.
> 
> Advantages are that only small hooks are required in ZFS. The byte 
> updates, and the blacklist with checks for being blacklisted.
> 
> Disadvantages are that it is loss of precision, and possibly slower 
> rescans? Sanity?
> 
> But I do not really know the internals of ZFS, so I might be completely 
> wrong, and everyone is laughing already.
> 
> Discuss?
> 
> Lund
> 
> -- 
> Jorgen Lundman       | <lund...@lundman.net>
> Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
> Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
> Japan                | +81 (0)3 -3375-1767          (home)
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
Eric Schrock, Fishworks                        http://blogs.sun.com/eschrock
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] User quota design discussion..

Reply via email to