Re: [zfs-discuss] User quota design discussion..

2009-03-16 Thread Robert Milkowski
Hello Jorgen,

If you look at the list archives you will see that it made a huge
difference for some people including me. Now I'm easily able to
saturate GbE linke while zfs send|recv'ing.


-- 
Best regards,
 Robert Milkowski
   http://milek.blogspot.com


Saturday, March 14, 2009, 1:06:40 PM, you wrote:

JL Sorry, did not mean it as a complaints, it just has been for us. But if
JL it has been made faster, that would be excellent. ZFS send is very powerful.

JL Lund


JL Robert Milkowski wrote:
 Hello Jorgen,
 
 Friday, March 13, 2009, 1:14:12 AM, you wrote:
 
 JL That is a good point, I had not even planned to support quotas for ZFS
 JL send, but consider a rescan to be the answer.  We don't ZFS send very 
 JL often as it is far too slow.
 
 Since build 105 it should be *MUCH* for faster.
 
 




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] User quota design discussion..

2009-03-14 Thread Robert Milkowski
Hello Jorgen,

Friday, March 13, 2009, 1:14:12 AM, you wrote:

JL That is a good point, I had not even planned to support quotas for ZFS
JL send, but consider a rescan to be the answer.  We don't ZFS send very 
JL often as it is far too slow.

Since build 105 it should be *MUCH* for faster.


-- 
Best regards,
 Robert Milkowski
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] User quota design discussion..

2009-03-14 Thread Jorgen Lundman
Sorry, did not mean it as a complaints, it just has been for us. But if 
it has been made faster, that would be excellent. ZFS send is very powerful.


Lund


Robert Milkowski wrote:

Hello Jorgen,

Friday, March 13, 2009, 1:14:12 AM, you wrote:

JL That is a good point, I had not even planned to support quotas for ZFS
JL send, but consider a rescan to be the answer.  We don't ZFS send very 
JL often as it is far too slow.


Since build 105 it should be *MUCH* for faster.




--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] User quota design discussion..

2009-03-12 Thread Bob Friesenhahn

On Thu, 12 Mar 2009, Jorgen Lundman wrote:

User-land will then have a daemon, whether or not it is one daemon per 
file-system or really just one daemon does not matter. This process will open 
'/dev/quota' and empty the transaction log entries constantly. Take the 
uid,gid entries and update the byte-count in its database. How we store this 
database is up to us, but since it is in user-land it should have more 
flexibility, and is not as critical to be fast as it would have to be in 
kernel.


In order for this to work, ZFS data blocks need to somehow be 
associated with a POSIX user ID.  To start with, the ZFS POSIX layer 
is implemented on top of a non-POSIX Layer which does not need to know 
about POSIX user IDs.  ZFS also supports snapshots and clones.


The support for snapshots, clones, and potentially non-POSIX data 
storage, results in ZFS data blocks which are owned by multiple users 
at the same time, or multiple users over a period of time spanned by 
multiple snapshots.  If ZFS clones are modified, then files may have 
their ownership changed, while the unmodified data continues to be 
shared with other users.  If a cloned file has its ownership changed, 
then it would be quite tedious to figure out which blocks are now 
wholely owned by the new user, and which blocks are shared with other 
users.  By the time the analysis is complete, it will be wrong.


Before ZFS can apply per-user quota management, it is necessary to 
figure out how individual blocks can be charged to a user.  This seems 
to be a very complex issue and common usage won't work with your 
proposal.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] User quota design discussion..

2009-03-12 Thread Eric Schrock
Note that:

6501037 want user/group quotas on ZFS 

Is already committed to be fixed in build 113 (i.e. in the next month).

- Eric

On Thu, Mar 12, 2009 at 12:04:04PM +0900, Jorgen Lundman wrote:
 
 In the style of a discussion over a beverage, and talking about 
 user-quotas on ZFS, I recently pondered a design for implementing user 
 quotas on ZFS after having far too little sleep.
 
 It is probably nothing new, but I would be curious what you experts 
 think of the feasibility of implementing such a system and/or whether or 
 not it would even realistically work.
 
 I'm not suggesting that someone should do the work, or even that I will, 
 but rather in the interest of chatting about it.
 
 Feel free to ridicule me as required! :)
 
 Thoughts:
 
 Here at work we would like to have user quotas based on uid (and 
 presumably gid) to be able to fully replace the NetApps we run. Current 
 ZFS are not good enough for our situation. We simply can not mount 
 500,000 file-systems on all the NFS clients. Nor do all servers we run 
 support mirror-mounts. Nor do auto-mount see newly created directories 
 without a full remount.
 
 Current UFS-style-user-quotas are very exact. To the byte even. We do 
 not need this precision. If a user has 50MB of quota, and they are able 
 to reach 51MB usage, then that is acceptable to us. Especially since 
 they have to go under 50MB to be able to write new data, anyway.
 
 Instead of having complicated code in the kernel layer, slowing down the 
 file-system with locking and semaphores (and perhaps avoiding learning 
 indepth ZFS code?), I was wondering if a more simplistic setup could be 
 designed, that would still be acceptable. I will use the word 
 'acceptable' a lot. Sorry.
 
 My thoughts are that the ZFS file-system will simply write a 
 'transaction log' on a pipe. By transaction log I mean uid, gid and 
 'byte count changed'. And by pipe I don't necessarily mean pipe(2), but 
 it could be a fifo, pipe or socket. But currently I'm thinking 
 '/dev/quota' style.
 
 User-land will then have a daemon, whether or not it is one daemon per 
 file-system or really just one daemon does not matter. This process will 
 open '/dev/quota' and empty the transaction log entries constantly. Take 
 the uid,gid entries and update the byte-count in its database. How we 
 store this database is up to us, but since it is in user-land it should 
 have more flexibility, and is not as critical to be fast as it would 
 have to be in kernel.
 
 The daemon process can also grow in number of threads as demand increases.
 
 Once a user's quota reaches the limit (note here that /the/ call to 
 write() that goes over the limit will succeed, and probably a couple 
 more after. This is acceptable) the process will blacklist the uid in 
 kernel. Future calls to creat/open(CREAT)/write/(insert list of calls) 
 will be denied. Naturally calls to unlink/read etc should still succeed. 
 If the uid goes under the limit, the uid black-listing will be removed.
 
 If the user-land process crashes or dies, for whatever reason, the 
 buffer of the pipe will grow in the kernel. If the daemon is restarted 
 sufficiently quickly, all is well, it merely needs to catch up. If the 
 pipe does ever get full and items have to be discarded, a full-scan will 
 be required of the file-system. Since even with UFS quotas we need to 
 occasionally run 'quotacheck', it would seem this too, is acceptable (if 
 undesirable).
 
 If you have no daemon process running at all, you have no quotas at all. 
 But the same can be said about quite a few daemons. The administrators 
 need to adjust their usage.
 
 I can see a complication with doing a rescan. How could this be done 
 efficiently? I don't know if there is a neat way to make this happen 
 internally to ZFS, but from a user-land only point of view, perhaps a 
 snapshot could be created (synchronised with the /dev/quota pipe 
 reading?) and start a scan on the snapshot, while still processing 
 kernel log. Once the scan is complete, merge the two sets.
 
 Advantages are that only small hooks are required in ZFS. The byte 
 updates, and the blacklist with checks for being blacklisted.
 
 Disadvantages are that it is loss of precision, and possibly slower 
 rescans? Sanity?
 
 But I do not really know the internals of ZFS, so I might be completely 
 wrong, and everyone is laughing already.
 
 Discuss?
 
 Lund
 
 -- 
 Jorgen Lundman   | lund...@lundman.net
 Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
 Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
 Japan| +81 (0)3 -3375-1767  (home)
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
Eric Schrock, Fishworkshttp://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org

Re: [zfs-discuss] User quota design discussion..

2009-03-12 Thread Blake
That is pretty freaking cool.

On Thu, Mar 12, 2009 at 11:38 AM, Eric Schrock eric.schr...@sun.com wrote:
 Note that:

 6501037 want user/group quotas on ZFS

 Is already committed to be fixed in build 113 (i.e. in the next month).

 - Eric

 On Thu, Mar 12, 2009 at 12:04:04PM +0900, Jorgen Lundman wrote:

 In the style of a discussion over a beverage, and talking about
 user-quotas on ZFS, I recently pondered a design for implementing user
 quotas on ZFS after having far too little sleep.

 It is probably nothing new, but I would be curious what you experts
 think of the feasibility of implementing such a system and/or whether or
 not it would even realistically work.

 I'm not suggesting that someone should do the work, or even that I will,
 but rather in the interest of chatting about it.

 Feel free to ridicule me as required! :)

 Thoughts:

 Here at work we would like to have user quotas based on uid (and
 presumably gid) to be able to fully replace the NetApps we run. Current
 ZFS are not good enough for our situation. We simply can not mount
 500,000 file-systems on all the NFS clients. Nor do all servers we run
 support mirror-mounts. Nor do auto-mount see newly created directories
 without a full remount.

 Current UFS-style-user-quotas are very exact. To the byte even. We do
 not need this precision. If a user has 50MB of quota, and they are able
 to reach 51MB usage, then that is acceptable to us. Especially since
 they have to go under 50MB to be able to write new data, anyway.

 Instead of having complicated code in the kernel layer, slowing down the
 file-system with locking and semaphores (and perhaps avoiding learning
 indepth ZFS code?), I was wondering if a more simplistic setup could be
 designed, that would still be acceptable. I will use the word
 'acceptable' a lot. Sorry.

 My thoughts are that the ZFS file-system will simply write a
 'transaction log' on a pipe. By transaction log I mean uid, gid and
 'byte count changed'. And by pipe I don't necessarily mean pipe(2), but
 it could be a fifo, pipe or socket. But currently I'm thinking
 '/dev/quota' style.

 User-land will then have a daemon, whether or not it is one daemon per
 file-system or really just one daemon does not matter. This process will
 open '/dev/quota' and empty the transaction log entries constantly. Take
 the uid,gid entries and update the byte-count in its database. How we
 store this database is up to us, but since it is in user-land it should
 have more flexibility, and is not as critical to be fast as it would
 have to be in kernel.

 The daemon process can also grow in number of threads as demand increases.

 Once a user's quota reaches the limit (note here that /the/ call to
 write() that goes over the limit will succeed, and probably a couple
 more after. This is acceptable) the process will blacklist the uid in
 kernel. Future calls to creat/open(CREAT)/write/(insert list of calls)
 will be denied. Naturally calls to unlink/read etc should still succeed.
 If the uid goes under the limit, the uid black-listing will be removed.

 If the user-land process crashes or dies, for whatever reason, the
 buffer of the pipe will grow in the kernel. If the daemon is restarted
 sufficiently quickly, all is well, it merely needs to catch up. If the
 pipe does ever get full and items have to be discarded, a full-scan will
 be required of the file-system. Since even with UFS quotas we need to
 occasionally run 'quotacheck', it would seem this too, is acceptable (if
 undesirable).

 If you have no daemon process running at all, you have no quotas at all.
 But the same can be said about quite a few daemons. The administrators
 need to adjust their usage.

 I can see a complication with doing a rescan. How could this be done
 efficiently? I don't know if there is a neat way to make this happen
 internally to ZFS, but from a user-land only point of view, perhaps a
 snapshot could be created (synchronised with the /dev/quota pipe
 reading?) and start a scan on the snapshot, while still processing
 kernel log. Once the scan is complete, merge the two sets.

 Advantages are that only small hooks are required in ZFS. The byte
 updates, and the blacklist with checks for being blacklisted.

 Disadvantages are that it is loss of precision, and possibly slower
 rescans? Sanity?

 But I do not really know the internals of ZFS, so I might be completely
 wrong, and everyone is laughing already.

 Discuss?

 Lund

 --
 Jorgen Lundman       | lund...@lundman.net
 Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
 Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
 Japan                | +81 (0)3 -3375-1767          (home)
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 --
 Eric Schrock, Fishworks                        http://blogs.sun.com/eschrock
 ___
 

Re: [zfs-discuss] User quota design discussion..

2009-03-12 Thread Matthew Ahrens

Jorgen Lundman wrote:


In the style of a discussion over a beverage, and talking about 
user-quotas on ZFS, I recently pondered a design for implementing user 
quotas on ZFS after having far too little sleep.


It is probably nothing new, but I would be curious what you experts 
think of the feasibility of implementing such a system and/or whether or 
not it would even realistically work.


I'm not suggesting that someone should do the work, or even that I will, 
but rather in the interest of chatting about it.


As it turns out, I'm working on zfs user quotas presently, and expect to 
integrate in about a month.  My implementation is in-kernel, integrated with 
the rest of ZFS, and does not have the drawbacks you mention below.



Feel free to ridicule me as required! :)

Thoughts:

Here at work we would like to have user quotas based on uid (and 
presumably gid) to be able to fully replace the NetApps we run. Current 
ZFS are not good enough for our situation. We simply can not mount 
500,000 file-systems on all the NFS clients. Nor do all servers we run 
support mirror-mounts. Nor do auto-mount see newly created directories 
without a full remount.


Current UFS-style-user-quotas are very exact. To the byte even. We do 
not need this precision. If a user has 50MB of quota, and they are able 
to reach 51MB usage, then that is acceptable to us. Especially since 
they have to go under 50MB to be able to write new data, anyway.


Good, that's the behavior that user quotas will have -- delayed enforcement.

Instead of having complicated code in the kernel layer, slowing down the 
file-system with locking and semaphores (and perhaps avoiding learning 
indepth ZFS code?), I was wondering if a more simplistic setup could be 
designed, that would still be acceptable. I will use the word 
'acceptable' a lot. Sorry.


My thoughts are that the ZFS file-system will simply write a 
'transaction log' on a pipe. By transaction log I mean uid, gid and 
'byte count changed'. And by pipe I don't necessarily mean pipe(2), but 
it could be a fifo, pipe or socket. But currently I'm thinking 
'/dev/quota' style.


User-land will then have a daemon, whether or not it is one daemon per 
file-system or really just one daemon does not matter. This process will 
open '/dev/quota' and empty the transaction log entries constantly. Take 
the uid,gid entries and update the byte-count in its database. How we 
store this database is up to us, but since it is in user-land it should 
have more flexibility, and is not as critical to be fast as it would 
have to be in kernel.


The daemon process can also grow in number of threads as demand increases.

Once a user's quota reaches the limit (note here that /the/ call to 
write() that goes over the limit will succeed, and probably a couple 
more after. This is acceptable) the process will blacklist the uid in 
kernel. Future calls to creat/open(CREAT)/write/(insert list of calls) 
will be denied. Naturally calls to unlink/read etc should still succeed. 
If the uid goes under the limit, the uid black-listing will be removed.


If the user-land process crashes or dies, for whatever reason, the 
buffer of the pipe will grow in the kernel. If the daemon is restarted 
sufficiently quickly, all is well, it merely needs to catch up. If the 
pipe does ever get full and items have to be discarded, a full-scan will 
be required of the file-system. Since even with UFS quotas we need to 
occasionally run 'quotacheck', it would seem this too, is acceptable (if 
undesirable).


My implementation does not have this drawback.  Note that you would need to 
use the recovery mechanism in the case of a system crash / power loss as 
well.  Adding potentially hours to the crash recovery time is not acceptable.


If you have no daemon process running at all, you have no quotas at all. 
But the same can be said about quite a few daemons. The administrators 
need to adjust their usage.


I can see a complication with doing a rescan. How could this be done 
efficiently? I don't know if there is a neat way to make this happen 
internally to ZFS, but from a user-land only point of view, perhaps a 
snapshot could be created (synchronised with the /dev/quota pipe 
reading?) and start a scan on the snapshot, while still processing 
kernel log. Once the scan is complete, merge the two sets.


Advantages are that only small hooks are required in ZFS. The byte 
updates, and the blacklist with checks for being blacklisted.


Disadvantages are that it is loss of precision, and possibly slower 
rescans? Sanity?


Not to mention that this information needs to get stored somewhere, and dealt 
with when you zfs send the fs to another system.


But I do not really know the internals of ZFS, so I might be completely 
wrong, and everyone is laughing already.


Discuss?


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org

Re: [zfs-discuss] User quota design discussion..

2009-03-12 Thread Tomas Ögren
On 12 March, 2009 - Matthew Ahrens sent me these 5,0K bytes:

 Jorgen Lundman wrote:

 In the style of a discussion over a beverage, and talking about  
 user-quotas on ZFS, I recently pondered a design for implementing user  
 quotas on ZFS after having far too little sleep.

 It is probably nothing new, but I would be curious what you experts  
 think of the feasibility of implementing such a system and/or whether 
 or not it would even realistically work.

 I'm not suggesting that someone should do the work, or even that I 
 will, but rather in the interest of chatting about it.

 As it turns out, I'm working on zfs user quotas presently, and expect to  
 integrate in about a month.  My implementation is in-kernel, integrated 
 with the rest of ZFS, and does not have the drawbacks you mention below.

Is there any chance of this getting into S10?

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] User quota design discussion..

2009-03-12 Thread Matthew Ahrens

Bob Friesenhahn wrote:

On Thu, 12 Mar 2009, Jorgen Lundman wrote:

User-land will then have a daemon, whether or not it is one daemon per 
file-system or really just one daemon does not matter. This process 
will open '/dev/quota' and empty the transaction log entries 
constantly. Take the uid,gid entries and update the byte-count in its 
database. How we store this database is up to us, but since it is in 
user-land it should have more flexibility, and is not as critical to 
be fast as it would have to be in kernel.


In order for this to work, ZFS data blocks need to somehow be associated 
with a POSIX user ID.  To start with, the ZFS POSIX layer is implemented 
on top of a non-POSIX Layer which does not need to know about POSIX user 
IDs.  ZFS also supports snapshots and clones.


Yes, the DMU needs to communicate with the ZPL to determine the uid  gid to 
charge each file to.  This is done using a callback.


The support for snapshots, clones, and potentially non-POSIX data 
storage, results in ZFS data blocks which are owned by multiple users at 
the same time, or multiple users over a period of time spanned by 
multiple snapshots.  If ZFS clones are modified, then files may have 
their ownership changed, while the unmodified data continues to be 
shared with other users.  If a cloned file has its ownership changed, 
then it would be quite tedious to figure out which blocks are now 
wholely owned by the new user, and which blocks are shared with other 
users.  By the time the analysis is complete, it will be wrong.


Before ZFS can apply per-user quota management, it is necessary to 
figure out how individual blocks can be charged to a user.  This seems 
to be a very complex issue and common usage won't work with your proposal.


Indeed.  We have decided to charge for referenced space.  This is the same 
concept used by the referenced, refquota, and refreservation 
properties, and reported by stat(2) in st_blocks, and du(1) on files today.


This makes the issue much simpler.  We don't need to worry about blocks being 
shared between clones or snapshots, because we charge for every time a block 
is referenced.  When a clone is created, it starts with the same user 
accounting information as its origin snapshot.


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] User quota design discussion..

2009-03-12 Thread Jorgen Lundman



Bob Friesenhahn wrote:
In order for this to work, ZFS data blocks need to somehow be associated 
with a POSIX user ID.  To start with, the ZFS POSIX layer is implemented 
on top of a non-POSIX Layer which does not need to know about POSIX user 
IDs.  ZFS also supports snapshots and clones.


This I did not know, but now that you point it out, this would be the 
right way to design it. So the advantage of requiring less ZFS 
integration is no longer the case.


Lund

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] User quota design discussion..

2009-03-12 Thread Jorgen Lundman



Eric Schrock wrote:

Note that:

6501037 want user/group quotas on ZFS 


Is already committed to be fixed in build 113 (i.e. in the next month).

- Eric


Wow, that would be fantastic. We have the Sun vendors camped out at the 
data center trying to apply fresh patches. I believe 6798540 fixed the 
largest issue but it would be desirable to be able to use just ZFS.


Is this a project needing donations? I see your address is at Sun.com, 
and we already have 9 x4500s, but maybe you need some pocky, asse, 
collon or pocari sweat...



Lundy


[1]
BugID:6798540
 3-way deadlock happens in ufs filesystem on zvol when writng ufs log

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] User quota design discussion..

2009-03-12 Thread Jorgen Lundman




As it turns out, I'm working on zfs user quotas presently, and expect to 
integrate in about a month.  My implementation is in-kernel, integrated 
with the rest of ZFS, and does not have the drawbacks you mention below.


I merely suggested my design as it may have been something I _could_ 
have implemented, as it required little ZFS knowledge. (Adding hooks is 
usually easier). But naturally that has already been shown not to be 
the case.


A proper implementation is always going to be much more desirable :)





Good, that's the behavior that user quotas will have -- delayed 
enforcement.


There probably are situations where precision is required, or perhaps 
historical reasons, but for us a delayed enforcement may even be better.


Perhaps it would be better for the delivery of an email message that 
goes over the quota, to be allowed to complete writing the entire 
message. Than it is to abort a write() call somewhere in the middle, and 
return failures all the way back to generating a bounce message. Maybe.. 
can't say I have thought about it.




My implementation does not have this drawback.  Note that you would need 
to use the recovery mechanism in the case of a system crash / power loss 
as well.  Adding potentially hours to the crash recovery time is not 
acceptable.


Great! Will there be any particular limits on how many uids, or size of 
uids in your implementation? UFS generally does not, but I did note that 
if uid go over 1000 it flips out and changes the quotas file to 
128GB in size.



Not to mention that this information needs to get stored somewhere, and 
dealt with when you zfs send the fs to another system.


That is a good point, I had not even planned to support quotas for ZFS 
send, but consider a rescan to be the answer.  We don't ZFS send very 
often as it is far too slow.


Lund

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] User quota design discussion..

2009-03-12 Thread Matthew Ahrens

Jorgen Lundman wrote:
Great! Will there be any particular limits on how many uids, or size of 
uids in your implementation? UFS generally does not, but I did note that 
if uid go over 1000 it flips out and changes the quotas file to 
128GB in size.


All UIDs, as well as SIDs (from the SMB server), are permitted.  Any number 
of users and quotas are permitted, and handled efficiently.  Note, UID on 
Solaris is a 31-bit number.


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] User quota design discussion..

2009-03-11 Thread Jorgen Lundman


In the style of a discussion over a beverage, and talking about 
user-quotas on ZFS, I recently pondered a design for implementing user 
quotas on ZFS after having far too little sleep.


It is probably nothing new, but I would be curious what you experts 
think of the feasibility of implementing such a system and/or whether or 
not it would even realistically work.


I'm not suggesting that someone should do the work, or even that I will, 
but rather in the interest of chatting about it.


Feel free to ridicule me as required! :)

Thoughts:

Here at work we would like to have user quotas based on uid (and 
presumably gid) to be able to fully replace the NetApps we run. Current 
ZFS are not good enough for our situation. We simply can not mount 
500,000 file-systems on all the NFS clients. Nor do all servers we run 
support mirror-mounts. Nor do auto-mount see newly created directories 
without a full remount.


Current UFS-style-user-quotas are very exact. To the byte even. We do 
not need this precision. If a user has 50MB of quota, and they are able 
to reach 51MB usage, then that is acceptable to us. Especially since 
they have to go under 50MB to be able to write new data, anyway.


Instead of having complicated code in the kernel layer, slowing down the 
file-system with locking and semaphores (and perhaps avoiding learning 
indepth ZFS code?), I was wondering if a more simplistic setup could be 
designed, that would still be acceptable. I will use the word 
'acceptable' a lot. Sorry.


My thoughts are that the ZFS file-system will simply write a 
'transaction log' on a pipe. By transaction log I mean uid, gid and 
'byte count changed'. And by pipe I don't necessarily mean pipe(2), but 
it could be a fifo, pipe or socket. But currently I'm thinking 
'/dev/quota' style.


User-land will then have a daemon, whether or not it is one daemon per 
file-system or really just one daemon does not matter. This process will 
open '/dev/quota' and empty the transaction log entries constantly. Take 
the uid,gid entries and update the byte-count in its database. How we 
store this database is up to us, but since it is in user-land it should 
have more flexibility, and is not as critical to be fast as it would 
have to be in kernel.


The daemon process can also grow in number of threads as demand increases.

Once a user's quota reaches the limit (note here that /the/ call to 
write() that goes over the limit will succeed, and probably a couple 
more after. This is acceptable) the process will blacklist the uid in 
kernel. Future calls to creat/open(CREAT)/write/(insert list of calls) 
will be denied. Naturally calls to unlink/read etc should still succeed. 
If the uid goes under the limit, the uid black-listing will be removed.


If the user-land process crashes or dies, for whatever reason, the 
buffer of the pipe will grow in the kernel. If the daemon is restarted 
sufficiently quickly, all is well, it merely needs to catch up. If the 
pipe does ever get full and items have to be discarded, a full-scan will 
be required of the file-system. Since even with UFS quotas we need to 
occasionally run 'quotacheck', it would seem this too, is acceptable (if 
undesirable).


If you have no daemon process running at all, you have no quotas at all. 
But the same can be said about quite a few daemons. The administrators 
need to adjust their usage.


I can see a complication with doing a rescan. How could this be done 
efficiently? I don't know if there is a neat way to make this happen 
internally to ZFS, but from a user-land only point of view, perhaps a 
snapshot could be created (synchronised with the /dev/quota pipe 
reading?) and start a scan on the snapshot, while still processing 
kernel log. Once the scan is complete, merge the two sets.


Advantages are that only small hooks are required in ZFS. The byte 
updates, and the blacklist with checks for being blacklisted.


Disadvantages are that it is loss of precision, and possibly slower 
rescans? Sanity?


But I do not really know the internals of ZFS, so I might be completely 
wrong, and everyone is laughing already.


Discuss?

Lund

--
Jorgen Lundman   | lund...@lundman.net
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss