Re: dtrace ioctls
On Wed, Oct 19, 2011 at 10:22:08PM +, David Holland wrote: On Wed, Oct 19, 2011 at 10:01:33PM +0100, David Laight wrote: Hmmm... the sun code is passing the structure by value Is it? The non-sun code appears to be calling an ioctl that's defined to take a pointer to a pointer to a structure. Or maybe I'm totally misreading ioccom.h? Maybe I was asleep David -- David Laight: da...@l8s.co.uk
Re: fs-independent quotas
On Wed, Oct 19, 2011 at 06:09:27PM +, David Holland wrote: support to other filesystems (tempfs, perhaps v7fs) or even add other filesystems that have or may have their own native quota handling (zfs, Hammer, you name it). zfs - does it really have quota? All the demos I've seen talk about sub-filesystem limits; you create per-user sub-filesystems if you want to emulate per-user quota. (Correct me if I'm wrong.) How would this fit in, if at all? -is
Re: fs-independent quotas
On Wed, Oct 19, 2011 at 10:20:23PM +, David Holland wrote: On Wed, Oct 19, 2011 at 09:22:02PM +0200, Manuel Bouyer wrote: So, a few months back we got a new improved quota format for FFS. Unfortunately, one of the side effects of this was to sprinkle specific knowledge of the new format through all the userlevel quota tools and quota support logic. To be fair, this was alongside the existing specific knowledge of the old quota format; nonetheless, it's messy and unscalable. of course there's been changes to the tools, as there's a new format. The tools ought to be format-independent. I can't parse this, can you explain ? The tools needs to be aware of the format to do something usefull with the data, isn't it ? We may want to add more quota formats (e.g. the different and incompatible new quota format FreeBSD added last year) or add quota support to other filesystems (tempfs, perhaps v7fs) or even add other filesystems that have or may have their own native quota handling (zfs, Hammer, you name it). Also, my planned lfs-renovation is currently hung up on the VFS-level quota interface, because I don't want to rip out the existing maybe-partial support for quotas but can't plug new code into the existing framework. You'll have to explain this. lfs is some variant of ffs, I see no reasons why it coudln't use the new format. It could use whatever format it wants. To the extent it currently supports quotas, I think it's limited to the old-style quotas, that is, quota1. But there's no way to plug it in without taking the fs-dependent code currently in all the tools and access pathway and making a third or perhaps a third and fourth copy of all the logic. that's plain wrong. If it's quota1 you can use the quota1 code in sys/ufs/ufs (just as it would have done before quota2). Likewise, if I were to go add quota support to v7fs, or try to hook up whatever quota support zfs has, or commit Hammer and try to get whatever quota support *it* has working, or add ext2 quota support, or write a new fs with quota support, or whatever, I'd have to make still more copies of the logic to cope with all the different formats and layouts. Of course if you have new on-disk format you need to do some conversion, whatever filesystem independant format you use. But I think you could still reuse sys/ufs/ufs/quota2_subr.c to do the convertion from plist to some binary representation. This is not a good idea, not scalable, and not sensible, especially when a filesystem-independent (read format-independent if you like) interface is both perfectly possible and simpler. I strongly believe the plist representation is format-independent. It has exactly the same informations as what you propose. in fact the new format is fs-independant. Yes, in the sense that one could add the format to other file systems; but no, in the sense that other file systems already have their own quota formats and we need to be able to interoperate. You have to do some convertion, of the same level as with what you propose. But this is just what the current propib format is ! a set of tables with key/values pair ! That's great, that'll make the changes I need to make that much easier. But it doesn't seem particularly familiar relative to the code I've been working on. Or maybe you don't need to change it at all. the quota *type* - the quota value is: the configured hard limit the configured soft limit the configured grace period the current usage the current grace expiry time (if any) This is exactly the format described in quotactl(2). No, what's described in quotactl(2) is something about commands and arguments... and while there is a substructure that looks something like this, the fact remains that it's a *sub*structure Yes, but you still need a way to pass commands. You didn't talk about this. and the schema is not tabular. I don't understant what you mean here. there's a set of values associated with an id, I can't see the difference with what your proposing. The quota *class* is the thing the quota is imposed on; this is currently either user or group. There is no likely prospect of additional quota classes appearing. I don't think we should limit ourselve to these class. I could see per-host or per-hostgroup quotas for networked filesystems for example. I'm not limiting it to anything, but I'll believe in more quota classes when I see them. Per-host quotas (even if they make sense, which I question) aren't going to work very well with a 32-bit id, for example. right, that's where a plist is a win. Whereas, as I pointed out before, there are filesystems in the field with more than two quota types. The current format has no limitations in this area. class idtypehard
proposed additions to sys/conf/std
I propose adding pseudo-device drvctl and/or options BUFQ_PRIOCSCAN to src/sys/conf/std. The reasons I even bring this up: - Many kernels are missing drvctl and thus do not support disk wedges (this is arguably due to a flaw in the design of disk wedges, but that's a another bikeshed). - BUFQ_PRIOCSCAN is superior to BUFQ_DISKSORT, and in fact BUFQ_DISKSORT is actually inferior to BUFQ_FCFS in terms of interactive disk I/O responsiveness. There are many kernels that default to BUFQ_DISKSORT due to not explicitly adding BUFQ_PRIOCSCAN. The ominous # it's commonly used is NOT a good reason to enable options here. line has me a bit apprehensive. However, pseudo-device cpuctl is there already. There are some options that are there for historical reasons, so this is sort of a slippery slope. Do we need a new config file for standard-but-optional-options? Jonathan Kollasch
Re: fs-independent quotas
On Thu, Oct 20, 2011 at 06:43:47AM +0200, Emmanuel Dreyfus wrote: It seems to me that quotas are fundamentally a special-purpose key/value store; that is, you look up quota information for a particular thing (the key) and get back the quota settings and current usage information (the value). If you are going to add a generic key/value store mechanism for all filesystems, you can consider fs-independent extended attrbiutes as well. I am not adding a generic key/value store mechanism. I am representing the quota data as a specific key/value store. A generic key/value store mechanism for all filesystems would be a very large, messy, and semantically nebulous project... -- David A. Holland dholl...@netbsd.org
Re: fs-independent quotas
On Thu, Oct 20, 2011 at 11:57:04AM +0200, Ignatios Souvatzis wrote: support to other filesystems (tempfs, perhaps v7fs) or even add other filesystems that have or may have their own native quota handling (zfs, Hammer, you name it). zfs - does it really have quota? I don't know... but if not, there are plenty of other fses. All the demos I've seen talk about sub-filesystem limits; you create per-user sub-filesystems if you want to emulate per-user quota. (Correct me if I'm wrong.) How would this fit in, if at all? That's a good question. My first instinct is that like the other stuff zfs does that it does in its own semantically-incompatible way, it would require its own tools. But I guess the quota system could be made to report the limits if the sub-filesystems are specifically assigned to users somehow. Or something like that... -- David A. Holland dholl...@netbsd.org
Re: fs-independent quotas
On Thu, Oct 20, 2011 at 05:23:14PM +0200, Manuel Bouyer wrote: That's way more complicated than necessary. Think of it as like VOP_READDIR - you get passed a position, you send back some number of items, and update the position. Depending on how the data are stored on disk, the notion of position (which also implies some ordering) can be difficult to handle, especially if the data we're reading can change between two calls, causing the position do become invalid. ...yes, but this is just one of those things you have to cope with when doing filesystems. It's no different from readdir in that regard. It's certainly less trouble to send back to userland the whole set of data - especially if what userland wants is the whole set of data (I can't see what a partial read of quota would be usefull for). No, no it really isn't. Suppose there are, say, 50,000 users, so to send back the whole works you have to accumulate 100,000 quota entries in a gigantic blob... a machine with 50,000 users will have enough RAM for this but that doesn't mean that allocating a contiguous chunk of kernel memory that large is easy or desirable. Far better to read it out a couple hundred at a time. There are two design truisms for database stuff that apply here: first, you always end up wanting cursors, and second, you always end up wanting bulk get (and not just single get) from those cursors. So it's usually a good idea to anticipate this and design it all in up front. The reason to wrap the position in a cursor abstraction is to allow flexibility about how the position is represented. But then the cursor would still be stored in userland ? That's the idea, like reading a file with pread(). I think the kernel should know, or at least be able to know, how many cursors are currently open; but I don't think there's any need to keep the cursor state itself in the kernel. -- David A. Holland dholl...@netbsd.org
Re: fs-independent quotas
On Thu, Oct 20, 2011 at 03:47:26PM +, David Holland wrote: On Thu, Oct 20, 2011 at 05:23:14PM +0200, Manuel Bouyer wrote: That's way more complicated than necessary. Think of it as like VOP_READDIR - you get passed a position, you send back some number of items, and update the position. Depending on how the data are stored on disk, the notion of position (which also implies some ordering) can be difficult to handle, especially if the data we're reading can change between two calls, causing the position do become invalid. ...yes, but this is just one of those things you have to cope with when doing filesystems. It's no different from readdir in that regard. It's certainly less trouble to send back to userland the whole set of data - especially if what userland wants is the whole set of data (I can't see what a partial read of quota would be usefull for). No, no it really isn't. Suppose there are, say, 50,000 users, so to send back the whole works you have to accumulate 100,000 quota entries in a gigantic blob... a machine with 50,000 users will have enough RAM for this but that doesn't mean that allocating a contiguous chunk of kernel memory that large is easy or desirable. Far better to read it out a couple hundred at a time. We're talking a few MB of ram here, isn't it ? the kernel can certainly allocate this without troubles (other subsystems do). There are two design truisms for database stuff that apply here: first, you always end up wanting cursors, and second, you always end up wanting bulk get (and not just single get) from those cursors. So it's usually a good idea to anticipate this and design it all in up front. Maybe ... I know that in the end I want the whole set of data and not just a part of it. But if you believe it's needed this can easily be added to the existing quotactl(2) (it would just be a new command). The reason to wrap the position in a cursor abstraction is to allow flexibility about how the position is represented. But then the cursor would still be stored in userland ? That's the idea, like reading a file with pread(). I think the kernel should know, or at least be able to know, how many cursors are currently open; but I don't think there's any need to keep the cursor state itself in the kernel. So you want a quotaopen/quotaclose, with a file descriptor (or something similar) ? -- Manuel Bouyer bou...@antioche.eu.org NetBSD: 26 ans d'experience feront toujours la difference --
Re: fs-independent quotas
On Thu, Oct 20, 2011 at 06:00:28PM +0200, Manuel Bouyer wrote: It's certainly less trouble to send back to userland the whole set of data - especially if what userland wants is the whole set of data (I can't see what a partial read of quota would be usefull for). No, no it really isn't. Suppose there are, say, 50,000 users, so to send back the whole works you have to accumulate 100,000 quota entries in a gigantic blob... a machine with 50,000 users will have enough RAM for this but that doesn't mean that allocating a contiguous chunk of kernel memory that large is easy or desirable. Far better to read it out a couple hundred at a time. We're talking a few MB of ram here, isn't it ? the kernel can certainly allocate this without troubles (other subsystems do). The proplib'd and XMLified complete dump for 50,000 users will probably make a blob of between 10 and 20 MB. (Note: this is an estimate; I haven't checked the size by trying it. It might be larger. I'd be surprised if it were much smaller.) I don't see why it's desirable to manifest such large objects when it's easily avoidable. There are two design truisms for database stuff that apply here: first, you always end up wanting cursors, and second, you always end up wanting bulk get (and not just single get) from those cursors. So it's usually a good idea to anticipate this and design it all in up front. Maybe ... I know that in the end I want the whole set of data and not just a part of it. Yes, probably. The cursor API I've floated so far is not general enough to support much else. Although it could be made more general. But if you believe it's needed this can easily be added to the existing quotactl(2) (it would just be a new command). Yes, perhaps it could... but why? What's to be gained by using a baroque proplib encoding of what can otherwise be handled as an array of simple structs? I remember asking this question when you first proposed the proplib interface last spring, and never really got a clear answer. The reason to wrap the position in a cursor abstraction is to allow flexibility about how the position is represented. But then the cursor would still be stored in userland ? That's the idea, like reading a file with pread(). I think the kernel should know, or at least be able to know, how many cursors are currently open; but I don't think there's any need to keep the cursor state itself in the kernel. So you want a quotaopen/quotaclose, with a file descriptor (or something similar) ? The proposed API already has explicit open and close for cursors; what I'm saying is that this should be exposed to the kernel. (Open already has to be, to initialize the cursor position; close should be, so the filesystem can if necessary know if there are cursors open at any given time. Otherwise you can get into trouble; see for example nfsd and readdir.) -- David A. Holland dholl...@netbsd.org
Re: fs-independent quotas
On Thu, Oct 20, 2011 at 04:39:21PM +, David Holland wrote: We're talking a few MB of ram here, isn't it ? the kernel can certainly allocate this without troubles (other subsystems do). The proplib'd and XMLified complete dump for 50,000 users will probably make a blob of between 10 and 20 MB. (Note: this is an estimate; I haven't checked the size by trying it. It might be larger. I'd be surprised if it were much smaller.) I tested with a few 10s or users; my estimate is about 35MB for 50k users. I don't see why it's desirable to manifest such large objects when it's easily avoidable. We don't agree on easily. There are two design truisms for database stuff that apply here: first, you always end up wanting cursors, and second, you always end up wanting bulk get (and not just single get) from those cursors. So it's usually a good idea to anticipate this and design it all in up front. Maybe ... I know that in the end I want the whole set of data and not just a part of it. Yes, probably. The cursor API I've floated so far is not general enough to support much else. Although it could be made more general. But if you believe it's needed this can easily be added to the existing quotactl(2) (it would just be a new command). Yes, perhaps it could... but why? What's to be gained by using a baroque proplib encoding of what can otherwise be handled as an array of simple structs? it's an easily machine-parsable text. That's probably the reason why it's used in other parts of the kernel too. I remember asking this question when you first proposed the proplib interface last spring, and never really got a clear answer. I see it as being the common format used for non-performance-critical kernel/userland communication. It has been adopted by other kernel subsystems, there's prior art there. The reason to wrap the position in a cursor abstraction is to allow flexibility about how the position is represented. But then the cursor would still be stored in userland ? That's the idea, like reading a file with pread(). I think the kernel should know, or at least be able to know, how many cursors are currently open; but I don't think there's any need to keep the cursor state itself in the kernel. So you want a quotaopen/quotaclose, with a file descriptor (or something similar) ? The proposed API already has explicit open and close for cursors; what I'm saying is that this should be exposed to the kernel. (Open already has to be, to initialize the cursor position; close should be, so the filesystem can if necessary know if there are cursors open at any given time. Otherwise you can get into trouble; see for example nfsd and readdir.) So you're close to have something like a file descriptor. -- Manuel Bouyer bou...@antioche.eu.org NetBSD: 26 ans d'experience feront toujours la difference --
Re: fs-independent quotas
On Thu, Oct 20, 2011 at 05:35:16PM +, David Holland wrote: I can't parse this, can you explain ? The tools needs to be aware of the format to do something usefull with the data, isn't it ? The tools can and should work with a filesystem-independent abstract schema. This should be independent of any filesystem's on-disk quota format, just as the dirent.h structures are independent of any filesystem's on-disk directory layout. the current proplib-based schema is independant of the on-disk format (as it's just another representation of the same set of data that you proposed). that's plain wrong. If it's quota1 you can use the quota1 code in sys/ufs/ufs (just as it would have done before quota2). No, it is not wrong. It cannot use the quota1 code in ufs; the whole premise of the proposed lfs renovation is to unhook lfs from ufs. The ufs code is a big blob, not a library of components; you can't just use parts of it, or at least not easily. I can copy the ufs quota1 structures and some of the ufs quota code, yes; but then I have struct lfs_dqblk, and I need to interface it to the rest of the system, and as things currently stand that forces me to clone all the ffs-quota1-specific quota code all over everywhere. So, if I understand you properly, your lfs code won't use the quota1 on-disk format but some new format based on a lfs_dqblk structure. Then it's a brand new disk format, the right thing to do is to use the convertion functions from common/lib/libquota/ (as the ufs/quota1 and ufs/quota2 code already do) and convert from here to your on-disk format. You can't claim a data representation isn't filesystem-independant because it doesn't correspond to you on-disk representation. As it's filesystem-independant it has (by definition) to be converted to every on-disk representation. The lfs/ufs split would have been committed ages ago if the quota system hadn't gotten in the way. This is why, last spring, when yo were designing quota2, I was asking you to fix things above the FS to be FS-independent. But you didn't; instead it got worse. I tried at the time to explain the situation and the premises, and why the quota system should be FS-independent at and above the VFS level, but I got ignored and then sucked away by real life. Well, I don't remember the details of that time but what I retained is that you didn't like xml. Now you're saying I move lfs out of ufs and I can't use quota1 for lfs. Yes, of course as quota1 is tightly coupled to ufs, and my project was not to make quota1 filesystem-independant - it was to add a new on-disk quota for ffs with some better properties. You can't blame me for not making quota1 (or even quota2) reusable outside of ufs when my goal was to get a new on-disk format for ffs. That's just not the same work. Now, I don't think the current quota1 code is that much tied to ufs. If you want to use the same dqblk for your on-disk format (but then it's on-disk format, you can't claim it's fs-independant), code can certainly be reorganised to make it reusable outside of ufs. But that's orthogonal to filesystem-independant format representation. Now I'm trying to fix it. Likewise, if I were to go add quota support to v7fs, or try to hook up whatever quota support zfs has, or commit Hammer and try to get whatever quota support *it* has working, or add ext2 quota support, or write a new fs with quota support, or whatever, I'd have to make still more copies of the logic to cope with all the different formats and layouts. Of course if you have new on-disk format you need to do some conversion, whatever filesystem independant format you use. But I think you could still reuse sys/ufs/ufs/quota2_subr.c to do the convertion from plist to some binary representation. I could cut and paste it, maybe. That's not particularly desirable. Now that I understand where you want to go, it's not the right thing to do. Use the code in common/lib/libquota and write convertion routines for your filesystem. You can call it a 'cut-n-paste' from quota2_subr.c, but as quota2_subr.c is about converting the filsystem-independant data to the quota2 on-disk format, and you use a different on-disk format you can't blame it for not fitting your needs. This is not a good idea, not scalable, and not sensible, especially when a filesystem-independent (read format-independent if you like) interface is both perfectly possible and simpler. I strongly believe the plist representation is format-independent. It has exactly the same informations as what you propose. Right now, I'm not sure if it is or not. I'm only sure that it's highly complicated It's not more complicated than the table representation you proposed (beside being xml-based, but that's all whe have now). (unnecessarily so) and underdocumented. Meanwhile, documentation can always be improved. The plist format is described in
Re: fs-independent quotas
On Thu, Oct 20, 2011 at 06:54:54PM +0200, Manuel Bouyer wrote: On Thu, Oct 20, 2011 at 04:39:21PM +, David Holland wrote: We're talking a few MB of ram here, isn't it ? the kernel can certainly allocate this without troubles (other subsystems do). The proplib'd and XMLified complete dump for 50,000 users will probably make a blob of between 10 and 20 MB. (Note: this is an estimate; I haven't checked the size by trying it. It might be larger. I'd be surprised if it were much smaller.) I tested with a few 10s or users; my estimate is about 35MB for 50k users. I don't see why it's desirable to manifest such large objects when it's easily avoidable. We don't agree on easily. FYI: I just went around, and around, and around on this with the configuration framework a proprietary kernel subsystem. If you just take the position that _any_ write to _any_ part of the data invalidates all cursors it is not so bad. The user application has to be coded to deal with that, but it keeps the complexity out of the kernel. Thor
Re: fs-independent quotas
Ignatios Souvatzis i...@netbsd.org writes: On Wed, Oct 19, 2011 at 06:09:27PM +, David Holland wrote: support to other filesystems (tempfs, perhaps v7fs) or even add other filesystems that have or may have their own native quota handling (zfs, Hammer, you name it). zfs - does it really have quota? Yes, it does, as of zfs filesystem V4. http://hub.opensolaris.org/bin/view/Community+Group+zfs/faq#HCanIsetquotasonZFSfilesystems3F relurk/
Extended attributes Linux interface
Hello, There were previously discussions, started by Emmanuel, concerning the extended attributes, including on the various available APIs and which to support etc. At the time I read them I was catching up with a lot of mail and had written down a small note about a potential security implication that crossed my mind if we used the Linux interface. Perhaps someone can (dis)confirm: Strings are used instead of IDs to distinguish the class of an extended attribute, i.e. system etc. My question is then: must those be limited to ASCII or can they support arbitrary bytes, or UTF-8? If unicode strings are possible, I think that it'd be possible for a string to look like system but to actually be something else to an auditing administrator, unless all tools clearly showed those non-ASCII bytes in an escaped format. Of course, if the kernel wanted to match system, it wouldn't match then, but the fact that it may _appear_ to be correct to an admin may introduce a security issue if extended permissions were ever implemented on top of that system. Perhaps that this problem could also exist with the key names in case they're part of permission descriptions? Thanks, -- Matt