Re: dtrace ioctls

2011-10-20 Thread David Laight
On Wed, Oct 19, 2011 at 10:22:08PM +, David Holland wrote:
 On Wed, Oct 19, 2011 at 10:01:33PM +0100, David Laight wrote:
   
   Hmmm... the sun code is passing the structure by value 
 
 Is it? The non-sun code appears to be calling an ioctl that's defined
 to take a pointer to a pointer to a structure. Or maybe I'm totally
 misreading ioccom.h?

Maybe I was asleep

David

-- 
David Laight: da...@l8s.co.uk


Re: fs-independent quotas

2011-10-20 Thread Ignatios Souvatzis
On Wed, Oct 19, 2011 at 06:09:27PM +, David Holland wrote:
 support to other filesystems (tempfs, perhaps v7fs) or even add other
 filesystems that have or may have their own native quota handling
 (zfs, Hammer, you name it). 

zfs - does it really have quota? 

All the demos I've seen talk about sub-filesystem limits; you create
per-user sub-filesystems if you want to emulate per-user quota.

(Correct me if I'm wrong.)

How would this fit in, if at all?

-is


Re: fs-independent quotas

2011-10-20 Thread Manuel Bouyer
On Wed, Oct 19, 2011 at 10:20:23PM +, David Holland wrote:
 On Wed, Oct 19, 2011 at 09:22:02PM +0200, Manuel Bouyer wrote:
So, a few months back we got a new improved quota format for FFS.
Unfortunately, one of the side effects of this was to sprinkle
specific knowledge of the new format through all the userlevel quota
tools and quota support logic. To be fair, this was alongside the
existing specific knowledge of the old quota format; nonetheless, it's
messy and unscalable.
   
   of course there's been changes to the tools, as there's a new format.
 
 The tools ought to be format-independent.

I can't parse this, can you explain ? The tools needs to be aware of the
format to do something usefull with the data, isn't it ?

 
We may want to add more quota formats (e.g. the different and
incompatible new quota format FreeBSD added last year) or add quota
support to other filesystems (tempfs, perhaps v7fs) or even add other
filesystems that have or may have their own native quota handling
(zfs, Hammer, you name it). Also, my planned lfs-renovation is
currently hung up on the VFS-level quota interface, because I don't
want to rip out the existing maybe-partial support for quotas but
can't plug new code into the existing framework.
   
   You'll have to explain this. lfs is some variant of ffs, I see no reasons
   why it coudln't use the new format.
 
 It could use whatever format it wants. To the extent it currently
 supports quotas, I think it's limited to the old-style quotas, that
 is, quota1. But there's no way to plug it in without taking the
 fs-dependent code currently in all the tools and access pathway and
 making a third or perhaps a third and fourth copy of all the logic.

that's plain wrong. If it's quota1 you can use the quota1 code in
sys/ufs/ufs (just as it would have done before quota2).

 
 Likewise, if I were to go add quota support to v7fs, or try to hook up
 whatever quota support zfs has, or commit Hammer and try to get
 whatever quota support *it* has working, or add ext2 quota support, or
 write a new fs with quota support, or whatever, I'd have to make still
 more copies of the logic to cope with all the different formats and
 layouts.

Of course if you have new on-disk format you need to do some conversion,
whatever filesystem independant format you use.
But I think you could still reuse sys/ufs/ufs/quota2_subr.c to do the
convertion from plist to some binary representation.

 
 This is not a good idea, not scalable, and not sensible, especially
 when a filesystem-independent (read format-independent if you like)
 interface is both perfectly possible and simpler.

I strongly believe the plist representation is format-independent.
It has exactly the same informations as what you propose.

 
   in fact the new format is fs-independant. 
 
 Yes, in the sense that one could add the format to other file systems;
 but no, in the sense that other file systems already have their own
 quota formats and we need to be able to interoperate.

You have to do some convertion, of the same level as with what you
propose.

 
   But this is just what the current propib format is ! a set of tables
   with key/values pair !
 
 That's great, that'll make the changes I need to make that much
 easier. But it doesn't seem particularly familiar relative to the code
 I've been working on.

Or maybe you don't need to change it at all.

 
the quota *type*

   - the quota value is:
the configured hard limit
the configured soft limit
the configured grace period
the current usage
the current grace expiry time (if any)
   
   This is exactly the format described in quotactl(2).
 
 No, what's described in quotactl(2) is something about commands and
 arguments... and while there is a substructure that looks something
 like this, the fact remains that it's a *sub*structure

Yes, but you still need a way to pass commands. You didn't talk about this.

 and the schema
 is not tabular.

I don't understant what you mean here. there's a set of values associated
with an id, I can't see the difference with what your proposing.

 
The quota *class* is the thing the quota is imposed on; this is
currently either user or group. There is no likely prospect of
additional quota classes appearing.
   
   I don't think we should limit ourselve to these class. I could see
   per-host or per-hostgroup quotas for networked filesystems for example.
 
 I'm not limiting it to anything, but I'll believe in more quota
 classes when I see them. Per-host quotas (even if they make sense,
 which I question) aren't going to work very well with a 32-bit id, for
 example.

right, that's where a plist is a win.

 
 Whereas, as I pointed out before, there are filesystems in the field
 with more than two quota types.

The current format has no limitations in this area.

 
   class  idtypehard 

proposed additions to sys/conf/std

2011-10-20 Thread Jonathan A. Kollasch

I propose adding

pseudo-device drvctl

and/or

options BUFQ_PRIOCSCAN

to src/sys/conf/std.

The reasons I even bring this up:
 - Many kernels are missing drvctl and thus do not support disk wedges
   (this is arguably due to a flaw in the design of disk wedges, but
   that's a another bikeshed).
 - BUFQ_PRIOCSCAN is superior to BUFQ_DISKSORT, and in fact
   BUFQ_DISKSORT is actually inferior to BUFQ_FCFS in terms of
   interactive disk I/O responsiveness.  There are many kernels that
   default to BUFQ_DISKSORT due to not explicitly adding BUFQ_PRIOCSCAN.

The ominous 
# it's commonly used is NOT a good reason to enable options here.
 line has me a bit apprehensive.  However, pseudo-device cpuctl is there
already.  There are some options that are there for historical reasons,
so this is sort of a slippery slope.  Do we need a new config file for
standard-but-optional-options?

Jonathan Kollasch


Re: fs-independent quotas

2011-10-20 Thread David Holland
On Thu, Oct 20, 2011 at 06:43:47AM +0200, Emmanuel Dreyfus wrote:
   It seems to me that quotas are fundamentally a special-purpose
   key/value store; that is, you look up quota information for a
   particular thing (the key) and get back the quota settings and current
   usage information (the value).
  
  If you are going to add a generic key/value store mechanism for all
  filesystems,  you can consider fs-independent extended attrbiutes as
  well.

I am not adding a generic key/value store mechanism. I am representing
the quota data as a specific key/value store.

A generic key/value store mechanism for all filesystems would be a
very large, messy, and semantically nebulous project...

-- 
David A. Holland
dholl...@netbsd.org


Re: fs-independent quotas

2011-10-20 Thread David Holland
On Thu, Oct 20, 2011 at 11:57:04AM +0200, Ignatios Souvatzis wrote:
   support to other filesystems (tempfs, perhaps v7fs) or even add other
   filesystems that have or may have their own native quota handling
   (zfs, Hammer, you name it). 
  
  zfs - does it really have quota? 

I don't know... but if not, there are plenty of other fses.

  All the demos I've seen talk about sub-filesystem limits; you create
  per-user sub-filesystems if you want to emulate per-user quota.
  
  (Correct me if I'm wrong.)
  
  How would this fit in, if at all?

That's a good question. My first instinct is that like the other stuff
zfs does that it does in its own semantically-incompatible way, it
would require its own tools. But I guess the quota system could be
made to report the limits if the sub-filesystems are specifically
assigned to users somehow. Or something like that...

-- 
David A. Holland
dholl...@netbsd.org


Re: fs-independent quotas

2011-10-20 Thread David Holland
On Thu, Oct 20, 2011 at 05:23:14PM +0200, Manuel Bouyer wrote:
   That's way more complicated than necessary. Think of it as like
   VOP_READDIR - you get passed a position, you send back some number of
   items, and update the position.
  
  Depending on how the data are stored on disk, the notion of position
  (which also implies some ordering) can be difficult to handle,
  especially if the data we're reading can change between two calls,
  causing the position do become invalid.

...yes, but this is just one of those things you have to cope with
when doing filesystems. It's no different from readdir in that regard.

  It's certainly less trouble to send back to userland the whole set of
  data - especially if what userland wants is the whole set of data
  (I can't see what a partial read of quota would be usefull for).

No, no it really isn't. Suppose there are, say, 50,000 users, so to
send back the whole works you have to accumulate 100,000 quota entries
in a gigantic blob... a machine with 50,000 users will have enough RAM
for this but that doesn't mean that allocating a contiguous chunk of
kernel memory that large is easy or desirable. Far better to read it
out a couple hundred at a time.

There are two design truisms for database stuff that apply here:
first, you always end up wanting cursors, and second, you always end
up wanting bulk get (and not just single get) from those cursors. So
it's usually a good idea to anticipate this and design it all in up
front.

   The reason to wrap the position in a cursor abstraction is to allow
   flexibility about how the position is represented.
  
  But then the cursor would still be stored in userland ?

That's the idea, like reading a file with pread().

I think the kernel should know, or at least be able to know, how many
cursors are currently open; but I don't think there's any need to keep
the cursor state itself in the kernel.

-- 
David A. Holland
dholl...@netbsd.org


Re: fs-independent quotas

2011-10-20 Thread Manuel Bouyer
On Thu, Oct 20, 2011 at 03:47:26PM +, David Holland wrote:
 On Thu, Oct 20, 2011 at 05:23:14PM +0200, Manuel Bouyer wrote:
That's way more complicated than necessary. Think of it as like
VOP_READDIR - you get passed a position, you send back some number of
items, and update the position.
   
   Depending on how the data are stored on disk, the notion of position
   (which also implies some ordering) can be difficult to handle,
   especially if the data we're reading can change between two calls,
   causing the position do become invalid.
 
 ...yes, but this is just one of those things you have to cope with
 when doing filesystems. It's no different from readdir in that regard.
 
   It's certainly less trouble to send back to userland the whole set of
   data - especially if what userland wants is the whole set of data
   (I can't see what a partial read of quota would be usefull for).
 
 No, no it really isn't. Suppose there are, say, 50,000 users, so to
 send back the whole works you have to accumulate 100,000 quota entries
 in a gigantic blob... a machine with 50,000 users will have enough RAM
 for this but that doesn't mean that allocating a contiguous chunk of
 kernel memory that large is easy or desirable. Far better to read it
 out a couple hundred at a time.

We're talking a few MB of ram here, isn't it ? the kernel can certainly
allocate this without troubles (other subsystems do).


 
 There are two design truisms for database stuff that apply here:
 first, you always end up wanting cursors, and second, you always end
 up wanting bulk get (and not just single get) from those cursors. So
 it's usually a good idea to anticipate this and design it all in up
 front.

Maybe ... I know that in the end I want the whole set of data and not
just a part of it. But if you believe it's needed this can
easily be added to the existing quotactl(2) (it would just be a new command).

 
The reason to wrap the position in a cursor abstraction is to allow
flexibility about how the position is represented.
   
   But then the cursor would still be stored in userland ?
 
 That's the idea, like reading a file with pread().
 
 I think the kernel should know, or at least be able to know, how many
 cursors are currently open; but I don't think there's any need to keep
 the cursor state itself in the kernel.

So you want a quotaopen/quotaclose, with a file descriptor (or something
similar) ?

-- 
Manuel Bouyer bou...@antioche.eu.org
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: fs-independent quotas

2011-10-20 Thread David Holland
On Thu, Oct 20, 2011 at 06:00:28PM +0200, Manuel Bouyer wrote:
 It's certainly less trouble to send back to userland the whole set of
 data - especially if what userland wants is the whole set of data
 (I can't see what a partial read of quota would be usefull for).
   
   No, no it really isn't. Suppose there are, say, 50,000 users, so to
   send back the whole works you have to accumulate 100,000 quota entries
   in a gigantic blob... a machine with 50,000 users will have enough RAM
   for this but that doesn't mean that allocating a contiguous chunk of
   kernel memory that large is easy or desirable. Far better to read it
   out a couple hundred at a time.
  
  We're talking a few MB of ram here, isn't it ? the kernel can certainly
  allocate this without troubles (other subsystems do).

The proplib'd and XMLified complete dump for 50,000 users will
probably make a blob of between 10 and 20 MB. (Note: this is an
estimate; I haven't checked the size by trying it. It might be larger.
I'd be surprised if it were much smaller.)

I don't see why it's desirable to manifest such large objects when
it's easily avoidable.

   There are two design truisms for database stuff that apply here:
   first, you always end up wanting cursors, and second, you always end
   up wanting bulk get (and not just single get) from those cursors. So
   it's usually a good idea to anticipate this and design it all in up
   front.
  
  Maybe ... I know that in the end I want the whole set of data and not
  just a part of it.

Yes, probably. The cursor API I've floated so far is not general
enough to support much else. Although it could be made more general.

  But if you believe it's needed this can easily be added to the
  existing quotactl(2) (it would just be a new command).

Yes, perhaps it could... but why? What's to be gained by using a
baroque proplib encoding of what can otherwise be handled as an array
of simple structs?

I remember asking this question when you first proposed the proplib
interface last spring, and never really got a clear answer.

  The reason to wrap the position in a cursor abstraction is to allow
  flexibility about how the position is represented.
 
 But then the cursor would still be stored in userland ?
   
   That's the idea, like reading a file with pread().
   
   I think the kernel should know, or at least be able to know, how many
   cursors are currently open; but I don't think there's any need to keep
   the cursor state itself in the kernel.
  
  So you want a quotaopen/quotaclose, with a file descriptor (or something
  similar) ?

The proposed API already has explicit open and close for cursors; what
I'm saying is that this should be exposed to the kernel. (Open already
has to be, to initialize the cursor position; close should be, so the
filesystem can if necessary know if there are cursors open at any
given time. Otherwise you can get into trouble; see for example nfsd
and readdir.)

-- 
David A. Holland
dholl...@netbsd.org


Re: fs-independent quotas

2011-10-20 Thread Manuel Bouyer
On Thu, Oct 20, 2011 at 04:39:21PM +, David Holland wrote:
   We're talking a few MB of ram here, isn't it ? the kernel can certainly
   allocate this without troubles (other subsystems do).
 
 The proplib'd and XMLified complete dump for 50,000 users will
 probably make a blob of between 10 and 20 MB. (Note: this is an
 estimate; I haven't checked the size by trying it. It might be larger.
 I'd be surprised if it were much smaller.)

I tested with a few 10s or users; my estimate is about 35MB for 50k users.

 
 I don't see why it's desirable to manifest such large objects when
 it's easily avoidable.

We don't agree on easily. 

 
There are two design truisms for database stuff that apply here:
first, you always end up wanting cursors, and second, you always end
up wanting bulk get (and not just single get) from those cursors. So
it's usually a good idea to anticipate this and design it all in up
front.
   
   Maybe ... I know that in the end I want the whole set of data and not
   just a part of it.
 
 Yes, probably. The cursor API I've floated so far is not general
 enough to support much else. Although it could be made more general.
 
   But if you believe it's needed this can easily be added to the
   existing quotactl(2) (it would just be a new command).
 
 Yes, perhaps it could... but why? What's to be gained by using a
 baroque proplib encoding of what can otherwise be handled as an array
 of simple structs?

it's an easily machine-parsable text. That's probably the reason why it's
used in other parts of the kernel too.

 
 I remember asking this question when you first proposed the proplib
 interface last spring, and never really got a clear answer.

I see it as being the common format used for non-performance-critical
kernel/userland communication. It has been adopted by other kernel
subsystems, there's prior art there.

 
   The reason to wrap the position in a cursor abstraction is to allow
   flexibility about how the position is represented.
  
  But then the cursor would still be stored in userland ?

That's the idea, like reading a file with pread().

I think the kernel should know, or at least be able to know, how many
cursors are currently open; but I don't think there's any need to keep
the cursor state itself in the kernel.
   
   So you want a quotaopen/quotaclose, with a file descriptor (or something
   similar) ?
 
 The proposed API already has explicit open and close for cursors; what
 I'm saying is that this should be exposed to the kernel. (Open already
 has to be, to initialize the cursor position; close should be, so the
 filesystem can if necessary know if there are cursors open at any
 given time. Otherwise you can get into trouble; see for example nfsd
 and readdir.)

So you're close to have something like a file descriptor.

-- 
Manuel Bouyer bou...@antioche.eu.org
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: fs-independent quotas

2011-10-20 Thread Manuel Bouyer
On Thu, Oct 20, 2011 at 05:35:16PM +, David Holland wrote:
   I can't parse this, can you explain ? The tools needs to be aware of the
   format to do something usefull with the data, isn't it ?
 
 The tools can and should work with a filesystem-independent abstract
 schema. This should be independent of any filesystem's on-disk quota
 format, just as the dirent.h structures are independent of any
 filesystem's on-disk directory layout.

the current proplib-based schema is independant of the on-disk format
(as it's just another representation of the same set of data that you 
proposed).

 
   that's plain wrong. If it's quota1 you can use the quota1 code in
   sys/ufs/ufs (just as it would have done before quota2).
 
 No, it is not wrong. It cannot use the quota1 code in ufs; the whole
 premise of the proposed lfs renovation is to unhook lfs from ufs. The
 ufs code is a big blob, not a library of components; you can't just
 use parts of it, or at least not easily.
 
 I can copy the ufs quota1 structures and some of the ufs quota code,
 yes; but then I have struct lfs_dqblk, and I need to interface it to
 the rest of the system, and as things currently stand that forces me
 to clone all the ffs-quota1-specific quota code all over everywhere.

So, if I understand you properly, your lfs code won't use the 
quota1 on-disk format but some new format based on a lfs_dqblk structure.
Then it's a brand new disk format, the right thing to do is to use the
convertion functions from common/lib/libquota/ (as the ufs/quota1 and
ufs/quota2 code already do) and convert from here to your on-disk
format. 

You can't claim a data representation isn't filesystem-independant because
it doesn't correspond to you on-disk representation. As it's
filesystem-independant it has (by definition) to be converted to every
on-disk representation.

 
 The lfs/ufs split would have been committed ages ago if the quota
 system hadn't gotten in the way. This is why, last spring, when yo
 were designing quota2, I was asking you to fix things above the FS to
 be FS-independent. But you didn't; instead it got worse. I tried at
 the time to explain the situation and the premises, and why the quota
 system should be FS-independent at and above the VFS level, but I got
 ignored and then sucked away by real life.

Well, I don't remember the details of that time but what I retained
is that you didn't like xml.
Now you're saying I move lfs out of ufs and I can't use quota1 for lfs.
Yes, of course as quota1 is tightly coupled to ufs, and my project was
not to make quota1 filesystem-independant - it was to add a new
on-disk quota for ffs with some better properties. You can't blame
me for not making quota1 (or even quota2) reusable outside of ufs when 
my goal was to get a new on-disk format for ffs. That's just not the
same work.

Now, I don't think the current quota1 code is that much tied to ufs.
If you want to use the same dqblk for your on-disk format (but then
it's on-disk format, you can't claim it's fs-independant), code can
certainly be reorganised to make it reusable outside of ufs. But that's
orthogonal to filesystem-independant format representation.

 
 Now I'm trying to fix it.
 
Likewise, if I were to go add quota support to v7fs, or try to hook up
whatever quota support zfs has, or commit Hammer and try to get
whatever quota support *it* has working, or add ext2 quota support, or
write a new fs with quota support, or whatever, I'd have to make still
more copies of the logic to cope with all the different formats and
layouts.
   
   Of course if you have new on-disk format you need to do some conversion,
   whatever filesystem independant format you use.
   But I think you could still reuse sys/ufs/ufs/quota2_subr.c to do the
   convertion from plist to some binary representation.
 
 I could cut and paste it, maybe. That's not particularly desirable.

Now that I understand where you want to go, it's not the right thing
to do. Use the code in common/lib/libquota and write convertion routines
for your filesystem. You can call it a 'cut-n-paste' from quota2_subr.c,
but as quota2_subr.c is about converting the filsystem-independant
data to the quota2 on-disk format, and you use a different on-disk
format you can't blame it for not fitting your needs.

 
This is not a good idea, not scalable, and not sensible, especially
when a filesystem-independent (read format-independent if you like)
interface is both perfectly possible and simpler.
   
   I strongly believe the plist representation is format-independent.
   It has exactly the same informations as what you propose.
 
 Right now, I'm not sure if it is or not. I'm only sure that it's
 highly complicated

It's not more complicated than the table representation you proposed
(beside being xml-based, but that's all whe have now).

 (unnecessarily so) and underdocumented. Meanwhile,

documentation can always be improved. The plist format is described
in 

Re: fs-independent quotas

2011-10-20 Thread Thor Lancelot Simon
On Thu, Oct 20, 2011 at 06:54:54PM +0200, Manuel Bouyer wrote:
 On Thu, Oct 20, 2011 at 04:39:21PM +, David Holland wrote:
We're talking a few MB of ram here, isn't it ? the kernel can certainly
allocate this without troubles (other subsystems do).
  
  The proplib'd and XMLified complete dump for 50,000 users will
  probably make a blob of between 10 and 20 MB. (Note: this is an
  estimate; I haven't checked the size by trying it. It might be larger.
  I'd be surprised if it were much smaller.)
 
 I tested with a few 10s or users; my estimate is about 35MB for 50k users.
 
  
  I don't see why it's desirable to manifest such large objects when
  it's easily avoidable.
 
 We don't agree on easily. 

FYI:  I just went around, and around, and around on this with the
configuration framework a proprietary kernel subsystem.  If you just
take the position that _any_ write to _any_ part of the data invalidates
all cursors it is not so bad.  The user application has to be coded to
deal with that, but it keeps the complexity out of the kernel.

Thor


Re: fs-independent quotas

2011-10-20 Thread Daniel Hagerty
Ignatios Souvatzis i...@netbsd.org writes:

 On Wed, Oct 19, 2011 at 06:09:27PM +, David Holland wrote:
  support to other filesystems (tempfs, perhaps v7fs) or even add other
  filesystems that have or may have their own native quota handling
  (zfs, Hammer, you name it). 
 
 zfs - does it really have quota? 

Yes, it does, as of zfs filesystem V4.

http://hub.opensolaris.org/bin/view/Community+Group+zfs/faq#HCanIsetquotasonZFSfilesystems3F

relurk/


Extended attributes Linux interface

2011-10-20 Thread Matthew Mondor
Hello,

There were previously discussions, started by Emmanuel, concerning the
extended attributes, including on the various available APIs and which
to support etc.

At the time I read them I was catching up with a lot of mail and had
written down a small note about a potential security implication that
crossed my mind if we used the Linux interface.  Perhaps someone can
(dis)confirm:

Strings are used instead of IDs to distinguish the class of an extended
attribute, i.e. system etc.  My question is then: must those be
limited to ASCII or can they support arbitrary bytes, or UTF-8?

If unicode strings are possible, I think that it'd be possible for a
string to look like system but to actually be something else to an
auditing administrator, unless all tools clearly showed those non-ASCII
bytes in an escaped format.

Of course, if the kernel wanted to match system, it wouldn't match
then, but the fact that it may _appear_ to be correct to an admin may
introduce a security issue if extended permissions were ever
implemented on top of that system.  Perhaps that this problem could
also exist with the key names in case they're part of permission
descriptions?

Thanks,
-- 
Matt