Re: Adding subroot information to /proc/mounts, or obtaining that through other means

2007-06-22 Thread Ram Pai
On Thu, 2007-06-21 at 10:31 -0700, H. Peter Anvin wrote:
 Ram Pai wrote:
  
  Peter, I am not working on it currently. But i am interested in getting
  it done. I have the seed set of patches which had Al Viro's ideas
  incorporated. Infact those patches were sent on lkml 2 months back.
  Shall we start with those patches?
  
 
 Okay, so what I see in your patches are:
 
   path-from-root: mount point of the mount from /
   path-from-root-of-its-sb: path from its own root dentry.
   propagation-flag: SHARED, SLAVE, UNBINDABLE, PRIVATE
   peer-mount-id: the mount-id of its peer mount (if this mount is shared)
   master-mount-id: the mount-id of its master mount (if this mount is
 slave)
 
 Other than cosmetic, I don't see anything terribly wrong with this,
 although getting a flag when the directory is overmounted would be nice.
 
 I guess I suggest a single comma-separated field with flags and optional
 :argument:
 
   private
   shared:peer
   slave:master
   unbindable
   overmounted
 
 So we could end up with something like:
 
 rootfs / rootfs rw 0 0 0:1 / 1 private,overmounted
 
 ... where 1 is the mnt_id (sequence number).
 
 [Please see my other comments in this thread... basically I believe we
 should just add fields to /proc/mounts.]

I had two patches. The first patch added a new interface
called /proc/mounts_new  and had the following format.

FSID  mntpt  root-dentry  fstype fs-options

where FSID is a filesystem unique id
mntpt is the path to the mountpoint
root-dentry is the path to the dentry with respect to the root dentry of
the same filesystem.
fstype  is the filesystem type
fs-options  the mount options used.


the second patch made a /proc/propagation interface which had almost the
same fields, but also added fields to show the propagation type of the
mount as well as pointers to its peers and master depending on the type
of the mount. 

I think the consensus seems to have a new interface /proc/make-a-name
which extends the interface provided by /proc/mounts but provides the
propagation state of the mounts too as well as disambiguate bind mounts.
Which makes sense.

Why not have something like this?

mnt-id FSID backing-dev mntpt root-dentry fstype
comma-separated-fs-options

and one of the fields in the comma-separated-fs-options indicates the
propagation type of the mount.


BTW: what is the need for overmounted flag?  Do you mean two vfsmounts
mounted on the same dentry on the ***same vfsmount*** ?


RP










 
   -hpa

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Adding subroot information to /proc/mounts, or obtaining that through other means

2007-06-22 Thread H. Peter Anvin
Ram Pai wrote:
 
 the second patch made a /proc/propagation interface which had almost the
 same fields, but also added fields to show the propagation type of the
 mount as well as pointers to its peers and master depending on the type
 of the mount. 
 
 I think the consensus seems to have a new interface /proc/make-a-name
 which extends the interface provided by /proc/mounts but provides the
 propagation state of the mounts too as well as disambiguate bind mounts.
 Which makes sense.
 

Why?  It seems a lot cleaner to have all the information in the same
place.  It is highly unfriendly to userspace to have to gather
information in a lot of places, plus it adds race conditions.

It would be another matter if the format that we have now couldn't be
extended, but we need those fields (well, except the two zeros, but who
cares) *anyway*, so we might as well stick to the existing file, and
reduce the total amount of code and clutter.

 
 BTW: what is the need for overmounted flag?  Do you mean two vfsmounts
 mounted on the same dentry on the ***same vfsmount*** ?
 

Maybe I'm not following the uses of your flags well enough to figure out
 if that information can already been deduced.

-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Adding subroot information to /proc/mounts, or obtaining that through other means

2007-06-22 Thread Ram Pai
On Fri, 2007-06-22 at 00:06 -0700, H. Peter Anvin wrote:
 Ram Pai wrote:
  
  the second patch made a /proc/propagation interface which had almost the
  same fields, but also added fields to show the propagation type of the
  mount as well as pointers to its peers and master depending on the type
  of the mount. 
  
  I think the consensus seems to have a new interface /proc/make-a-name
  which extends the interface provided by /proc/mounts but provides the
  propagation state of the mounts too as well as disambiguate bind mounts.
  Which makes sense.
  
 
 Why?  It seems a lot cleaner to have all the information in the same
 place.  It is highly unfriendly to userspace to have to gather
 information in a lot of places, plus it adds race conditions.
 
 It would be another matter if the format that we have now couldn't be
 extended, but we need those fields (well, except the two zeros, but who
 cares) *anyway*, so we might as well stick to the existing file, and
 reduce the total amount of code and clutter.

Ok. so you think /proc/mounts can be extended easily without breaking
any userspace commands?

well lets see..
1. to disambiguate bind mounts, we have to add a field that displays the
 path to the mount's root dentry from the filesystem's root
 dentry. Agree?

2. For filesystems that do not have a backing store, it becomes hard
to disambiguate bind mounts in (1). So we need to add a
filesystem-id field.

3. if we need to add the propagation status of the mount we need a
 propagation flag added in the output.

4. To be able to construct the propagation tree, we need a way to refer
to the other mounts, since some mounts are peers and some other
mounts are master. Which means we need a mount-id field.
Agree?

If you agree to the above 4 new fields, it becomes challenging to
extend /proc/mounts to incorporate these new fields without
breaking any existing applications. 


  
  BTW: what is the need for overmounted flag?  Do you mean two vfsmounts
  mounted on the same dentry on the ***same vfsmount*** ?
  
 
 Maybe I'm not following the uses of your flags well enough to figure out
  if that information can already been deduced.

With the addition of the above 4 mentioned fields, I think one should be
easily able to decipher which mnt-id is mounted on which mnt-id. no?
maybe not. Well we will have to extend the mountpoint field to indicate
the mnt-id in which the mountpoint resides.  

RP

 
   -hpa

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-22 Thread John Johansen
On Thu, Jun 21, 2007 at 09:06:40PM -0400, James Morris wrote:
 On Thu, 21 Jun 2007, Chris Mason wrote:
 
   The incomplete mediation flows from the design, since the pathname-based
   mediation doesn't generalize to cover all objects unlike label- or
   attribute-based mediation.  And the use the natural abstraction for
   each object type approach likewise doesn't yield any general model or
   anything that you can analyze systematically for data flow.
  
  This feels quite a lot like a repeat of the discussion at the kernel
  summit.  There are valid uses for path based security, and if they don't
  fit your needs, please don't use them.  But, path based semantics alone
  are not a valid reason to shut out AA.
 
 The validity or otherwise of pathname access control is not being 
 discussed here.
 
 The point is that the pathname model does not generalize, and that 
 AppArmor's inability to provide adequate coverage of the system is a 
 design issue arising from this.
 
As we have previously stated we are not using pathnames for IPC.  The
use of pathnames for file access mediation is not a design issue that in
anyway prevents us from extending AppArmor to mediate IPC or networking.

The current focus is making the revision necessary for AppArmor's file
mediation at which point we can focus on finishing of the network
and IPC support.

 Recall that the question asked by Lars was whether there were any 
 outstanding technical issues relating to AppArmor.
 
 AppArmor does not and can not provide the level of confinement claimed by 
 the documentation, and its policy does not reflect its actual confinement 
 properties.  That's kind of a technical issue, right?
 
AppArmor currently controls file and capabilities, which was explicitly
stated in the documentation submitted with the patches.  And it has
been posted before that network and IPC mediation are a wip.




pgpa39172kAUO.pgp
Description: PGP signature


Re: Adding subroot information to /proc/mounts, or obtaining that through other means

2007-06-22 Thread H. Peter Anvin
Ram Pai wrote:
 
 Ok. so you think /proc/mounts can be extended easily without breaking
 any userspace commands?
 
 well lets see..
 1. to disambiguate bind mounts, we have to add a field that displays the
path to the mount's root dentry from the filesystem's root
dentry. Agree?
 
 2. For filesystems that do not have a backing store, it becomes hard
   to disambiguate bind mounts in (1). So we need to add a
   filesystem-id field.
 
 3. if we need to add the propagation status of the mount we need a
propagation flag added in the output.
 
 4. To be able to construct the propagation tree, we need a way to refer
   to the other mounts, since some mounts are peers and some other
   mounts are master. Which means we need a mount-id field.
   Agree?
 
 If you agree to the above 4 new fields, it becomes challenging to
 extend /proc/mounts to incorporate these new fields without
 breaking any existing applications. 
 

No, I don't think so.  I suspect, in fact, that as long as we add the
new fields to the right (obviously) we should be fine.  There aren't all
that many users of /proc/mounts, and even fewer that don't use the
library functions (getmntent et al.)

-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-22 Thread John Johansen
On Thu, Jun 21, 2007 at 04:59:54PM -0400, Stephen Smalley wrote:
 On Thu, 2007-06-21 at 21:54 +0200, Lars Marowsky-Bree wrote:
  On 2007-06-21T15:42:28, James Morris [EMAIL PROTECTED] wrote:
  
 
  And now, yes, I know AA doesn't mediate IPC or networking (yet), but
  that's a missing feature, not broken by design.
 
 The incomplete mediation flows from the design, since the pathname-based
 mediation doesn't generalize to cover all objects unlike label- or
 attribute-based mediation.  And the use the natural abstraction for
 each object type approach likewise doesn't yield any general model or
 anything that you can analyze systematically for data flow.
 
No the incomplete mediation does not flow from the design.  We have
deliberately focused on doing the necessary modifications for pathname
based mediation.  The IPC and network mediation are a wip.

We have never claimed to be using pathname-based mediation IPC or networking.
The natural abstraction approach does generize well enough and will
be analyzable.

 The emphasis on never modifying applications for security in AA likewise
 has an adverse impact here, as you will ultimately have to deal with
 application mediation of access to their own objects and operations not
 directly visible to the kernel (as we have already done in SELinux for
 D-BUS and others and are doing for X).  Otherwise, your protection of
 desktop applications is easily subverted.
 
yes of course, we realize that dbus and X must be trusted applications,
this to will happen.  But it will happen piece meal, something about
releasing early and often comes to mind.

   You might define this as a non-technical issue, but the fact that 
   AppArmor 
   simply does not and can not work is a fairly significant consideration, I 
   would imagine.
  
  If I restrict my Mozilla to not access my on-disk mail folder, it can't
  get there. (Barring bugs in programs which Mozilla is allowed to run
  unconfined, sure.)
 
 Um, no.  It might not be able to directly open files via that path, but
 showing that it can never read or write your mail is a rather different
 matter.
 
Actually it can be analyzed and shown.  It is ugly to do and we
currently don't have a tool capable of doing it, but it is possible.


pgpC22P7rGCbZ.pgp
Description: PGP signature


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-22 Thread Andreas Gruenbacher
On Saturday 16 June 2007 02:20, Pavel Machek wrote:
 Ok, so mv gets slower for big trees... and open() gets faster for deep
 trees. Previously, open in current directory was one atomic read of
 directory entry, now it has to read directory, and its parent, and its
 parent parent, and its...

 (Or am I wrong and getting full path does not need to bring anything
 in, not even in cache-cold case?)

You are wrong, indeed. The dentries in the dcache are connected to the dcache 
through their parent dentry pointers, which means that the parent dentries 
are always in memory, too. No I/O is involved for walking up dentry trees.

(Caveat: nfsd does allow disconnected dentries. It does not make sense to try 
confining an in-kernel daemon though, an no user process can ever access a 
dentry before it gets connected (lookup does that), so this difference is 
irrelevant here.)
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-22 Thread Lars Marowsky-Bree
On 2007-06-21T23:45:36, Joshua Brindle [EMAIL PROTECTED] wrote:

 remember, the policies define a white-list
 
 Except for unconfined processes.

The argument that AA doesn't mediate what it is not configured to
mediate is correct, yes, but I don't think that's a valid _design_ issue
with AA.

 Or through IPC or the network, that is the point, filesystem only 
 coverage doesn't cut it; there is no way to say the browser can't access 
 the users mail in AA, and there never will be.

We have a variety of filtering mechanisms which are specific to a
domain. iptables filters networking only; file permissions filter file
access only. This argument is not really strong.

tangent
If you're now arguing the spirit of Unix, I can turn your argument
around too: the Unix spirit is to have smallish dedicated tools. If AA
is dedicated to mediating file access, isn't that nice!

AA _could_ be extended to mediate network access and IPC (and this is
WIP). If we had tcpfs and ipcfs - you know, everything is a filesystem,
the Linux spirit! ;-) - AA could mediate them as well.
/tangent

However, we're discussing the way it mediates file accesses here,
for which it appears useful and capable of functionality which SELinux's
approach cannot provide.


Regards,
Lars

-- 
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-22 Thread Stephen Smalley
On Thu, 2007-06-21 at 23:17 +0200, Lars Marowsky-Bree wrote:
 On 2007-06-21T16:59:54, Stephen Smalley [EMAIL PROTECTED] wrote:
 
  Or can access the data under a different path to which their profile
  does give them access, whether in its final destination or in some
  temporary file processed along the way.
 
 Well, yes. That is intentional.
 
 Your point is?

It may very well be unintentional access, especially when taking into
account wildcards in profiles and user-writable directories.

  The emphasis on never modifying applications for security in AA likewise
  has an adverse impact here, as you will ultimately have to deal with
  application mediation of access to their own objects and operations not
  directly visible to the kernel (as we have already done in SELinux for
  D-BUS and others and are doing for X).  Otherwise, your protection of
  desktop applications is easily subverted.
 
 That is an interesting argument, but not what we're discussing here.
 We're arguing filesystem access mediation.

IOW, anything that AA cannot protect against is out of scope.  An easy
escape from any criticism.

  Um, no.  It might not be able to directly open files via that path, but
  showing that it can never read or write your mail is a rather different
  matter.
 
 Yes. Your use case is different than mine.

My use case is being able to protect data reliably.  Yours?

-- 
Stephen Smalley
National Security Agency

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-22 Thread Stephen Smalley
On Fri, 2007-06-22 at 21:34 +1000, Neil Brown wrote:
 On Friday June 22, [EMAIL PROTECTED] wrote:
   
   Yes. Your use case is different than mine.
  
  My use case is being able to protect data reliably.  Yours?
 
 Saying protect data is nearly meaningless without a threat model.
 I bet you don't try to protect data from a direct nuclear hit, or a
 court order.
 
 
 AA has a fairly clear threat model.  It involves a flaw in a
 program being used by an external agent to cause it to use
 privileges it would not normally exercise to subvert privacy or
 integrity.
 I think this model matches a lot of real threats that real sysadmins
 have real concerns about.  I believe that the design of AA addresses
 this model quite well.
  
 
 What is your threat model?  Maybe it is just different.

The threat model you describe above is a subset of what SELinux
addresses.  And our argument is that AA does not meet that model well,
because it relies upon ambiguous and unstable identifiers for subjects
and objects (a violation of the fundamental requirements for access
control) and because it provides very incomplete mediation.

From http://www.nsa.gov/selinux/info/faq.cfm:
The Security-enhanced Linux's new features are designed to enforce the
separation of information based on confidentiality and integrity
requirements. They are designed for preventing processes from reading
data and programs, tampering with data and programs, bypassing
application security mechanisms, executing untrustworthy programs, or
interfering with other processes in violation of the system security
policy. They also help to confine the potential damage that can be
caused by malicious or flawed programs. They should also be useful for
enabling a single system to be used by users with differing security
authorizations to access multiple kinds of information with differing
security requirements without compromising those security requirements.

-- 
Stephen Smalley
National Security Agency

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-22 Thread Stephen Smalley
On Fri, 2007-06-22 at 01:06 -0700, John Johansen wrote:
 On Thu, Jun 21, 2007 at 04:59:54PM -0400, Stephen Smalley wrote:
  On Thu, 2007-06-21 at 21:54 +0200, Lars Marowsky-Bree wrote:
   On 2007-06-21T15:42:28, James Morris [EMAIL PROTECTED] wrote:
   
  
   And now, yes, I know AA doesn't mediate IPC or networking (yet), but
   that's a missing feature, not broken by design.
  
  The incomplete mediation flows from the design, since the pathname-based
  mediation doesn't generalize to cover all objects unlike label- or
  attribute-based mediation.  And the use the natural abstraction for
  each object type approach likewise doesn't yield any general model or
  anything that you can analyze systematically for data flow.
  
 No the incomplete mediation does not flow from the design.  We have
 deliberately focused on doing the necessary modifications for pathname
 based mediation.  The IPC and network mediation are a wip.

The fact that you have to go back to the drawing board for them is that
you didn't get the abstraction right in the first place.

 We have never claimed to be using pathname-based mediation IPC or networking.
 The natural abstraction approach does generize well enough and will
 be analyzable.

I think we must have different understandings of the words generalize
and analyzable.  Look, if I want to be able to state properties about
data flow in the system for confidentiality or integrity goals (my
secret data can never leak to unauthorized entities, my critical data
can never be corrupted/tainted by unauthorized entities - directly or
indirectly), then I need to be able to have a common reference point for
my policy.  When my policy is based on different abstractions
(pathnames, IP addresses, window ids, whatever) for different objects,
then I can no longer identify how data can flow throughout the system in
a system-wide way. 

  Um, no.  It might not be able to directly open files via that path, but
  showing that it can never read or write your mail is a rather different
  matter.
  
 Actually it can be analyzed and shown.  It is ugly to do and we
 currently don't have a tool capable of doing it, but it is possible.

No, it isn't possible when using ambiguous and unstable identifiers for
the subjects and objects, nor when mediation is incomplete.

-- 
Stephen Smalley
National Security Agency

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-22 Thread Neil Brown
On Friday June 22, [EMAIL PROTECTED] wrote:
  
  Yes. Your use case is different than mine.
 
 My use case is being able to protect data reliably.  Yours?

Saying protect data is nearly meaningless without a threat model.
I bet you don't try to protect data from a direct nuclear hit, or a
court order.


AA has a fairly clear threat model.  It involves a flaw in a
program being used by an external agent to cause it to use
privileges it would not normally exercise to subvert privacy or
integrity.
I think this model matches a lot of real threats that real sysadmins
have real concerns about.  I believe that the design of AA addresses
this model quite well. 

What is your threat model?  Maybe it is just different.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG?]Set XIP mount option on ext2 bypass check.

2007-06-22 Thread Satyam Sharma

Hi,

On 6/22/07, Arnd Bergmann [EMAIL PROTECTED] wrote:

On Thursday 21 June 2007, Carsten Otte wrote:

 This is an updated version of my bugfix patch. Yan Zheng pointed out,
 that ext2_remount lacks checking if -o xip should be enabled or not.
 This patch checks for presence of direct_access on the backing block
 device and if the blocksize meets the requirements.
 Andrew, please consider adding this patch to -mm.

 Signed-off-by: Carsten Otte [EMAIL PROTECTED]

It looks to me like a local denial of service attack in case of
user-mountable ext2 file systems in /etc/fstab.

Shouldn't that make it go into 2.6.22?


I agree. I would go on to suggest that all trivially-triggered oopsen /
panics from userspace (even if they require privileges, such as the
cat /dev/snapshot == oops issue posted last week) in fact ought to
be CVE's, and the corresponding fixes for such issues be considered
as candidates for -stable, if applicable to the current stable kernel.

Satyam
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-22 Thread Chris Mason
On Thu, Jun 21, 2007 at 09:06:40PM -0400, James Morris wrote:
 On Thu, 21 Jun 2007, Chris Mason wrote:
 
   The incomplete mediation flows from the design, since the pathname-based
   mediation doesn't generalize to cover all objects unlike label- or
   attribute-based mediation.  And the use the natural abstraction for
   each object type approach likewise doesn't yield any general model or
   anything that you can analyze systematically for data flow.
  
  This feels quite a lot like a repeat of the discussion at the kernel
  summit.  There are valid uses for path based security, and if they don't
  fit your needs, please don't use them.  But, path based semantics alone
  are not a valid reason to shut out AA.
 
 The validity or otherwise of pathname access control is not being 
 discussed here.
 
 The point is that the pathname model does not generalize, and that 
 AppArmor's inability to provide adequate coverage of the system is a 
 design issue arising from this.

I'm sorry, but I don't see where in the paragraphs above you aren't
making a general argument against the pathname model.

-chris

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-22 Thread Lars Marowsky-Bree
On 2007-06-22T07:53:47, Stephen Smalley [EMAIL PROTECTED] wrote:

  No the incomplete mediation does not flow from the design.  We have
  deliberately focused on doing the necessary modifications for pathname
  based mediation.  The IPC and network mediation are a wip.
 The fact that you have to go back to the drawing board for them is that
 you didn't get the abstraction right in the first place.

That's an interesting claim, however I don't think it holds. AA was
designed to mediate file access in a form which is intuitive to admins.

It's to be expected that it doesn't directly apply to mediating other
forms of access.

 I think we must have different understandings of the words generalize
 and analyzable.  Look, if I want to be able to state properties about
 data flow in the system for confidentiality or integrity goals (my
 secret data can never leak to unauthorized entities, my critical data
 can never be corrupted/tainted by unauthorized entities - directly or
 indirectly),

I seem to think that this is not what AA is trying to do, so evaluating
it in that context doesn't seem useful. It's like saying a screw driver
isn't a hammer, so it is useless because you have a nail.


Regards,
Lars

-- 
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-22 Thread Lars Marowsky-Bree
On 2007-06-22T08:41:51, Stephen Smalley [EMAIL PROTECTED] wrote:

 The issue arises even for a collection of collaborating confined
 processes with different profiles, and the collaboration may be
 intentional or unintentional (in the latter case, one of the confined
 processes may be taking advantage of known behavior of another process
 and shared access by both to some resource in order to induce a
 particular behavior in that process).

Point taken; the point remains is that you need at least several
(intentionally or not) cooperating processes. The chances of this are
significantly lower than a single process exploit.

 And remember that confinement isn't just about limiting untrusted
 processes but also about protecting trusted processes; limiting the
 inputs and outputs of a trusted process can be just as important to
 preventing exploitation.

True. It'd appear that if you want that, you'd specify the AA profile so
that it doesn't include directories/files writable by untrusted
processes.

 Sorry, do you mean the server as in the server system or as in the
 server daemon?  For the former, I'd agree - we would want SELinux
 policy applied on the server as well as the client to ensure that the
 data is being protected consistently throughout and that the server is
 not misrepresenting the security guarantees expected by the clients.
 Providing an illusion of confinement on each client without any
 corresponding protection on the server system would be very prone to
 bypass.  For the latter, the kernel can only truly confine application
 code, not in-kernel threads, although we can subject the in-kernel nfsd
 to permission checking as a robustness check.  We've always noted that
 SELinux does depend on the correctness of the kernel.

Oh, you're saying that this threat is out-of-scope? ;-)

 Every time we've noted an issue with AA, the answer has been that it is
 out of scope.  Yet the public documentation for AA misrepresents the
 situation and its comparisons with SELinux conveniently ignore its
 limitations.

I'm sorry. Again, I'm not responsible for marketing comparisons made by
anyone else, nor do I think they should apply to this discussion where
we're discussing the merits of what AA actually _does_; not what
someone's marketing claims it does - otherwise I'll go dig out marketing
claims about SELinux too ;-)

And, coming at it from that direction, I feel it does something useful.

Note that here we've already strayed from the focus of the discussion;
we're no longer arguing the implementation is ugly/broken, but you're
claiming doesn't do what I need - which I'm not disagreeing with. It
doesn't do what you want. Which is why you have SELinux, and it's going
to stay. Fine.

If we assume that the users who run it do have a need / use case for it
which they can't solve differently, we should really get back to the
discussion of how those needs can be met or provided by Linux in a
feasible way.

  Your use case mandates complete system-wide mediation, because you want
  full data flow analysis. Mine doesn't.
 Then yours isn't mandatory access control, nor is it confinement.  

I apologize for not using the word confinement in the way you expect
it to be used. I certainly don't want to imply it does do things it
doesn't. Keep in mind I'm not a native speaker, so nuances do get lost
sometimes; nor do I have long years of experience in the security
field. Thanks for clearing this up.

So agreed, it is not confinement nor MAC.

Would it be more appropriate if I used the word restricts or
constrains?


Regards,
Lars

-- 
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-22 Thread Stephen Smalley
On Fri, 2007-06-22 at 13:37 +0200, Lars Marowsky-Bree wrote:
 On 2007-06-22T07:19:39, Stephen Smalley [EMAIL PROTECTED] wrote:
 
Or can access the data under a different path to which their profile
does give them access, whether in its final destination or in some
temporary file processed along the way.
   Well, yes. That is intentional.
   
   Your point is?
  
  It may very well be unintentional access, especially when taking into
  account wildcards in profiles and user-writable directories.
 
 Again, you're saying that AA is not confining unconfined processes.
 That's a given. If unconfined processes assist confined processes in
 breeching their confinement, yes, that is not mediated.

The issue arises even for a collection of collaborating confined
processes with different profiles, and the collaboration may be
intentional or unintentional (in the latter case, one of the confined
processes may be taking advantage of known behavior of another process
and shared access by both to some resource in order to induce a
particular behavior in that process).

And remember that confinement isn't just about limiting untrusted
processes but also about protecting trusted processes; limiting the
inputs and outputs of a trusted process can be just as important to
preventing exploitation.

 You're basically saying that anything but system-wide mandatory access
 control is pointless.

Mandatory access control as historically understood has always meant
system-wide.  As well as always being based on the security properties
of the data (so that global and persistent protection of the data is
possible).  You can't actually use the terms 'mandatory access control'
or 'confinement' for AppArmor unless you redefine them.

 If you want to go down that route, what is your reply to me saying that
 SELinux cannot mediate NFS mounts - if the server is not confined using
 SELinux as well? The argument is really, really moot and pointless. Yes,
 unconfined actions can affect confined processes. 

Sorry, do you mean the server as in the server system or as in the
server daemon?  For the former, I'd agree - we would want SELinux
policy applied on the server as well as the client to ensure that the
data is being protected consistently throughout and that the server is
not misrepresenting the security guarantees expected by the clients.
Providing an illusion of confinement on each client without any
corresponding protection on the server system would be very prone to
bypass.  For the latter, the kernel can only truly confine application
code, not in-kernel threads, although we can subject the in-kernel nfsd
to permission checking as a robustness check.  We've always noted that
SELinux does depend on the correctness of the kernel.

   That is an interesting argument, but not what we're discussing here.
   We're arguing filesystem access mediation.
  IOW, anything that AA cannot protect against is out of scope.  An easy
  escape from any criticism.
 
 I'm quite sure that this reply is not AA specific as you try to make it
 appear.

Every time we've noted an issue with AA, the answer has been that it is
out of scope.  Yet the public documentation for AA misrepresents the
situation and its comparisons with SELinux conveniently ignore its
limitations.

   Yes. Your use case is different than mine.
  My use case is being able to protect data reliably.  Yours?
 
 I want to restrict certain possibly untrusted applications and
 network-facing services from accessing certain file patterns, because as
 a user and admin, that's the mindset I'm used to. I might be interested
 in mediating other channels too, but the files are what I really care
 about. I'm inclined to trust the other processes.
 
 Your use case mandates complete system-wide mediation, because you want
 full data flow analysis. Mine doesn't.

Then yours isn't mandatory access control, nor is it confinement.  

-- 
Stephen Smalley
National Security Agency

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-22 Thread Stephen Smalley
On Fri, 2007-06-22 at 14:42 +0200, Lars Marowsky-Bree wrote:
 On 2007-06-22T07:53:47, Stephen Smalley [EMAIL PROTECTED] wrote:
 
   No the incomplete mediation does not flow from the design.  We have
   deliberately focused on doing the necessary modifications for pathname
   based mediation.  The IPC and network mediation are a wip.
  The fact that you have to go back to the drawing board for them is that
  you didn't get the abstraction right in the first place.
 
 That's an interesting claim, however I don't think it holds. AA was
 designed to mediate file access in a form which is intuitive to admins.
 
 It's to be expected that it doesn't directly apply to mediating other
 forms of access.
 
  I think we must have different understandings of the words generalize
  and analyzable.  Look, if I want to be able to state properties about
  data flow in the system for confidentiality or integrity goals (my
  secret data can never leak to unauthorized entities, my critical data
  can never be corrupted/tainted by unauthorized entities - directly or
  indirectly),
 
 I seem to think that this is not what AA is trying to do, so evaluating
 it in that context doesn't seem useful. It's like saying a screw driver
 isn't a hammer, so it is useless because you have a nail.

Again, in that case, please remove all uses of the terms mandatory
access control, confinement and integrity protection from AA
documentation and code.

-- 
Stephen Smalley
National Security Agency

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-22 Thread James Morris
On Fri, 22 Jun 2007, Chris Mason wrote:

  The validity or otherwise of pathname access control is not being 
  discussed here.
  
  The point is that the pathname model does not generalize, and that 
  AppArmor's inability to provide adequate coverage of the system is a 
  design issue arising from this.
 
 I'm sorry, but I don't see where in the paragraphs above you aren't
 making a general argument against the pathname model.

There are two distinct concepts here.

A. Pathname labeling - applying access control to pathnames to objects, 
rather than labeling the objects themselves.

Think of this as, say, securing your house by putting a gate in the street 
in front of the house, regardless of how many other possible paths there 
are to the house via other streets, adjoining properties etc.

Pathname labeling and mediation is simply mediating a well-known path to 
the object.  In this analogy, object labeling would instead ensure that 
all of the accessible doors, windows and other entrances of the house were 
locked, so that someone trying to break in from the rear alley would 
not get in simply by bypassing the front gate and opening any door.

What you do with AppArmor, instead of addressing the problem, is just 
redefine the environment along the lines of set your house into a rock 
wall so there is only one path to it.


B. Pathname access control as a general abstraction for OS security.

Which is what was being discussed above, in response to a question from 
Lars about technical issues, and that this _model_ doesn't generalize to 
the rest of the OS, regardless of whether you think the mechanism of 
pathname labeling itself is appropriate or not.

In any case, clarifying such a distinction should not obscure the central 
issue, which is: AppArmor's design is broken.

General users, many kernel developers, and even security researchers who 
have not yet looked under the covers [1], are probably unaware that the 
confinement claims being made about AppArmor's confinement capabilities 
are simply not possible with either its model or implementation.

To quote from:

http://www.novell.com/linux/security/apparmor/

  AppArmor gives you network application security via mandatory access 
   control for programs, protecting against the exploitation of software 
   flaws and compromised systems. AppArmor includes everything you need to 
   provide effective containment for programs (including those that run as 
   root) to thwart attempted exploits and even zero-day attacks.

This is not accurate in any sense of the term containment of mandatory 
access control that I've previously encountered.

The fact that it doesn't work as expected does not arise simply from 
missing features or being different.  It arises from the design of the 
system, which uses a pathname abstraction, where, even if we agree to 
ignore issue (1) above, still does not work, because only filesystem 
interactions are mediated.

AppArmor policy does not reflect the behavior of the system.

The simple policy that users can so effortlessly manipulate is simple 
because it is wrong, and deliberately so.

The design of the AppArmor is based on _appearing simple_, but at the 
expense of completeness and thus correctness.


[1] http://lkml.org/lkml/2007/4/19/357

  My gosh, you're right.  What the heck?  With all due respect to the
   developers of AppArmor, I can't help thinking that that's pretty lame.
   I think this raises substantial questions about the value of AppArmor.
   What is the point of having a jail if it leaves gaping holes that
   malicious code could use to escape?

   And why isn't this documented clearly, with the implications fully
   explained?  - David Wagner, http://www.cs.berkeley.edu/~daw/
 

Indeed.



- James
-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-22 Thread Stephen Smalley
On Fri, 2007-06-22 at 14:54 +0200, Lars Marowsky-Bree wrote:
 On 2007-06-22T08:41:51, Stephen Smalley [EMAIL PROTECTED] wrote:
  Sorry, do you mean the server as in the server system or as in the
  server daemon?  For the former, I'd agree - we would want SELinux
  policy applied on the server as well as the client to ensure that the
  data is being protected consistently throughout and that the server is
  not misrepresenting the security guarantees expected by the clients.
  Providing an illusion of confinement on each client without any
  corresponding protection on the server system would be very prone to
  bypass.  For the latter, the kernel can only truly confine application
  code, not in-kernel threads, although we can subject the in-kernel nfsd
  to permission checking as a robustness check.  We've always noted that
  SELinux does depend on the correctness of the kernel.
 
 Oh, you're saying that this threat is out-of-scope? ;-)

Kernel flaws aren't something we can address (reliably and in general)
via a kernel mechanism.  Versus a threat that can be addressed by kernel
mechanism, like providing complete mediation and unambiguous access
control.  There is a difference.  And we aren't ignoring the kernel
correctness problem - there is separate work in that space, but it
requires separate mechanism outside of SELinux itself.

 Note that here we've already strayed from the focus of the discussion;
 we're no longer arguing the implementation is ugly/broken, but you're
 claiming doesn't do what I need - which I'm not disagreeing with. It
 doesn't do what you want. Which is why you have SELinux, and it's going
 to stay. Fine.
 
 If we assume that the users who run it do have a need / use case for it
 which they can't solve differently, we should really get back to the
 discussion of how those needs can be met or provided by Linux in a
 feasible way.

Part of the discussion has been whether those users' needs could be
solved better via SELinux, whether via improved userspace tools (ala
seedit possibly with an enhanced restorecond) or via extended kernel
mechanism (ala named type transitions and some kind of inheritance
model).  It does however seem like there is a fundamental conflict there
on renaming behavior; I do not believe that file protections should
automatically change in the kernel upon rename and should require
explicit userspace action.  The question though is whether that renaming
behavior in AA is truly a user requirement or a manufactured (i.e.
made-up) one, and whether it is truly a runtime requirement or just a
policy creation time one (the latter being adequately addressed by
userspace relabeling).

If that behavior is truly needed (a point I wouldn't concede), then the
discussion does reduce down to the right implementation method.  There
it appears that the primary objection is to regenerating full pathname
strings and matching them versus some kind of incremental matching
either during lookup itself or via a reverse match as suggested.  That
discussion is principally one in which the vfs folks should be engaged,
not me.

   Your use case mandates complete system-wide mediation, because you want
   full data flow analysis. Mine doesn't.
  Then yours isn't mandatory access control, nor is it confinement.  
 
 I apologize for not using the word confinement in the way you expect
 it to be used. I certainly don't want to imply it does do things it
 doesn't. Keep in mind I'm not a native speaker, so nuances do get lost
 sometimes; nor do I have long years of experience in the security
 field. Thanks for clearing this up.
 
 So agreed, it is not confinement nor MAC.
 
 Would it be more appropriate if I used the word restricts or
 constrains?

Yes.  Or simply controls file accesses and capability usage.

-- 
Stephen Smalley
National Security Agency

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-22 Thread Stephen Smalley
On Fri, 2007-06-22 at 09:22 -0400, Stephen Smalley wrote:
 On Fri, 2007-06-22 at 14:54 +0200, Lars Marowsky-Bree wrote:
  On 2007-06-22T08:41:51, Stephen Smalley [EMAIL PROTECTED] wrote:
   Sorry, do you mean the server as in the server system or as in the
   server daemon?  For the former, I'd agree - we would want SELinux
   policy applied on the server as well as the client to ensure that the
   data is being protected consistently throughout and that the server is
   not misrepresenting the security guarantees expected by the clients.
   Providing an illusion of confinement on each client without any
   corresponding protection on the server system would be very prone to
   bypass.  For the latter, the kernel can only truly confine application
   code, not in-kernel threads, although we can subject the in-kernel nfsd
   to permission checking as a robustness check.  We've always noted that
   SELinux does depend on the correctness of the kernel.
  
  Oh, you're saying that this threat is out-of-scope? ;-)
 
 Kernel flaws aren't something we can address (reliably and in general)
 via a kernel mechanism.  Versus a threat that can be addressed by kernel
 mechanism, like providing complete mediation and unambiguous access
 control.  There is a difference.  And we aren't ignoring the kernel
 correctness problem - there is separate work in that space, but it
 requires separate mechanism outside of SELinux itself.
 
  Note that here we've already strayed from the focus of the discussion;
  we're no longer arguing the implementation is ugly/broken, but you're
  claiming doesn't do what I need - which I'm not disagreeing with. It
  doesn't do what you want. Which is why you have SELinux, and it's going
  to stay. Fine.
  
  If we assume that the users who run it do have a need / use case for it
  which they can't solve differently, we should really get back to the
  discussion of how those needs can be met or provided by Linux in a
  feasible way.
 
 Part of the discussion has been whether those users' needs could be
 solved better via SELinux, whether via improved userspace tools (ala
 seedit possibly with an enhanced restorecond) or via extended kernel
 mechanism (ala named type transitions and some kind of inheritance
 model).  It does however seem like there is a fundamental conflict there
 on renaming behavior; I do not believe that file protections should
 automatically change in the kernel upon rename and should require
 explicit userspace action.  The question though is whether that renaming
 behavior in AA is truly a user requirement or a manufactured (i.e.
 made-up) one, and whether it is truly a runtime requirement or just a
 policy creation time one (the latter being adequately addressed by
 userspace relabeling).

I suppose there is also a question of whether that kind of model
wouldn't be better implemented as an ACL model with implicit
inheritance, e.g. if you specify an ACL on a directory, then all files
accessed via that directory are controlled in accordance with that ACL
unless they have their own more specific ACL, and if you move one of
those files to a different directory, then they automatically pick up
the protection of the new parent.  Doesn't require the kernel to be
matching pathname strings, just walking up the tree to determine the
closest ancestor with an explicit ACL on it.

 
 If that behavior is truly needed (a point I wouldn't concede), then the
 discussion does reduce down to the right implementation method.  There
 it appears that the primary objection is to regenerating full pathname
 strings and matching them versus some kind of incremental matching
 either during lookup itself or via a reverse match as suggested.  That
 discussion is principally one in which the vfs folks should be engaged,
 not me.
 
Your use case mandates complete system-wide mediation, because you want
full data flow analysis. Mine doesn't.
   Then yours isn't mandatory access control, nor is it confinement.  
  
  I apologize for not using the word confinement in the way you expect
  it to be used. I certainly don't want to imply it does do things it
  doesn't. Keep in mind I'm not a native speaker, so nuances do get lost
  sometimes; nor do I have long years of experience in the security
  field. Thanks for clearing this up.
  
  So agreed, it is not confinement nor MAC.
  
  Would it be more appropriate if I used the word restricts or
  constrains?
 
 Yes.  Or simply controls file accesses and capability usage.
 
-- 
Stephen Smalley
National Security Agency

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-22 Thread James Morris
On Fri, 22 Jun 2007, Chris Mason wrote:

 But, this is a completely different discussion than if AA is
 solving problems in the wild for its intended audience, or if the code
 is somehow flawed and breaking other parts of the kernel.

Is its intended audience aware of its limitiations?  Lars has just 
acknowledged that it does not implement mandatory access control, for one.

Until people understand these issues, they certainly need to be addressed 
in the context of upstream merge.

 We've been over the AA is different discussion in threads about a
 billion times, and at the last kernel summit.

I don't believe that people at the summit were adequately informed on the 
issue, and from several accounts I've heard, Stephen Smalley was 
effectively cut off before he could even get to his second slide.

 I think Lars and others have done a pretty good job of describing the 
 problems they are trying to solve, can we please move on to discussing 
 technical issues around that?

Keep in mind that this current thread arose from Greg KH asking about 
whether AppArmor could effectively be implemented via SELinux and 
userspace labeling.

Some of us took the time to perform analysis and then provide feedback on 
this, in good faith.

The underlying issues only came up again in response to an inflammatory 
post by Lars.  If you want to avoid discussions of AppArmor's design, then 
I suggest taking it up with those who initiate them.



- James
-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-22 Thread Casey Schaufler

--- Stephen Smalley [EMAIL PROTECTED] wrote:

 
 Mandatory access control as historically understood has always meant
 system-wide.

Chapter and verse: TCSEC 3.1.1.4 Mandatory Access Control
  The TCB shall enforce a mandatory access control policy over
   all subjects and storage objects under its control.

Chapter and verse: TCSEC 3.2.1.4 Mandatory Access Control
  The TCB shall enforce a mandatory access control policy over
   all resources that are directly or indirectly accessible by
   subjects external to the TCB.

The first reference is in the definition of a B1 system, while the
second is for a B2 system. It was made clear to those of us
doing TCSEC evaluations that there is a very real distintion between
these two requirements. In the history that I've been through,
starting in 1987, the B1 requirement has been read to allow for
things that are not storage objects to be uncontrolled while the
B2 requirement does not. If everything is a storage object, which
is the approach everybody took, yes, you're looking at system wide.
If, on the other hand, you were to take a different model approach
and say that networking does not use storage objects you would not
have to account for them under the B1 rules, while you would still
have to for B2. Historically, no one succeeded with B2, and the
mindset of B1 prevailed. So, historically, the understanding was
that it was easier to declare everything a storage object and code
up some MAC for it than it was to provide a security model that
explained networking as it really works. I suggest that if AA wants
to declare that as far as they are concerned sockets, ports, and
packets are none of them storage objects but are instead pregnant
weasels that is their peragotive as system designers and that
a B1 evaluation team would have accepted that, provided sufficient
evidence and explaination of the relevence of pregnant weasels
was provided. It would not have worked at B2, but historically
everyone understood that B2 was out of reach.

Stephen is correct in that historically everyone implemented system
that put everything under MAC, but is not in that it was well
understood that if you could define something as not being a
storage object you didn't have to submit it to MAC. Then as now
it was easier to get people to code MAC than to get them to write
papers about the security properties of pregnant weasels.



Casey Schaufler
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-22 Thread Chris Mason
On Fri, Jun 22, 2007 at 10:23:03AM -0400, James Morris wrote:
 On Fri, 22 Jun 2007, Chris Mason wrote:
 
  But, this is a completely different discussion than if AA is
  solving problems in the wild for its intended audience, or if the code
  is somehow flawed and breaking other parts of the kernel.
 
 Is its intended audience aware of its limitiations?  Lars has just 
 acknowledged that it does not implement mandatory access control, for one.
 
 Until people understand these issues, they certainly need to be addressed 
 in the context of upstream merge.

It is definitely useful to clearly understand the intended AA use cases
during the merge.

 
  We've been over the AA is different discussion in threads about a
  billion times, and at the last kernel summit.
 
 I don't believe that people at the summit were adequately informed on the 
 issue, and from several accounts I've heard, Stephen Smalley was 
 effectively cut off before he could even get to his second slide.

I'm sure people there will have a different versions of events.  The
one part that was discussed was if pathname based security was
useful, and a number of the people in the room (outside of 
novell) said it was.  Now, it could be that nobody wanted to argue
anymore, since most opinions had come out on one list or another by
then.  

But as someone who doesn't use either SElinux or AA, I really hope
we can get past the part of the debate where:

while(1)
AA) we think we're making users happy with pathname security
SELINUX) pathname security sucks

So, yes Greg got it started and Lars is a well known trouble maker, and
I completely understand if you want to say no thank you to an selinux
based AA ;)  The models are different and it shouldn't be a requirement
that they try to use the same underlying mechanisms.

-chris

-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-22 Thread david

On Fri, 22 Jun 2007, James Morris wrote:


On Fri, 22 Jun 2007, Chris Mason wrote:


But, this is a completely different discussion than if AA is
solving problems in the wild for its intended audience, or if the code
is somehow flawed and breaking other parts of the kernel.


Is its intended audience aware of its limitiations?  Lars has just
acknowledged that it does not implement mandatory access control, for one.

Until people understand these issues, they certainly need to be addressed
in the context of upstream merge.


there are always going to be people who misunderstand things. by this 
logic nothing can ever be merged that can be misused.


at least some of the intended audience (including me) do understand the 
limits and still consider the result useful.


David Lang
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-22 Thread david

On Fri, 22 Jun 2007, Stephen Smalley wrote:


On Fri, 2007-06-22 at 01:06 -0700, John Johansen wrote:

On Thu, Jun 21, 2007 at 04:59:54PM -0400, Stephen Smalley wrote:

On Thu, 2007-06-21 at 21:54 +0200, Lars Marowsky-Bree wrote:

On 2007-06-21T15:42:28, James Morris [EMAIL PROTECTED] wrote:




And now, yes, I know AA doesn't mediate IPC or networking (yet), but
that's a missing feature, not broken by design.


The incomplete mediation flows from the design, since the pathname-based
mediation doesn't generalize to cover all objects unlike label- or
attribute-based mediation.  And the use the natural abstraction for
each object type approach likewise doesn't yield any general model or
anything that you can analyze systematically for data flow.


No the incomplete mediation does not flow from the design.  We have
deliberately focused on doing the necessary modifications for pathname
based mediation.  The IPC and network mediation are a wip.


The fact that you have to go back to the drawing board for them is that
you didn't get the abstraction right in the first place.


or it just means that the tool to regulat the network is different from 
the tool to regulate the filesystem.


oh, by the way. that's how the rest of the *nix world works. firewall 
rules apply to networking, filesystem permissions and ACLs apply to the 
filesystem.


this is like climing that the latest improvement to ipsec shouldn't go in 
becouse it down't allow you to handle things the same way that you handle 
temp files or a serial port.



We have never claimed to be using pathname-based mediation IPC or networking.
The natural abstraction approach does generize well enough and will
be analyzable.


I think we must have different understandings of the words generalize
and analyzable.  Look, if I want to be able to state properties about
data flow in the system for confidentiality or integrity goals (my
secret data can never leak to unauthorized entities, my critical data
can never be corrupted/tainted by unauthorized entities - directly or
indirectly), then I need to be able to have a common reference point for
my policy.  When my policy is based on different abstractions
(pathnames, IP addresses, window ids, whatever) for different objects,
then I can no longer identify how data can flow throughout the system in
a system-wide way.


if you are doing a system-wide analysis then you are correct.

the AA approach is to start with the exposed items and limit the damage 
that can be done to you.


sysadmins already think in terms of paths and what can access that path 
(directory permissions), AA extends this in a very natural way and doesn't 
require any special tools or extra steps for normal administration. As a 
result sysadmins are far more likely to use this then they are to touch 
anything that requires that they do a full system analysis before they 
start.


another advantage is that since the policies are independant of each other 
it becomes very easy for software to include sample policies with the 
source.



Um, no.  It might not be able to directly open files via that path, but
showing that it can never read or write your mail is a rather different
matter.


Actually it can be analyzed and shown.  It is ugly to do and we
currently don't have a tool capable of doing it, but it is possible.


No, it isn't possible when using ambiguous and unstable identifiers for
the subjects and objects, nor when mediation is incomplete.


it is possible to say that without assistance from an outside process the 
process cannot access the files containing your mail.


if there is some other method of accessing the content no filesystem-level 
thing can know about it (for example, if another process is a pop server 
that requires no password). but I don't beleive that SELinux policies as 
distributed by any vendor would prevent this (yes, it would be possible 
for a tailored policy to prevent it, but if the policy is so complex that 
only distro staff should touch it that doesn't matter in real life)


David Lang
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/26] ext3: remove extra IS_RDONLY() check

2007-06-22 Thread Dave Hansen

ext3_change_inode_journal_flag() is only called from one
location: ext3_ioctl(EXT3_IOC_SETFLAGS).  That ioctl
case already has a IS_RDONLY() call in it so this one
is superfluous.


Signed-off-by: Dave Hansen [EMAIL PROTECTED]
---

 lxc-dave/fs/ext3/inode.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -puN fs/ext3/inode.c~ext3-extra-ro-check fs/ext3/inode.c
--- lxc/fs/ext3/inode.c~ext3-extra-ro-check 2007-06-21 23:23:11.0 
-0700
+++ lxc-dave/fs/ext3/inode.c2007-06-21 23:23:12.0 -0700
@@ -3182,7 +3182,7 @@ int ext3_change_inode_journal_flag(struc
 */
 
journal = EXT3_JOURNAL(inode);
-   if (is_journal_aborted(journal) || IS_RDONLY(inode))
+   if (is_journal_aborted(journal))
return -EROFS;
 
journal_lock_updates(journal);
_
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/26] ext4: remove extra IS_RDONLY() check

2007-06-22 Thread Dave Hansen

ext4_change_inode_journal_flag() is only called from one
location: ext4_ioctl(EXT3_IOC_SETFLAGS).  That ioctl
case already has a IS_RDONLY() call in it so this one
is superfluous.


Signed-off-by: Dave Hansen [EMAIL PROTECTED]
---

 lxc-dave/fs/ext4/inode.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -puN fs/ext4/inode.c~ext4-extra-ro-check fs/ext4/inode.c
--- lxc/fs/ext4/inode.c~ext4-extra-ro-check 2007-06-21 23:23:12.0 
-0700
+++ lxc-dave/fs/ext4/inode.c2007-06-21 23:23:12.0 -0700
@@ -3196,7 +3196,7 @@ int ext4_change_inode_journal_flag(struc
 */
 
journal = EXT4_JOURNAL(inode);
-   if (is_journal_aborted(journal) || IS_RDONLY(inode))
+   if (is_journal_aborted(journal))
return -EROFS;
 
jbd2_journal_lock_updates(journal);
_
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/26] Mount writer count and read-only bind mounts

2007-06-22 Thread Dave Hansen
The largest change from last time is in patch 25/26.  It has
been reworked to use per-cpu lock instead of a per-node one.
See the patch for more details.

I would also appreciate any close review of patch 08/27.  It
touches a lot of code, and I'm a bit worried that a buglet or
two could have snuck in there.

Note that there are a few stragglers left using IS_RDONLY()
in reiser4 and in some other low-level filesystem code.  The
reiser4 asserts should be just fine left the way they are,
plus reiser4 needs quite a bit of work before merging anyway.
The ntfs usage appears to be internal, and not related to
user activity.

This patch survives running ltp as well as a home-grown set
of filesystem operations here:

http://sr71.net/~dave/linux/robind-test.sh

---

Why do we need r/o bind mounts?

This feature allows a read-only view into a read-write filesystem.
In the process of doing that, it also provides infrastructure for
keeping track of the number of writers to any given mount.

This has a number of uses.  It allows chroots to have parts of
filesystems writable.  It will be useful for containers in the future
because users may have root inside a container, but should not
be allowed to write to somefilesystems.  This also replaces 
patches that vserver has had out of the tree for several years.

It allows security enhancement by making sure that parts of
your filesystem read-only (such as when you don't trust your
FTP server), when you don't want to have entire new filesystems
mounted, or when you want atime selectively updated.
I've been using the following script to test that the feature is
working as desired.  It takes a directory and makes a regular
bind and a r/o bind mount of it.  It then performs some normal
filesystem operations on the three directories, including ones
that are expected to fail, like creating a file on the r/o
mount.


Signed-off-by: Dave Hansen [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/26] r/o bind mounts: elevate write count for some ioctls

2007-06-22 Thread Dave Hansen

Some ioctl()s can cause writes to the filesystem.  Take
these, and make them use mnt_want/drop_write() instead.

We need to pass the filp one layer deeper in XFS, but
somebody _just_ pulled it out in February because nobody
was using it, so I don't feel guilty for adding it back.

Signed-off-by: Dave Hansen [EMAIL PROTECTED]
---

 lxc-dave/fs/ext2/ioctl.c  |   46 +-
 lxc-dave/fs/ext3/ioctl.c  |  100 +---
 lxc-dave/fs/ext4/ioctl.c  |  105 +-
 lxc-dave/fs/fat/file.c|   10 +--
 lxc-dave/fs/hfsplus/ioctl.c   |   39 +++-
 lxc-dave/fs/jfs/ioctl.c   |   33 ++
 lxc-dave/fs/ocfs2/ioctl.c |   11 +--
 lxc-dave/fs/reiserfs/ioctl.c  |   53 +++--
 lxc-dave/fs/xfs/linux-2.6/xfs_ioctl.c |   15 +++-
 lxc-dave/fs/xfs/linux-2.6/xfs_iops.c  |7 --
 lxc-dave/fs/xfs/linux-2.6/xfs_lrw.c   |9 ++
 11 files changed, 272 insertions(+), 156 deletions(-)

diff -puN fs/ext2/ioctl.c~05-24-ioctl-mnt-takers fs/ext2/ioctl.c
--- lxc/fs/ext2/ioctl.c~05-24-ioctl-mnt-takers  2007-06-22 10:05:25.0 
-0700
+++ lxc-dave/fs/ext2/ioctl.c2007-06-22 10:58:25.0 -0700
@@ -12,6 +12,7 @@
 #include linux/time.h
 #include linux/sched.h
 #include linux/compat.h
+#include linux/mount.h
 #include linux/smp_lock.h
 #include asm/current.h
 #include asm/uaccess.h
@@ -22,6 +23,7 @@ int ext2_ioctl (struct inode * inode, st
 {
struct ext2_inode_info *ei = EXT2_I(inode);
unsigned int flags;
+   int ret;
 
ext2_debug (cmd = %u, arg = %lu\n, cmd, arg);
 
@@ -32,14 +34,19 @@ int ext2_ioctl (struct inode * inode, st
case EXT2_IOC_SETFLAGS: {
unsigned int oldflags;
 
-   if (IS_RDONLY(inode))
-   return -EROFS;
-
-   if ((current-fsuid != inode-i_uid)  !capable(CAP_FOWNER))
-   return -EACCES;
+   ret = mnt_want_write(filp-f_vfsmnt);
+   if (ret)
+   return ret;
+
+   if ((current-fsuid != inode-i_uid)  !capable(CAP_FOWNER)) {
+   ret = -EACCES;
+   goto setflags_out;
+   }
 
-   if (get_user(flags, (int __user *) arg))
-   return -EFAULT;
+   if (get_user(flags, (int __user *) arg)) {
+   ret = -EFAULT;
+   goto setflags_out;
+   }
 
if (!S_ISDIR(inode-i_mode))
flags = ~EXT2_DIRSYNC_FL;
@@ -56,7 +63,8 @@ int ext2_ioctl (struct inode * inode, st
if ((flags ^ oldflags)  (EXT2_APPEND_FL | EXT2_IMMUTABLE_FL)) {
if (!capable(CAP_LINUX_IMMUTABLE)) {
mutex_unlock(inode-i_mutex);
-   return -EPERM;
+   ret = -EPERM;
+   goto setflags_out;
}
}
 
@@ -68,20 +76,26 @@ int ext2_ioctl (struct inode * inode, st
ext2_set_inode_flags(inode);
inode-i_ctime = CURRENT_TIME_SEC;
mark_inode_dirty(inode);
-   return 0;
+setflags_out:
+   mnt_drop_write(filp-f_vfsmnt);
+   return ret;
}
case EXT2_IOC_GETVERSION:
return put_user(inode-i_generation, (int __user *) arg);
case EXT2_IOC_SETVERSION:
if ((current-fsuid != inode-i_uid)  !capable(CAP_FOWNER))
return -EPERM;
-   if (IS_RDONLY(inode))
-   return -EROFS;
-   if (get_user(inode-i_generation, (int __user *) arg))
-   return -EFAULT; 
-   inode-i_ctime = CURRENT_TIME_SEC;
-   mark_inode_dirty(inode);
-   return 0;
+   ret = mnt_want_write(filp-f_vfsmnt);
+   if (ret)
+   return ret;
+   if (get_user(inode-i_generation, (int __user *) arg)) {
+   ret = -EFAULT;
+   } else {
+   inode-i_ctime = CURRENT_TIME_SEC;
+   mark_inode_dirty(inode);
+   }
+   mnt_drop_write(filp-f_vfsmnt);
+   return ret;
default:
return -ENOTTY;
}
diff -puN fs/ext3/ioctl.c~05-24-ioctl-mnt-takers fs/ext3/ioctl.c
--- lxc/fs/ext3/ioctl.c~05-24-ioctl-mnt-takers  2007-06-22 10:05:25.0 
-0700
+++ lxc-dave/fs/ext3/ioctl.c2007-06-22 10:59:02.0 -0700
@@ -12,6 +12,7 @@
 #include linux/capability.h
 #include linux/ext3_fs.h
 #include linux/ext3_jbd.h
+#include linux/mount.h
 #include linux/time.h
 #include linux/compat.h
 #include linux/smp_lock.h
@@ -37,14 +38,19 @@ int ext3_ioctl (struct inode * inode, st
unsigned int oldflags;
  

[PATCH 09/26] make access() use mnt check

2007-06-22 Thread Dave Hansen

It is OK to let access() go without using a mnt_want/drop_write()
pair because it doesn't actually do writes to the filesystem,
and it is inherently racy anyway.  This is a rare case when it is
OK to use __mnt_is_readonly() directly.

Signed-off-by: Dave Hansen [EMAIL PROTECTED]
---

 lxc-dave/fs/open.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -puN fs/open.c~make-access-use-helper fs/open.c
--- lxc/fs/open.c~make-access-use-helper2007-06-21 23:23:17.0 
-0700
+++ lxc-dave/fs/open.c  2007-06-21 23:23:17.0 -0700
@@ -395,7 +395,7 @@ asmlinkage long sys_faccessat(int dfd, c
   special_file(nd.dentry-d_inode-i_mode))
goto out_path_release;
 
-   if(IS_RDONLY(nd.dentry-d_inode))
+   if (__mnt_is_readonly(nd.mnt))
res = -EROFS;
 
 out_path_release:
_
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/26] filesystem helpers for custom 'struct file's

2007-06-22 Thread Dave Hansen

Christoph H. says this stands on its own and can go in before the
rest of the r/o bind mount set.  

---

Some filesystems forego the vfs and may_open() and create their
own 'struct file's.

This patch creates a couple of helper functions which can be
used by these filesystems, and will provide a unified place
which the r/o bind mount code may patch.

Also, rename two existing, static-scope init_file() to less
generic names.

Signed-off-by: Dave Hansen [EMAIL PROTECTED]
---

 lxc-dave/fs/configfs/dir.c|5 +++--
 lxc-dave/fs/file_table.c  |   34 ++
 lxc-dave/fs/hugetlbfs/inode.c |   22 +-
 lxc-dave/fs/sysfs/dir.c   |4 ++--
 lxc-dave/include/linux/file.h |9 +
 lxc-dave/ipc/shm.c|   13 +
 lxc-dave/mm/shmem.c   |7 ++-
 lxc-dave/mm/tiny-shmem.c  |   19 +++
 lxc-dave/net/socket.c |   19 ++-
 9 files changed, 81 insertions(+), 51 deletions(-)

diff -puN fs/configfs/dir.c~01-24-filesystem-helpers-for-custom-struct-file-s 
fs/configfs/dir.c
--- lxc/fs/configfs/dir.c~01-24-filesystem-helpers-for-custom-struct-file-s 
2007-06-21 23:23:13.0 -0700
+++ lxc-dave/fs/configfs/dir.c  2007-06-21 23:23:13.0 -0700
@@ -142,7 +142,7 @@ static int init_dir(struct inode * inode
return 0;
 }
 
-static int init_file(struct inode * inode)
+static int configfs_init_file(struct inode * inode)
 {
inode-i_size = PAGE_SIZE;
inode-i_fop = configfs_file_operations;
@@ -283,7 +283,8 @@ static int configfs_attach_attr(struct c
 
dentry-d_fsdata = configfs_get(sd);
sd-s_dentry = dentry;
-   error = configfs_create(dentry, (attr-ca_mode  S_IALLUGO) | S_IFREG, 
init_file);
+   error = configfs_create(dentry, (attr-ca_mode  S_IALLUGO) | S_IFREG,
+   configfs_init_file);
if (error) {
configfs_put(sd);
return error;
diff -puN fs/file_table.c~01-24-filesystem-helpers-for-custom-struct-file-s 
fs/file_table.c
--- lxc/fs/file_table.c~01-24-filesystem-helpers-for-custom-struct-file-s   
2007-06-21 23:23:13.0 -0700
+++ lxc-dave/fs/file_table.c2007-06-21 23:23:13.0 -0700
@@ -139,6 +139,40 @@ fail:
 
 EXPORT_SYMBOL(get_empty_filp);
 
+struct file *alloc_file(struct vfsmount *mnt, struct dentry *dentry,
+   mode_t mode, const struct file_operations *fop)
+{
+   struct file *file;
+   struct path;
+
+   file = get_empty_filp();
+   if (!file)
+   return NULL;
+
+   init_file(file, mnt, dentry, mode, fop);
+   return file;
+}
+EXPORT_SYMBOL(alloc_file);
+
+/*
+ * Note: This is a crappy interface.  It is here to make
+ * merging with the existing users of get_empty_filp()
+ * who have complex failure logic easier.  All users
+ * of this should be moving to alloc_file().
+ */
+int init_file(struct file *file, struct vfsmount *mnt, struct dentry *dentry,
+  mode_t mode, const struct file_operations *fop)
+{
+   int error = 0;
+   file-f_path.dentry = dentry;
+   file-f_path.mnt = mntget(mnt);
+   file-f_mapping = dentry-d_inode-i_mapping;
+   file-f_mode = mode;
+   file-f_op = fop;
+   return error;
+}
+EXPORT_SYMBOL(init_file);
+
 void fastcall fput(struct file *file)
 {
if (atomic_dec_and_test(file-f_count))
diff -puN 
fs/hugetlbfs/inode.c~01-24-filesystem-helpers-for-custom-struct-file-s 
fs/hugetlbfs/inode.c
--- lxc/fs/hugetlbfs/inode.c~01-24-filesystem-helpers-for-custom-struct-file-s  
2007-06-21 23:23:13.0 -0700
+++ lxc-dave/fs/hugetlbfs/inode.c   2007-06-21 23:23:13.0 -0700
@@ -759,16 +759,11 @@ struct file *hugetlb_zero_setup(size_t s
if (!dentry)
goto out_shm_unlock;
 
-   error = -ENFILE;
-   file = get_empty_filp();
-   if (!file)
-   goto out_dentry;
-
error = -ENOSPC;
inode = hugetlbfs_get_inode(root-d_sb, current-fsuid,
current-fsgid, S_IFREG | S_IRWXUGO, 0);
if (!inode)
-   goto out_file;
+   goto out_dentry;
 
error = -ENOMEM;
if (hugetlb_reserve_pages(inode, 0, size  HPAGE_SHIFT))
@@ -777,17 +772,18 @@ struct file *hugetlb_zero_setup(size_t s
d_instantiate(dentry, inode);
inode-i_size = size;
inode-i_nlink = 0;
-   file-f_path.mnt = mntget(hugetlbfs_vfsmount);
-   file-f_path.dentry = dentry;
-   file-f_mapping = inode-i_mapping;
-   file-f_op = hugetlbfs_file_operations;
-   file-f_mode = FMODE_WRITE | FMODE_READ;
+
+   error = -ENFILE;
+   file = alloc_file(hugetlbfs_vfsmount, dentry,
+   FMODE_WRITE | FMODE_READ,
+   hugetlbfs_file_operations);
+   if (!file)
+   goto out_inode;
+
return file;
 
 out_inode:
iput(inode);
-out_file:
-   

[PATCH 08/26] elevate writer count for chown and friends

2007-06-22 Thread Dave Hansen


chown/chmod,etc... don't call permission in the same way
that the normal open for write calls do.  They still
write to the filesystem, so bump the write count during
these operations.

Signed-off-by: Dave Hansen [EMAIL PROTECTED]
---

 lxc-dave/fs/open.c |   39 ++-
 1 file changed, 30 insertions(+), 9 deletions(-)

diff -puN fs/open.c~06-24-elevate-writer-count-for-chown-and-friends fs/open.c
--- lxc/fs/open.c~06-24-elevate-writer-count-for-chown-and-friends  
2007-06-21 23:23:16.0 -0700
+++ lxc-dave/fs/open.c  2007-06-21 23:23:16.0 -0700
@@ -508,12 +508,12 @@ asmlinkage long sys_fchmod(unsigned int 
 
audit_inode(NULL, inode);
 
-   err = -EROFS;
-   if (IS_RDONLY(inode))
+   err = mnt_want_write(file-f_vfsmnt);
+   if (err)
goto out_putf;
err = -EPERM;
if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
-   goto out_putf;
+   goto out_drop_write;
mutex_lock(inode-i_mutex);
if (mode == (mode_t) -1)
mode = inode-i_mode;
@@ -522,6 +522,8 @@ asmlinkage long sys_fchmod(unsigned int 
err = notify_change(dentry, newattrs);
mutex_unlock(inode-i_mutex);
 
+out_drop_write:
+   mnt_drop_write(file-f_vfsmnt);
 out_putf:
fput(file);
 out:
@@ -541,13 +543,13 @@ asmlinkage long sys_fchmodat(int dfd, co
goto out;
inode = nd.dentry-d_inode;
 
-   error = -EROFS;
-   if (IS_RDONLY(inode))
+   error = mnt_want_write(nd.mnt);
+   if (error)
goto dput_and_out;
 
error = -EPERM;
if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
-   goto dput_and_out;
+   goto out_drop_write;
 
mutex_lock(inode-i_mutex);
if (mode == (mode_t) -1)
@@ -557,6 +559,8 @@ asmlinkage long sys_fchmodat(int dfd, co
error = notify_change(nd.dentry, newattrs);
mutex_unlock(inode-i_mutex);
 
+out_drop_write:
+   mnt_drop_write(nd.mnt);
 dput_and_out:
path_release(nd);
 out:
@@ -579,9 +583,6 @@ static int chown_common(struct dentry * 
printk(KERN_ERR chown_common: NULL inode\n);
goto out;
}
-   error = -EROFS;
-   if (IS_RDONLY(inode))
-   goto out;
error = -EPERM;
if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
goto out;
@@ -611,7 +612,12 @@ asmlinkage long sys_chown(const char __u
error = user_path_walk(filename, nd);
if (error)
goto out;
+   error = mnt_want_write(nd.mnt);
+   if (error)
+   goto out_release;
error = chown_common(nd.dentry, user, group);
+   mnt_drop_write(nd.mnt);
+out_release:
path_release(nd);
 out:
return error;
@@ -631,7 +637,12 @@ asmlinkage long sys_fchownat(int dfd, co
error = __user_walk_fd(dfd, filename, follow, nd);
if (error)
goto out;
+   error = mnt_want_write(nd.mnt);
+   if (error)
+   goto out_release;
error = chown_common(nd.dentry, user, group);
+   mnt_drop_write(nd.mnt);
+out_release:
path_release(nd);
 out:
return error;
@@ -645,7 +656,12 @@ asmlinkage long sys_lchown(const char __
error = user_path_walk_link(filename, nd);
if (error)
goto out;
+   error = mnt_want_write(nd.mnt);
+   if (error)
+   goto out_release;
error = chown_common(nd.dentry, user, group);
+   mnt_drop_write(nd.mnt);
+out_release:
path_release(nd);
 out:
return error;
@@ -662,9 +678,14 @@ asmlinkage long sys_fchown(unsigned int 
if (!file)
goto out;
 
+   error = mnt_want_write(file-f_vfsmnt);
+   if (error)
+   goto out_fput;
dentry = file-f_path.dentry;
audit_inode(NULL, dentry-d_inode);
error = chown_common(dentry, user, group);
+   mnt_drop_write(file-f_vfsmnt);
+out_fput:
fput(file);
 out:
return error;
_
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/26] elevate write count during entire ncp_ioctl()

2007-06-22 Thread Dave Hansen


Some ioctls need write access, but others don't.  Make a helper
function to decide when write access is needed, and take it.

Signed-off-by: Dave Hansen [EMAIL PROTECTED]
---

 lxc-dave/fs/ncpfs/ioctl.c |   55 +-
 1 file changed, 54 insertions(+), 1 deletion(-)

diff -puN fs/ncpfs/ioctl.c~08-24-elevate-write-count-during-entire-ncp-ioctl 
fs/ncpfs/ioctl.c
--- lxc/fs/ncpfs/ioctl.c~08-24-elevate-write-count-during-entire-ncp-ioctl  
2007-06-21 23:23:18.0 -0700
+++ lxc-dave/fs/ncpfs/ioctl.c   2007-06-21 23:23:18.0 -0700
@@ -14,6 +14,7 @@
 #include linux/ioctl.h
 #include linux/time.h
 #include linux/mm.h
+#include linux/mount.h
 #include linux/highuid.h
 #include linux/smp_lock.h
 #include linux/vmalloc.h
@@ -260,7 +261,7 @@ ncp_get_charsets(struct ncp_server* serv
 }
 #endif /* CONFIG_NCPFS_NLS */
 
-int ncp_ioctl(struct inode *inode, struct file *filp,
+static int __ncp_ioctl(struct inode *inode, struct file *filp,
  unsigned int cmd, unsigned long arg)
 {
struct ncp_server *server = NCP_SERVER(inode);
@@ -821,6 +822,58 @@ outrel:
return -EINVAL;
 }
 
+static int ncp_ioctl_need_write(unsigned int cmd)
+{
+   switch (cmd) {
+   case NCP_IOC_GET_FS_INFO:
+   case NCP_IOC_GET_FS_INFO_V2:
+   case NCP_IOC_NCPREQUEST:
+   case NCP_IOC_SETDENTRYTTL:
+   case NCP_IOC_SIGN_INIT:
+   case NCP_IOC_LOCKUNLOCK:
+   case NCP_IOC_SET_SIGN_WANTED:
+   return 1;
+   case NCP_IOC_GETOBJECTNAME:
+   case NCP_IOC_SETOBJECTNAME:
+   case NCP_IOC_GETPRIVATEDATA:
+   case NCP_IOC_SETPRIVATEDATA:
+   case NCP_IOC_SETCHARSETS:
+   case NCP_IOC_GETCHARSETS:
+   case NCP_IOC_CONN_LOGGED_IN:
+   case NCP_IOC_GETDENTRYTTL:
+   case NCP_IOC_GETMOUNTUID2:
+   case NCP_IOC_SIGN_WANTED:
+   case NCP_IOC_GETROOT:
+   case NCP_IOC_SETROOT:
+   return 0;
+   default:
+   /* unkown IOCTL command, assume write */
+   WARN_ON(1);
+   }
+   return 1;
+}
+
+int ncp_ioctl(struct inode *inode, struct file *filp,
+ unsigned int cmd, unsigned long arg)
+{
+   int ret;
+
+   if (ncp_ioctl_need_write(cmd)) {
+   /*
+* inside the ioctl(), any failures which
+* are because of file_permission() are
+* -EACCESS, so it seems consistent to keep
+*  that here.
+*/
+   if (mnt_want_write(filp-f_vfsmnt))
+   return -EACCES;
+   }
+   ret = __ncp_ioctl(inode, filp, cmd, arg);
+   if (ncp_ioctl_need_write(cmd))
+   mnt_drop_write(filp-f_vfsmnt);
+   return ret;
+}
+
 #ifdef CONFIG_COMPAT
 long ncp_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 {
_
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/26] elevate write count for link and symlink calls

2007-06-22 Thread Dave Hansen


Signed-off-by: Dave Hansen [EMAIL PROTECTED]
---

 lxc-dave/fs/namei.c |   10 ++
 1 file changed, 10 insertions(+)

diff -puN fs/namei.c~09-24-elevate-write-count-for-link-and-symlink-calls 
fs/namei.c
--- lxc/fs/namei.c~09-24-elevate-write-count-for-link-and-symlink-calls 
2007-06-21 23:23:19.0 -0700
+++ lxc-dave/fs/namei.c 2007-06-21 23:23:19.0 -0700
@@ -2244,7 +2244,12 @@ asmlinkage long sys_symlinkat(const char
if (IS_ERR(dentry))
goto out_unlock;
 
+   error = mnt_want_write(nd.mnt);
+   if (error)
+   goto out_dput;
error = vfs_symlink(nd.dentry-d_inode, dentry, from, S_IALLUGO);
+   mnt_drop_write(nd.mnt);
+out_dput:
dput(dentry);
 out_unlock:
mutex_unlock(nd.dentry-d_inode-i_mutex);
@@ -2339,7 +2344,12 @@ asmlinkage long sys_linkat(int olddfd, c
error = PTR_ERR(new_dentry);
if (IS_ERR(new_dentry))
goto out_unlock;
+   error = mnt_want_write(nd.mnt);
+   if (error)
+   goto out_dput;
error = vfs_link(old_nd.dentry, nd.dentry-d_inode, new_dentry);
+   mnt_drop_write(nd.mnt);
+out_dput:
dput(new_dentry);
 out_unlock:
mutex_unlock(nd.dentry-d_inode-i_mutex);
_
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/26] elevate write count open()'d files

2007-06-22 Thread Dave Hansen

This is the first really tricky patch in the series.  It
elevates the writer count on a mount each time a
non-special file is opened for write.

This is not completely apparent in the patch because the
two if() conditions in may_open() above the
mnt_want_write() call are, combined, equivalent to
special_file().

There is also an elevated count around the vfs_create()
call in open_namei().  The count needs to be kept elevated
all the way into the may_open() call.  Otherwise, when the
write is dropped, a ro-rw transisition could occur.  This
would lead to having rw access on the newly created file,
while the vfsmount is ro.  That is bad.

Some filesystems forego the use of normal vfs calls to create
struct files.  Make sure that these users elevate the mnt writer
count because they will get __fput(), and we need to make
sure they're balanced.

Signed-off-by: Dave Hansen [EMAIL PROTECTED]
---

 lxc-dave/fs/file_table.c |9 -
 lxc-dave/fs/namei.c  |   20 
 lxc-dave/ipc/mqueue.c|3 +++
 3 files changed, 27 insertions(+), 5 deletions(-)

diff -puN fs/file_table.c~14-24-tricky-elevate-write-count-files-are-open-ed 
fs/file_table.c
--- lxc/fs/file_table.c~14-24-tricky-elevate-write-count-files-are-open-ed  
2007-06-22 10:00:02.0 -0700
+++ lxc-dave/fs/file_table.c2007-06-22 10:00:02.0 -0700
@@ -169,6 +169,10 @@ int init_file(struct file *file, struct 
file-f_mapping = dentry-d_inode-i_mapping;
file-f_mode = mode;
file-f_op = fop;
+   if (mode  FMODE_WRITE) {
+   error = mnt_want_write(mnt);
+   WARN_ON(error);
+   }
return error;
 }
 EXPORT_SYMBOL(init_file);
@@ -206,8 +210,11 @@ void fastcall __fput(struct file *file)
if (unlikely(S_ISCHR(inode-i_mode)  inode-i_cdev != NULL))
cdev_put(inode-i_cdev);
fops_put(file-f_op);
-   if (file-f_mode  FMODE_WRITE)
+   if (file-f_mode  FMODE_WRITE) {
put_write_access(inode);
+   if (!special_file(inode-i_mode))
+   mnt_drop_write(mnt);
+   }
put_pid(file-f_owner.pid);
file_kill(file);
file-f_path.dentry = NULL;
diff -puN fs/namei.c~14-24-tricky-elevate-write-count-files-are-open-ed 
fs/namei.c
--- lxc/fs/namei.c~14-24-tricky-elevate-write-count-files-are-open-ed   
2007-06-22 10:00:02.0 -0700
+++ lxc-dave/fs/namei.c 2007-06-22 10:00:02.0 -0700
@@ -1544,8 +1544,15 @@ int may_open(struct nameidata *nd, int a
return -EACCES;
 
flag = ~O_TRUNC;
-   } else if (IS_RDONLY(inode)  (flag  FMODE_WRITE))
-   return -EROFS;
+   } else if (flag  FMODE_WRITE) {
+   /*
+* effectively: !special_file()
+* balanced by __fput()
+*/
+   error = mnt_want_write(nd-mnt);
+   if (error)
+   return error;
+   }
/*
 * An append-only file must be opened in append mode for writing.
 */
@@ -1684,14 +1691,17 @@ do_last:
}
 
if (IS_ERR(nd-intent.open.file)) {
-   mutex_unlock(dir-d_inode-i_mutex);
error = PTR_ERR(nd-intent.open.file);
-   goto exit_dput;
+   goto exit_mutex_unlock;
}
 
/* Negative dentry, just create the file */
if (!path.dentry-d_inode) {
+   error = mnt_want_write(nd-mnt);
+   if (error)
+   goto exit_mutex_unlock;
error = open_namei_create(nd, path, flag, mode);
+   mnt_drop_write(nd-mnt);
if (error)
goto exit;
return 0;
@@ -1729,6 +1739,8 @@ ok:
goto exit;
return 0;
 
+exit_mutex_unlock:
+   mutex_unlock(dir-d_inode-i_mutex);
 exit_dput:
dput_path(path, nd);
 exit:
diff -puN ipc/mqueue.c~14-24-tricky-elevate-write-count-files-are-open-ed 
ipc/mqueue.c
--- lxc/ipc/mqueue.c~14-24-tricky-elevate-write-count-files-are-open-ed 
2007-06-22 10:00:02.0 -0700
+++ lxc-dave/ipc/mqueue.c   2007-06-22 10:00:02.0 -0700
@@ -687,6 +687,9 @@ asmlinkage long sys_mq_open(const char _
goto out;
filp = do_open(dentry, oflag);
} else {
+   error = mnt_want_write(mqueue_mnt);
+   if (error)
+   goto out;
filp = do_create(mqueue_mnt-mnt_root, dentry,
oflag, mode, u_attr);
}
diff -puN fs/ext2/ioctl.c~14-24-tricky-elevate-write-count-files-are-open-ed 
fs/ext2/ioctl.c
_
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/26] document nlink function

2007-06-22 Thread Dave Hansen

These should have been documented from the beginning.  Fix it.

Signed-off-by: Dave Hansen [EMAIL PROTECTED]
---

 lxc-dave/include/linux/fs.h |   27 +++
 1 file changed, 27 insertions(+)

diff -puN include/linux/fs.h~document-nlink-funcs include/linux/fs.h
--- lxc/include/linux/fs.h~document-nlink-funcs 2007-06-21 23:23:10.0 
-0700
+++ lxc-dave/include/linux/fs.h 2007-06-21 23:23:10.0 -0700
@@ -1198,6 +1198,14 @@ static inline void mark_inode_dirty_sync
__mark_inode_dirty(inode, I_DIRTY_SYNC);
 }
 
+/**
+ * inc_nlink - directly increment an inode's link count
+ * @inode: inode
+ *
+ * This is a low-level filesystem helper to replace any
+ * direct filesystem manipulation of i_nlink.  Currently,
+ * it is only here for parity with dec_nlink().
+ */
 static inline void inc_nlink(struct inode *inode)
 {
inode-i_nlink++;
@@ -1209,11 +1217,30 @@ static inline void inode_inc_link_count(
mark_inode_dirty(inode);
 }
 
+/**
+ * drop_nlink - directly drop an inode's link count
+ * @inode: inode
+ *
+ * This is a low-level filesystem helper to replace any
+ * direct filesystem manipulation of i_nlink.  In cases
+ * where we are attempting to track writes to the
+ * filesystem, a decrement to zero means an imminent
+ * write when the file is truncated and actually unlinked
+ * on the filesystem.
+ */
 static inline void drop_nlink(struct inode *inode)
 {
inode-i_nlink--;
 }
 
+/**
+ * clear_nlink - directly zero an inode's link count
+ * @inode: inode
+ *
+ * This is a low-level filesystem helper to replace any
+ * direct filesystem manipulation of i_nlink.  See
+ * drop_nlink() for why we care about i_nlink hitting zero.
+ */
 static inline void clear_nlink(struct inode *inode)
 {
inode-i_nlink = 0;
_
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 15/26] mount_is_safe(): add comment

2007-06-22 Thread Dave Hansen


This area of code is currently #ifdef'd out, so add a comment
for the time when it is actually used.

Signed-off-by: Dave Hansen [EMAIL PROTECTED]
---

 lxc-dave/fs/namespace.c |4 
 1 file changed, 4 insertions(+)

diff -puN fs/namespace.c~11-24-mount-is-safe-add-comment fs/namespace.c
--- lxc/fs/namespace.c~11-24-mount-is-safe-add-comment  2007-06-21 
23:23:20.0 -0700
+++ lxc-dave/fs/namespace.c 2007-06-21 23:23:20.0 -0700
@@ -691,6 +691,10 @@ static int mount_is_safe(struct nameidat
if (current-uid != nd-dentry-d_inode-i_uid)
return -EPERM;
}
+   /*
+* We will eventually check for the mnt-writer_count here,
+* but since the code is not used now, skip it - Dave Hansen
+*/
if (vfs_permission(nd, MAY_WRITE))
return -EPERM;
return 0;
_
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 20/26] elevate write count for do_utimes()

2007-06-22 Thread Dave Hansen


Signed-off-by: Dave Hansen [EMAIL PROTECTED]
---

 lxc-dave/fs/utimes.c |   27 ---
 1 file changed, 16 insertions(+), 11 deletions(-)

diff -puN fs/utimes.c~16-24-elevate-write-count-for-do-utimes fs/utimes.c
--- lxc/fs/utimes.c~16-24-elevate-write-count-for-do-utimes 2007-06-21 
23:23:24.0 -0700
+++ lxc-dave/fs/utimes.c2007-06-21 23:23:24.0 -0700
@@ -1,6 +1,7 @@
 #include linux/compiler.h
 #include linux/fs.h
 #include linux/linkage.h
+#include linux/mount.h
 #include linux/namei.h
 #include linux/sched.h
 #include linux/utime.h
@@ -32,8 +33,8 @@ asmlinkage long sys_utime(char __user * 
goto out;
inode = nd.dentry-d_inode;
 
-   error = -EROFS;
-   if (IS_RDONLY(inode))
+   error = mnt_want_write(nd.mnt);
+   if (error)
goto dput_and_out;
 
/* Don't worry, the checks are done in inode_change_ok() */
@@ -41,7 +42,7 @@ asmlinkage long sys_utime(char __user * 
if (times) {
error = -EPERM;
if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
-   goto dput_and_out;
+   goto mnt_drop_write_and_out;
 
error = get_user(newattrs.ia_atime.tv_sec, times-actime);
newattrs.ia_atime.tv_nsec = 0;
@@ -49,21 +50,23 @@ asmlinkage long sys_utime(char __user * 
error = get_user(newattrs.ia_mtime.tv_sec, 
times-modtime);
newattrs.ia_mtime.tv_nsec = 0;
if (error)
-   goto dput_and_out;
+   goto mnt_drop_write_and_out;
 
newattrs.ia_valid |= ATTR_ATIME_SET | ATTR_MTIME_SET;
} else {
 error = -EACCES;
 if (IS_IMMUTABLE(inode))
-goto dput_and_out;
+   goto mnt_drop_write_and_out;
 
if (current-fsuid != inode-i_uid 
(error = vfs_permission(nd, MAY_WRITE)) != 0)
-   goto dput_and_out;
+   goto mnt_drop_write_and_out;
}
mutex_lock(inode-i_mutex);
error = notify_change(nd.dentry, newattrs);
mutex_unlock(inode-i_mutex);
+mnt_drop_write_and_out:
+   mnt_drop_write(nd.mnt);
 dput_and_out:
path_release(nd);
 out:
@@ -89,8 +92,8 @@ long do_utimes(int dfd, char __user *fil
goto out;
inode = nd.dentry-d_inode;
 
-   error = -EROFS;
-   if (IS_RDONLY(inode))
+   error = mnt_want_write(nd.mnt);
+   if (error)
goto dput_and_out;
 
/* Don't worry, the checks are done in inode_change_ok() */
@@ -98,7 +101,7 @@ long do_utimes(int dfd, char __user *fil
if (times) {
error = -EPERM;
 if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
-goto dput_and_out;
+   goto mnt_drop_write_and_out;
 
newattrs.ia_atime.tv_sec = times[0].tv_sec;
newattrs.ia_atime.tv_nsec = times[0].tv_usec * 1000;
@@ -108,15 +111,17 @@ long do_utimes(int dfd, char __user *fil
} else {
error = -EACCES;
 if (IS_IMMUTABLE(inode))
-goto dput_and_out;
+   goto mnt_drop_write_and_out;
 
if (current-fsuid != inode-i_uid 
(error = vfs_permission(nd, MAY_WRITE)) != 0)
-   goto dput_and_out;
+   goto mnt_drop_write_and_out;
}
mutex_lock(inode-i_mutex);
error = notify_change(nd.dentry, newattrs);
mutex_unlock(inode-i_mutex);
+mnt_drop_write_and_out:
+   mnt_drop_write(nd.mnt);
 dput_and_out:
path_release(nd);
 out:
_
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 19/26] elevate writer count for do_sys_truncate()

2007-06-22 Thread Dave Hansen



Signed-off-by: Dave Hansen [EMAIL PROTECTED]
---

 lxc-dave/fs/open.c |   16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff -puN fs/open.c~15-24-elevate-writer-count-for-do-sys-truncate fs/open.c
--- lxc/fs/open.c~15-24-elevate-writer-count-for-do-sys-truncate
2007-06-21 23:23:23.0 -0700
+++ lxc-dave/fs/open.c  2007-06-21 23:23:23.0 -0700
@@ -241,28 +241,28 @@ static long do_sys_truncate(const char _
if (!S_ISREG(inode-i_mode))
goto dput_and_out;
 
-   error = vfs_permission(nd, MAY_WRITE);
+   error = mnt_want_write(nd.mnt);
if (error)
goto dput_and_out;
 
-   error = -EROFS;
-   if (IS_RDONLY(inode))
-   goto dput_and_out;
+   error = vfs_permission(nd, MAY_WRITE);
+   if (error)
+   goto mnt_drop_write_and_out;
 
error = -EPERM;
if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
-   goto dput_and_out;
+   goto mnt_drop_write_and_out;
 
/*
 * Make sure that there are no leases.
 */
error = break_lease(inode, FMODE_WRITE);
if (error)
-   goto dput_and_out;
+   goto mnt_drop_write_and_out;
 
error = get_write_access(inode);
if (error)
-   goto dput_and_out;
+   goto mnt_drop_write_and_out;
 
error = locks_verify_truncate(inode, NULL, length);
if (!error) {
@@ -271,6 +271,8 @@ static long do_sys_truncate(const char _
}
put_write_access(inode);
 
+mnt_drop_write_and_out:
+   mnt_drop_write(nd.mnt);
 dput_and_out:
path_release(nd);
 out:
_
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/26] elevate mount count for extended attributes

2007-06-22 Thread Dave Hansen

This basically audits the callers of xattr_permission(), which
calls permission() and can perform writes to the filesystem.

Signed-off-by: Dave Hansen [EMAIL PROTECTED]
---

 lxc-dave/fs/nfsd/nfs4proc.c |7 ++-
 lxc-dave/fs/xattr.c |   16 ++--
 2 files changed, 20 insertions(+), 3 deletions(-)

diff -puN fs/nfsd/nfs4proc.c~10-24-elevate-mount-count-for-extended-attributes 
fs/nfsd/nfs4proc.c
--- lxc/fs/nfsd/nfs4proc.c~10-24-elevate-mount-count-for-extended-attributes
2007-06-21 23:23:19.0 -0700
+++ lxc-dave/fs/nfsd/nfs4proc.c 2007-06-21 23:23:19.0 -0700
@@ -626,14 +626,19 @@ nfsd4_setattr(struct svc_rqst *rqstp, st
return status;
}
}
+   status = mnt_want_write(cstate-current_fh.fh_export-ex_mnt);
+   if (status)
+   return status;
status = nfs_ok;
if (setattr-sa_acl != NULL)
status = nfsd4_set_nfs4_acl(rqstp, cstate-current_fh,
setattr-sa_acl);
if (status)
-   return status;
+   goto out;
status = nfsd_setattr(rqstp, cstate-current_fh, setattr-sa_iattr,
0, (time_t)0);
+out:
+   mnt_drop_write(cstate-current_fh.fh_export-ex_mnt);
return status;
 }
 
diff -puN fs/xattr.c~10-24-elevate-mount-count-for-extended-attributes 
fs/xattr.c
--- lxc/fs/xattr.c~10-24-elevate-mount-count-for-extended-attributes
2007-06-21 23:23:19.0 -0700
+++ lxc-dave/fs/xattr.c 2007-06-21 23:23:19.0 -0700
@@ -12,6 +12,7 @@
 #include linux/smp_lock.h
 #include linux/file.h
 #include linux/xattr.h
+#include linux/mount.h
 #include linux/namei.h
 #include linux/security.h
 #include linux/syscalls.h
@@ -33,8 +34,6 @@ xattr_permission(struct inode *inode, co
 * filesystem  or on an immutable / append-only inode.
 */
if (mask  MAY_WRITE) {
-   if (IS_RDONLY(inode))
-   return -EROFS;
if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
return -EPERM;
}
@@ -237,7 +236,11 @@ sys_setxattr(char __user *path, char __u
error = user_path_walk(path, nd);
if (error)
return error;
+   error = mnt_want_write(nd.mnt);
+   if (error)
+   return error;
error = setxattr(nd.dentry, name, value, size, flags);
+   mnt_drop_write(nd.mnt);
path_release(nd);
return error;
 }
@@ -252,7 +255,11 @@ sys_lsetxattr(char __user *path, char __
error = user_path_walk_link(path, nd);
if (error)
return error;
+   error = mnt_want_write(nd.mnt);
+   if (error)
+   return error;
error = setxattr(nd.dentry, name, value, size, flags);
+   mnt_drop_write(nd.mnt);
path_release(nd);
return error;
 }
@@ -268,9 +275,14 @@ sys_fsetxattr(int fd, char __user *name,
f = fget(fd);
if (!f)
return error;
+   error = mnt_want_write(f-f_vfsmnt);
+   if (error)
+   goto out_fput;
dentry = f-f_path.dentry;
audit_inode(NULL, dentry-d_inode);
error = setxattr(dentry, name, value, size, flags);
+   mnt_drop_write(f-f_vfsmnt);
+out_fput:
fput(f);
return error;
 }
_
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 21/26] elevate write count for do_sys_utime() and touch_atime()

2007-06-22 Thread Dave Hansen



Signed-off-by: Dave Hansen [EMAIL PROTECTED]
---

 lxc-dave/fs/inode.c |   20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff -puN fs/inode.c~17-24-elevate-write-count-for-do-sys-utime-and-touch-atime 
fs/inode.c
--- lxc/fs/inode.c~17-24-elevate-write-count-for-do-sys-utime-and-touch-atime   
2007-06-21 23:23:24.0 -0700
+++ lxc-dave/fs/inode.c 2007-06-21 23:23:24.0 -0700
@@ -1161,22 +1161,23 @@ void touch_atime(struct vfsmount *mnt, s
struct inode *inode = dentry-d_inode;
struct timespec now;
 
-   if (inode-i_flags  S_NOATIME)
+   if (mnt  mnt_want_write(mnt))
return;
+   if (inode-i_flags  S_NOATIME)
+   goto out;
if (IS_NOATIME(inode))
-   return;
+   goto out;
if ((inode-i_sb-s_flags  MS_NODIRATIME)  S_ISDIR(inode-i_mode))
-   return;
+   goto out;
 
/*
 * We may have a NULL vfsmount when coming from NFSD
 */
if (mnt) {
if (mnt-mnt_flags  MNT_NOATIME)
-   return;
+   goto out;
if ((mnt-mnt_flags  MNT_NODIRATIME)  S_ISDIR(inode-i_mode))
-   return;
-
+   goto out;
if (mnt-mnt_flags  MNT_RELATIME) {
/*
 * With relative atime, only update atime if the
@@ -1187,16 +1188,19 @@ void touch_atime(struct vfsmount *mnt, s
inode-i_atime)  0 
timespec_compare(inode-i_ctime,
inode-i_atime)  0)
-   return;
+   goto out;
}
}
 
now = current_fs_time(inode-i_sb);
if (timespec_equal(inode-i_atime, now))
-   return;
+   goto out;
 
inode-i_atime = now;
mark_inode_dirty_sync(inode);
+out:
+   if (mnt)
+   mnt_drop_write(mnt);
 }
 EXPORT_SYMBOL(touch_atime);
 
_
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 23/26] elevate mnt writers for vfs_unlink() callers

2007-06-22 Thread Dave Hansen



Signed-off-by: Dave Hansen [EMAIL PROTECTED]
---

 lxc-dave/fs/namei.c   |4 
 lxc-dave/ipc/mqueue.c |5 -
 2 files changed, 8 insertions(+), 1 deletion(-)

diff -puN fs/namei.c~19-24-elevate-mnt-writers-for-vfs-unlink-callers fs/namei.c
--- lxc/fs/namei.c~19-24-elevate-mnt-writers-for-vfs-unlink-callers 
2007-06-21 23:23:25.0 -0700
+++ lxc-dave/fs/namei.c 2007-06-21 23:23:25.0 -0700
@@ -2175,7 +2175,11 @@ static long do_unlinkat(int dfd, const c
inode = dentry-d_inode;
if (inode)
atomic_inc(inode-i_count);
+   error = mnt_want_write(nd.mnt);
+   if (error)
+   goto exit2;
error = vfs_unlink(nd.dentry-d_inode, dentry);
+   mnt_drop_write(nd.mnt);
exit2:
dput(dentry);
}
diff -puN ipc/mqueue.c~19-24-elevate-mnt-writers-for-vfs-unlink-callers 
ipc/mqueue.c
--- lxc/ipc/mqueue.c~19-24-elevate-mnt-writers-for-vfs-unlink-callers   
2007-06-21 23:23:25.0 -0700
+++ lxc-dave/ipc/mqueue.c   2007-06-21 23:23:25.0 -0700
@@ -750,8 +750,11 @@ asmlinkage long sys_mq_unlink(const char
inode = dentry-d_inode;
if (inode)
atomic_inc(inode-i_count);
-
+   err = mnt_want_write(mqueue_mnt);
+   if (err)
+   goto out_err;
err = vfs_unlink(dentry-d_parent-d_inode, dentry);
+   mnt_drop_write(mqueue_mnt);
 out_err:
dput(dentry);
 
_
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 25/26] r/o bind mounts: scalable writer count

2007-06-22 Thread Dave Hansen

This uses a statically-allocated percpu variable.  All
operations are local to a cpu as long that cpu operates on
the same mount, and there are no writer count imbalances.
Writer count imbalances happen when a write is taken on one
cpu, and released on another, like when an open/close pair
is performed on two different cpus because the task moved.

I've written a little benchmark to sit in a loop for a couple
of seconds in several cpus in parallel doing open/write/close
loops.

http://sr71.net/~dave/linux/openbench.c

The code in here is a a worst-possible case for this patch.
It does opens on a _pair_ of files in two different mounts
in parallel.  This should cause my code to lose its operate
on the same mount optimization completely.  This worst-case
scenario causes a 3% degredation in the benchmark.

I could probably get rid of even this 3%, but it would be
more complex than what I have here, and I think this is
getting into acceptable territory.  In practice, I expect
writing more than 3 bytes to a file, as well as disk I/O
to mask any effects that this has.

---

A little history:

These patches used to use a single atomic_t in each vfsmount
to track the number of writers to the mount at any given time.
Open a file for write anywhere on the mount, and you increment
the atomic_t.

This didn't scale on NUMA or SMP.  It caused something like a
4x slowdown in open/write/close operations.  It bounced
cachelines around like mad.  I tried out a new system with a
per-node spinlock and atomic in each vfsmount to replace the
old single atomic.  It worked _much_ better on NUMA, but still
caused a ~6% regression on the same open/write/close operation
set on a normal 4-way SMP machine, because it was still
contended inside of a node.

We could generalize this lock, but this is an awfully specialized
situation, and I'd be worried that people would try to use it
when it isn't absolutely necessary.

Signed-off-by: Dave Hansen [EMAIL PROTECTED]
---

 lxc-dave/fs/namei.c|6 +
 lxc-dave/fs/namespace.c|  140 -
 lxc-dave/include/linux/mount.h |9 ++
 3 files changed, 150 insertions(+), 5 deletions(-)

diff -puN fs/namei.c~numa_mnt_want_write fs/namei.c
--- lxc/fs/namei.c~numa_mnt_want_write  2007-06-22 10:12:46.0 -0700
+++ lxc-dave/fs/namei.c 2007-06-22 10:12:46.0 -0700
@@ -231,10 +231,12 @@ int permission(struct inode *inode, int 
int retval, submask;
 
if (mask  MAY_WRITE) {
-
/*
-* Nobody gets write access to a read-only fs.
+* If this WARN_ON() is hit, it likely means that
+* there was a missed mnt_want_write() on the path
+* leading here.
 */
+   WARN_ON(__mnt_is_readonly(nd-mnt));
if (IS_RDONLY(inode) 
(S_ISREG(mode) || S_ISDIR(mode) || S_ISLNK(mode)))
return -EROFS;
diff -puN fs/namespace.c~numa_mnt_want_write fs/namespace.c
--- lxc/fs/namespace.c~numa_mnt_want_write  2007-06-22 10:12:46.0 
-0700
+++ lxc-dave/fs/namespace.c 2007-06-22 10:21:39.0 -0700
@@ -17,6 +17,7 @@
 #include linux/quotaops.h
 #include linux/acct.h
 #include linux/capability.h
+#include linux/cpumask.h
 #include linux/module.h
 #include linux/sysfs.h
 #include linux/seq_file.h
@@ -51,6 +52,8 @@ static inline unsigned long hash(struct 
return tmp  hash_mask;
 }
 
+#define MNT_WRITER_UNDERFLOW_LIMIT -(116)
+
 struct vfsmount *alloc_vfsmnt(const char *name)
 {
struct vfsmount *mnt = kmem_cache_zalloc(mnt_cache, GFP_KERNEL);
@@ -64,6 +67,7 @@ struct vfsmount *alloc_vfsmnt(const char
INIT_LIST_HEAD(mnt-mnt_share);
INIT_LIST_HEAD(mnt-mnt_slave_list);
INIT_LIST_HEAD(mnt-mnt_slave);
+   atomic_set(mnt-__mnt_writers, 0);
if (name) {
int size = strlen(name) + 1;
char *newname = kmalloc(size, GFP_KERNEL);
@@ -76,19 +80,149 @@ struct vfsmount *alloc_vfsmnt(const char
return mnt;
 }
 
-int mnt_want_write(struct vfsmount *mnt)
+
+struct mnt_writer {
+   /*
+* If holding multiple instances of this lock, they
+* must be ordered by cpu number.
+*/
+   spinlock_t lock;
+   unsigned long count;
+   struct vfsmount *mnt;
+} cacheline_aligned_in_smp;
+DEFINE_PER_CPU(struct mnt_writer, mnt_writers);
+
+static int __init init_mnt_writers(void)
 {
-   if (__mnt_is_readonly(mnt))
-   return -EROFS;
+   int cpu;
+   for_each_possible_cpu(cpu) {
+   struct mnt_writer *writer = per_cpu(mnt_writers, cpu);
+   spin_lock_init(writer-lock);
+   writer-count = 0;
+   }
return 0;
 }
+fs_initcall(init_mnt_writers);
+
+static inline void __clear_mnt_count(struct mnt_writer *cpu_writer)
+{
+   if (!cpu_writer-mnt)
+   

[PATCH 26/26] honor r/w changes at do_remount() time

2007-06-22 Thread Dave Hansen


Originally from: Herbert Poetzl [EMAIL PROTECTED]

This is the core of the read-only bind mount patch set.

Note that this does _not_ add a ro option directly to
the bind mount operation.  If you require such a mount,
you must first do the bind, then follow it up with a
'mount -o remount,ro' operation.

Signed-off-by: Dave Hansen [EMAIL PROTECTED]
---

 lxc-dave/fs/namespace.c|   40 ++--
 lxc-dave/include/linux/mount.h |7 ++-
 2 files changed, 40 insertions(+), 7 deletions(-)

diff -puN fs/namespace.c~23-24-honor-r-w-changes-at-do-remount-time 
fs/namespace.c
--- lxc/fs/namespace.c~23-24-honor-r-w-changes-at-do-remount-time   
2007-06-22 10:14:19.0 -0700
+++ lxc-dave/fs/namespace.c 2007-06-22 10:14:19.0 -0700
@@ -201,7 +201,7 @@ void mnt_drop_write(struct vfsmount *mnt
 }
 EXPORT_SYMBOL_GPL(mnt_drop_write);
 
-int mnt_make_readonly(struct vfsmount *mnt)
+static int mnt_make_readonly(struct vfsmount *mnt)
 {
int ret = 0;
 
@@ -214,15 +214,21 @@ int mnt_make_readonly(struct vfsmount *m
goto out;
}
/*
-* actually set mount's r/o flag here to make
-* __mnt_is_readonly() true, which keeps anyone
-* from doing a successful mnt_want_write().
+* nobody can do a successful mnt_want_write() with all
+* of the counts in MNT_DENIED_WRITE and the locks held.
 */
+   if (!ret)
+   mnt-mnt_flags |= MNT_READONLY;
 out:
mnt_unlock_cpus();
return ret;
 }
 
+static void __mnt_unmake_readonly(struct vfsmount *mnt)
+{
+   mnt-mnt_flags = ~MNT_READONLY;
+}
+
 int simple_set_mnt(struct vfsmount *mnt, struct super_block *sb)
 {
mnt-mnt_sb = sb;
@@ -524,7 +530,7 @@ static int show_vfsmnt(struct seq_file *
seq_path(m, mnt, mnt-mnt_root,  \t\n\\);
seq_putc(m, ' ');
mangle(m, mnt-mnt_sb-s_type-name);
-   seq_puts(m, mnt-mnt_sb-s_flags  MS_RDONLY ?  ro :  rw);
+   seq_puts(m, __mnt_is_readonly(mnt) ?  ro :  rw);
for (fs_infop = fs_info; fs_infop-flag; fs_infop++) {
if (mnt-mnt_sb-s_flags  fs_infop-flag)
seq_puts(m, fs_infop-str);
@@ -1093,6 +1099,23 @@ out:
return err;
 }
 
+static int change_mount_flags(struct vfsmount *mnt, int ms_flags)
+{
+   int error = 0;
+   int readonly_request = 0;
+
+   if (ms_flags  MS_RDONLY)
+   readonly_request = 1;
+   if (readonly_request == __mnt_is_readonly(mnt))
+   return 0;
+
+   if (readonly_request)
+   error = mnt_make_readonly(mnt);
+   else
+   __mnt_unmake_readonly(mnt);
+   return error;
+}
+
 /*
  * change filesystem flags. dir should be a physical root of filesystem.
  * If you've mounted a non-root directory somewhere and want to do remount
@@ -1114,7 +1137,10 @@ static int do_remount(struct nameidata *
return -EINVAL;
 
down_write(sb-s_umount);
-   err = do_remount_sb(sb, flags, data, 0);
+   if (flags  MS_BIND)
+   err = change_mount_flags(nd-mnt, flags);
+   else
+   err = do_remount_sb(sb, flags, data, 0);
if (!err)
nd-mnt-mnt_flags = mnt_flags;
up_write(sb-s_umount);
@@ -1558,6 +1584,8 @@ long do_mount(char *dev_name, char *dir_
mnt_flags |= MNT_NODIRATIME;
if (flags  MS_RELATIME)
mnt_flags |= MNT_RELATIME;
+   if (flags  MS_RDONLY)
+   mnt_flags |= MNT_READONLY;
 
flags = ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE |
   MS_NOATIME | MS_NODIRATIME | MS_RELATIME);
diff -puN include/linux/mount.h~23-24-honor-r-w-changes-at-do-remount-time 
include/linux/mount.h
--- lxc/include/linux/mount.h~23-24-honor-r-w-changes-at-do-remount-time
2007-06-22 10:14:19.0 -0700
+++ lxc-dave/include/linux/mount.h  2007-06-22 10:14:19.0 -0700
@@ -29,6 +29,7 @@ struct mnt_namespace;
 #define MNT_NOATIME0x08
 #define MNT_NODIRATIME 0x10
 #define MNT_RELATIME   0x20
+#define MNT_READONLY   0x40 /* does the user want this to be r/o? */
 
 #define MNT_SHRINKABLE 0x100
 
@@ -87,7 +88,11 @@ static inline struct vfsmount *mntget(st
  */
 static inline int __mnt_is_readonly(struct vfsmount *mnt)
 {
-   return (mnt-mnt_sb-s_flags  MS_RDONLY);
+   if (mnt-mnt_flags  MNT_READONLY)
+   return 1;
+   if (mnt-mnt_sb-s_flags  MS_RDONLY)
+   return 1;
+   return 0;
 }
 
 extern int mnt_want_write(struct vfsmount *mnt);
_
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 24/26] do_rmdir(): elevate write count

2007-06-22 Thread Dave Hansen


Elevate the write count during the vfs_rmdir() call.

Signed-off-by: Dave Hansen [EMAIL PROTECTED]
---

 lxc-dave/fs/namei.c |5 +
 1 file changed, 5 insertions(+)

diff -puN fs/namei.c~20-24-do-rmdir-elevate-write-count fs/namei.c
--- lxc/fs/namei.c~20-24-do-rmdir-elevate-write-count   2007-06-21 
23:23:26.0 -0700
+++ lxc-dave/fs/namei.c 2007-06-21 23:23:26.0 -0700
@@ -2095,7 +2095,12 @@ static long do_rmdir(int dfd, const char
error = PTR_ERR(dentry);
if (IS_ERR(dentry))
goto exit2;
+   error = mnt_want_write(nd.mnt);
+   if (error)
+   goto exit3;
error = vfs_rmdir(nd.dentry-d_inode, dentry);
+   mnt_drop_write(nd.mnt);
+exit3:
dput(dentry);
 exit2:
mutex_unlock(nd.dentry-d_inode-i_mutex);
_
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 16/26] unix_find_other() elevate write count for touch_atime()

2007-06-22 Thread Dave Hansen


Signed-off-by: Dave Hansen [EMAIL PROTECTED]
---

 lxc-dave/net/unix/af_unix.c |   16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff -puN 
net/unix/af_unix.c~12-24-unix-find-other-elevate-write-count-for-touch-atime 
net/unix/af_unix.c
--- 
lxc/net/unix/af_unix.c~12-24-unix-find-other-elevate-write-count-for-touch-atime
2007-06-21 23:23:21.0 -0700
+++ lxc-dave/net/unix/af_unix.c 2007-06-21 23:23:21.0 -0700
@@ -703,21 +703,27 @@ static struct sock *unix_find_other(stru
err = path_lookup(sunname-sun_path, LOOKUP_FOLLOW, nd);
if (err)
goto fail;
+
+   err = mnt_want_write(nd.mnt);
+   if (err)
+   goto put_path_fail;
+
err = vfs_permission(nd, MAY_WRITE);
if (err)
-   goto put_fail;
+   goto mnt_drop_write_fail;
 
err = -ECONNREFUSED;
if (!S_ISSOCK(nd.dentry-d_inode-i_mode))
-   goto put_fail;
+   goto mnt_drop_write_fail;
u=unix_find_socket_byinode(nd.dentry-d_inode);
if (!u)
-   goto put_fail;
+   goto mnt_drop_write_fail;
 
if (u-sk_type == type)
touch_atime(nd.mnt, nd.dentry);
 
path_release(nd);
+   mnt_drop_write(nd.mnt);
 
err=-EPROTOTYPE;
if (u-sk_type != type) {
@@ -737,7 +743,9 @@ static struct sock *unix_find_other(stru
}
return u;
 
-put_fail:
+mnt_drop_write_fail:
+   mnt_drop_write(nd.mnt);
+put_path_fail:
path_release(nd);
 fail:
*error=err;
_
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 22/26] sys_mknodat(): elevate write count for vfs_mknod/create()

2007-06-22 Thread Dave Hansen


This takes care of all of the direct callers of vfs_mknod().
Since a few of these cases also handle normal file creation
as well, this also covers some calls to vfs_create().

Signed-off-by: Dave Hansen [EMAIL PROTECTED]
---

 lxc-dave/fs/namei.c |   12 
 lxc-dave/fs/nfsd/vfs.c  |8 ++--
 lxc-dave/net/unix/af_unix.c |4 
 3 files changed, 22 insertions(+), 2 deletions(-)

diff -puN fs/namei.c~18-24-sys-mknodat-elevate-write-count-for-vfs-mknod-create 
fs/namei.c
--- lxc/fs/namei.c~18-24-sys-mknodat-elevate-write-count-for-vfs-mknod-create   
2007-06-21 23:23:25.0 -0700
+++ lxc-dave/fs/namei.c 2007-06-21 23:23:25.0 -0700
@@ -1897,14 +1897,26 @@ asmlinkage long sys_mknodat(int dfd, con
if (!IS_ERR(dentry)) {
switch (mode  S_IFMT) {
case 0: case S_IFREG:
+   error = mnt_want_write(nd.mnt);
+   if (error)
+   break;
error = vfs_create(nd.dentry-d_inode,dentry,mode,nd);
+   mnt_drop_write(nd.mnt);
break;
case S_IFCHR: case S_IFBLK:
+   error = mnt_want_write(nd.mnt);
+   if (error)
+   break;
error = vfs_mknod(nd.dentry-d_inode,dentry,mode,
new_decode_dev(dev));
+   mnt_drop_write(nd.mnt);
break;
case S_IFIFO: case S_IFSOCK:
+   error = mnt_want_write(nd.mnt);
+   if (error)
+   break;
error = vfs_mknod(nd.dentry-d_inode,dentry,mode,0);
+   mnt_drop_write(nd.mnt);
break;
case S_IFDIR:
error = -EPERM;
diff -puN 
fs/nfsd/vfs.c~18-24-sys-mknodat-elevate-write-count-for-vfs-mknod-create 
fs/nfsd/vfs.c
--- 
lxc/fs/nfsd/vfs.c~18-24-sys-mknodat-elevate-write-count-for-vfs-mknod-create
2007-06-21 23:23:25.0 -0700
+++ lxc-dave/fs/nfsd/vfs.c  2007-06-21 23:23:25.0 -0700
@@ -1192,7 +1192,11 @@ nfsd_create(struct svc_rqst *rqstp, stru
case S_IFBLK:
case S_IFIFO:
case S_IFSOCK:
+   host_err = mnt_want_write(fhp-fh_export-ex_mnt);
+   if (host_err)
+   break;
host_err = vfs_mknod(dirp, dchild, iap-ia_mode, rdev);
+   mnt_drop_write(fhp-fh_export-ex_mnt);
break;
default:
printk(nfsd: bad file type %o in nfsd_create\n, type);
@@ -1811,7 +1815,7 @@ nfsd_permission(struct svc_export *exp, 
inode-i_mode,
IS_IMMUTABLE(inode)? immut : ,
IS_APPEND(inode)?append : ,
-   __mnt_is_readonly(exp-mnt)? ro : );
+   __mnt_is_readonly(exp-ex_mnt)?  ro : );
dprintk(  owner %d/%d user %d/%d\n,
inode-i_uid, inode-i_gid, current-fsuid, current-fsgid);
 #endif
@@ -1822,7 +1826,7 @@ nfsd_permission(struct svc_export *exp, 
 */
if (!(acc  MAY_LOCAL_ACCESS))
if (acc  (MAY_WRITE | MAY_SATTR | MAY_TRUNC)) {
-   if (EX_RDONLY(exp) || __mnt_is_readonly(exp-mnt))
+   if (EX_RDONLY(exp) || __mnt_is_readonly(exp-ex_mnt))
return nfserr_rofs;
if (/* (acc  MAY_WRITE)  */ IS_IMMUTABLE(inode))
return nfserr_perm;
diff -puN 
net/unix/af_unix.c~18-24-sys-mknodat-elevate-write-count-for-vfs-mknod-create 
net/unix/af_unix.c
--- 
lxc/net/unix/af_unix.c~18-24-sys-mknodat-elevate-write-count-for-vfs-mknod-create
   2007-06-21 23:23:25.0 -0700
+++ lxc-dave/net/unix/af_unix.c 2007-06-21 23:23:25.0 -0700
@@ -816,7 +816,11 @@ static int unix_bind(struct socket *sock
 */
mode = S_IFSOCK |
   (SOCK_INODE(sock)-i_mode  ~current-fs-umask);
+   err = mnt_want_write(nd.mnt);
+   if (err)
+   goto out_mknod_dput;
err = vfs_mknod(nd.dentry-d_inode, dentry, mode, 0);
+   mnt_drop_write(nd.mnt);
if (err)
goto out_mknod_dput;
mutex_unlock(nd.dentry-d_inode-i_mutex);
_
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 17/26] elevate write count over calls to vfs_rename()

2007-06-22 Thread Dave Hansen

This also creates a little helper in the NFS code to
make an if() a little bit less ugly.

Signed-off-by: Dave Hansen [EMAIL PROTECTED]
---

 lxc-dave/fs/namei.c|4 
 lxc-dave/fs/nfsd/vfs.c |   23 +++
 2 files changed, 23 insertions(+), 4 deletions(-)

diff -puN fs/namei.c~13-24-elevate-write-count-over-calls-to-vfs-rename 
fs/namei.c
--- lxc/fs/namei.c~13-24-elevate-write-count-over-calls-to-vfs-rename   
2007-06-21 23:23:22.0 -0700
+++ lxc-dave/fs/namei.c 2007-06-21 23:23:22.0 -0700
@@ -2575,8 +2575,12 @@ static int do_rename(int olddfd, const c
if (new_dentry == trap)
goto exit5;
 
+   error = mnt_want_write(oldnd.mnt);
+   if (error)
+   goto exit5;
error = vfs_rename(old_dir-d_inode, old_dentry,
   new_dir-d_inode, new_dentry);
+   mnt_drop_write(oldnd.mnt);
 exit5:
dput(new_dentry);
 exit4:
diff -puN fs/nfsd/vfs.c~13-24-elevate-write-count-over-calls-to-vfs-rename 
fs/nfsd/vfs.c
--- lxc/fs/nfsd/vfs.c~13-24-elevate-write-count-over-calls-to-vfs-rename
2007-06-21 23:23:22.0 -0700
+++ lxc-dave/fs/nfsd/vfs.c  2007-06-21 23:23:22.0 -0700
@@ -1554,6 +1554,14 @@ out_nfserr:
goto out_unlock;
 }
 
+static inline int svc_msnfs(struct svc_fh *ffhp)
+{
+#ifdef MSNFS
+   return (ffhp-fh_export-ex_flags  NFSEXP_MSNFS);
+#else
+   return 0;
+#endif
+}
 /*
  * Rename a file
  * N.B. After this call _both_ ffhp and tfhp need an fh_put
@@ -1615,13 +1623,20 @@ nfsd_rename(struct svc_rqst *rqstp, stru
if (ndentry == trap)
goto out_dput_new;
 
-#ifdef MSNFS
-   if ((ffhp-fh_export-ex_flags  NFSEXP_MSNFS) 
+   if (svc_msnfs(ffhp) 
((atomic_read(odentry-d_count)  1)
 || (atomic_read(ndentry-d_count)  1))) {
host_err = -EPERM;
-   } else
-#endif
+   goto out_dput_new;
+   }
+
+   host_err = -EXDEV;
+   if (ffhp-fh_export-ex_mnt != tfhp-fh_export-ex_mnt)
+   goto out_dput_new;
+   host_err = mnt_want_write(ffhp-fh_export-ex_mnt);
+   if (host_err)
+   goto out_dput_new;
+
host_err = vfs_rename(fdir, odentry, tdir, ndentry);
if (!host_err  EX_ISSYNC(tfhp-fh_export)) {
host_err = nfsd_sync_dir(tdentry);
_
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 14/26] elevate write count for file_update_time()

2007-06-22 Thread Dave Hansen


Signed-off-by: Dave Hansen [EMAIL PROTECTED]
---

 lxc-dave/fs/inode.c |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff -puN fs/inode.c~elevate-write-count-for-file_update_time fs/inode.c
--- lxc/fs/inode.c~elevate-write-count-for-file_update_time 2007-06-21 
23:23:20.0 -0700
+++ lxc-dave/fs/inode.c 2007-06-21 23:23:20.0 -0700
@@ -1217,10 +1217,13 @@ void file_update_time(struct file *file)
struct inode *inode = file-f_path.dentry-d_inode;
struct timespec now;
int sync_it = 0;
+   int err = 0;
 
if (IS_NOCMTIME(inode))
return;
-   if (IS_RDONLY(inode))
+   if (file-f_vfsmnt)
+   err = mnt_want_write(file-f_vfsmnt);
+   if (err)
return;
 
now = current_fs_time(inode-i_sb);
@@ -1236,6 +1239,8 @@ void file_update_time(struct file *file)
 
if (sync_it)
mark_inode_dirty_sync(inode);
+   if (file-f_vfsmnt)
+   mnt_drop_write(file-f_vfsmnt);
 }
 
 EXPORT_SYMBOL(file_update_time);
_
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 39/45] AppArmor: Profile loading and manipulation, pathname matching

2007-06-22 Thread Chris Wright
* Chris Mason ([EMAIL PROTECTED]) wrote:
 I'm sure people there will have a different versions of events.  The
 one part that was discussed was if pathname based security was
 useful, and a number of the people in the room (outside of 
 novell) said it was.  Now, it could be that nobody wanted to argue
 anymore, since most opinions had come out on one list or another by
 then.  

Indeed.  The trouble is that's too high level compared with the actual
implementation details.  AA is stalled because it has failed to get
VFS support for it's model.  I don't see a nice way out unless it
changes it's notion of policy language (globbing is the tough one)
or gets traction to pass dentry/vfsmount all the way down.  Paths are
completely relevant for security, esp. when considering the parent dir
and the leaf (as in forward lookup case).  Retroactively creating the
full path is at the minimum ugly, and in the worst case can be insecure
(yes AA has taken many measures to mitigate that insecurity).

 But as someone who doesn't use either SElinux or AA, I really hope
 we can get past the part of the debate where:
 
 while(1)
 AA) we think we're making users happy with pathname security
 SELINUX) pathname security sucks

Yes.  Please.  Both parties are miserably failing the sanity test.
Doing the same thing over and over and expecting different results.

AA folks: deal with the VFS issues that your patchset have in a palatable
way (which does not include passing NULL when it's inconvenient to
do otherwise).  You've already missed an opportunity with Christoph's
suggestions for changes in NFS.  I know you've considered many alternative
approaches and consistently hit dead ends.  But please note, if you
have coded yourself into a corner because of your policy language,
that's your issue to solve, not ours.

SELinux folks: do something useful rather than quibbling over the TCSEC
definition of MAC and AA's poor taste in marketing literature.  Here's
some suggestions:

1) Make SELinux usable (it's *still* the number one complaint).  While
this is a bit of a cheap shot, it really is one of the core reasons AA
advocates exist.
2) Work on a variant of Kyle's suggestion to squash the relevancy of AA.
3) Write an effective exploit against AA that demonstrates the fundamental
weakness of the model (better make sure it's not also an issue for
targetted policy).

thanks,
-chris
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html