Re: [AppArmor 00/44] AppArmor security module overview

2007-06-27 Thread Andreas Dilger
Any chance you can remove linux-fsdevel from the CC list?  I don't think this
has anything to do with filesystems.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


DIO panic on 2.6.21.5

2007-06-27 Thread Badari Pulavarty

Hi Zach,

One of our perf. team ran into this while doing some runs.
I didn't see anything obvious - it looks like we converted
async IO to synchronous one. I didn't spend much time digging
around.

Is this a known issue ? Any ideas ?

Thanks,
Badari

[ cut here ]
kernel BUG at fs/direct-io.c:1113!
invalid opcode:  [1] SMP
CPU 11
Modules linked in: oprofile raw capability commoncap qla2xxx ipv6 button 
battery ac loop dm_mod tg3 ext3 jbd edd fan thermal processor scsi_transport_fc 
sg aic94xx libsas firmware_class scsi_transport_sas serverworks sd_mod scsePid: 
9709, comm: db2sysc Not tainted 2.6.21.5-smp #1
RIP: 0010:[]  [] 
__blockdev_direct_IO+0x96c/0xa4a

RSP: 0018:810c3d3efc08  EFLAGS: 00010202
RAX: 0246 RBX: 810041768b24 RCX: 80406918
RDX:  RSI: 0246 RDI: 810041768b24
RBP: 810041768800 R08: 0086 R09: 81003f0a2370
R10: 810c3f3ee9d8 R11: 802e9716 R12: 0001
R13: fdef R14: 810c5beb42b0 R15: 
FS:  2b3d14ffe940() GS:810c444c95c0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 2b3e84f57000 CR3: 4511c000 CR4: 06e0
Process db2sysc (pid: 9709, threadinfo 810c3d3ee000, task 810041fee760)
Stack:  810c42f0fbc0 4c1f4000 811836306d48 0011803e7f95
0009  0008 
810041fee760 00090001 0001 810024ac9948
Call Trace:
[] blkdev_direct_IO+0x45/0x4a
[] blkdev_get_blocks+0x0/0x96
[] generic_file_direct_IO+0xbd/0xfb
[] generic_file_direct_write+0x60/0xf5
[] __generic_file_aio_write_nolock+0x2e7/0x410
[] aio_read_evt+0x9a/0x108
[] generic_file_aio_write_nolock+0x34/0x80
[] generic_file_aio_write_nolock+0x0/0x80
[] aio_rw_vect_retry+0x72/0x14a
[] aio_run_iocb+0x9a/0x134
[] io_submit_one+0x311/0x35e
[] sys_io_submit+0x9b/0x101
[] default_wake_function+0x0/0xe
[] system_call+0x7e/0x83


Code: 0f 0b eb fe 4d 85 e4 75 1d 48 8b 74 24 08 44 89 ea 48 89 ef
RIP  [] __blockdev_direct_IO+0x96c/0xa4a
RSP 



-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] fsblock

2007-06-27 Thread Nick Piggin
On Thu, Jun 28, 2007 at 08:35:48AM +1000, David Chinner wrote:
> On Wed, Jun 27, 2007 at 07:50:56AM -0400, Chris Mason wrote:
> > Lets look at a typical example of how IO actually gets done today,
> > starting with sys_write():
> > 
> > sys_write(file, buffer, 1MB)
> > for each page:
> > prepare_write()
> > allocate contiguous chunks of disk
> > attach buffers
> > copy_from_user()
> > commit_write()
> > dirty buffers
> > 
> > pdflush:
> > writepages()
> > find pages with contiguous chunks of disk
> > build and submit large bios
> > 
> > So, we replace prepare_write and commit_write with an extent based api,
> > but we keep the dirty each buffer part.  writepages has to turn that
> > back into extents (bio sized), and the result is completely full of dark
> > dark corner cases.

That's true but I don't think an extent data structure means we can
become too far divorced from the pagecache or the native block size
-- what will end up happening is that often we'll need "stuff" to map
between all those as well, even if it is only at IO-time.

But the point is taken, and I do believe that at least for APIs, extent
based seems like the best way to go. And that should allow fsblock to
be replaced or augmented in future without _too_ much pain.

 
> Yup - I've been on the painful end of those dark corner cases several
> times in the last few months.
> 
> It's also worth pointing out that mpage_readpages() already works on
> an extent basis - it overloads bufferheads to provide a "map_bh" that
> can point to a range of blocks in the same state. The code then iterates
> the map_bh range a page at a time building bios (i.e. not even using
> buffer heads) from that map..

One issue I have with the current nobh and mpage stuff is that it
requires multiple calls into get_block (first to prepare write, then
to writepage), it doesn't allow filesystems to attach resources
required for writeout at prepare_write time, and it doesn't play nicely
with buffers in general. (not to mention that nobh error handling is
buggy).

I haven't done any mpage-like code for fsblocks yet, but I think they
wouldn't be too much trouble, and wouldn't have any of the above
problems...
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC:PATCH] How best to handle implicit clearing of setuid/setgid bits on NFS?

2007-06-27 Thread Jeff Layton
On Wed, 27 Jun 2007 18:15:55 -0400
Trond Myklebust <[EMAIL PROTECTED]> wrote:

> On Tue, 2007-05-29 at 12:47 -0400, Jeff Layton wrote:
> > I've been looking at issue of clearing setuid/setgid bits when a file
> > is written to on NFS. Here's the problem in a nutshell:
> > 
> > We have 2 users. test1 and test2. Both are members of the group
> > "testgrp":
> > 
> > [EMAIL PROTECTED] ls -l f1
> > -rwxrwsr-x 1 test1 testgrp 2 2007-05-29 12:23 f1
> > [EMAIL PROTECTED] echo foo > f1
> > -bash: f1: Permission denied
> > 
> > ...and f1 is unchanged. The problem is that the VFS calls remove_suid
> > to wipe the setgid bit. This ends up causing a SETATTR call, which
> > fails on NFS because we're attempting to remove these bits as user
> > "test2".
> > 
> > Until recently, the situation here was worse. The VFS would truncate
> > the file first and then try to clear the setgid bit. The truncate would
> > succeed, but the perm change would fail. You'd end up with a zero-length
> > file. This was fixed my making the size change and bit-clearing go via
> > the same setattr call, so the whole operation just errors out now.
> > 
> > My question is -- Is there anything we can do to make this work as it
> > does on a local filesystem? Ideally there would be some way to tell the
> > server "clear the setuid/gid bits", without actually modifying the
> > contents of the file. Is there a NFS call we can use that would do this?
> > 
> > The only thing I can think of is to read the first byte of the file and
> > then overwrite it with the same data, but that seems racy and may have
> > other problems (and what do you do with a zero-length, setuid file?).
> > 
> > Any suggestions appreciated...
> 
> The answer should be simple: leave it to the server. All servers I know
> (except possibly Hummingbird) will clear the setuid/setgid bit whenever
> the file is modified. Anything else would be _extremely_ racy anyway:
> Consider the scenario where you are preparing to write to the file, when
> suddenly a user on another client makes the file setuid after you have
> open()ed the file, but before you actually issue the write().
> 
> IOW: calling remove_suid() on the client is completely redundant, and
> should be suppressed.
> 
> Trond
> 

Ok. This is a bit more complex now since we remove suid bits on
truncate, but don't set ATTR_FORCE.

Here's a patch that should do this. I know there's a general
aversion to adding new flags to vfs structures, but I couldn't think of
a way to cleanly do this without adding one.

Note that I've not tested this patch at all so this is just a RFC.

CC'ing Al here since he's expressed interest in this problem as well.

Thoughts?

Signed-off-by: Jeff Layton <[EMAIL PROTECTED]>

diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index ca20d3c..afdd82e 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -71,7 +71,7 @@ static struct file_system_type nfs_fs_type = {
.name   = "nfs",
.get_sb = nfs_get_sb,
.kill_sb= nfs_kill_super,
-   .fs_flags   = 
FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
+   .fs_flags   = 
FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA|FS_DONT_REMOVE_SUID,
 };
 
 struct file_system_type nfs_xdev_fs_type = {
@@ -79,7 +79,7 @@ struct file_system_type nfs_xdev_fs_type = {
.name   = "nfs",
.get_sb = nfs_xdev_get_sb,
.kill_sb= nfs_kill_super,
-   .fs_flags   = 
FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
+   .fs_flags   = 
FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA|FS_DONT_REMOVE_SUID,
 };
 
 static const struct super_operations nfs_sops = {
@@ -107,7 +107,7 @@ static struct file_system_type nfs4_fs_type = {
.name   = "nfs4",
.get_sb = nfs4_get_sb,
.kill_sb= nfs4_kill_super,
-   .fs_flags   = 
FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
+   .fs_flags   = 
FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA|FS_DONT_REMOVE_SUID,
 };
 
 struct file_system_type nfs4_xdev_fs_type = {
@@ -115,7 +115,7 @@ struct file_system_type nfs4_xdev_fs_type = {
.name   = "nfs4",
.get_sb = nfs4_xdev_get_sb,
.kill_sb= nfs4_kill_super,
-   .fs_flags   = 
FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
+   .fs_flags   = 
FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA|FS_DONT_REMOVE_SUID,
 };
 
 struct file_system_type nfs4_referral_fs_type = {
@@ -123,7 +123,7 @@ struct file_system_type nfs4_referral_fs_type = {
.name   = "nfs4",
.get_sb = nfs4_referral_get_sb,
.kill_sb= nfs4_kill_super,
-   .fs_flags   = 
FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
+   .fs_flags   = 
FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA|FS_DONT_REMOVE_SUID,
 };
 
 static const struct super_operations nfs4_sops = {
diff --git a/include/lin

Re: [PATCH 4/7][TAKE5] support new modes in fallocate

2007-06-27 Thread Nathan Scott
On Thu, 2007-06-28 at 10:39 +1000, David Chinner wrote:
> 
> 
> I don't think it does - swapfile I/O looks like it goes direct to
> bio without passing through the filesystem.  When the swapfile is
> mapped, it scans and records the extent map of the entire swapfile
> in a separate structure and AFAICT the swap code uses that built map
> without touching the filesystem at all.
> 
> If that is true then the written/unwritten state of the extents is
> irrelevant; all we need is allocated disk space for the file and
> swapping should work. And it's not like anyone should be reading
> the contents of that swapfile through the filesystem, either. ;)

Ah, yes, good point - thats true.  Unwritten extents are ideal for
this then, as attempts to read swap via the regular interfaces will
return zeros instead of random swapped out memory contents.

cheers.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7][TAKE5] support new modes in fallocate

2007-06-27 Thread David Chinner
On Thu, Jun 28, 2007 at 09:28:36AM +1000, Nathan Scott wrote:
> On Wed, 2007-06-27 at 23:36 +1000, David Chinner wrote:
> >  Allows setup_swap_extents() to use preallocated files on XFS
> > filesystems for swap files without ever needing to convert them.
> 
> Using unwritten extents (as opposed to the MKSWAP flag mentioned
> earlier) has the unfortunate down side of requiring transactions,
> possibly additional IO, and memory allocation during swap.  (but,
> this patch should probably go in regardless, as teaching generic
> code about unwritten extents is not a bad idea).

I don't think it does - swapfile I/O looks like it goes direct to
bio without passing through the filesystem.  When the swapfile is
mapped, it scans and records the extent map of the entire swapfile
in a separate structure and AFAICT the swap code uses that built map
without touching the filesystem at all.

If that is true then the written/unwritten state of the extents is
irrelevant; all we need is allocated disk space for the file and
swapping should work. And it's not like anyone should be reading
the contents of that swapfile through the filesystem, either. ;)

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 00/44] AppArmor security module overview

2007-06-27 Thread David Miller
From: Casey Schaufler <[EMAIL PROTECTED]>
Date: Wed, 27 Jun 2007 17:27:17 -0700 (PDT)

> --- David Miller <[EMAIL PROTECTED]> wrote:
> 
> > Neither of those are reasons why something should go into the tree.
> 
> They reflect the corporate reality of the open source community.
> If you're going to go down the "open source isn't for money"
> rathole please take it elsewhere. I've heard the arguments so many
> times I can sing them to the tune of "Lady Madonna".

That's a wonderful technical reason for apparmour to go into the tree.
Thanks for sharing and making it even clearer what is really behind
this submission.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 00/44] AppArmor security module overview

2007-06-27 Thread Casey Schaufler

--- David Miller <[EMAIL PROTECTED]> wrote:

> From: Crispin Cowan <[EMAIL PROTECTED]>
> Date: Wed, 27 Jun 2007 15:46:57 -0700
> 
> > But we do not want to prevent other people from using SELinux if it
> > suits them. Linux is about choice, and that is especially vital in
> > security. As Linus himself observed when LSM was started, there are a
> > lot of security models, they have various strengths and weaknesses, and
> > often are not compatible with each other. That is why it is important
> > that LSM persist, that SELinux not be the only in-tree user of LSM, and
> > why we think AppArmor should be included upstream, so that non-SUSE
> > users can also use AppArmor if it suits them.
> 
> Anyone can apply the apparmour patch to their tree, they get the
> choice that way.  Nobody is currently prevented from using apparmour
> if they want to, any such suggestion is pure rubbish.

The exact same argument was made prior to SELinux going upstream.
Look, if you can't be right, try at least to be original.

> It is even more incredulious to imply that just by having apparmour
> in the upstream kernel all the userland bits will magically appear
> on every user's distribution.

Just like all the SELinux userland magically appeared in everyone's
distribution? Nope, didn't happen.

> Give me a break.

No. You are out of line and spewing ignorance.

> What you get by the code going into the upstream kernel tree is that
> it a) adds some pseudo legitimacy to AppArmour (which I don't
> personally think is warranted) and b) gets the work of keeping
> apparmour working with upstream largely off of your back and in the
> hands of the upstream community.

Duh. Those are pretty much the reasons anyone goes through the
trouble of getting anything upstream.

> Neither of those are reasons why something should go into the tree.

They reflect the corporate reality of the open source community.
If you're going to go down the "open source isn't for money"
rathole please take it elsewhere. I've heard the arguments so many
times I can sing them to the tune of "Lady Madonna".

> Frankly I think AppArmour is a joke,

"SELinux, AppArmor, and Hilary Clinton walk into a bar ..."

> and all of this integration with
> LSM business is just a face saving effort, nothing more.  And saving
> face is not, and has never been, a reason for something to be put into
> the upstream tree.

Believe what you will. Crispin has been working with LSM from the
inception those many years ago. He's been working on getting this
module in for over a year. If you don't like his module go write
your own and put him out of business.


Casey Schaufler
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7][TAKE5] support new modes in fallocate

2007-06-27 Thread Nathan Scott
On Wed, 2007-06-27 at 23:36 +1000, David Chinner wrote:
>  Allows setup_swap_extents() to use preallocated files on XFS
> filesystems for swap files without ever needing to convert them.

Using unwritten extents (as opposed to the MKSWAP flag mentioned
earlier) has the unfortunate down side of requiring transactions,
possibly additional IO, and memory allocation during swap.  (but,
this patch should probably go in regardless, as teaching generic
code about unwritten extents is not a bad idea).

cheers.

--
Nathan

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 00/44] AppArmor security module overview

2007-06-27 Thread David Miller
From: Crispin Cowan <[EMAIL PROTECTED]>
Date: Wed, 27 Jun 2007 15:46:57 -0700

> But we do not want to prevent other people from using SELinux if it
> suits them. Linux is about choice, and that is especially vital in
> security. As Linus himself observed when LSM was started, there are a
> lot of security models, they have various strengths and weaknesses, and
> often are not compatible with each other. That is why it is important
> that LSM persist, that SELinux not be the only in-tree user of LSM, and
> why we think AppArmor should be included upstream, so that non-SUSE
> users can also use AppArmor if it suits them.

Anyone can apply the apparmour patch to their tree, they get the
choice that way.  Nobody is currently prevented from using apparmour
if they want to, any such suggestion is pure rubbish.

It is even more incredulious to imply that just by having apparmour
in the upstream kernel all the userland bits will magically appear
on every user's distribution.

Give me a break.

What you get by the code going into the upstream kernel tree is that
it a) adds some pseudo legitimacy to AppArmour (which I don't
personally think is warranted) and b) gets the work of keeping
apparmour working with upstream largely off of your back and in the
hands of the upstream community.

Neither of those are reasons why something should go into the tree.

Frankly I think AppArmour is a joke, and all of this integration with
LSM business is just a face saving effort, nothing more.  And saving
face is not, and has never been, a reason for something to be put into
the upstream tree.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 00/44] AppArmor security module overview

2007-06-27 Thread Crispin Cowan
Sean wrote:
> On Wed, 27 Jun 2007 14:06:04 -0700
> Crispin Cowan <[EMAIL PROTECTED]> wrote:
>   
>> I am hoping for a reconciliation where the people who don't like
>> AppArmor live with it by not using it. AppArmor is not intended to
>> replace SELinux, it is intended to address a different set of goals.
>> 
> You keep saying that.   But for that to be true you'd have to believe
> _everyone_ using Novell distributions has needs that align exactly
> with AppArmor.  Otherwise, how to explain that you don't offer and
> support both SELinux and AppArmor to your users?
>   
They are meant to co-exist in the Linux kernel source tree.

It is a fact that there exist use cases where AppArmor is incapable of
meeting the need and SELinux is just the right thing. It is Novell's
business judgment that there are not enough of those situations in our
customer base to be worth the additional expense at this time.

But we do not want to prevent other people from using SELinux if it
suits them. Linux is about choice, and that is especially vital in
security. As Linus himself observed when LSM was started, there are a
lot of security models, they have various strengths and weaknesses, and
often are not compatible with each other. That is why it is important
that LSM persist, that SELinux not be the only in-tree user of LSM, and
why we think AppArmor should be included upstream, so that non-SUSE
users can also use AppArmor if it suits them.

Crispin

-- 
Crispin Cowan, Ph.D.   http://crispincowan.com/~crispin/
Director of Software Engineering   http://novell.com
AppArmor Chat: irc.oftc.net/#apparmor

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] fsblock

2007-06-27 Thread David Chinner
On Wed, Jun 27, 2007 at 07:50:56AM -0400, Chris Mason wrote:
> On Wed, Jun 27, 2007 at 07:32:45AM +0200, Nick Piggin wrote:
> > On Tue, Jun 26, 2007 at 08:34:49AM -0400, Chris Mason wrote:
> > > On Tue, Jun 26, 2007 at 07:23:09PM +1000, David Chinner wrote:
> > > > On Tue, Jun 26, 2007 at 01:55:11PM +1000, Nick Piggin wrote:
> > > 
> > > [ ... fsblocks vs extent range mapping ]
> > > 
> > > > iomaps can double as range locks simply because iomaps are
> > > > expressions of ranges within the file.  Seeing as you can only
> > > > access a given range exclusively to modify it, inserting an empty
> > > > mapping into the tree as a range lock gives an effective method of
> > > > allowing safe parallel reads, writes and allocation into the file.
> > > > 
> > > > The fsblocks and the vm page cache interface cannot be used to
> > > > facilitate this because a radix tree is the wrong type of tree to
> > > > store this information in. A sparse, range based tree (e.g. btree)
> > > > is the right way to do this and it matches very well with
> > > > a range based API.
> > > 
> > > I'm really not against the extent based page cache idea, but I kind of
> > > assumed it would be too big a change for this kind of generic setup.  At
> > > any rate, if we'd like to do it, it may be best to ditch the idea of
> > > "attach mapping information to a page", and switch to "lookup mapping
> > > information and range locking for a page".
> > 
> > Well the get_block equivalent API is extent based one now, and I'll
> > look at what is required in making map_fsblock a more generic call
> > that could be used for an extent-based scheme.
> > 
> > An extent based thing IMO really isn't appropriate as the main generic
> > layer here though. If it is really useful and popular, then it could
> > be turned into generic code and sit along side fsblock or underneath
> > fsblock...
> 
> Lets look at a typical example of how IO actually gets done today,
> starting with sys_write():
> 
> sys_write(file, buffer, 1MB)
> for each page:
> prepare_write()
>   allocate contiguous chunks of disk
> attach buffers
> copy_from_user()
> commit_write()
> dirty buffers
> 
> pdflush:
> writepages()
> find pages with contiguous chunks of disk
>   build and submit large bios
> 
> So, we replace prepare_write and commit_write with an extent based api,
> but we keep the dirty each buffer part.  writepages has to turn that
> back into extents (bio sized), and the result is completely full of dark
> dark corner cases.

Yup - I've been on the painful end of those dark corner cases several
times in the last few months.

It's also worth pointing out that mpage_readpages() already works on
an extent basis - it overloads bufferheads to provide a "map_bh" that
can point to a range of blocks in the same state. The code then iterates
the map_bh range a page at a time building bios (i.e. not even using
buffer heads) from that map..

> I do think fsblocks is a nice cleanup on its own, but Dave has a good
> point that it makes sense to look for ways generalize things even more.

*nod*

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 00/44] AppArmor security module overview

2007-06-27 Thread Sean
On Wed, 27 Jun 2007 14:06:04 -0700
Crispin Cowan <[EMAIL PROTECTED]> wrote:

> I am hoping for a reconciliation where the people who don't like
> AppArmor live with it by not using it. AppArmor is not intended to
> replace SELinux, it is intended to address a different set of goals.

You keep saying that.   But for that to be true you'd have to believe
_everyone_ using Novell distributions has needs that align exactly
with AppArmor.  Otherwise, how to explain that you don't offer and
support both SELinux and AppArmor to your users?

It seems as far as Novell is concerned, AppArmor _is_ meant to replace
SELinux.  Not that there is really anything wrong with that, but it's
a little disingenuous to then argue that they're meant to coexist.

Sean
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 00/44] AppArmor security module overview

2007-06-27 Thread Crispin Cowan
Adrian Bunk wrote:
> On Tue, Jun 26, 2007 at 07:47:00PM -0700, Andrew Morton wrote:
>   
>> Do you agree with the "irreconcilable" part?  I think I do. 
I am hoping for a reconciliation where the people who don't like
AppArmor live with it by not using it. AppArmor is not intended to
replace SELinux, it is intended to address a different set of goals.

>> I suspect that we're at the stage of having to decide between
>>
>> a) set aside the technical issues and grudgingly merge this stuff as a
>>service to Suse and to their users (both of which entities are very
>>important to us) and leave it all as an object lesson in
>>how-not-to-develop-kernel-features.
>>
>>Minimisation of the impact on the rest of the kernel is of course
>>very important here.
>>
>> versus
>>
>> b) leave it out and require that Suse wear the permanent cost and
>>quality impact of maintaining it out-of-tree.  It will still be an
>>object lesson in how-not-to-develop-kernel-features.
>> ...
> versus
>
> c) if [1] AppArmor is considered to be something that wouldn't 
>be merged if it wasn't already widely deployed by Suse: leave it out, 
>work on an ideal solution [2], and let Suse wear the one-time cost
>of migrating their users to the ideal solution
>   
We argue that the proposed patch is a viable solution for providing
AppArmor functionality. We would be happy for specific suggestions on
how to make it better.

> I'm not claiming to understand the technical details, but from both 
> slightly reading over the previous discussions and the "What are the 
> advantages of AppArmor over SELinux?" section in the AppArmor FAQ [3] my 
> impression is that a main advantage of AppArmor are more user friendly 
> userspace tools. Therefore, if [1] AppArmor is considered technically 
> inferior to SELinux, it might still become more popular than SELinux 
> simply because it's easier to use - and although it's technically 
> inferior.  
AppArmor's advantages come from the model, not the tools. AppArmor is
not inferior to SELinux, it is different than SELinux. Neither can
replace the other without horrid kludges.

Crispin

-- 
Crispin Cowan, Ph.D.   http://crispincowan.com/~crispin/
Director of Software Engineering   http://novell.com
AppArmor Chat: irc.oftc.net/#apparmor

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD 0/4] AppArmor - Don't pass NULL nameidata to vfs_create/lookup/permission IOPs

2007-06-27 Thread Andreas Gruenbacher
On Wednesday 27 June 2007 01:46, Trond Myklebust wrote:
> On Tue, 2007-06-26 at 16:15 -0700, [EMAIL PROTECTED] wrote:
> > To remove conditionally passing of vfsmounts to the LSM, a nameidata
> > struct can be instantiated in the nfsd and mqueue filesystems.  This
> > however results in useless information being passed down, as not
> > all fields in the nameidata struct will be meaingful.  The nameidata
> > struct is split creating struct nameidata2 that contains only the
> > fields
> > that will carry meaningful information.
> 
> I don't object to the concept per se, but could you please give it a
> more descriptive name please? "struct vfs_intent" would be a lot more
> accurate than "nameidata2".

Yes, the name is pretty arbitrary. vfs_intent is better, even though the 
struct doesn't only include the intent data.

Thanks,
Andreas
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] fsblock

2007-06-27 Thread Anton Altaparmakov

On 27 Jun 2007, at 12:50, Chris Mason wrote:

On Wed, Jun 27, 2007 at 07:32:45AM +0200, Nick Piggin wrote:

On Tue, Jun 26, 2007 at 08:34:49AM -0400, Chris Mason wrote:

On Tue, Jun 26, 2007 at 07:23:09PM +1000, David Chinner wrote:

On Tue, Jun 26, 2007 at 01:55:11PM +1000, Nick Piggin wrote:


[ ... fsblocks vs extent range mapping ]


iomaps can double as range locks simply because iomaps are
expressions of ranges within the file.  Seeing as you can only
access a given range exclusively to modify it, inserting an empty
mapping into the tree as a range lock gives an effective method of
allowing safe parallel reads, writes and allocation into the file.

The fsblocks and the vm page cache interface cannot be used to
facilitate this because a radix tree is the wrong type of tree to
store this information in. A sparse, range based tree (e.g. btree)
is the right way to do this and it matches very well with
a range based API.


I'm really not against the extent based page cache idea, but I  
kind of
assumed it would be too big a change for this kind of generic  
setup.  At

any rate, if we'd like to do it, it may be best to ditch the idea of
"attach mapping information to a page", and switch to "lookup  
mapping

information and range locking for a page".


Well the get_block equivalent API is extent based one now, and I'll
look at what is required in making map_fsblock a more generic call
that could be used for an extent-based scheme.

An extent based thing IMO really isn't appropriate as the main  
generic

layer here though. If it is really useful and popular, then it could
be turned into generic code and sit along side fsblock or underneath
fsblock...


Lets look at a typical example of how IO actually gets done today,
starting with sys_write():


Yes, this is very inefficient which is one of the reasons I don't use  
the generic file write helpers in NTFS.  The other reasons are that  
supporting larger logical block sizes than PAGE_CACHE_SIZE becomes a  
pain if it is not done this way when the write targets a hole as that  
requires all pages in the hole to be locked simultaneously which  
would mean dropping the page lock to acquire the others that are of  
lower page index and to then re-take the page lock which is horrible  
- much better to lock all at once from the outset and the other  
reason is that in NTFS there is such a thing as the initialized size  
of an attribute which basically states "anything past this byte  
offset must be returned as 0 on read, i.e. it does not have to be  
read from disk at all, and on write beyond the initialized_size you  
have to zero on disk everything between the old initialized size and  
the start of the write before you begin writing and certainly before  
you update the initalized_size otherwise a concurrent read would see  
random old data from the disk.


For NTFS this effectively becomes:


sys_write(file, buffer, 1MB)


allocate space for the entire 1MB write

if write offset past the initialized_size zero out on disk starting  
at initialized_size up to the start offset for the write and update  
the initialized size to be equal to the start offset of the write


do {
	if (current position is in a hole and the NTFS logical block size is  
> PAGE_CACHE_SIZE) {

work on (NTFS logical block size / PAGE_CACHE_SIZE) pages in 
one go;
do_pages = vol->cluster_size / PAGE_CACHE_SIZE;
} else {
work on only one page;
do_pages = 1;
}
	fault in for read (do_pages*PAGE_CACHE_SIZE) bytes worth of source  
pages

grab do_pages worth of pages
prepare_write - attach buffers to grabbed pages
copy data from source to grabbed&prepared pages
commit_write the copied pages by dirtying their buffers
} while (data left to write);

The allocation in advance is a huge win both in terms of avoiding  
fragmentation (NTFS still uses a very simple/stupid allocator so you  
get a lot of fragmentation if two processes write to different files  
simultaneously and do so in small chunks) and in terms of performance.


I have wondered whether I should perhaps turn on the "multi page"  
stuff on for all writes rather than just for ones that go into a hole  
and the logical size is greater than the PAGE_CACHE_SIZE as that  
might improve performance even further but I haven't had the time/ 
inclination to experiment...


And I have also wondered whether to go direct to bio/wholes pages at  
once instead of bothering with dirtying each buffer but the buffers  
(which are always 512 bytes on NTFS) allow me to easily support  
dirtying smaller parts of the page which is desired at least on  
volumes with a logical block size < PAGE_CACHE_SIZE as different bits  
of the page could then reside on completely different locations on  
disk so writing out unneeded bits of the page could result in a lot  
of wasted disk head seek times.


Best regards,

Anton


for each page:

Re: [AppArmor 00/44] AppArmor security module overview

2007-06-27 Thread Adrian Bunk
On Tue, Jun 26, 2007 at 07:47:00PM -0700, Andrew Morton wrote:
> On Tue, 26 Jun 2007 19:24:03 -0700 John Johansen <[EMAIL PROTECTED]> wrote:
> 
> > > 
> > > so...  where do we stand with this?  Fundamental, irreconcilable
> > > differences over the use of pathname-based security?
> > > 
> > There certainly seems to be some differences of opinion over the use
> > of pathname-based-security.
> 
> I was refreshed to have not been cc'ed on a lkml thread for once.  I guess
> it couldn't last.
> 
> Do you agree with the "irreconcilable" part?  I think I do.
> 
> I suspect that we're at the stage of having to decide between
> 
> a) set aside the technical issues and grudgingly merge this stuff as a
>service to Suse and to their users (both of which entities are very
>important to us) and leave it all as an object lesson in
>how-not-to-develop-kernel-features.
> 
>Minimisation of the impact on the rest of the kernel is of course
>very important here.
> 
> versus
> 
> b) leave it out and require that Suse wear the permanent cost and
>quality impact of maintaining it out-of-tree.  It will still be an
>object lesson in how-not-to-develop-kernel-features.
>...

versus

c) if [1] AppArmor is considered to be something that wouldn't 
   be merged if it wasn't already widely deployed by Suse: leave it out, 
   work on an ideal solution [2], and let Suse wear the one-time cost
   of migrating their users to the ideal solution

One important point is that if AppArmor gets merged there will be much 
more distribution support for it, and many people on !Suse will start 
using it.

I'm not claiming to understand the technical details, but from both 
slightly reading over the previous discussions and the "What are the 
advantages of AppArmor over SELinux?" section in the AppArmor FAQ [3] my 
impression is that a main advantage of AppArmor are more user friendly 
userspace tools. Therefore, if [1] AppArmor is considered technically 
inferior to SELinux, it might still become more popular than SELinux 
simply because it's easier to use - and although it's technically 
inferior.

cu
Adrian

[1] note the "if"
[2] could be, but not necessarily, SELinux
[3] http://developer.novell.com/wiki/index.php/Apparmor_FAQ

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 00/44] AppArmor security module overview

2007-06-27 Thread Andreas Gruenbacher
On Wednesday 27 June 2007 12:58, Kyle Moffett wrote:
> I seem to recall you could actually end up racing and building a path
> to the file in those directories as "a/d/0/3" or some other path at
> which it never even remotely existed. I'd love to be wrong,

Cheer up, you recall wrong.

> but I can't help but see this problem in any reverse-pathname-generation
> proposal which gets the locking right.

Have a look at how __d_path() is implemented (with the fixes): It takes the 
dcache_lock, and the vfsmount_lock where necessary, and this ensures that the 
pathname can't change under it, neither because of a rename nor unlink nor 
remount. The pathname computed is *exactly* the name the file has at that 
specific point time.

A few more details about how pathnames work are explained in the tech doc at:
http://forge.novell.com/modules/xfcontent/downloads.php/apparmor/LKML_Submission-May_07

Andreas
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7][TAKE5] support new modes in fallocate

2007-06-27 Thread David Chinner
On Tue, Jun 26, 2007 at 11:49:15PM -0400, Andreas Dilger wrote:
> On Jun 27, 2007  09:14 +1000, David Chinner wrote:
> > Someone on the XFs list had an interesting request - preallocated
> > swap files. You can't use unwritten extents for this because
> > of sys_swapon()s use of bmap() (XFS returns holes for reading
> > unwritten extents), so we need a method of preallocating that does
> > not zero or mark the extent unread. i.e. FA_MKSWAP.
> 
> Is there a reason why unwritten extents return 0 to bmap()?

It's a fallout of xfs_get_blocks not mapping unwritten extents
on read because we want do_mpage_readpage() to treat them
as a hole. i.e. zero fill them instead of doing I/O. This is
the way XFS was shoehorned into the generic read path :/

> This
> would seem to be the only impediment from using fallocated files
> for swap files.  Maybe if FIEMAP was used by mkswap to get an
> "UNWRITTEN" flag back instead of "HOLE" it wouldn't be a problem.

Probably. If we taught do_mpage_readpage() about unwritten mappings,
then would could map them on read if and then sys_swapon can remain
blissfully unaware of unwritten extents.

I think this is pretty much all I need to do to acheive that is
(untested):

---

Teach do_mpage_readpage() about unwritten extents so we can
always map them in get_blocks rather than they are are holes on
read. Allows setup_swap_extents() to use preallocated files on XFS
filesystems for swap files without ever needing to convert them.

Signed-Off-By: Dave Chinner <[EMAIL PROTECTED]>

---
 fs/mpage.c  |5 +++--
 fs/xfs/linux-2.6/xfs_aops.c |   13 +++--
 2 files changed, 6 insertions(+), 12 deletions(-)

Index: 2.6.x-xfs-new/fs/mpage.c
===
--- 2.6.x-xfs-new.orig/fs/mpage.c   2007-05-29 16:17:59.0 +1000
+++ 2.6.x-xfs-new/fs/mpage.c2007-06-27 22:39:35.568852270 +1000
@@ -207,7 +207,8 @@ do_mpage_readpage(struct bio *bio, struc
 * Map blocks using the result from the previous get_blocks call first.
 */
nblocks = map_bh->b_size >> blkbits;
-   if (buffer_mapped(map_bh) && block_in_file > *first_logical_block &&
+   if (buffer_mapped(map_bh) && !buffer_unwritten(map_bh) &&
+   block_in_file > *first_logical_block &&
block_in_file < (*first_logical_block + nblocks)) {
unsigned map_offset = block_in_file - *first_logical_block;
unsigned last = nblocks - map_offset;
@@ -242,7 +243,7 @@ do_mpage_readpage(struct bio *bio, struc
*first_logical_block = block_in_file;
}
 
-   if (!buffer_mapped(map_bh)) {
+   if (!buffer_mapped(map_bh) || buffer_unwritten(map_bh)) {
fully_mapped = 0;
if (first_hole == blocks_per_page)
first_hole = page_block;
Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_aops.c
===
--- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_aops.c  2007-06-05 
22:14:39.0 +1000
+++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_aops.c   2007-06-27 22:39:29.545636749 
+1000
@@ -1340,16 +1340,9 @@ __xfs_get_blocks(
return 0;
 
if (iomap.iomap_bn != IOMAP_DADDR_NULL) {
-   /*
-* For unwritten extents do not report a disk address on
-* the read case (treat as if we're reading into a hole).
-*/
-   if (create || !(iomap.iomap_flags & IOMAP_UNWRITTEN)) {
-   xfs_map_buffer(bh_result, &iomap, offset,
-  inode->i_blkbits);
-   }
-   if (create && (iomap.iomap_flags & IOMAP_UNWRITTEN)) {
-   if (direct)
+   xfs_map_buffer(bh_result, &iomap, offset, inode->i_blkbits);
+   if (iomap.iomap_flags & IOMAP_UNWRITTEN) {
+   if (create && direct)
bh_result->b_private = inode;
set_buffer_unwritten(bh_result);
}


Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] fsblock

2007-06-27 Thread Kyle Moffett

On Jun 26, 2007, at 07:14:14, Nick Piggin wrote:

On Tue, Jun 26, 2007 at 07:23:09PM +1000, David Chinner wrote:
Can we call it a block mapping layer or something like that? e.g.  
struct blkmap?


I'm not fixed on fsblock, but blkmap doesn't grab me either. It is  
a map from the pagecache to the block layer, but blkmap sounds like  
it is a map from the block to somewhere.


fsblkmap ;)


vmblock? pgblock?

Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] fsblock

2007-06-27 Thread Chris Mason
On Wed, Jun 27, 2007 at 07:32:45AM +0200, Nick Piggin wrote:
> On Tue, Jun 26, 2007 at 08:34:49AM -0400, Chris Mason wrote:
> > On Tue, Jun 26, 2007 at 07:23:09PM +1000, David Chinner wrote:
> > > On Tue, Jun 26, 2007 at 01:55:11PM +1000, Nick Piggin wrote:
> > 
> > [ ... fsblocks vs extent range mapping ]
> > 
> > > iomaps can double as range locks simply because iomaps are
> > > expressions of ranges within the file.  Seeing as you can only
> > > access a given range exclusively to modify it, inserting an empty
> > > mapping into the tree as a range lock gives an effective method of
> > > allowing safe parallel reads, writes and allocation into the file.
> > > 
> > > The fsblocks and the vm page cache interface cannot be used to
> > > facilitate this because a radix tree is the wrong type of tree to
> > > store this information in. A sparse, range based tree (e.g. btree)
> > > is the right way to do this and it matches very well with
> > > a range based API.
> > 
> > I'm really not against the extent based page cache idea, but I kind of
> > assumed it would be too big a change for this kind of generic setup.  At
> > any rate, if we'd like to do it, it may be best to ditch the idea of
> > "attach mapping information to a page", and switch to "lookup mapping
> > information and range locking for a page".
> 
> Well the get_block equivalent API is extent based one now, and I'll
> look at what is required in making map_fsblock a more generic call
> that could be used for an extent-based scheme.
> 
> An extent based thing IMO really isn't appropriate as the main generic
> layer here though. If it is really useful and popular, then it could
> be turned into generic code and sit along side fsblock or underneath
> fsblock...

Lets look at a typical example of how IO actually gets done today,
starting with sys_write():

sys_write(file, buffer, 1MB)
for each page:
prepare_write()
allocate contiguous chunks of disk
attach buffers
copy_from_user()
commit_write()
dirty buffers

pdflush:
writepages()
find pages with contiguous chunks of disk
build and submit large bios

So, we replace prepare_write and commit_write with an extent based api,
but we keep the dirty each buffer part.  writepages has to turn that
back into extents (bio sized), and the result is completely full of dark
dark corner cases.

I do think fsblocks is a nice cleanup on its own, but Dave has a good
point that it makes sense to look for ways generalize things even more.

-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 00/44] AppArmor security module overview

2007-06-27 Thread Kyle Moffett

On Jun 26, 2007, at 22:24:03, John Johansen wrote:

other issues that have been raised are:
- the use of d_path to generate the pathname used for mediation when a
  file is opened.
  - Generating the pathname using a reverse walk is considered ugly


A little more than "ugly".  In this basic concurrent rename() and  
path-lookup load:


mkdir -p /a/b/0
mkdir -p /a/b/2
mkdir -p /c
touch /a/b/0/1

cd /a/b
while true; mv 0/1 2/3; mv 2/3 0/1; done &
cd /
while true; do mv a/b c/d; mv c/d a/b; done &
while true; do cat a/b/0/1 & done
while true; do cat a/b/2/3 & done
while true; do cat c/d/0/1 & done
while true; do cat c/d/2/3 & done

I seem to recall you could actually end up racing and building a path  
to the file in those directories as "a/d/0/3" or some other path at  
which it never even remotely existed.  I'd love to be wrong, but I  
can't help but see this problem in any reverse-pathname-generation  
proposal which gets the locking right.


Cheers,
Kyle Moffett

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html