Re: [AppArmor 00/44] AppArmor security module overview
Any chance you can remove linux-fsdevel from the CC list? I don't think this has anything to do with filesystems. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
DIO panic on 2.6.21.5
Hi Zach, One of our perf. team ran into this while doing some runs. I didn't see anything obvious - it looks like we converted async IO to synchronous one. I didn't spend much time digging around. Is this a known issue ? Any ideas ? Thanks, Badari [ cut here ] kernel BUG at fs/direct-io.c:1113! invalid opcode: [1] SMP CPU 11 Modules linked in: oprofile raw capability commoncap qla2xxx ipv6 button battery ac loop dm_mod tg3 ext3 jbd edd fan thermal processor scsi_transport_fc sg aic94xx libsas firmware_class scsi_transport_sas serverworks sd_mod scsePid: 9709, comm: db2sysc Not tainted 2.6.21.5-smp #1 RIP: 0010:[] [] __blockdev_direct_IO+0x96c/0xa4a RSP: 0018:810c3d3efc08 EFLAGS: 00010202 RAX: 0246 RBX: 810041768b24 RCX: 80406918 RDX: RSI: 0246 RDI: 810041768b24 RBP: 810041768800 R08: 0086 R09: 81003f0a2370 R10: 810c3f3ee9d8 R11: 802e9716 R12: 0001 R13: fdef R14: 810c5beb42b0 R15: FS: 2b3d14ffe940() GS:810c444c95c0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 2b3e84f57000 CR3: 4511c000 CR4: 06e0 Process db2sysc (pid: 9709, threadinfo 810c3d3ee000, task 810041fee760) Stack: 810c42f0fbc0 4c1f4000 811836306d48 0011803e7f95 0009 0008 810041fee760 00090001 0001 810024ac9948 Call Trace: [] blkdev_direct_IO+0x45/0x4a [] blkdev_get_blocks+0x0/0x96 [] generic_file_direct_IO+0xbd/0xfb [] generic_file_direct_write+0x60/0xf5 [] __generic_file_aio_write_nolock+0x2e7/0x410 [] aio_read_evt+0x9a/0x108 [] generic_file_aio_write_nolock+0x34/0x80 [] generic_file_aio_write_nolock+0x0/0x80 [] aio_rw_vect_retry+0x72/0x14a [] aio_run_iocb+0x9a/0x134 [] io_submit_one+0x311/0x35e [] sys_io_submit+0x9b/0x101 [] default_wake_function+0x0/0xe [] system_call+0x7e/0x83 Code: 0f 0b eb fe 4d 85 e4 75 1d 48 8b 74 24 08 44 89 ea 48 89 ef RIP [] __blockdev_direct_IO+0x96c/0xa4a RSP - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] fsblock
On Thu, Jun 28, 2007 at 08:35:48AM +1000, David Chinner wrote: > On Wed, Jun 27, 2007 at 07:50:56AM -0400, Chris Mason wrote: > > Lets look at a typical example of how IO actually gets done today, > > starting with sys_write(): > > > > sys_write(file, buffer, 1MB) > > for each page: > > prepare_write() > > allocate contiguous chunks of disk > > attach buffers > > copy_from_user() > > commit_write() > > dirty buffers > > > > pdflush: > > writepages() > > find pages with contiguous chunks of disk > > build and submit large bios > > > > So, we replace prepare_write and commit_write with an extent based api, > > but we keep the dirty each buffer part. writepages has to turn that > > back into extents (bio sized), and the result is completely full of dark > > dark corner cases. That's true but I don't think an extent data structure means we can become too far divorced from the pagecache or the native block size -- what will end up happening is that often we'll need "stuff" to map between all those as well, even if it is only at IO-time. But the point is taken, and I do believe that at least for APIs, extent based seems like the best way to go. And that should allow fsblock to be replaced or augmented in future without _too_ much pain. > Yup - I've been on the painful end of those dark corner cases several > times in the last few months. > > It's also worth pointing out that mpage_readpages() already works on > an extent basis - it overloads bufferheads to provide a "map_bh" that > can point to a range of blocks in the same state. The code then iterates > the map_bh range a page at a time building bios (i.e. not even using > buffer heads) from that map.. One issue I have with the current nobh and mpage stuff is that it requires multiple calls into get_block (first to prepare write, then to writepage), it doesn't allow filesystems to attach resources required for writeout at prepare_write time, and it doesn't play nicely with buffers in general. (not to mention that nobh error handling is buggy). I haven't done any mpage-like code for fsblocks yet, but I think they wouldn't be too much trouble, and wouldn't have any of the above problems... - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC:PATCH] How best to handle implicit clearing of setuid/setgid bits on NFS?
On Wed, 27 Jun 2007 18:15:55 -0400 Trond Myklebust <[EMAIL PROTECTED]> wrote: > On Tue, 2007-05-29 at 12:47 -0400, Jeff Layton wrote: > > I've been looking at issue of clearing setuid/setgid bits when a file > > is written to on NFS. Here's the problem in a nutshell: > > > > We have 2 users. test1 and test2. Both are members of the group > > "testgrp": > > > > [EMAIL PROTECTED] ls -l f1 > > -rwxrwsr-x 1 test1 testgrp 2 2007-05-29 12:23 f1 > > [EMAIL PROTECTED] echo foo > f1 > > -bash: f1: Permission denied > > > > ...and f1 is unchanged. The problem is that the VFS calls remove_suid > > to wipe the setgid bit. This ends up causing a SETATTR call, which > > fails on NFS because we're attempting to remove these bits as user > > "test2". > > > > Until recently, the situation here was worse. The VFS would truncate > > the file first and then try to clear the setgid bit. The truncate would > > succeed, but the perm change would fail. You'd end up with a zero-length > > file. This was fixed my making the size change and bit-clearing go via > > the same setattr call, so the whole operation just errors out now. > > > > My question is -- Is there anything we can do to make this work as it > > does on a local filesystem? Ideally there would be some way to tell the > > server "clear the setuid/gid bits", without actually modifying the > > contents of the file. Is there a NFS call we can use that would do this? > > > > The only thing I can think of is to read the first byte of the file and > > then overwrite it with the same data, but that seems racy and may have > > other problems (and what do you do with a zero-length, setuid file?). > > > > Any suggestions appreciated... > > The answer should be simple: leave it to the server. All servers I know > (except possibly Hummingbird) will clear the setuid/setgid bit whenever > the file is modified. Anything else would be _extremely_ racy anyway: > Consider the scenario where you are preparing to write to the file, when > suddenly a user on another client makes the file setuid after you have > open()ed the file, but before you actually issue the write(). > > IOW: calling remove_suid() on the client is completely redundant, and > should be suppressed. > > Trond > Ok. This is a bit more complex now since we remove suid bits on truncate, but don't set ATTR_FORCE. Here's a patch that should do this. I know there's a general aversion to adding new flags to vfs structures, but I couldn't think of a way to cleanly do this without adding one. Note that I've not tested this patch at all so this is just a RFC. CC'ing Al here since he's expressed interest in this problem as well. Thoughts? Signed-off-by: Jeff Layton <[EMAIL PROTECTED]> diff --git a/fs/nfs/super.c b/fs/nfs/super.c index ca20d3c..afdd82e 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -71,7 +71,7 @@ static struct file_system_type nfs_fs_type = { .name = "nfs", .get_sb = nfs_get_sb, .kill_sb= nfs_kill_super, - .fs_flags = FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA, + .fs_flags = FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA|FS_DONT_REMOVE_SUID, }; struct file_system_type nfs_xdev_fs_type = { @@ -79,7 +79,7 @@ struct file_system_type nfs_xdev_fs_type = { .name = "nfs", .get_sb = nfs_xdev_get_sb, .kill_sb= nfs_kill_super, - .fs_flags = FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA, + .fs_flags = FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA|FS_DONT_REMOVE_SUID, }; static const struct super_operations nfs_sops = { @@ -107,7 +107,7 @@ static struct file_system_type nfs4_fs_type = { .name = "nfs4", .get_sb = nfs4_get_sb, .kill_sb= nfs4_kill_super, - .fs_flags = FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA, + .fs_flags = FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA|FS_DONT_REMOVE_SUID, }; struct file_system_type nfs4_xdev_fs_type = { @@ -115,7 +115,7 @@ struct file_system_type nfs4_xdev_fs_type = { .name = "nfs4", .get_sb = nfs4_xdev_get_sb, .kill_sb= nfs4_kill_super, - .fs_flags = FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA, + .fs_flags = FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA|FS_DONT_REMOVE_SUID, }; struct file_system_type nfs4_referral_fs_type = { @@ -123,7 +123,7 @@ struct file_system_type nfs4_referral_fs_type = { .name = "nfs4", .get_sb = nfs4_referral_get_sb, .kill_sb= nfs4_kill_super, - .fs_flags = FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA, + .fs_flags = FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA|FS_DONT_REMOVE_SUID, }; static const struct super_operations nfs4_sops = { diff --git a/include/lin
Re: [PATCH 4/7][TAKE5] support new modes in fallocate
On Thu, 2007-06-28 at 10:39 +1000, David Chinner wrote: > > > I don't think it does - swapfile I/O looks like it goes direct to > bio without passing through the filesystem. When the swapfile is > mapped, it scans and records the extent map of the entire swapfile > in a separate structure and AFAICT the swap code uses that built map > without touching the filesystem at all. > > If that is true then the written/unwritten state of the extents is > irrelevant; all we need is allocated disk space for the file and > swapping should work. And it's not like anyone should be reading > the contents of that swapfile through the filesystem, either. ;) Ah, yes, good point - thats true. Unwritten extents are ideal for this then, as attempts to read swap via the regular interfaces will return zeros instead of random swapped out memory contents. cheers. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/7][TAKE5] support new modes in fallocate
On Thu, Jun 28, 2007 at 09:28:36AM +1000, Nathan Scott wrote: > On Wed, 2007-06-27 at 23:36 +1000, David Chinner wrote: > > Allows setup_swap_extents() to use preallocated files on XFS > > filesystems for swap files without ever needing to convert them. > > Using unwritten extents (as opposed to the MKSWAP flag mentioned > earlier) has the unfortunate down side of requiring transactions, > possibly additional IO, and memory allocation during swap. (but, > this patch should probably go in regardless, as teaching generic > code about unwritten extents is not a bad idea). I don't think it does - swapfile I/O looks like it goes direct to bio without passing through the filesystem. When the swapfile is mapped, it scans and records the extent map of the entire swapfile in a separate structure and AFAICT the swap code uses that built map without touching the filesystem at all. If that is true then the written/unwritten state of the extents is irrelevant; all we need is allocated disk space for the file and swapping should work. And it's not like anyone should be reading the contents of that swapfile through the filesystem, either. ;) Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 00/44] AppArmor security module overview
From: Casey Schaufler <[EMAIL PROTECTED]> Date: Wed, 27 Jun 2007 17:27:17 -0700 (PDT) > --- David Miller <[EMAIL PROTECTED]> wrote: > > > Neither of those are reasons why something should go into the tree. > > They reflect the corporate reality of the open source community. > If you're going to go down the "open source isn't for money" > rathole please take it elsewhere. I've heard the arguments so many > times I can sing them to the tune of "Lady Madonna". That's a wonderful technical reason for apparmour to go into the tree. Thanks for sharing and making it even clearer what is really behind this submission. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 00/44] AppArmor security module overview
--- David Miller <[EMAIL PROTECTED]> wrote: > From: Crispin Cowan <[EMAIL PROTECTED]> > Date: Wed, 27 Jun 2007 15:46:57 -0700 > > > But we do not want to prevent other people from using SELinux if it > > suits them. Linux is about choice, and that is especially vital in > > security. As Linus himself observed when LSM was started, there are a > > lot of security models, they have various strengths and weaknesses, and > > often are not compatible with each other. That is why it is important > > that LSM persist, that SELinux not be the only in-tree user of LSM, and > > why we think AppArmor should be included upstream, so that non-SUSE > > users can also use AppArmor if it suits them. > > Anyone can apply the apparmour patch to their tree, they get the > choice that way. Nobody is currently prevented from using apparmour > if they want to, any such suggestion is pure rubbish. The exact same argument was made prior to SELinux going upstream. Look, if you can't be right, try at least to be original. > It is even more incredulious to imply that just by having apparmour > in the upstream kernel all the userland bits will magically appear > on every user's distribution. Just like all the SELinux userland magically appeared in everyone's distribution? Nope, didn't happen. > Give me a break. No. You are out of line and spewing ignorance. > What you get by the code going into the upstream kernel tree is that > it a) adds some pseudo legitimacy to AppArmour (which I don't > personally think is warranted) and b) gets the work of keeping > apparmour working with upstream largely off of your back and in the > hands of the upstream community. Duh. Those are pretty much the reasons anyone goes through the trouble of getting anything upstream. > Neither of those are reasons why something should go into the tree. They reflect the corporate reality of the open source community. If you're going to go down the "open source isn't for money" rathole please take it elsewhere. I've heard the arguments so many times I can sing them to the tune of "Lady Madonna". > Frankly I think AppArmour is a joke, "SELinux, AppArmor, and Hilary Clinton walk into a bar ..." > and all of this integration with > LSM business is just a face saving effort, nothing more. And saving > face is not, and has never been, a reason for something to be put into > the upstream tree. Believe what you will. Crispin has been working with LSM from the inception those many years ago. He's been working on getting this module in for over a year. If you don't like his module go write your own and put him out of business. Casey Schaufler [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/7][TAKE5] support new modes in fallocate
On Wed, 2007-06-27 at 23:36 +1000, David Chinner wrote: > Allows setup_swap_extents() to use preallocated files on XFS > filesystems for swap files without ever needing to convert them. Using unwritten extents (as opposed to the MKSWAP flag mentioned earlier) has the unfortunate down side of requiring transactions, possibly additional IO, and memory allocation during swap. (but, this patch should probably go in regardless, as teaching generic code about unwritten extents is not a bad idea). cheers. -- Nathan - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 00/44] AppArmor security module overview
From: Crispin Cowan <[EMAIL PROTECTED]> Date: Wed, 27 Jun 2007 15:46:57 -0700 > But we do not want to prevent other people from using SELinux if it > suits them. Linux is about choice, and that is especially vital in > security. As Linus himself observed when LSM was started, there are a > lot of security models, they have various strengths and weaknesses, and > often are not compatible with each other. That is why it is important > that LSM persist, that SELinux not be the only in-tree user of LSM, and > why we think AppArmor should be included upstream, so that non-SUSE > users can also use AppArmor if it suits them. Anyone can apply the apparmour patch to their tree, they get the choice that way. Nobody is currently prevented from using apparmour if they want to, any such suggestion is pure rubbish. It is even more incredulious to imply that just by having apparmour in the upstream kernel all the userland bits will magically appear on every user's distribution. Give me a break. What you get by the code going into the upstream kernel tree is that it a) adds some pseudo legitimacy to AppArmour (which I don't personally think is warranted) and b) gets the work of keeping apparmour working with upstream largely off of your back and in the hands of the upstream community. Neither of those are reasons why something should go into the tree. Frankly I think AppArmour is a joke, and all of this integration with LSM business is just a face saving effort, nothing more. And saving face is not, and has never been, a reason for something to be put into the upstream tree. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 00/44] AppArmor security module overview
Sean wrote: > On Wed, 27 Jun 2007 14:06:04 -0700 > Crispin Cowan <[EMAIL PROTECTED]> wrote: > >> I am hoping for a reconciliation where the people who don't like >> AppArmor live with it by not using it. AppArmor is not intended to >> replace SELinux, it is intended to address a different set of goals. >> > You keep saying that. But for that to be true you'd have to believe > _everyone_ using Novell distributions has needs that align exactly > with AppArmor. Otherwise, how to explain that you don't offer and > support both SELinux and AppArmor to your users? > They are meant to co-exist in the Linux kernel source tree. It is a fact that there exist use cases where AppArmor is incapable of meeting the need and SELinux is just the right thing. It is Novell's business judgment that there are not enough of those situations in our customer base to be worth the additional expense at this time. But we do not want to prevent other people from using SELinux if it suits them. Linux is about choice, and that is especially vital in security. As Linus himself observed when LSM was started, there are a lot of security models, they have various strengths and weaknesses, and often are not compatible with each other. That is why it is important that LSM persist, that SELinux not be the only in-tree user of LSM, and why we think AppArmor should be included upstream, so that non-SUSE users can also use AppArmor if it suits them. Crispin -- Crispin Cowan, Ph.D. http://crispincowan.com/~crispin/ Director of Software Engineering http://novell.com AppArmor Chat: irc.oftc.net/#apparmor - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] fsblock
On Wed, Jun 27, 2007 at 07:50:56AM -0400, Chris Mason wrote: > On Wed, Jun 27, 2007 at 07:32:45AM +0200, Nick Piggin wrote: > > On Tue, Jun 26, 2007 at 08:34:49AM -0400, Chris Mason wrote: > > > On Tue, Jun 26, 2007 at 07:23:09PM +1000, David Chinner wrote: > > > > On Tue, Jun 26, 2007 at 01:55:11PM +1000, Nick Piggin wrote: > > > > > > [ ... fsblocks vs extent range mapping ] > > > > > > > iomaps can double as range locks simply because iomaps are > > > > expressions of ranges within the file. Seeing as you can only > > > > access a given range exclusively to modify it, inserting an empty > > > > mapping into the tree as a range lock gives an effective method of > > > > allowing safe parallel reads, writes and allocation into the file. > > > > > > > > The fsblocks and the vm page cache interface cannot be used to > > > > facilitate this because a radix tree is the wrong type of tree to > > > > store this information in. A sparse, range based tree (e.g. btree) > > > > is the right way to do this and it matches very well with > > > > a range based API. > > > > > > I'm really not against the extent based page cache idea, but I kind of > > > assumed it would be too big a change for this kind of generic setup. At > > > any rate, if we'd like to do it, it may be best to ditch the idea of > > > "attach mapping information to a page", and switch to "lookup mapping > > > information and range locking for a page". > > > > Well the get_block equivalent API is extent based one now, and I'll > > look at what is required in making map_fsblock a more generic call > > that could be used for an extent-based scheme. > > > > An extent based thing IMO really isn't appropriate as the main generic > > layer here though. If it is really useful and popular, then it could > > be turned into generic code and sit along side fsblock or underneath > > fsblock... > > Lets look at a typical example of how IO actually gets done today, > starting with sys_write(): > > sys_write(file, buffer, 1MB) > for each page: > prepare_write() > allocate contiguous chunks of disk > attach buffers > copy_from_user() > commit_write() > dirty buffers > > pdflush: > writepages() > find pages with contiguous chunks of disk > build and submit large bios > > So, we replace prepare_write and commit_write with an extent based api, > but we keep the dirty each buffer part. writepages has to turn that > back into extents (bio sized), and the result is completely full of dark > dark corner cases. Yup - I've been on the painful end of those dark corner cases several times in the last few months. It's also worth pointing out that mpage_readpages() already works on an extent basis - it overloads bufferheads to provide a "map_bh" that can point to a range of blocks in the same state. The code then iterates the map_bh range a page at a time building bios (i.e. not even using buffer heads) from that map.. > I do think fsblocks is a nice cleanup on its own, but Dave has a good > point that it makes sense to look for ways generalize things even more. *nod* Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 00/44] AppArmor security module overview
On Wed, 27 Jun 2007 14:06:04 -0700 Crispin Cowan <[EMAIL PROTECTED]> wrote: > I am hoping for a reconciliation where the people who don't like > AppArmor live with it by not using it. AppArmor is not intended to > replace SELinux, it is intended to address a different set of goals. You keep saying that. But for that to be true you'd have to believe _everyone_ using Novell distributions has needs that align exactly with AppArmor. Otherwise, how to explain that you don't offer and support both SELinux and AppArmor to your users? It seems as far as Novell is concerned, AppArmor _is_ meant to replace SELinux. Not that there is really anything wrong with that, but it's a little disingenuous to then argue that they're meant to coexist. Sean - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 00/44] AppArmor security module overview
Adrian Bunk wrote: > On Tue, Jun 26, 2007 at 07:47:00PM -0700, Andrew Morton wrote: > >> Do you agree with the "irreconcilable" part? I think I do. I am hoping for a reconciliation where the people who don't like AppArmor live with it by not using it. AppArmor is not intended to replace SELinux, it is intended to address a different set of goals. >> I suspect that we're at the stage of having to decide between >> >> a) set aside the technical issues and grudgingly merge this stuff as a >>service to Suse and to their users (both of which entities are very >>important to us) and leave it all as an object lesson in >>how-not-to-develop-kernel-features. >> >>Minimisation of the impact on the rest of the kernel is of course >>very important here. >> >> versus >> >> b) leave it out and require that Suse wear the permanent cost and >>quality impact of maintaining it out-of-tree. It will still be an >>object lesson in how-not-to-develop-kernel-features. >> ... > versus > > c) if [1] AppArmor is considered to be something that wouldn't >be merged if it wasn't already widely deployed by Suse: leave it out, >work on an ideal solution [2], and let Suse wear the one-time cost >of migrating their users to the ideal solution > We argue that the proposed patch is a viable solution for providing AppArmor functionality. We would be happy for specific suggestions on how to make it better. > I'm not claiming to understand the technical details, but from both > slightly reading over the previous discussions and the "What are the > advantages of AppArmor over SELinux?" section in the AppArmor FAQ [3] my > impression is that a main advantage of AppArmor are more user friendly > userspace tools. Therefore, if [1] AppArmor is considered technically > inferior to SELinux, it might still become more popular than SELinux > simply because it's easier to use - and although it's technically > inferior. AppArmor's advantages come from the model, not the tools. AppArmor is not inferior to SELinux, it is different than SELinux. Neither can replace the other without horrid kludges. Crispin -- Crispin Cowan, Ph.D. http://crispincowan.com/~crispin/ Director of Software Engineering http://novell.com AppArmor Chat: irc.oftc.net/#apparmor - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFD 0/4] AppArmor - Don't pass NULL nameidata to vfs_create/lookup/permission IOPs
On Wednesday 27 June 2007 01:46, Trond Myklebust wrote: > On Tue, 2007-06-26 at 16:15 -0700, [EMAIL PROTECTED] wrote: > > To remove conditionally passing of vfsmounts to the LSM, a nameidata > > struct can be instantiated in the nfsd and mqueue filesystems. This > > however results in useless information being passed down, as not > > all fields in the nameidata struct will be meaingful. The nameidata > > struct is split creating struct nameidata2 that contains only the > > fields > > that will carry meaningful information. > > I don't object to the concept per se, but could you please give it a > more descriptive name please? "struct vfs_intent" would be a lot more > accurate than "nameidata2". Yes, the name is pretty arbitrary. vfs_intent is better, even though the struct doesn't only include the intent data. Thanks, Andreas - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] fsblock
On 27 Jun 2007, at 12:50, Chris Mason wrote: On Wed, Jun 27, 2007 at 07:32:45AM +0200, Nick Piggin wrote: On Tue, Jun 26, 2007 at 08:34:49AM -0400, Chris Mason wrote: On Tue, Jun 26, 2007 at 07:23:09PM +1000, David Chinner wrote: On Tue, Jun 26, 2007 at 01:55:11PM +1000, Nick Piggin wrote: [ ... fsblocks vs extent range mapping ] iomaps can double as range locks simply because iomaps are expressions of ranges within the file. Seeing as you can only access a given range exclusively to modify it, inserting an empty mapping into the tree as a range lock gives an effective method of allowing safe parallel reads, writes and allocation into the file. The fsblocks and the vm page cache interface cannot be used to facilitate this because a radix tree is the wrong type of tree to store this information in. A sparse, range based tree (e.g. btree) is the right way to do this and it matches very well with a range based API. I'm really not against the extent based page cache idea, but I kind of assumed it would be too big a change for this kind of generic setup. At any rate, if we'd like to do it, it may be best to ditch the idea of "attach mapping information to a page", and switch to "lookup mapping information and range locking for a page". Well the get_block equivalent API is extent based one now, and I'll look at what is required in making map_fsblock a more generic call that could be used for an extent-based scheme. An extent based thing IMO really isn't appropriate as the main generic layer here though. If it is really useful and popular, then it could be turned into generic code and sit along side fsblock or underneath fsblock... Lets look at a typical example of how IO actually gets done today, starting with sys_write(): Yes, this is very inefficient which is one of the reasons I don't use the generic file write helpers in NTFS. The other reasons are that supporting larger logical block sizes than PAGE_CACHE_SIZE becomes a pain if it is not done this way when the write targets a hole as that requires all pages in the hole to be locked simultaneously which would mean dropping the page lock to acquire the others that are of lower page index and to then re-take the page lock which is horrible - much better to lock all at once from the outset and the other reason is that in NTFS there is such a thing as the initialized size of an attribute which basically states "anything past this byte offset must be returned as 0 on read, i.e. it does not have to be read from disk at all, and on write beyond the initialized_size you have to zero on disk everything between the old initialized size and the start of the write before you begin writing and certainly before you update the initalized_size otherwise a concurrent read would see random old data from the disk. For NTFS this effectively becomes: sys_write(file, buffer, 1MB) allocate space for the entire 1MB write if write offset past the initialized_size zero out on disk starting at initialized_size up to the start offset for the write and update the initialized size to be equal to the start offset of the write do { if (current position is in a hole and the NTFS logical block size is > PAGE_CACHE_SIZE) { work on (NTFS logical block size / PAGE_CACHE_SIZE) pages in one go; do_pages = vol->cluster_size / PAGE_CACHE_SIZE; } else { work on only one page; do_pages = 1; } fault in for read (do_pages*PAGE_CACHE_SIZE) bytes worth of source pages grab do_pages worth of pages prepare_write - attach buffers to grabbed pages copy data from source to grabbed&prepared pages commit_write the copied pages by dirtying their buffers } while (data left to write); The allocation in advance is a huge win both in terms of avoiding fragmentation (NTFS still uses a very simple/stupid allocator so you get a lot of fragmentation if two processes write to different files simultaneously and do so in small chunks) and in terms of performance. I have wondered whether I should perhaps turn on the "multi page" stuff on for all writes rather than just for ones that go into a hole and the logical size is greater than the PAGE_CACHE_SIZE as that might improve performance even further but I haven't had the time/ inclination to experiment... And I have also wondered whether to go direct to bio/wholes pages at once instead of bothering with dirtying each buffer but the buffers (which are always 512 bytes on NTFS) allow me to easily support dirtying smaller parts of the page which is desired at least on volumes with a logical block size < PAGE_CACHE_SIZE as different bits of the page could then reside on completely different locations on disk so writing out unneeded bits of the page could result in a lot of wasted disk head seek times. Best regards, Anton for each page:
Re: [AppArmor 00/44] AppArmor security module overview
On Tue, Jun 26, 2007 at 07:47:00PM -0700, Andrew Morton wrote: > On Tue, 26 Jun 2007 19:24:03 -0700 John Johansen <[EMAIL PROTECTED]> wrote: > > > > > > > so... where do we stand with this? Fundamental, irreconcilable > > > differences over the use of pathname-based security? > > > > > There certainly seems to be some differences of opinion over the use > > of pathname-based-security. > > I was refreshed to have not been cc'ed on a lkml thread for once. I guess > it couldn't last. > > Do you agree with the "irreconcilable" part? I think I do. > > I suspect that we're at the stage of having to decide between > > a) set aside the technical issues and grudgingly merge this stuff as a >service to Suse and to their users (both of which entities are very >important to us) and leave it all as an object lesson in >how-not-to-develop-kernel-features. > >Minimisation of the impact on the rest of the kernel is of course >very important here. > > versus > > b) leave it out and require that Suse wear the permanent cost and >quality impact of maintaining it out-of-tree. It will still be an >object lesson in how-not-to-develop-kernel-features. >... versus c) if [1] AppArmor is considered to be something that wouldn't be merged if it wasn't already widely deployed by Suse: leave it out, work on an ideal solution [2], and let Suse wear the one-time cost of migrating their users to the ideal solution One important point is that if AppArmor gets merged there will be much more distribution support for it, and many people on !Suse will start using it. I'm not claiming to understand the technical details, but from both slightly reading over the previous discussions and the "What are the advantages of AppArmor over SELinux?" section in the AppArmor FAQ [3] my impression is that a main advantage of AppArmor are more user friendly userspace tools. Therefore, if [1] AppArmor is considered technically inferior to SELinux, it might still become more popular than SELinux simply because it's easier to use - and although it's technically inferior. cu Adrian [1] note the "if" [2] could be, but not necessarily, SELinux [3] http://developer.novell.com/wiki/index.php/Apparmor_FAQ -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 00/44] AppArmor security module overview
On Wednesday 27 June 2007 12:58, Kyle Moffett wrote: > I seem to recall you could actually end up racing and building a path > to the file in those directories as "a/d/0/3" or some other path at > which it never even remotely existed. I'd love to be wrong, Cheer up, you recall wrong. > but I can't help but see this problem in any reverse-pathname-generation > proposal which gets the locking right. Have a look at how __d_path() is implemented (with the fixes): It takes the dcache_lock, and the vfsmount_lock where necessary, and this ensures that the pathname can't change under it, neither because of a rename nor unlink nor remount. The pathname computed is *exactly* the name the file has at that specific point time. A few more details about how pathnames work are explained in the tech doc at: http://forge.novell.com/modules/xfcontent/downloads.php/apparmor/LKML_Submission-May_07 Andreas - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/7][TAKE5] support new modes in fallocate
On Tue, Jun 26, 2007 at 11:49:15PM -0400, Andreas Dilger wrote: > On Jun 27, 2007 09:14 +1000, David Chinner wrote: > > Someone on the XFs list had an interesting request - preallocated > > swap files. You can't use unwritten extents for this because > > of sys_swapon()s use of bmap() (XFS returns holes for reading > > unwritten extents), so we need a method of preallocating that does > > not zero or mark the extent unread. i.e. FA_MKSWAP. > > Is there a reason why unwritten extents return 0 to bmap()? It's a fallout of xfs_get_blocks not mapping unwritten extents on read because we want do_mpage_readpage() to treat them as a hole. i.e. zero fill them instead of doing I/O. This is the way XFS was shoehorned into the generic read path :/ > This > would seem to be the only impediment from using fallocated files > for swap files. Maybe if FIEMAP was used by mkswap to get an > "UNWRITTEN" flag back instead of "HOLE" it wouldn't be a problem. Probably. If we taught do_mpage_readpage() about unwritten mappings, then would could map them on read if and then sys_swapon can remain blissfully unaware of unwritten extents. I think this is pretty much all I need to do to acheive that is (untested): --- Teach do_mpage_readpage() about unwritten extents so we can always map them in get_blocks rather than they are are holes on read. Allows setup_swap_extents() to use preallocated files on XFS filesystems for swap files without ever needing to convert them. Signed-Off-By: Dave Chinner <[EMAIL PROTECTED]> --- fs/mpage.c |5 +++-- fs/xfs/linux-2.6/xfs_aops.c | 13 +++-- 2 files changed, 6 insertions(+), 12 deletions(-) Index: 2.6.x-xfs-new/fs/mpage.c === --- 2.6.x-xfs-new.orig/fs/mpage.c 2007-05-29 16:17:59.0 +1000 +++ 2.6.x-xfs-new/fs/mpage.c2007-06-27 22:39:35.568852270 +1000 @@ -207,7 +207,8 @@ do_mpage_readpage(struct bio *bio, struc * Map blocks using the result from the previous get_blocks call first. */ nblocks = map_bh->b_size >> blkbits; - if (buffer_mapped(map_bh) && block_in_file > *first_logical_block && + if (buffer_mapped(map_bh) && !buffer_unwritten(map_bh) && + block_in_file > *first_logical_block && block_in_file < (*first_logical_block + nblocks)) { unsigned map_offset = block_in_file - *first_logical_block; unsigned last = nblocks - map_offset; @@ -242,7 +243,7 @@ do_mpage_readpage(struct bio *bio, struc *first_logical_block = block_in_file; } - if (!buffer_mapped(map_bh)) { + if (!buffer_mapped(map_bh) || buffer_unwritten(map_bh)) { fully_mapped = 0; if (first_hole == blocks_per_page) first_hole = page_block; Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_aops.c === --- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_aops.c 2007-06-05 22:14:39.0 +1000 +++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_aops.c 2007-06-27 22:39:29.545636749 +1000 @@ -1340,16 +1340,9 @@ __xfs_get_blocks( return 0; if (iomap.iomap_bn != IOMAP_DADDR_NULL) { - /* -* For unwritten extents do not report a disk address on -* the read case (treat as if we're reading into a hole). -*/ - if (create || !(iomap.iomap_flags & IOMAP_UNWRITTEN)) { - xfs_map_buffer(bh_result, &iomap, offset, - inode->i_blkbits); - } - if (create && (iomap.iomap_flags & IOMAP_UNWRITTEN)) { - if (direct) + xfs_map_buffer(bh_result, &iomap, offset, inode->i_blkbits); + if (iomap.iomap_flags & IOMAP_UNWRITTEN) { + if (create && direct) bh_result->b_private = inode; set_buffer_unwritten(bh_result); } Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] fsblock
On Jun 26, 2007, at 07:14:14, Nick Piggin wrote: On Tue, Jun 26, 2007 at 07:23:09PM +1000, David Chinner wrote: Can we call it a block mapping layer or something like that? e.g. struct blkmap? I'm not fixed on fsblock, but blkmap doesn't grab me either. It is a map from the pagecache to the block layer, but blkmap sounds like it is a map from the block to somewhere. fsblkmap ;) vmblock? pgblock? Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] fsblock
On Wed, Jun 27, 2007 at 07:32:45AM +0200, Nick Piggin wrote: > On Tue, Jun 26, 2007 at 08:34:49AM -0400, Chris Mason wrote: > > On Tue, Jun 26, 2007 at 07:23:09PM +1000, David Chinner wrote: > > > On Tue, Jun 26, 2007 at 01:55:11PM +1000, Nick Piggin wrote: > > > > [ ... fsblocks vs extent range mapping ] > > > > > iomaps can double as range locks simply because iomaps are > > > expressions of ranges within the file. Seeing as you can only > > > access a given range exclusively to modify it, inserting an empty > > > mapping into the tree as a range lock gives an effective method of > > > allowing safe parallel reads, writes and allocation into the file. > > > > > > The fsblocks and the vm page cache interface cannot be used to > > > facilitate this because a radix tree is the wrong type of tree to > > > store this information in. A sparse, range based tree (e.g. btree) > > > is the right way to do this and it matches very well with > > > a range based API. > > > > I'm really not against the extent based page cache idea, but I kind of > > assumed it would be too big a change for this kind of generic setup. At > > any rate, if we'd like to do it, it may be best to ditch the idea of > > "attach mapping information to a page", and switch to "lookup mapping > > information and range locking for a page". > > Well the get_block equivalent API is extent based one now, and I'll > look at what is required in making map_fsblock a more generic call > that could be used for an extent-based scheme. > > An extent based thing IMO really isn't appropriate as the main generic > layer here though. If it is really useful and popular, then it could > be turned into generic code and sit along side fsblock or underneath > fsblock... Lets look at a typical example of how IO actually gets done today, starting with sys_write(): sys_write(file, buffer, 1MB) for each page: prepare_write() allocate contiguous chunks of disk attach buffers copy_from_user() commit_write() dirty buffers pdflush: writepages() find pages with contiguous chunks of disk build and submit large bios So, we replace prepare_write and commit_write with an extent based api, but we keep the dirty each buffer part. writepages has to turn that back into extents (bio sized), and the result is completely full of dark dark corner cases. I do think fsblocks is a nice cleanup on its own, but Dave has a good point that it makes sense to look for ways generalize things even more. -chris - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [AppArmor 00/44] AppArmor security module overview
On Jun 26, 2007, at 22:24:03, John Johansen wrote: other issues that have been raised are: - the use of d_path to generate the pathname used for mediation when a file is opened. - Generating the pathname using a reverse walk is considered ugly A little more than "ugly". In this basic concurrent rename() and path-lookup load: mkdir -p /a/b/0 mkdir -p /a/b/2 mkdir -p /c touch /a/b/0/1 cd /a/b while true; mv 0/1 2/3; mv 2/3 0/1; done & cd / while true; do mv a/b c/d; mv c/d a/b; done & while true; do cat a/b/0/1 & done while true; do cat a/b/2/3 & done while true; do cat c/d/0/1 & done while true; do cat c/d/2/3 & done I seem to recall you could actually end up racing and building a path to the file in those directories as "a/d/0/3" or some other path at which it never even remotely existed. I'd love to be wrong, but I can't help but see this problem in any reverse-pathname-generation proposal which gets the locking right. Cheers, Kyle Moffett - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html