Re: [RFC] fsblock

2007-06-28 Thread Nick Piggin
On Thu, Jun 28, 2007 at 08:20:31AM -0400, Chris Mason wrote:
> On Thu, Jun 28, 2007 at 04:44:43AM +0200, Nick Piggin wrote:
> > 
> > That's true but I don't think an extent data structure means we can
> > become too far divorced from the pagecache or the native block size
> > -- what will end up happening is that often we'll need "stuff" to map
> > between all those as well, even if it is only at IO-time.
> 
> I think the fundamental difference is that fsblock still does:
> mapping_info = page->something, where something is attached on a per
> page basis.  What we really want is mapping_info = lookup_mapping(page),
> where that function goes and finds something stored on a per extent
> basis, with extra bits for tracking dirty and locked state.
> 
> Ideally, in at least some of the cases the dirty and locked state could
> be at an extent granularity (streaming IO) instead of the block
> granularity (random IO).
> 
> In my little brain, even block based filesystems should be able to take
> advantage of this...but such things are always easier to believe in
> before the coding starts.

Now I wouldn't for a minute deny that at least some of the block
information would be well to store in extent/tree format (if XFS 
does it it must be good!).

And yes, I'm sure filesystems with even basic block based allocation
could get a reasonable ratio of blocks to extents.

However I think it is fundamentally another layer or at least
more complexity... fsblocks uses the existing pagecache mapping as
(much of) the data structure and uses the existing pagecache locking
for the locking. And it fundamentally just provides a block access
and IO layer into the pagecache for the filesystem, which I think will
often be needed anyway.

But that said, I would like to see a generic extent mapping layer
sitting between fsblock and the filesystem (I might even have a crack
at it myself)... and I could be proven completely wrong and it may be
that fsblock isn't required at all after such a layer goes in. So I
will try to keep all the APIs extent based.

The first thing I actually looked at for "get_blocks" was for the
filesystem to build up a tree of mappings itself, completely unconnected
from the pagecache. It just ended up being a little more work and
locking but the idea isn't insane :)


> > One issue I have with the current nobh and mpage stuff is that it
> > requires multiple calls into get_block (first to prepare write, then
> > to writepage), it doesn't allow filesystems to attach resources
> > required for writeout at prepare_write time, and it doesn't play nicely
> > with buffers in general. (not to mention that nobh error handling is
> > buggy).
> > 
> > I haven't done any mpage-like code for fsblocks yet, but I think they
> > wouldn't be too much trouble, and wouldn't have any of the above
> > problems...
> 
> Could be, but the fundamental issue of sometimes pages have mappings
> attached and sometimes they don't is still there.  The window is
> smaller, but non-zero.

The aim for fsblocks is that any page under IO will always have fsblocks,
which I hope is going to make this easy. In the fsblocks patch I sent out
there is a window (with mmapped pages), however that's a bug wich can be
fixed rather than a fundamental problem. So writepages will be less problem.

Readpages may indeed be more efficient and block mapping with extents than
individual fsblocks (or it could be, if it were an extent based API itself).

Well I don't know. Extents are always going to have benefits, but I don't
know if it means the fsblock part could go away completely. I'll keep
it in mind though.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] fsblock

2007-06-28 Thread David Chinner
On Thu, Jun 28, 2007 at 08:20:31AM -0400, Chris Mason wrote:
> On Thu, Jun 28, 2007 at 04:44:43AM +0200, Nick Piggin wrote:
> > That's true but I don't think an extent data structure means we can
> > become too far divorced from the pagecache or the native block size
> > -- what will end up happening is that often we'll need "stuff" to map
> > between all those as well, even if it is only at IO-time.
> 
> I think the fundamental difference is that fsblock still does:
> mapping_info = page->something, where something is attached on a per
> page basis.  What we really want is mapping_info = lookup_mapping(page),

lookup_block_mapping(page) ;)

But yes, that is the essence of what I was saying. Thanks for
describing it so concisely, Chris.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7][TAKE5] support new modes in fallocate

2007-06-28 Thread David Chinner
On Thu, Jun 28, 2007 at 11:49:13PM +0530, Amit K. Arora wrote:
> On Wed, Jun 27, 2007 at 09:18:04AM +1000, David Chinner wrote:
> > On Tue, Jun 26, 2007 at 11:34:13AM -0400, Andreas Dilger wrote:
> > > On Jun 26, 2007  16:02 +0530, Amit K. Arora wrote:
> > > > On Mon, Jun 25, 2007 at 03:46:26PM -0600, Andreas Dilger wrote:
> > > > > Can you clarify - what is the current behaviour when ENOSPC (or some 
> > > > > other
> > > > > error) is hit?  Does it keep the current fallocate() or does it free 
> > > > > it?
> > > > 
> > > > Currently it is left on the file system implementation. In ext4, we do
> > > > not undo preallocation if some error (say, ENOSPC) is hit. Hence it may
> > > > end up with partial (pre)allocation. This is inline with dd and
> > > > posix_fallocate, which also do not free the partially allocated space.
> > > 
> > > Since I believe the XFS allocation ioctls do it the opposite way (free
> > > preallocated space on error) this should be encoded into the flags.
> > > Having it "filesystem dependent" just means that nobody will be happy.
> > 
> > No, XFs does not free preallocated space on error. it is up to the
> > application to clean up.
> 
> Since XFS also does not free preallocated space on error and this
> behavior is inline with dd, posix_fallocate() and the current ext4
> implementation, do we still need FA_FL_FREE_ENOSPC flag ?

Not at the moment.

> > > What I mean is that any data read from the file should have the 
> > > "appearance"
> > > of being zeroed (whether zeroes are actually written to disk or not).  
> > > What
> > > I _think_ David is proposing is to allow fallocate() to return without
> > > marking the blocks even "uninitialized" and subsequent reads would return
> > > the old data from the disk.
> > 
> > Correct, but for swap files that's not an issue - no user should be able
> > too read them, and FA_MKSWAP would really need root privileges to execute.
> 
> Will the FA_MKSWAP mode still be required with your suggested change of
> teaching do_mpage_readpage() about unwritten extents being in place ?
> Or, will you still like to have FA_MKSWAP mode ?

budgie:/mnt/test # xfs_io -f -c "resvsp 0 1048576" -c "truncate 1048576" 
swap_file
budgie:/mnt/test # mkswap swap_file
Setting up swapspace version 1, size = 1032 kB
budgie:/mnt/test # swapon -v swap_file
swapon on swap_file
budgie:/mnt/test # swapon -s
FilenameTypeSizeUsedPriority
/dev/sda2   partition   9437152 0   -1
/mnt/test/swap_file file992 0   -2
budgie:/mnt/test # xfs_bmap -vvp swap_file
swap_file:
 EXT: FILE-OFFSET  BLOCK-RANGE  AG AG-OFFSETTOTAL FLAGS
   0: [0..31]: 96..127   0 (96..127)   32
   1: [32..2047]:  128..2143 0 (128..2143)   2016 1
 FLAG Values:
01 Unwritten preallocated extent
001000 Doesn't begin on stripe unit
000100 Doesn't end   on stripe unit
10 Doesn't begin on stripe width
01 Doesn't end   on stripe width

Looks like the changes work, so FA_MKSWAP is not necessary for XFS.
We can drop that for the moment unless anyone else sees a need for it.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7][TAKE5] support new modes in fallocate

2007-06-28 Thread Nathan Scott
On Thu, 2007-06-28 at 23:49 +0530, Amit K. Arora wrote:
> 
> > Correct, but for swap files that's not an issue - no user should be
> able
> > too read them, and FA_MKSWAP would really need root privileges to
> execute.
> 
> Will the FA_MKSWAP mode still be required with your suggested change
> of
> teaching do_mpage_readpage() about unwritten extents being in place ?
> Or, will you still like to have FA_MKSWAP mode ? 

There's no need for a MKSWAP flag.

cheers.

--
Nathan

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6][TAKE5] fallocate system call

2007-06-28 Thread Andreas Dilger
On Jun 28, 2007  23:27 +0530, Amit K. Arora wrote:
> On Thu, Jun 28, 2007 at 02:55:43AM -0700, Andrew Morton wrote:
> > Are we all supposed to re-review the entire patchset (or at least #4 and
> > #7) again?
> 
> As I mentioned in the note above, only patches #4 and #7 were new and
> thus these needed to be reviewed. Other patches are _not_ replacements
> of any of the patches which are already part of -mm and/or in Ted's
> patch queue. They were posted again as just "placeholders" so that the
> two new patches (#4 & #7) could be reviewed. Sorry for any confusion.

The new patches are definitely a big improvement over the previous API,
and need to go in before fallocate() goes into mainline.  This last set
of changes allows the behaviour of these syscalls to accomodate the various
different semantics desired by XFS in a sensible manner instead of tying
all of the individual behaviours (time update, size update, alloc/free, etc)
into monolithic modes that will never make everyone happy.

My understanding is that you only need to grab #4 and #7 to get your tree
into get fallocate in sync with the ext4 patch queue (i.e. they are
incremental over the previous set).

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6][TAKE5] fallocate system call

2007-06-28 Thread Jeff Garzik

Andrew Morton wrote:

b) We do what we normally don't do and reserve the syscall slots in mainline.


If everyone agrees it's going to happen... why not?

Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6][TAKE5] fallocate system call

2007-06-28 Thread Dave Kleikamp
On Thu, 2007-06-28 at 11:33 -0700, Andrew Morton wrote:
> On Thu, 28 Jun 2007 23:27:57 +0530 "Amit K. Arora" <[EMAIL PROTECTED]> wrote:
> 
> > > Please drop the non-ext4 patches from the ext4 tree and send incremental
> > > patches against the (non-ext4) fallocate patches in -mm.
> > 
> > Please let us know what you think of Mingming's suggestion of posting
> > all the fallocate patches including the ext4 ones as incremental ones
> > against the -mm.
> 
> I think Mingming was asking that Ted move the current quilt tree into git,
> presumably because she's working off git.

I moved the fallocate patches to the very end of the series in the quilt
tree.  This way the patches will be in the quilt tree for testing, but
Ted can easily leave them out of the git tree so you and Linus won't
pull them with the ext4 patches.

Fortunately, the ext4-specific fallocate patches don't conflict with the
other patches in the queue, so they can (at least for now) be handled
independently in the -mm tree.

> I'm not sure what to do, really.  The core kernel patches need to be in
> Ted's tree for testing but that'll create a mess for me.
> 
> ug.
> 
> Options might be
> 
> a) I drop the fallocate patches from -mm and from the ext4 tree, hack up
>any needed build fixes, then just wait for it all to mature and then
>think about it again
> 
> b) We do what we normally don't do and reserve the syscall slots in mainline.

-- 
David Kleikamp
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6][TAKE5] fallocate system call

2007-06-28 Thread Andrew Morton
On Thu, 28 Jun 2007 23:27:57 +0530 "Amit K. Arora" <[EMAIL PROTECTED]> wrote:

> > Please drop the non-ext4 patches from the ext4 tree and send incremental
> > patches against the (non-ext4) fallocate patches in -mm.
> 
> Please let us know what you think of Mingming's suggestion of posting
> all the fallocate patches including the ext4 ones as incremental ones
> against the -mm.

I think Mingming was asking that Ted move the current quilt tree into git,
presumably because she's working off git.

I'm not sure what to do, really.  The core kernel patches need to be in
Ted's tree for testing but that'll create a mess for me.

ug.

Options might be

a) I drop the fallocate patches from -mm and from the ext4 tree, hack up
   any needed build fixes, then just wait for it all to mature and then
   think about it again

b) We do what we normally don't do and reserve the syscall slots in mainline.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/7][TAKE5] support new modes in fallocate

2007-06-28 Thread Amit K. Arora
On Wed, Jun 27, 2007 at 09:18:04AM +1000, David Chinner wrote:
> On Tue, Jun 26, 2007 at 11:34:13AM -0400, Andreas Dilger wrote:
> > On Jun 26, 2007  16:02 +0530, Amit K. Arora wrote:
> > > On Mon, Jun 25, 2007 at 03:46:26PM -0600, Andreas Dilger wrote:
> > > > Can you clarify - what is the current behaviour when ENOSPC (or some 
> > > > other
> > > > error) is hit?  Does it keep the current fallocate() or does it free it?
> > > 
> > > Currently it is left on the file system implementation. In ext4, we do
> > > not undo preallocation if some error (say, ENOSPC) is hit. Hence it may
> > > end up with partial (pre)allocation. This is inline with dd and
> > > posix_fallocate, which also do not free the partially allocated space.
> > 
> > Since I believe the XFS allocation ioctls do it the opposite way (free
> > preallocated space on error) this should be encoded into the flags.
> > Having it "filesystem dependent" just means that nobody will be happy.
> 
> No, XFs does not free preallocated space on error. it is up to the
> application to clean up.

Since XFS also does not free preallocated space on error and this
behavior is inline with dd, posix_fallocate() and the current ext4
implementation, do we still need FA_FL_FREE_ENOSPC flag ?
 
> > What I mean is that any data read from the file should have the "appearance"
> > of being zeroed (whether zeroes are actually written to disk or not).  What
> > I _think_ David is proposing is to allow fallocate() to return without
> > marking the blocks even "uninitialized" and subsequent reads would return
> > the old data from the disk.
> 
> Correct, but for swap files that's not an issue - no user should be able
> too read them, and FA_MKSWAP would really need root privileges to execute.

Will the FA_MKSWAP mode still be required with your suggested change of
teaching do_mpage_readpage() about unwritten extents being in place ?
Or, will you still like to have FA_MKSWAP mode ?

--
Regards,
Amit Arora
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7][TAKE5] ext4: support new modes

2007-06-28 Thread Amit K. Arora
On Wed, Jun 27, 2007 at 10:04:56AM +1000, David Chinner wrote:
> On Wed, Jun 27, 2007 at 12:59:08AM +0530, Amit K. Arora wrote:
> > On Tue, Jun 26, 2007 at 12:14:00PM -0400, Andreas Dilger wrote:
> > > On Jun 26, 2007  17:37 +0530, Amit K. Arora wrote:
> > > > I think, modifying ctime/mtime should be dependent on the other flags.
> > > > E.g., if we do not zero out data blocks on allocation/deallocation,
> > > > update only ctime. Otherwise, update ctime and mtime both.
> > > 
> > > I'm only being the advocate for requirements David Chinner has put
> > > forward due to existing behaviour in XFS.  This is one of the reasons
> > > why I think the "flags" mechanism we now have - we can encode the
> > > various different behaviours in any way we want and leave it to the
> > > caller.
> > 
> > I understand. May be we can confirm once more with David Chinner if this
> > is really required. Will it really be a compatibility issue if new XFS
> > preallocations (ie. via fallocate) update mtime/ctime?
> 
> It should be left up to the filesystem to decide. Only the
> filesystem knows whether something changed and the timestamp should
> or should not be updated.

Since Andreas had suggested FA_FL_NO_MTIME flag thinking it as a
requirement from XFS (whereas XFS does not need this flag), I don't think
we need to add this new flag.

Please let know if someone still feels FA_FL_NO_MTIME flag can be
useful.

--
Regards,
Amit Arora

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6][TAKE5] fallocate system call

2007-06-28 Thread Amit K. Arora
On Thu, Jun 28, 2007 at 02:55:43AM -0700, Andrew Morton wrote:
> On Mon, 25 Jun 2007 18:58:10 +0530 "Amit K. Arora" <[EMAIL PROTECTED]> wrote:
> 
> > N O T E: 
> > ---
> > 1) Only Patches 4/7 and 7/7 are NEW. Rest of them are _already_ part
> >of ext4 patch queue git tree hosted by Ted.
> 
> Why the heck are replacements for these things being sent out again when
> they're already in -mm and they're already in Ted's queue (from which I
> need to diligently drop them each time I remerge)?
> 
> Are we all supposed to re-review the entire patchset (or at least #4 and
> #7) again?

As I mentioned in the note above, only patches #4 and #7 were new and
thus these needed to be reviewed. Other patches are _not_ replacements
of any of the patches which are already part of -mm and/or in Ted's
patch queue. They were posted again as just "placeholders" so that the
two new patches (#4 & #7) could be reviewed. Sorry for any confusion.
 
> Please drop the non-ext4 patches from the ext4 tree and send incremental
> patches against the (non-ext4) fallocate patches in -mm.

Please let us know what you think of Mingming's suggestion of posting
all the fallocate patches including the ext4 ones as incremental ones
against the -mm.

--
Regards,
Amit Arora
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 32/44] Enable LSM hooks to distinguish operations on file descriptors from operations on pathnames

2007-06-28 Thread James Morris
On Tue, 26 Jun 2007, [EMAIL PROTECTED] wrote:

> Struct iattr already contains ia_file since commit cc4e69de from 
> Miklos (which is related to commit befc649c). Use this to pass
> struct file down the setattr hooks. This allows LSMs to distinguish
> operations on file descriptors from operations on paths.

I'm not quite sure I understand this.

Why would you distinguish operations based on whether you have a pathname 
or an inode for an object ?

Are you trying to cater for the case where you're holding an open fd for a 
file which has been deleted, and thus has no pathname?




- James
-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Adding subroot information to /proc/mounts, or obtaining that through other means

2007-06-28 Thread H. Peter Anvin

Pavel Machek wrote:

Hi!


... or, alternatively, add a subfield to the first field (which would
entail escaping whatever separator we choose):

/dev/md6 /export ext3 rw,data=ordered 0 0
/dev/md6:/users/foo /home/foo ext3 rw,data=ordered 0 0
/dev/md6:/users/bar /home/bar ext3 rw,data=ordered 0 0

Hell, no.  The first field is in principle impossible to parse unless
you know the fs type.

How about making a new file with sane format?  From the very


Well, what about /sysfs, with its one value per file rule?



There are two reasons not to do it that way:

- atomicity
- backwards compatibility

Of these, I would argue the former is the most important.

Additionally, I don't think sysfs has the ability to present different 
structures on a per-process basis; keep in mind this isn't really 
/proc/mounts, but really /proc//mounts.


-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Adding subroot information to /proc/mounts, or obtaining that through other means

2007-06-28 Thread Pavel Machek
Hi!

> > ... or, alternatively, add a subfield to the first field (which would
> > entail escaping whatever separator we choose):
> > 
> > /dev/md6 /export ext3 rw,data=ordered 0 0
> > /dev/md6:/users/foo /home/foo ext3 rw,data=ordered 0 0
> > /dev/md6:/users/bar /home/bar ext3 rw,data=ordered 0 0
> 
> Hell, no.  The first field is in principle impossible to parse unless
> you know the fs type.
> 
> How about making a new file with sane format?  From the very

Well, what about /sysfs, with its one value per file rule?

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6][TAKE5] fallocate system call

2007-06-28 Thread Mingming Cao
On Thu, 2007-06-28 at 02:55 -0700, Andrew Morton wrote:

> Please drop the non-ext4 patches from the ext4 tree and send incremental
> patches against the (non-ext4) fallocate patches in -mm.
> 
The ext4 fallocate() patches are dependent on the core fallocate()
patches, so ext4 patch-queue and git tree won't compile (it's not based
on mm tree) without the core changes.

We can send ext4 fallocate patches (incremental patches against mm tree)
and drop the full fallocate patches(ext4 and non ext4 part) from ext4
patch queue if you prefer this way.

> And try to get the code finished?  Time is pressing.
> 
I looked at the mm tree, there are other ext4 features/changes that are
currently in ext4-patch-queue(not ext4 git tree) that not in part of
ext4 series yet. Ted, can you merge those patches to your git tree?
Thanks!


Thanks for your patience.

Mingming.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC:PATCH] How best to handle implicit clearing of setuid/setgid bits on NFS?

2007-06-28 Thread Trond Myklebust
On Wed, 2007-06-27 at 22:13 -0400, Jeff Layton wrote:
> Ok. This is a bit more complex now since we remove suid bits on
> truncate, but don't set ATTR_FORCE.
> 
> Here's a patch that should do this. I know there's a general
> aversion to adding new flags to vfs structures, but I couldn't think of
> a way to cleanly do this without adding one.
> 
> Note that I've not tested this patch at all so this is just a RFC.
> 
> CC'ing Al here since he's expressed interest in this problem as well.
> 
> Thoughts?

We don't really need to do this with extra VFS flags. Here is my
preferred approach for dealing with this problem.

http://article.gmane.org/gmane.linux.nfs/8511/match=attr%5fkill%5fsuid

As you can see, that still allows the filesystem to determine how it
should deal with the ATTR_KILL_SUID/ATTR_KILL_SGID flags. The default
behaviour is provided by inode_setattr(), and is the same as today. Only
filesystems that don't use inode_setattr() will need to be audited for
whether or not they need a fix.

Cheers,
  Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] fsblock

2007-06-28 Thread Chris Mason
On Thu, Jun 28, 2007 at 04:44:43AM +0200, Nick Piggin wrote:
> On Thu, Jun 28, 2007 at 08:35:48AM +1000, David Chinner wrote:
> > On Wed, Jun 27, 2007 at 07:50:56AM -0400, Chris Mason wrote:
> > > Lets look at a typical example of how IO actually gets done today,
> > > starting with sys_write():
> > > 
> > > sys_write(file, buffer, 1MB)
> > > for each page:
> > > prepare_write()
> > >   allocate contiguous chunks of disk
> > > attach buffers
> > > copy_from_user()
> > > commit_write()
> > > dirty buffers
> > > 
> > > pdflush:
> > > writepages()
> > > find pages with contiguous chunks of disk
> > >   build and submit large bios
> > > 
> > > So, we replace prepare_write and commit_write with an extent based api,
> > > but we keep the dirty each buffer part.  writepages has to turn that
> > > back into extents (bio sized), and the result is completely full of dark
> > > dark corner cases.
> 
> That's true but I don't think an extent data structure means we can
> become too far divorced from the pagecache or the native block size
> -- what will end up happening is that often we'll need "stuff" to map
> between all those as well, even if it is only at IO-time.

I think the fundamental difference is that fsblock still does:
mapping_info = page->something, where something is attached on a per
page basis.  What we really want is mapping_info = lookup_mapping(page),
where that function goes and finds something stored on a per extent
basis, with extra bits for tracking dirty and locked state.

Ideally, in at least some of the cases the dirty and locked state could
be at an extent granularity (streaming IO) instead of the block
granularity (random IO).

In my little brain, even block based filesystems should be able to take
advantage of this...but such things are always easier to believe in
before the coding starts.

> 
> But the point is taken, and I do believe that at least for APIs, extent
> based seems like the best way to go. And that should allow fsblock to
> be replaced or augmented in future without _too_ much pain.
> 
>  
> > Yup - I've been on the painful end of those dark corner cases several
> > times in the last few months.
> > 
> > It's also worth pointing out that mpage_readpages() already works on
> > an extent basis - it overloads bufferheads to provide a "map_bh" that
> > can point to a range of blocks in the same state. The code then iterates
> > the map_bh range a page at a time building bios (i.e. not even using
> > buffer heads) from that map..
> 
> One issue I have with the current nobh and mpage stuff is that it
> requires multiple calls into get_block (first to prepare write, then
> to writepage), it doesn't allow filesystems to attach resources
> required for writeout at prepare_write time, and it doesn't play nicely
> with buffers in general. (not to mention that nobh error handling is
> buggy).
> 
> I haven't done any mpage-like code for fsblocks yet, but I think they
> wouldn't be too much trouble, and wouldn't have any of the above
> problems...

Could be, but the fundamental issue of sometimes pages have mappings
attached and sometimes they don't is still there.  The window is
smaller, but non-zero.

-chris

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [AppArmor 00/44] AppArmor security module overview

2007-06-28 Thread Alan Cox
> > Anyone can apply the apparmour patch to their tree, they get the
> > choice that way.  Nobody is currently prevented from using apparmour
> > if they want to, any such suggestion is pure rubbish.
> 
> The exact same argument was made prior to SELinux going upstream.

Its made for every thing before it goes upstream. It shouldn't be going
uptream until it works, is reliable and does something useful. Then if it
ever makes that grade it can go and sit in -mm for a bit to shake down .

> > Frankly I think AppArmour is a joke,
> 
> "SELinux, AppArmor, and Hilary Clinton walk into a bar ..."


SELinux orders a beer object
AppArmor order a /beer
Hilary says "You are both under 21 you can't"
SELinux orders a shandy object
AppArmor orders a /shandy

SELinux is refused because the shandy mixer opened a beer object and
shandy inherited beer typing
AppArmor gets drunk because /shandy and /beer are clearly different


-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6][TAKE5] fallocate system call

2007-06-28 Thread Andrew Morton
On Mon, 25 Jun 2007 18:58:10 +0530 "Amit K. Arora" <[EMAIL PROTECTED]> wrote:

> N O T E: 
> ---
> 1) Only Patches 4/7 and 7/7 are NEW. Rest of them are _already_ part
>of ext4 patch queue git tree hosted by Ted.

Why the heck are replacements for these things being sent out again when
they're already in -mm and they're already in Ted's queue (from which I
need to diligently drop them each time I remerge)?

Are we all supposed to re-review the entire patchset (or at least #4 and
#7) again?

The core kernel changes are not appropriate to the ext4 tree.

For a start, the syscall numbers in Ted's queue are wrong (other new
syscalls are pending).

Patches which add syscalls are an utter PITA to carry due to all the patch
conflicts and to the relatively frequent syscall renumbering (they don't
get numbered in time-of-arrival order due to differing rates at which patches
mature).

Please drop the non-ext4 patches from the ext4 tree and send incremental
patches against the (non-ext4) fallocate patches in -mm.

And try to get the code finished?  Time is pressing.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html