Re: [PATCH] ext3: htree entry integrity checking

2006-11-16 Thread Jeff Mahoney
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Andreas Dilger wrote:
> On Nov 16, 2006  11:50 -0500, Jeff Mahoney wrote:
>>  Currently, if a corrupted directory entry with rec_len=0 is encountered,
>>  we still trust that the data is valid. This can cause an infinite loop
>>  in htree_dirblock_to_tree() since the iteration loop will never make any
>>  progress.
> 
> Actually, I think Eric Sandeen was working on similar fixes already, and
> instead of doing a per-item check each time we look at the entry it does
> a full-block check the first time it is read (as ext2 does).
> 
>>  This fixes the problem described at:
>>  http://projects.info-pull.com/mokb/MOKB-10-11-2006.html
> 
> Would also be good to CC linux-ext4, where the ext3 maintainers live.

Ok, thanks. If that's already in -mm, I'll use that one.

- -Jeff

- --
Jeff Mahoney
SUSE Labs
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFFXQIwLPWxlyuTD7IRApH7AJ9+/SFmd9bf8E741wvxw/6vdrUrdwCeJNEG
eHZMo5RWUrLW5iDEqehjRlI=
=lGRM
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 15/23] ext3 -nobh option causes oops

2006-11-16 Thread Badari Pulavarty

On Thu, 2006-11-16 at 23:51 +0100, Adrian Bunk wrote:
> It seems this patch that went into 2.6.17.8 should also be included in 
> the 2.6.16.x branch, or do I miss anything?
> 

Yes. This is needed for 2.6.16.x branch also.

Thanks,
Badari


> On Thu, Aug 03, 2006 at 10:40:03PM -0700, Greg KH wrote:
> > -stable review patch.  If anyone has any objections, please let us know.
> > 
> > --
> > From: Badari Pulavarty <[EMAIL PROTECTED]>
> > 
> > For files other than IFREG, nobh option doesn't make sense.  Modifications
> > to them are journalled and needs buffer heads to do that.  Without this
> > patch, we get kernel oops in page_buffers().
> > 
> > Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
> > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> > Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
> > 
> > ---
> >  fs/ext3/inode.c |6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > --- linux-2.6.17.7.orig/fs/ext3/inode.c
> > +++ linux-2.6.17.7/fs/ext3/inode.c
> > @@ -1159,7 +1159,7 @@ retry:
> > ret = PTR_ERR(handle);
> > goto out;
> > }
> > -   if (test_opt(inode->i_sb, NOBH))
> > +   if (test_opt(inode->i_sb, NOBH) && ext3_should_writeback_data(inode))
> > ret = nobh_prepare_write(page, from, to, ext3_get_block);
> > else
> > ret = block_prepare_write(page, from, to, ext3_get_block);
> > @@ -1245,7 +1245,7 @@ static int ext3_writeback_commit_write(s
> > if (new_i_size > EXT3_I(inode)->i_disksize)
> > EXT3_I(inode)->i_disksize = new_i_size;
> >  
> > -   if (test_opt(inode->i_sb, NOBH))
> > +   if (test_opt(inode->i_sb, NOBH) && ext3_should_writeback_data(inode))
> > ret = nobh_commit_write(file, page, from, to);
> > else
> > ret = generic_commit_write(file, page, from, to);
> > @@ -1495,7 +1495,7 @@ static int ext3_writeback_writepage(stru
> > goto out_fail;
> > }
> >  
> > -   if (test_opt(inode->i_sb, NOBH))
> > +   if (test_opt(inode->i_sb, NOBH) && ext3_should_writeback_data(inode))
> > ret = nobh_writepage(page, ext3_get_block, wbc);
> > else
> > ret = block_write_full_page(page, ext3_get_block, wbc);
> > 

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 15/23] ext3 -nobh option causes oops

2006-11-16 Thread Adrian Bunk
It seems this patch that went into 2.6.17.8 should also be included in 
the 2.6.16.x branch, or do I miss anything?

TIA
Adrian


On Thu, Aug 03, 2006 at 10:40:03PM -0700, Greg KH wrote:
> -stable review patch.  If anyone has any objections, please let us know.
> 
> --
> From: Badari Pulavarty <[EMAIL PROTECTED]>
> 
> For files other than IFREG, nobh option doesn't make sense.  Modifications
> to them are journalled and needs buffer heads to do that.  Without this
> patch, we get kernel oops in page_buffers().
> 
> Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]>
> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
> Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]>
> 
> ---
>  fs/ext3/inode.c |6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> --- linux-2.6.17.7.orig/fs/ext3/inode.c
> +++ linux-2.6.17.7/fs/ext3/inode.c
> @@ -1159,7 +1159,7 @@ retry:
>   ret = PTR_ERR(handle);
>   goto out;
>   }
> - if (test_opt(inode->i_sb, NOBH))
> + if (test_opt(inode->i_sb, NOBH) && ext3_should_writeback_data(inode))
>   ret = nobh_prepare_write(page, from, to, ext3_get_block);
>   else
>   ret = block_prepare_write(page, from, to, ext3_get_block);
> @@ -1245,7 +1245,7 @@ static int ext3_writeback_commit_write(s
>   if (new_i_size > EXT3_I(inode)->i_disksize)
>   EXT3_I(inode)->i_disksize = new_i_size;
>  
> - if (test_opt(inode->i_sb, NOBH))
> + if (test_opt(inode->i_sb, NOBH) && ext3_should_writeback_data(inode))
>   ret = nobh_commit_write(file, page, from, to);
>   else
>   ret = generic_commit_write(file, page, from, to);
> @@ -1495,7 +1495,7 @@ static int ext3_writeback_writepage(stru
>   goto out_fail;
>   }
>  
> - if (test_opt(inode->i_sb, NOBH))
> + if (test_opt(inode->i_sb, NOBH) && ext3_should_writeback_data(inode))
>   ret = nobh_writepage(page, ext3_get_block, wbc);
>   else
>   ret = block_write_full_page(page, ext3_get_block, wbc);
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext3: htree entry integrity checking

2006-11-16 Thread Eric Sandeen
Andreas Dilger wrote:
> On Nov 16, 2006  11:50 -0500, Jeff Mahoney wrote:
>>  Currently, if a corrupted directory entry with rec_len=0 is encountered,
>>  we still trust that the data is valid. This can cause an infinite loop
>>  in htree_dirblock_to_tree() since the iteration loop will never make any
>>  progress.
> 
> Actually, I think Eric Sandeen was working on similar fixes already, and
> instead of doing a per-item check each time we look at the entry it does
> a full-block check the first time it is read (as ext2 does).
> 
>>  This fixes the problem described at:
>>  http://projects.info-pull.com/mokb/MOKB-10-11-2006.html
> 
> Would also be good to CC linux-ext4, where the ext3 maintainers live.
> Hmm, maybe we need to update MAINTAINERS with the new list address?

This should already be fixed, in some fashion, in -mm:

http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19-rc5/2.6.19-rc5-mm2/broken-out/handle-ext3-directory-corruption-better.patch

I have been looking at doing a check only when the block is first read,
but other things have come up & taken some time, and that is a bit on
the back burner now...

-Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ext3: htree entry integrity checking

2006-11-16 Thread Andreas Dilger
On Nov 16, 2006  11:50 -0500, Jeff Mahoney wrote:
>  Currently, if a corrupted directory entry with rec_len=0 is encountered,
>  we still trust that the data is valid. This can cause an infinite loop
>  in htree_dirblock_to_tree() since the iteration loop will never make any
>  progress.

Actually, I think Eric Sandeen was working on similar fixes already, and
instead of doing a per-item check each time we look at the entry it does
a full-block check the first time it is read (as ext2 does).

>  This fixes the problem described at:
>  http://projects.info-pull.com/mokb/MOKB-10-11-2006.html

Would also be good to CC linux-ext4, where the ext3 maintainers live.
Hmm, maybe we need to update MAINTAINERS with the new list address?

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Boot failure with ext2 and initrds

2006-11-16 Thread Andrew Morton
On Thu, 16 Nov 2006 12:15:16 -0800
Mingming Cao <[EMAIL PROTECTED]> wrote:

> On Thu, 2006-11-16 at 01:13 -0800, Andrew Morton wrote:
> > On Thu, 16 Nov 2006 00:49:20 -0800
> > Mingming Cao <[EMAIL PROTECTED]> wrote:
> > 
> > > On Wed, 2006-11-15 at 23:22 -0800, Andrew Morton wrote:
> > > > On Wed, 15 Nov 2006 22:55:43 -0800
> > > > Mingming Cao <[EMAIL PROTECTED]> wrote:
> > > > 
> > > > > Hmm, maxblocks, in bitmap_search_next_usable_block(),  is the end 
> > > > > block 
> > > > > number of the range  to search, not the lengh of the range. maxblocks 
> > > > > get passed to ext2_find_next_zero_bit(), where it expecting to take 
> > > > > the 
> > > > > _size_ of the range to search instead...
> > > > > 
> > > > > Something like this: (this is not a patch)
> > > > >   @@ -524,7 +524,7 @@ bitmap_search_next_usable_block(ext2_grp
> > > > >   ext2_grpblk_t next;
> > > > > 
> > > > >-  next = ext2_find_next_zero_bit(bh->b_data, maxblocks, 
> > > > > start);
> > > > >+  next = ext2_find_next_zero_bit(bh->b_data, 
> > > > > maxblocks-start + 1, start);
> > > > >   if (next >= maxblocks)
> > > > >   return -1;
> > > > >   return next;
> > > > >}
> > > > 
> > > > yes, the `size' arg to find_next_zero_bit() represents the number of 
> > > > bits
> > > > to scan at `offset'.
> > > > 
> > > > So I think your change is correctish.  But we don't want the "+ 1", do 
> > > > we?
> > > > 
> > > I think we still need the "+1", maxblocks here is the ending block of
> > > the reservation window, so the number of bits to scan =end-start+1.
> > > 
> > > > If we're right then this bug could cause the code to scan off the end 
> > > > of the
> > > > bitmap.  But it won't explain Hugh's bug, because of the if (next >= 
> > > > maxblocks).
> > > > 
> > > 
> > > Yeah.. at first I thought it might be related, then, thinked it over,
> > > the bug only makes the bits to scan larger, so if find_next_zero_bit()
> > > returns something off the end of bitmap, that is fine, it just
> > > indicating that there is no free bit left in the rest of bitmap, which
> > > is expected behavior. So bitmap_search_next_usable_block() fail is the
> > > expected. It will move on to next block group and try to create a new
> > > reservation window there.
> > 
> > I wonder why it's never oopsed.  Perhaps there's always a zero in there for
> > some reason.
> > 
> 
> Why you think it should oopsed?  Even if find_next_zero_bit() finds a
> zero bit beyond of the end of bitmap, the check next > maxblocks will
> catch this and make sure we are not taking a zero bit out of the bitmap
> range, so it fails as expected.

If it can read off the end of the buffer, it can oops.  With
CONFIG_DEBUG_PAGEALLOC, especially.


> > > That does not explain the repeated reservation window add and remove
> > > behavior Huge has reported. 
> > 
> > I spent quite some time comparing with ext3.  I'm a bit stumped and I'm
> > suspecting that the simplistic porting the code is now OK, but something's
> > just wrong.
> > 
> > I assume that the while (1) loop in ext3_try_to_allocate_with_rsv() has
> > gone infinite.  I don't see why, but more staring is needed.
> > 
> 
> The loop should not go forever, it will stops when there is no window
> with free bit to reserve in the given block group.

It seems to have done so in Hugh's testing, but there's some question there
now.  Although I didn't check to see if there's a significant difference
between Hugh's patch and mine.


> > What lock protects the fields in struct ext[234]_reserve_window from being
> > concurrently modified by two CPUs?  None, it seems.  Ditto
> > ext[234]_reserve_window_node.  i_mutex will cover it for write(), but not
> > for pageout over a file hole.  If we end up with a zero- or negative-sized
> > window then odd things might happen.
> > 
> 
> Yes, trucate_mutex protect both struct ext[234]_reserve_window and ext
> [234]_reserve_window_node, and struct ext[234]_block_alloc_info.
> Actually I think truncate_mutex protects all data structures related to
> block allocation/mapping structures.

Yes.  I guess ext2 needs a new mutex for this.  Sad.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Boot failure with ext2 and initrds

2006-11-16 Thread Mingming Cao
On Thu, 2006-11-16 at 01:13 -0800, Andrew Morton wrote:
> On Thu, 16 Nov 2006 00:49:20 -0800
> Mingming Cao <[EMAIL PROTECTED]> wrote:
> 
> > On Wed, 2006-11-15 at 23:22 -0800, Andrew Morton wrote:
> > > On Wed, 15 Nov 2006 22:55:43 -0800
> > > Mingming Cao <[EMAIL PROTECTED]> wrote:
> > > 
> > > > Hmm, maxblocks, in bitmap_search_next_usable_block(),  is the end block 
> > > > number of the range  to search, not the lengh of the range. maxblocks 
> > > > get passed to ext2_find_next_zero_bit(), where it expecting to take the 
> > > > _size_ of the range to search instead...
> > > > 
> > > > Something like this: (this is not a patch)
> > > >   @@ -524,7 +524,7 @@ bitmap_search_next_usable_block(ext2_grp
> > > > ext2_grpblk_t next;
> > > > 
> > > >-next = ext2_find_next_zero_bit(bh->b_data, maxblocks, start);
> > > >+next = ext2_find_next_zero_bit(bh->b_data, maxblocks-start + 1, 
> > > > start);
> > > > if (next >= maxblocks)
> > > > return -1;
> > > > return next;
> > > >}
> > > 
> > > yes, the `size' arg to find_next_zero_bit() represents the number of bits
> > > to scan at `offset'.
> > > 
> > > So I think your change is correctish.  But we don't want the "+ 1", do we?
> > > 
> > I think we still need the "+1", maxblocks here is the ending block of
> > the reservation window, so the number of bits to scan =end-start+1.
> > 
> > > If we're right then this bug could cause the code to scan off the end of 
> > > the
> > > bitmap.  But it won't explain Hugh's bug, because of the if (next >= 
> > > maxblocks).
> > > 
> > 
> > Yeah.. at first I thought it might be related, then, thinked it over,
> > the bug only makes the bits to scan larger, so if find_next_zero_bit()
> > returns something off the end of bitmap, that is fine, it just
> > indicating that there is no free bit left in the rest of bitmap, which
> > is expected behavior. So bitmap_search_next_usable_block() fail is the
> > expected. It will move on to next block group and try to create a new
> > reservation window there.
> 
> I wonder why it's never oopsed.  Perhaps there's always a zero in there for
> some reason.
> 

Why you think it should oopsed?  Even if find_next_zero_bit() finds a
zero bit beyond of the end of bitmap, the check next > maxblocks will
catch this and make sure we are not taking a zero bit out of the bitmap
range, so it fails as expected.

> > That does not explain the repeated reservation window add and remove
> > behavior Huge has reported. 
> 
> I spent quite some time comparing with ext3.  I'm a bit stumped and I'm
> suspecting that the simplistic porting the code is now OK, but something's
> just wrong.
> 
> I assume that the while (1) loop in ext3_try_to_allocate_with_rsv() has
> gone infinite.  I don't see why, but more staring is needed.
> 

The loop should not go forever, it will stops when there is no window
with free bit to reserve in the given block group.

> What lock protects the fields in struct ext[234]_reserve_window from being
> concurrently modified by two CPUs?  None, it seems.  Ditto
> ext[234]_reserve_window_node.  i_mutex will cover it for write(), but not
> for pageout over a file hole.  If we end up with a zero- or negative-sized
> window then odd things might happen.
> 

Yes, trucate_mutex protect both struct ext[234]_reserve_window and ext
[234]_reserve_window_node, and struct ext[234]_block_alloc_info.
Actually I think truncate_mutex protects all data structures related to
block allocation/mapping structures.



Mingming

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Boot failure with ext2 and initrds

2006-11-16 Thread Hugh Dickins
On Thu, 16 Nov 2006, Andrew Morton wrote:
> On Thu, 16 Nov 2006 00:49:20 -0800
> Mingming Cao <[EMAIL PROTECTED]> wrote:
> 
> > That does not explain the repeated reservation window add and remove
> > behavior Huge has reported. 
> 
> I spent quite some time comparing with ext3.  I'm a bit stumped and I'm
> suspecting that the simplistic porting the code is now OK, but something's
> just wrong.
> 
> I assume that the while (1) loop in ext3_try_to_allocate_with_rsv() has
> gone infinite.  I don't see why, but more staring is needed.

Just to report that similar tests on three machines have each run
for 20 hours so far without any such infinite loop reoccurring.

Well, I broke off the x86_64 for a couple of hours: wondered if I'd got
confused, forgotten to "rmmod ext2" at one stage, and saw that behaviour
with my simple "ext2fs_blk_t ret_block" patch, rather than your more
extensive patch to ext2_new_blocks() that I'd believed I was testing.
I didn't give it long enough to be conclusive, but the problem didn't
show up with that either, so I've gone back to testing with yours.

I'd have kept the hang around for longer if I'd guessed it would be
hard to reproduce, and that it would be puzzling even to you: sorry.
kdb is in, if it comes again I can peer at it more closely.

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Boot failure with ext2 and initrds

2006-11-16 Thread Russell King
On Wed, Nov 15, 2006 at 11:22:28PM -0800, Andrew Morton wrote:
> On Wed, 15 Nov 2006 22:55:43 -0800
> Mingming Cao <[EMAIL PROTECTED]> wrote:
> 
> > Hmm, maxblocks, in bitmap_search_next_usable_block(),  is the end block 
> > number of the range  to search, not the lengh of the range. maxblocks 
> > get passed to ext2_find_next_zero_bit(), where it expecting to take the 
> > _size_ of the range to search instead...
> > 
> > Something like this: (this is not a patch)
> >   @@ -524,7 +524,7 @@ bitmap_search_next_usable_block(ext2_grp
> > ext2_grpblk_t next;
> > 
> >-next = ext2_find_next_zero_bit(bh->b_data, maxblocks, start);
> >+next = ext2_find_next_zero_bit(bh->b_data, maxblocks-start + 1, 
> > start);
> > if (next >= maxblocks)
> > return -1;
> > return next;
> >}
> 
> yes, the `size' arg to find_next_zero_bit() represents the number of bits
> to scan at `offset'.

Are you sure?  That's not the way it's implemented in many architectures.
find_next_*_bit() has always taken "address, maximum offset, starting offset"
and always has returned "next offset".

Just look at arch/i386/lib/bitops.c:

int find_next_zero_bit(const unsigned long *addr, int size, int offset)
{
unsigned long * p = ((unsigned long *) addr) + (offset >> 5);
int set = 0, bit = offset & 31, res;
...
/*
 * No zero yet, search remaining full bytes for a zero
 */
res = find_first_zero_bit (p, size - 32 * (p - (unsigned long *) addr));
return (offset + set + res);
}

So for the case that "offset" is aligned to a "long" boundary, that gives us:

res = find_first_zero_bit(addr + (offset>>5),
size - 32 * (addr + (offset>>5) - addr));

or:

res = find_first_zero_bit(addr + (offset>>5), size - (offset & ~31));

So, size _excludes_ offset.

Now, considering the return value, "res" above will be relative to
"addr + (offset>>5)".  However, we add "offset" on to that, so it's
relative to addr + (offset bits).

-- 
Russell King
 Linux kernel2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Boot failure with ext2 and initrds

2006-11-16 Thread Andrew Morton
On Thu, 16 Nov 2006 01:48:09 -0800
Andrew Morton <[EMAIL PROTECTED]> wrote:

> On Thu, 16 Nov 2006 12:37:17 +0300
> Alex Tomas <[EMAIL PROTECTED]> wrote:
> 
> > > Andrew Morton (AM) writes:
> > 
> >  AM> What lock protects the fields in struct ext[234]_reserve_window from 
> > being
> >  AM> concurrently modified by two CPUs?  None, it seems.  Ditto
> >  AM> ext[234]_reserve_window_node.  i_mutex will cover it for write(), but 
> > not
> >  AM> for pageout over a file hole.  If we end up with a zero- or 
> > negative-sized
> >  AM> window then odd things might happen.
> > 
> > truncate_mutex?
> > 
> 
> yes.  hmm.

by which I mean "ext2 doesn't have a truncate_mutex".
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Boot failure with ext2 and initrds

2006-11-16 Thread Andrew Morton
On Thu, 16 Nov 2006 12:37:17 +0300
Alex Tomas <[EMAIL PROTECTED]> wrote:

> > Andrew Morton (AM) writes:
> 
>  AM> What lock protects the fields in struct ext[234]_reserve_window from 
> being
>  AM> concurrently modified by two CPUs?  None, it seems.  Ditto
>  AM> ext[234]_reserve_window_node.  i_mutex will cover it for write(), but not
>  AM> for pageout over a file hole.  If we end up with a zero- or 
> negative-sized
>  AM> window then odd things might happen.
> 
> truncate_mutex?
> 

yes.  hmm.
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Boot failure with ext2 and initrds

2006-11-16 Thread Alex Tomas
> Andrew Morton (AM) writes:

 AM> What lock protects the fields in struct ext[234]_reserve_window from being
 AM> concurrently modified by two CPUs?  None, it seems.  Ditto
 AM> ext[234]_reserve_window_node.  i_mutex will cover it for write(), but not
 AM> for pageout over a file hole.  If we end up with a zero- or negative-sized
 AM> window then odd things might happen.

truncate_mutex?

thanks, Alex

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Boot failure with ext2 and initrds

2006-11-16 Thread Andrew Morton
On Thu, 16 Nov 2006 00:49:20 -0800
Mingming Cao <[EMAIL PROTECTED]> wrote:

> On Wed, 2006-11-15 at 23:22 -0800, Andrew Morton wrote:
> > On Wed, 15 Nov 2006 22:55:43 -0800
> > Mingming Cao <[EMAIL PROTECTED]> wrote:
> > 
> > > Hmm, maxblocks, in bitmap_search_next_usable_block(),  is the end block 
> > > number of the range  to search, not the lengh of the range. maxblocks 
> > > get passed to ext2_find_next_zero_bit(), where it expecting to take the 
> > > _size_ of the range to search instead...
> > > 
> > > Something like this: (this is not a patch)
> > >   @@ -524,7 +524,7 @@ bitmap_search_next_usable_block(ext2_grp
> > >   ext2_grpblk_t next;
> > > 
> > >-  next = ext2_find_next_zero_bit(bh->b_data, maxblocks, start);
> > >+  next = ext2_find_next_zero_bit(bh->b_data, maxblocks-start + 1, 
> > > start);
> > >   if (next >= maxblocks)
> > >   return -1;
> > >   return next;
> > >}
> > 
> > yes, the `size' arg to find_next_zero_bit() represents the number of bits
> > to scan at `offset'.
> > 
> > So I think your change is correctish.  But we don't want the "+ 1", do we?
> > 
> I think we still need the "+1", maxblocks here is the ending block of
> the reservation window, so the number of bits to scan =end-start+1.
> 
> > If we're right then this bug could cause the code to scan off the end of the
> > bitmap.  But it won't explain Hugh's bug, because of the if (next >= 
> > maxblocks).
> > 
> 
> Yeah.. at first I thought it might be related, then, thinked it over,
> the bug only makes the bits to scan larger, so if find_next_zero_bit()
> returns something off the end of bitmap, that is fine, it just
> indicating that there is no free bit left in the rest of bitmap, which
> is expected behavior. So bitmap_search_next_usable_block() fail is the
> expected. It will move on to next block group and try to create a new
> reservation window there.

I wonder why it's never oopsed.  Perhaps there's always a zero in there for
some reason.

> That does not explain the repeated reservation window add and remove
> behavior Huge has reported. 

I spent quite some time comparing with ext3.  I'm a bit stumped and I'm
suspecting that the simplistic porting the code is now OK, but something's
just wrong.

I assume that the while (1) loop in ext3_try_to_allocate_with_rsv() has
gone infinite.  I don't see why, but more staring is needed.

What lock protects the fields in struct ext[234]_reserve_window from being
concurrently modified by two CPUs?  None, it seems.  Ditto
ext[234]_reserve_window_node.  i_mutex will cover it for write(), but not
for pageout over a file hole.  If we end up with a zero- or negative-sized
window then odd things might happen.

> > btw, how come try_to_extend_reservation() uses spin_trylock?
> 
> Since locks are all allocated from reservation window, when ext3
> multiple blocks allocation was added, we added try_to_extend_reservation
> () to ext3, which trying to extend the reservation window size to at
> least match the number of blocks to allocate. So we have better chance
> to allocating multiple blocks from the window at a time.
> 
> Since all the multiple block allocation is based on best effort basis,
> the same applied to try_to_extend_reservation(). It seems no need to
> wait for the reservation tree lock if it's not avaible at that moment.
> 

I suspect that was not a good idea - it has added rather different
behaviour in the once-in-a-million case.


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Boot failure with ext2 and initrds

2006-11-16 Thread Mingming Cao
On Wed, 2006-11-15 at 23:22 -0800, Andrew Morton wrote:
> On Wed, 15 Nov 2006 22:55:43 -0800
> Mingming Cao <[EMAIL PROTECTED]> wrote:
> 
> > Hmm, maxblocks, in bitmap_search_next_usable_block(),  is the end block 
> > number of the range  to search, not the lengh of the range. maxblocks 
> > get passed to ext2_find_next_zero_bit(), where it expecting to take the 
> > _size_ of the range to search instead...
> > 
> > Something like this: (this is not a patch)
> >   @@ -524,7 +524,7 @@ bitmap_search_next_usable_block(ext2_grp
> > ext2_grpblk_t next;
> > 
> >-next = ext2_find_next_zero_bit(bh->b_data, maxblocks, start);
> >+next = ext2_find_next_zero_bit(bh->b_data, maxblocks-start + 1, 
> > start);
> > if (next >= maxblocks)
> > return -1;
> > return next;
> >}
> 
> yes, the `size' arg to find_next_zero_bit() represents the number of bits
> to scan at `offset'.
> 
> So I think your change is correctish.  But we don't want the "+ 1", do we?
> 
I think we still need the "+1", maxblocks here is the ending block of
the reservation window, so the number of bits to scan =end-start+1.

> If we're right then this bug could cause the code to scan off the end of the
> bitmap.  But it won't explain Hugh's bug, because of the if (next >= 
> maxblocks).
> 

Yeah.. at first I thought it might be related, then, thinked it over,
the bug only makes the bits to scan larger, so if find_next_zero_bit()
returns something off the end of bitmap, that is fine, it just
indicating that there is no free bit left in the rest of bitmap, which
is expected behavior. So bitmap_search_next_usable_block() fail is the
expected. It will move on to next block group and try to create a new
reservation window there.

That does not explain the repeated reservation window add and remove
behavior Huge has reported. 

> btw, how come try_to_extend_reservation() uses spin_trylock?

Since locks are all allocated from reservation window, when ext3
multiple blocks allocation was added, we added try_to_extend_reservation
() to ext3, which trying to extend the reservation window size to at
least match the number of blocks to allocate. So we have better chance
to allocating multiple blocks from the window at a time.

Since all the multiple block allocation is based on best effort basis,
the same applied to try_to_extend_reservation(). It seems no need to
wait for the reservation tree lock if it's not avaible at that moment.

Mingming

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html