Re: [PATCH] ext3: htree entry integrity checking
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Andreas Dilger wrote: > On Nov 16, 2006 11:50 -0500, Jeff Mahoney wrote: >> Currently, if a corrupted directory entry with rec_len=0 is encountered, >> we still trust that the data is valid. This can cause an infinite loop >> in htree_dirblock_to_tree() since the iteration loop will never make any >> progress. > > Actually, I think Eric Sandeen was working on similar fixes already, and > instead of doing a per-item check each time we look at the entry it does > a full-block check the first time it is read (as ext2 does). > >> This fixes the problem described at: >> http://projects.info-pull.com/mokb/MOKB-10-11-2006.html > > Would also be good to CC linux-ext4, where the ext3 maintainers live. Ok, thanks. If that's already in -mm, I'll use that one. - -Jeff - -- Jeff Mahoney SUSE Labs -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFFXQIwLPWxlyuTD7IRApH7AJ9+/SFmd9bf8E741wvxw/6vdrUrdwCeJNEG eHZMo5RWUrLW5iDEqehjRlI= =lGRM -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 15/23] ext3 -nobh option causes oops
On Thu, 2006-11-16 at 23:51 +0100, Adrian Bunk wrote: > It seems this patch that went into 2.6.17.8 should also be included in > the 2.6.16.x branch, or do I miss anything? > Yes. This is needed for 2.6.16.x branch also. Thanks, Badari > On Thu, Aug 03, 2006 at 10:40:03PM -0700, Greg KH wrote: > > -stable review patch. If anyone has any objections, please let us know. > > > > -- > > From: Badari Pulavarty <[EMAIL PROTECTED]> > > > > For files other than IFREG, nobh option doesn't make sense. Modifications > > to them are journalled and needs buffer heads to do that. Without this > > patch, we get kernel oops in page_buffers(). > > > > Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]> > > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> > > Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]> > > > > --- > > fs/ext3/inode.c |6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > --- linux-2.6.17.7.orig/fs/ext3/inode.c > > +++ linux-2.6.17.7/fs/ext3/inode.c > > @@ -1159,7 +1159,7 @@ retry: > > ret = PTR_ERR(handle); > > goto out; > > } > > - if (test_opt(inode->i_sb, NOBH)) > > + if (test_opt(inode->i_sb, NOBH) && ext3_should_writeback_data(inode)) > > ret = nobh_prepare_write(page, from, to, ext3_get_block); > > else > > ret = block_prepare_write(page, from, to, ext3_get_block); > > @@ -1245,7 +1245,7 @@ static int ext3_writeback_commit_write(s > > if (new_i_size > EXT3_I(inode)->i_disksize) > > EXT3_I(inode)->i_disksize = new_i_size; > > > > - if (test_opt(inode->i_sb, NOBH)) > > + if (test_opt(inode->i_sb, NOBH) && ext3_should_writeback_data(inode)) > > ret = nobh_commit_write(file, page, from, to); > > else > > ret = generic_commit_write(file, page, from, to); > > @@ -1495,7 +1495,7 @@ static int ext3_writeback_writepage(stru > > goto out_fail; > > } > > > > - if (test_opt(inode->i_sb, NOBH)) > > + if (test_opt(inode->i_sb, NOBH) && ext3_should_writeback_data(inode)) > > ret = nobh_writepage(page, ext3_get_block, wbc); > > else > > ret = block_write_full_page(page, ext3_get_block, wbc); > > - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 15/23] ext3 -nobh option causes oops
It seems this patch that went into 2.6.17.8 should also be included in the 2.6.16.x branch, or do I miss anything? TIA Adrian On Thu, Aug 03, 2006 at 10:40:03PM -0700, Greg KH wrote: > -stable review patch. If anyone has any objections, please let us know. > > -- > From: Badari Pulavarty <[EMAIL PROTECTED]> > > For files other than IFREG, nobh option doesn't make sense. Modifications > to them are journalled and needs buffer heads to do that. Without this > patch, we get kernel oops in page_buffers(). > > Signed-off-by: Badari Pulavarty <[EMAIL PROTECTED]> > Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> > Signed-off-by: Greg Kroah-Hartman <[EMAIL PROTECTED]> > > --- > fs/ext3/inode.c |6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > --- linux-2.6.17.7.orig/fs/ext3/inode.c > +++ linux-2.6.17.7/fs/ext3/inode.c > @@ -1159,7 +1159,7 @@ retry: > ret = PTR_ERR(handle); > goto out; > } > - if (test_opt(inode->i_sb, NOBH)) > + if (test_opt(inode->i_sb, NOBH) && ext3_should_writeback_data(inode)) > ret = nobh_prepare_write(page, from, to, ext3_get_block); > else > ret = block_prepare_write(page, from, to, ext3_get_block); > @@ -1245,7 +1245,7 @@ static int ext3_writeback_commit_write(s > if (new_i_size > EXT3_I(inode)->i_disksize) > EXT3_I(inode)->i_disksize = new_i_size; > > - if (test_opt(inode->i_sb, NOBH)) > + if (test_opt(inode->i_sb, NOBH) && ext3_should_writeback_data(inode)) > ret = nobh_commit_write(file, page, from, to); > else > ret = generic_commit_write(file, page, from, to); > @@ -1495,7 +1495,7 @@ static int ext3_writeback_writepage(stru > goto out_fail; > } > > - if (test_opt(inode->i_sb, NOBH)) > + if (test_opt(inode->i_sb, NOBH) && ext3_should_writeback_data(inode)) > ret = nobh_writepage(page, ext3_get_block, wbc); > else > ret = block_write_full_page(page, ext3_get_block, wbc); > - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ext3: htree entry integrity checking
Andreas Dilger wrote: > On Nov 16, 2006 11:50 -0500, Jeff Mahoney wrote: >> Currently, if a corrupted directory entry with rec_len=0 is encountered, >> we still trust that the data is valid. This can cause an infinite loop >> in htree_dirblock_to_tree() since the iteration loop will never make any >> progress. > > Actually, I think Eric Sandeen was working on similar fixes already, and > instead of doing a per-item check each time we look at the entry it does > a full-block check the first time it is read (as ext2 does). > >> This fixes the problem described at: >> http://projects.info-pull.com/mokb/MOKB-10-11-2006.html > > Would also be good to CC linux-ext4, where the ext3 maintainers live. > Hmm, maybe we need to update MAINTAINERS with the new list address? This should already be fixed, in some fashion, in -mm: http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.19-rc5/2.6.19-rc5-mm2/broken-out/handle-ext3-directory-corruption-better.patch I have been looking at doing a check only when the block is first read, but other things have come up & taken some time, and that is a bit on the back burner now... -Eric - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ext3: htree entry integrity checking
On Nov 16, 2006 11:50 -0500, Jeff Mahoney wrote: > Currently, if a corrupted directory entry with rec_len=0 is encountered, > we still trust that the data is valid. This can cause an infinite loop > in htree_dirblock_to_tree() since the iteration loop will never make any > progress. Actually, I think Eric Sandeen was working on similar fixes already, and instead of doing a per-item check each time we look at the entry it does a full-block check the first time it is read (as ext2 does). > This fixes the problem described at: > http://projects.info-pull.com/mokb/MOKB-10-11-2006.html Would also be good to CC linux-ext4, where the ext3 maintainers live. Hmm, maybe we need to update MAINTAINERS with the new list address? Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Boot failure with ext2 and initrds
On Thu, 16 Nov 2006 12:15:16 -0800 Mingming Cao <[EMAIL PROTECTED]> wrote: > On Thu, 2006-11-16 at 01:13 -0800, Andrew Morton wrote: > > On Thu, 16 Nov 2006 00:49:20 -0800 > > Mingming Cao <[EMAIL PROTECTED]> wrote: > > > > > On Wed, 2006-11-15 at 23:22 -0800, Andrew Morton wrote: > > > > On Wed, 15 Nov 2006 22:55:43 -0800 > > > > Mingming Cao <[EMAIL PROTECTED]> wrote: > > > > > > > > > Hmm, maxblocks, in bitmap_search_next_usable_block(), is the end > > > > > block > > > > > number of the range to search, not the lengh of the range. maxblocks > > > > > get passed to ext2_find_next_zero_bit(), where it expecting to take > > > > > the > > > > > _size_ of the range to search instead... > > > > > > > > > > Something like this: (this is not a patch) > > > > > @@ -524,7 +524,7 @@ bitmap_search_next_usable_block(ext2_grp > > > > > ext2_grpblk_t next; > > > > > > > > > >- next = ext2_find_next_zero_bit(bh->b_data, maxblocks, > > > > > start); > > > > >+ next = ext2_find_next_zero_bit(bh->b_data, > > > > > maxblocks-start + 1, start); > > > > > if (next >= maxblocks) > > > > > return -1; > > > > > return next; > > > > >} > > > > > > > > yes, the `size' arg to find_next_zero_bit() represents the number of > > > > bits > > > > to scan at `offset'. > > > > > > > > So I think your change is correctish. But we don't want the "+ 1", do > > > > we? > > > > > > > I think we still need the "+1", maxblocks here is the ending block of > > > the reservation window, so the number of bits to scan =end-start+1. > > > > > > > If we're right then this bug could cause the code to scan off the end > > > > of the > > > > bitmap. But it won't explain Hugh's bug, because of the if (next >= > > > > maxblocks). > > > > > > > > > > Yeah.. at first I thought it might be related, then, thinked it over, > > > the bug only makes the bits to scan larger, so if find_next_zero_bit() > > > returns something off the end of bitmap, that is fine, it just > > > indicating that there is no free bit left in the rest of bitmap, which > > > is expected behavior. So bitmap_search_next_usable_block() fail is the > > > expected. It will move on to next block group and try to create a new > > > reservation window there. > > > > I wonder why it's never oopsed. Perhaps there's always a zero in there for > > some reason. > > > > Why you think it should oopsed? Even if find_next_zero_bit() finds a > zero bit beyond of the end of bitmap, the check next > maxblocks will > catch this and make sure we are not taking a zero bit out of the bitmap > range, so it fails as expected. If it can read off the end of the buffer, it can oops. With CONFIG_DEBUG_PAGEALLOC, especially. > > > That does not explain the repeated reservation window add and remove > > > behavior Huge has reported. > > > > I spent quite some time comparing with ext3. I'm a bit stumped and I'm > > suspecting that the simplistic porting the code is now OK, but something's > > just wrong. > > > > I assume that the while (1) loop in ext3_try_to_allocate_with_rsv() has > > gone infinite. I don't see why, but more staring is needed. > > > > The loop should not go forever, it will stops when there is no window > with free bit to reserve in the given block group. It seems to have done so in Hugh's testing, but there's some question there now. Although I didn't check to see if there's a significant difference between Hugh's patch and mine. > > What lock protects the fields in struct ext[234]_reserve_window from being > > concurrently modified by two CPUs? None, it seems. Ditto > > ext[234]_reserve_window_node. i_mutex will cover it for write(), but not > > for pageout over a file hole. If we end up with a zero- or negative-sized > > window then odd things might happen. > > > > Yes, trucate_mutex protect both struct ext[234]_reserve_window and ext > [234]_reserve_window_node, and struct ext[234]_block_alloc_info. > Actually I think truncate_mutex protects all data structures related to > block allocation/mapping structures. Yes. I guess ext2 needs a new mutex for this. Sad. - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Boot failure with ext2 and initrds
On Thu, 2006-11-16 at 01:13 -0800, Andrew Morton wrote: > On Thu, 16 Nov 2006 00:49:20 -0800 > Mingming Cao <[EMAIL PROTECTED]> wrote: > > > On Wed, 2006-11-15 at 23:22 -0800, Andrew Morton wrote: > > > On Wed, 15 Nov 2006 22:55:43 -0800 > > > Mingming Cao <[EMAIL PROTECTED]> wrote: > > > > > > > Hmm, maxblocks, in bitmap_search_next_usable_block(), is the end block > > > > number of the range to search, not the lengh of the range. maxblocks > > > > get passed to ext2_find_next_zero_bit(), where it expecting to take the > > > > _size_ of the range to search instead... > > > > > > > > Something like this: (this is not a patch) > > > > @@ -524,7 +524,7 @@ bitmap_search_next_usable_block(ext2_grp > > > > ext2_grpblk_t next; > > > > > > > >-next = ext2_find_next_zero_bit(bh->b_data, maxblocks, start); > > > >+next = ext2_find_next_zero_bit(bh->b_data, maxblocks-start + 1, > > > > start); > > > > if (next >= maxblocks) > > > > return -1; > > > > return next; > > > >} > > > > > > yes, the `size' arg to find_next_zero_bit() represents the number of bits > > > to scan at `offset'. > > > > > > So I think your change is correctish. But we don't want the "+ 1", do we? > > > > > I think we still need the "+1", maxblocks here is the ending block of > > the reservation window, so the number of bits to scan =end-start+1. > > > > > If we're right then this bug could cause the code to scan off the end of > > > the > > > bitmap. But it won't explain Hugh's bug, because of the if (next >= > > > maxblocks). > > > > > > > Yeah.. at first I thought it might be related, then, thinked it over, > > the bug only makes the bits to scan larger, so if find_next_zero_bit() > > returns something off the end of bitmap, that is fine, it just > > indicating that there is no free bit left in the rest of bitmap, which > > is expected behavior. So bitmap_search_next_usable_block() fail is the > > expected. It will move on to next block group and try to create a new > > reservation window there. > > I wonder why it's never oopsed. Perhaps there's always a zero in there for > some reason. > Why you think it should oopsed? Even if find_next_zero_bit() finds a zero bit beyond of the end of bitmap, the check next > maxblocks will catch this and make sure we are not taking a zero bit out of the bitmap range, so it fails as expected. > > That does not explain the repeated reservation window add and remove > > behavior Huge has reported. > > I spent quite some time comparing with ext3. I'm a bit stumped and I'm > suspecting that the simplistic porting the code is now OK, but something's > just wrong. > > I assume that the while (1) loop in ext3_try_to_allocate_with_rsv() has > gone infinite. I don't see why, but more staring is needed. > The loop should not go forever, it will stops when there is no window with free bit to reserve in the given block group. > What lock protects the fields in struct ext[234]_reserve_window from being > concurrently modified by two CPUs? None, it seems. Ditto > ext[234]_reserve_window_node. i_mutex will cover it for write(), but not > for pageout over a file hole. If we end up with a zero- or negative-sized > window then odd things might happen. > Yes, trucate_mutex protect both struct ext[234]_reserve_window and ext [234]_reserve_window_node, and struct ext[234]_block_alloc_info. Actually I think truncate_mutex protects all data structures related to block allocation/mapping structures. Mingming - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Boot failure with ext2 and initrds
On Thu, 16 Nov 2006, Andrew Morton wrote: > On Thu, 16 Nov 2006 00:49:20 -0800 > Mingming Cao <[EMAIL PROTECTED]> wrote: > > > That does not explain the repeated reservation window add and remove > > behavior Huge has reported. > > I spent quite some time comparing with ext3. I'm a bit stumped and I'm > suspecting that the simplistic porting the code is now OK, but something's > just wrong. > > I assume that the while (1) loop in ext3_try_to_allocate_with_rsv() has > gone infinite. I don't see why, but more staring is needed. Just to report that similar tests on three machines have each run for 20 hours so far without any such infinite loop reoccurring. Well, I broke off the x86_64 for a couple of hours: wondered if I'd got confused, forgotten to "rmmod ext2" at one stage, and saw that behaviour with my simple "ext2fs_blk_t ret_block" patch, rather than your more extensive patch to ext2_new_blocks() that I'd believed I was testing. I didn't give it long enough to be conclusive, but the problem didn't show up with that either, so I've gone back to testing with yours. I'd have kept the hang around for longer if I'd guessed it would be hard to reproduce, and that it would be puzzling even to you: sorry. kdb is in, if it comes again I can peer at it more closely. Hugh - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Boot failure with ext2 and initrds
On Wed, Nov 15, 2006 at 11:22:28PM -0800, Andrew Morton wrote: > On Wed, 15 Nov 2006 22:55:43 -0800 > Mingming Cao <[EMAIL PROTECTED]> wrote: > > > Hmm, maxblocks, in bitmap_search_next_usable_block(), is the end block > > number of the range to search, not the lengh of the range. maxblocks > > get passed to ext2_find_next_zero_bit(), where it expecting to take the > > _size_ of the range to search instead... > > > > Something like this: (this is not a patch) > > @@ -524,7 +524,7 @@ bitmap_search_next_usable_block(ext2_grp > > ext2_grpblk_t next; > > > >-next = ext2_find_next_zero_bit(bh->b_data, maxblocks, start); > >+next = ext2_find_next_zero_bit(bh->b_data, maxblocks-start + 1, > > start); > > if (next >= maxblocks) > > return -1; > > return next; > >} > > yes, the `size' arg to find_next_zero_bit() represents the number of bits > to scan at `offset'. Are you sure? That's not the way it's implemented in many architectures. find_next_*_bit() has always taken "address, maximum offset, starting offset" and always has returned "next offset". Just look at arch/i386/lib/bitops.c: int find_next_zero_bit(const unsigned long *addr, int size, int offset) { unsigned long * p = ((unsigned long *) addr) + (offset >> 5); int set = 0, bit = offset & 31, res; ... /* * No zero yet, search remaining full bytes for a zero */ res = find_first_zero_bit (p, size - 32 * (p - (unsigned long *) addr)); return (offset + set + res); } So for the case that "offset" is aligned to a "long" boundary, that gives us: res = find_first_zero_bit(addr + (offset>>5), size - 32 * (addr + (offset>>5) - addr)); or: res = find_first_zero_bit(addr + (offset>>5), size - (offset & ~31)); So, size _excludes_ offset. Now, considering the return value, "res" above will be relative to "addr + (offset>>5)". However, we add "offset" on to that, so it's relative to addr + (offset bits). -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 Serial core - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Boot failure with ext2 and initrds
On Thu, 16 Nov 2006 01:48:09 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > On Thu, 16 Nov 2006 12:37:17 +0300 > Alex Tomas <[EMAIL PROTECTED]> wrote: > > > > Andrew Morton (AM) writes: > > > > AM> What lock protects the fields in struct ext[234]_reserve_window from > > being > > AM> concurrently modified by two CPUs? None, it seems. Ditto > > AM> ext[234]_reserve_window_node. i_mutex will cover it for write(), but > > not > > AM> for pageout over a file hole. If we end up with a zero- or > > negative-sized > > AM> window then odd things might happen. > > > > truncate_mutex? > > > > yes. hmm. by which I mean "ext2 doesn't have a truncate_mutex". - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Boot failure with ext2 and initrds
On Thu, 16 Nov 2006 12:37:17 +0300 Alex Tomas <[EMAIL PROTECTED]> wrote: > > Andrew Morton (AM) writes: > > AM> What lock protects the fields in struct ext[234]_reserve_window from > being > AM> concurrently modified by two CPUs? None, it seems. Ditto > AM> ext[234]_reserve_window_node. i_mutex will cover it for write(), but not > AM> for pageout over a file hole. If we end up with a zero- or > negative-sized > AM> window then odd things might happen. > > truncate_mutex? > yes. hmm. - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Boot failure with ext2 and initrds
> Andrew Morton (AM) writes: AM> What lock protects the fields in struct ext[234]_reserve_window from being AM> concurrently modified by two CPUs? None, it seems. Ditto AM> ext[234]_reserve_window_node. i_mutex will cover it for write(), but not AM> for pageout over a file hole. If we end up with a zero- or negative-sized AM> window then odd things might happen. truncate_mutex? thanks, Alex - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Boot failure with ext2 and initrds
On Thu, 16 Nov 2006 00:49:20 -0800 Mingming Cao <[EMAIL PROTECTED]> wrote: > On Wed, 2006-11-15 at 23:22 -0800, Andrew Morton wrote: > > On Wed, 15 Nov 2006 22:55:43 -0800 > > Mingming Cao <[EMAIL PROTECTED]> wrote: > > > > > Hmm, maxblocks, in bitmap_search_next_usable_block(), is the end block > > > number of the range to search, not the lengh of the range. maxblocks > > > get passed to ext2_find_next_zero_bit(), where it expecting to take the > > > _size_ of the range to search instead... > > > > > > Something like this: (this is not a patch) > > > @@ -524,7 +524,7 @@ bitmap_search_next_usable_block(ext2_grp > > > ext2_grpblk_t next; > > > > > >- next = ext2_find_next_zero_bit(bh->b_data, maxblocks, start); > > >+ next = ext2_find_next_zero_bit(bh->b_data, maxblocks-start + 1, > > > start); > > > if (next >= maxblocks) > > > return -1; > > > return next; > > >} > > > > yes, the `size' arg to find_next_zero_bit() represents the number of bits > > to scan at `offset'. > > > > So I think your change is correctish. But we don't want the "+ 1", do we? > > > I think we still need the "+1", maxblocks here is the ending block of > the reservation window, so the number of bits to scan =end-start+1. > > > If we're right then this bug could cause the code to scan off the end of the > > bitmap. But it won't explain Hugh's bug, because of the if (next >= > > maxblocks). > > > > Yeah.. at first I thought it might be related, then, thinked it over, > the bug only makes the bits to scan larger, so if find_next_zero_bit() > returns something off the end of bitmap, that is fine, it just > indicating that there is no free bit left in the rest of bitmap, which > is expected behavior. So bitmap_search_next_usable_block() fail is the > expected. It will move on to next block group and try to create a new > reservation window there. I wonder why it's never oopsed. Perhaps there's always a zero in there for some reason. > That does not explain the repeated reservation window add and remove > behavior Huge has reported. I spent quite some time comparing with ext3. I'm a bit stumped and I'm suspecting that the simplistic porting the code is now OK, but something's just wrong. I assume that the while (1) loop in ext3_try_to_allocate_with_rsv() has gone infinite. I don't see why, but more staring is needed. What lock protects the fields in struct ext[234]_reserve_window from being concurrently modified by two CPUs? None, it seems. Ditto ext[234]_reserve_window_node. i_mutex will cover it for write(), but not for pageout over a file hole. If we end up with a zero- or negative-sized window then odd things might happen. > > btw, how come try_to_extend_reservation() uses spin_trylock? > > Since locks are all allocated from reservation window, when ext3 > multiple blocks allocation was added, we added try_to_extend_reservation > () to ext3, which trying to extend the reservation window size to at > least match the number of blocks to allocate. So we have better chance > to allocating multiple blocks from the window at a time. > > Since all the multiple block allocation is based on best effort basis, > the same applied to try_to_extend_reservation(). It seems no need to > wait for the reservation tree lock if it's not avaible at that moment. > I suspect that was not a good idea - it has added rather different behaviour in the once-in-a-million case. - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Boot failure with ext2 and initrds
On Wed, 2006-11-15 at 23:22 -0800, Andrew Morton wrote: > On Wed, 15 Nov 2006 22:55:43 -0800 > Mingming Cao <[EMAIL PROTECTED]> wrote: > > > Hmm, maxblocks, in bitmap_search_next_usable_block(), is the end block > > number of the range to search, not the lengh of the range. maxblocks > > get passed to ext2_find_next_zero_bit(), where it expecting to take the > > _size_ of the range to search instead... > > > > Something like this: (this is not a patch) > > @@ -524,7 +524,7 @@ bitmap_search_next_usable_block(ext2_grp > > ext2_grpblk_t next; > > > >-next = ext2_find_next_zero_bit(bh->b_data, maxblocks, start); > >+next = ext2_find_next_zero_bit(bh->b_data, maxblocks-start + 1, > > start); > > if (next >= maxblocks) > > return -1; > > return next; > >} > > yes, the `size' arg to find_next_zero_bit() represents the number of bits > to scan at `offset'. > > So I think your change is correctish. But we don't want the "+ 1", do we? > I think we still need the "+1", maxblocks here is the ending block of the reservation window, so the number of bits to scan =end-start+1. > If we're right then this bug could cause the code to scan off the end of the > bitmap. But it won't explain Hugh's bug, because of the if (next >= > maxblocks). > Yeah.. at first I thought it might be related, then, thinked it over, the bug only makes the bits to scan larger, so if find_next_zero_bit() returns something off the end of bitmap, that is fine, it just indicating that there is no free bit left in the rest of bitmap, which is expected behavior. So bitmap_search_next_usable_block() fail is the expected. It will move on to next block group and try to create a new reservation window there. That does not explain the repeated reservation window add and remove behavior Huge has reported. > btw, how come try_to_extend_reservation() uses spin_trylock? Since locks are all allocated from reservation window, when ext3 multiple blocks allocation was added, we added try_to_extend_reservation () to ext3, which trying to extend the reservation window size to at least match the number of blocks to allocate. So we have better chance to allocating multiple blocks from the window at a time. Since all the multiple block allocation is based on best effort basis, the same applied to try_to_extend_reservation(). It seems no need to wait for the reservation tree lock if it's not avaible at that moment. Mingming - To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html