Re: [PATCH] f2fs: Remove BUG_ON in dec_valid_node_count

2013-05-15 Thread Chris Fries
Hi,

Abandon my patch in favor of Jaegeuk's recovery accounting fix (mine
has a copy paste error anyway)

[f2fs-dev] [PATCH 1/2] f2fs: fix inconsistency of block count during recovery

This patch solves the issue completely.

-Chris


On Tue, May 14, 2013 at 2:26 PM, Russell Knize  wrote:
>
> The node accounting errors seem to be creeping in during recovery.
> Once the error is introduced in the total counts, it gets recorded in
> the next checkpoint and is carried forward.  Often times the error
> crops-up after a failed recovery attempt, such as when recovery
> attempts to recover an inode belonging to a subdirectory that was
> previously removed.
>
> We've also seen accounting errors elsewhere, such as the block bitmap
> management in update_sit_entry().  In one case, we bug-out on some
> random unlink operation because truncate_data_blocks_range() tries to
> invalidate a block whose bit is not set in the bitmap:
>
> kernel BUG at ../../../../../../kernel/fs/f2fs/segment.c:199!
> ...
> do_unlinkat()
> evict()
> f2fs_evict_inode()
> f2fs_truncate()
> truncate_data_blocks_range()
> invalidate_blocks()
> update_sit_entry()
>
> In another case, we bug-out during recovery while trying to allocate
> the dummy page for an fsync.  In this case, the page is already
> allocated:
>
> kernel BUG at ../../../../../../kernel/fs/f2fs/segment.c:211!
> ...
> vfs_kern_mount()
> mount_fs()
> f2fs_mount()
> mount_bdev()
> f2fs_fill_super()
> recover_fsync_data()
> recover_data_page()
> update_sit_entry()
>
>
> I've experimented with some changes to deal with the bug-outs in
> update_sit_entry() and regenerate the block counts that are maintained
> by the seg_entry and sec_entry structures, but I am not confident that
> there still won't exist a disparity between block allocations and
> their respective nodes.
>
> Some sort of consistency checker would be handy.  ;)
>
> We're going to run these torture tests with disable_roll_forward to
> try to at least confirm that we should be focused on the recovery
> process.
>
> Russ W. Knize
> Mobile Devices Android Platform
> Desk: 847-523-9381
> Mobile: 224-622-1391
>
>
> On Mon, May 13, 2013 at 10:41 AM, Chris Fries  wrote:
> > Yes, your observations are correct, total_valid_node_count is 0, and the
> > caller is truncate_node.  It could be a bug in recovery accounting
> > somewhere, but I haven't been able to save a copy of the disk image just
> > before the recovery or rmdir failure, which is the best place to start
> > investigating the bug.  It could also be some bad state of the disk
> > during
> > panic restart rather than a straight recovery bug.
> >
> > Adding some traces or logs into recovery might give clues too.
> >
> > The panics I've seen are both during mount and during rmdir.
> >
> > -- truncate_node
> >   -- remove_inode_page
> > -- f2fs_evict_inode
> >   -- evict
> > -- recover_orphan_inodes
> >   -- f2fs_fill_super
> > -- mount_bdev
> >
> > or
> >
> > -- truncate_node
> >   -- remove_inode_page
> > -- f2fs_evict_inode
> >   -- evict
> > -- d_delete
> >   -- vfs_rmdir
> >
> >
> >
> >
> > On Mon, May 13, 2013 at 2:05 AM, Haicheng Li
> > 
> > wrote:
> >>
> >>
> >> On Sun, May 12, 2013 at 10:46:27PM -0500, Chris Fries wrote:
> >> > From: Chris Fries 
> >> >
> >> > Panic loops while running LTP fsstress has been able to get
> >> > a disk into two different panic loops from dec_valid_node_count.
> >> > f2fs.h:714  BUG_ON(sbi->total_valid_node_count < count);
> >>
> >> This is interesting catch.
> >> from the code, dec_valid_node_count() is only called by
> >> truncate_node():
> >> dec_valid_node_count(sbi, dn->inode, 1);
> >>
> >> So the failure in your test means that sbi->total_valid_node_count < 1,
> >> i.e. equal to 0. This should be an unexpected status.
> >>
> >> I think a better solution should be to avoid such over truncate_node
> >> situation.
> >>
> >> How do you think?
> >>
> >> > Once, it happens during recovery itself, and the disk would cause
> >> > a panic every time it mounted.
> >> >
> >> > Another time, it would happen during garbage collection, so the disk
> >> > would cause a panic within 200 seconds of mounting.
> >> >
> >> > Removing this BUG_ON hasn't shown any side effects, so let's take it
> >> > out and monitor.
> >> >
> >> > Signed-off-by: Chris Fries 
> >> >
> >> > ---
> >> >  fs/f2fs/f2fs.h | 18 +++---
> >> >  1 file changed, 15 insertions(+), 3 deletions(-)
> >> >
> >> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> >> > index e80a87c..b8e9679 100644
> >> > --- a/fs/f2fs/f2fs.h
> >> > +++ b/fs/f2fs/f2fs.h
> >> > @@ -722,9 +722,21 @@ static inline void dec_valid_node_count(struct
> >> > f2fs_sb_info *sbi,
> >> >  {
> >> >   spin_lock(>stat_lock);
> >> >
> >> > -   

Re: [PATCH] f2fs: Remove BUG_ON in dec_valid_node_count

2013-05-15 Thread Chris Fries
Hi,

Abandon my patch in favor of Jaegeuk's recovery accounting fix (mine
has a copy paste error anyway)

[f2fs-dev] [PATCH 1/2] f2fs: fix inconsistency of block count during recovery

This patch solves the issue completely.

-Chris


On Tue, May 14, 2013 at 2:26 PM, Russell Knize rkni...@motorola.com wrote:

 The node accounting errors seem to be creeping in during recovery.
 Once the error is introduced in the total counts, it gets recorded in
 the next checkpoint and is carried forward.  Often times the error
 crops-up after a failed recovery attempt, such as when recovery
 attempts to recover an inode belonging to a subdirectory that was
 previously removed.

 We've also seen accounting errors elsewhere, such as the block bitmap
 management in update_sit_entry().  In one case, we bug-out on some
 random unlink operation because truncate_data_blocks_range() tries to
 invalidate a block whose bit is not set in the bitmap:

 kernel BUG at ../../../../../../kernel/fs/f2fs/segment.c:199!
 ...
 do_unlinkat()
 evict()
 f2fs_evict_inode()
 f2fs_truncate()
 truncate_data_blocks_range()
 invalidate_blocks()
 update_sit_entry()

 In another case, we bug-out during recovery while trying to allocate
 the dummy page for an fsync.  In this case, the page is already
 allocated:

 kernel BUG at ../../../../../../kernel/fs/f2fs/segment.c:211!
 ...
 vfs_kern_mount()
 mount_fs()
 f2fs_mount()
 mount_bdev()
 f2fs_fill_super()
 recover_fsync_data()
 recover_data_page()
 update_sit_entry()


 I've experimented with some changes to deal with the bug-outs in
 update_sit_entry() and regenerate the block counts that are maintained
 by the seg_entry and sec_entry structures, but I am not confident that
 there still won't exist a disparity between block allocations and
 their respective nodes.

 Some sort of consistency checker would be handy.  ;)

 We're going to run these torture tests with disable_roll_forward to
 try to at least confirm that we should be focused on the recovery
 process.

 Russ W. Knize
 Mobile Devices Android Platform
 Desk: 847-523-9381
 Mobile: 224-622-1391


 On Mon, May 13, 2013 at 10:41 AM, Chris Fries ccfr...@gmail.com wrote:
  Yes, your observations are correct, total_valid_node_count is 0, and the
  caller is truncate_node.  It could be a bug in recovery accounting
  somewhere, but I haven't been able to save a copy of the disk image just
  before the recovery or rmdir failure, which is the best place to start
  investigating the bug.  It could also be some bad state of the disk
  during
  panic restart rather than a straight recovery bug.
 
  Adding some traces or logs into recovery might give clues too.
 
  The panics I've seen are both during mount and during rmdir.
 
  -- truncate_node
-- remove_inode_page
  -- f2fs_evict_inode
-- evict
  -- recover_orphan_inodes
-- f2fs_fill_super
  -- mount_bdev
 
  or
 
  -- truncate_node
-- remove_inode_page
  -- f2fs_evict_inode
-- evict
  -- d_delete
-- vfs_rmdir
 
 
 
 
  On Mon, May 13, 2013 at 2:05 AM, Haicheng Li
  haicheng...@linux.intel.com
  wrote:
 
 
  On Sun, May 12, 2013 at 10:46:27PM -0500, Chris Fries wrote:
   From: Chris Fries c.fr...@motorola.com
  
   Panic loops while running LTP fsstress has been able to get
   a disk into two different panic loops from dec_valid_node_count.
   f2fs.h:714  BUG_ON(sbi-total_valid_node_count  count);
 
  This is interesting catch.
  from the code, dec_valid_node_count() is only called by
  truncate_node():
  dec_valid_node_count(sbi, dn-inode, 1);
 
  So the failure in your test means that sbi-total_valid_node_count  1,
  i.e. equal to 0. This should be an unexpected status.
 
  I think a better solution should be to avoid such over truncate_node
  situation.
 
  How do you think?
 
   Once, it happens during recovery itself, and the disk would cause
   a panic every time it mounted.
  
   Another time, it would happen during garbage collection, so the disk
   would cause a panic within 200 seconds of mounting.
  
   Removing this BUG_ON hasn't shown any side effects, so let's take it
   out and monitor.
  
   Signed-off-by: Chris Fries c.fr...@motorola.com
  
   ---
fs/f2fs/f2fs.h | 18 +++---
1 file changed, 15 insertions(+), 3 deletions(-)
  
   diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
   index e80a87c..b8e9679 100644
   --- a/fs/f2fs/f2fs.h
   +++ b/fs/f2fs/f2fs.h
   @@ -722,9 +722,21 @@ static inline void dec_valid_node_count(struct
   f2fs_sb_info *sbi,
{
 spin_lock(sbi-stat_lock);
  
   - BUG_ON(sbi-total_valid_block_count  count);
   - BUG_ON(sbi-total_valid_node_count  count);
   - BUG_ON(inode-i_blocks  count);
   + if (sbi-total_valid_block_count  count) {
   +

Re: [PATCH] f2fs: Remove BUG_ON in dec_valid_node_count

2013-05-14 Thread Russell Knize
The node accounting errors seem to be creeping in during recovery.
Once the error is introduced in the total counts, it gets recorded in
the next checkpoint and is carried forward.  Often times the error
crops-up after a failed recovery attempt, such as when recovery
attempts to recover an inode belonging to a subdirectory that was
previously removed.

We've also seen accounting errors elsewhere, such as the block bitmap
management in update_sit_entry().  In one case, we bug-out on some
random unlink operation because truncate_data_blocks_range() tries to
invalidate a block whose bit is not set in the bitmap:

kernel BUG at ../../../../../../kernel/fs/f2fs/segment.c:199!
...
do_unlinkat()
evict()
f2fs_evict_inode()
f2fs_truncate()
truncate_data_blocks_range()
invalidate_blocks()
update_sit_entry()

In another case, we bug-out during recovery while trying to allocate
the dummy page for an fsync.  In this case, the page is already
allocated:

kernel BUG at ../../../../../../kernel/fs/f2fs/segment.c:211!
...
vfs_kern_mount()
mount_fs()
f2fs_mount()
mount_bdev()
f2fs_fill_super()
recover_fsync_data()
recover_data_page()
update_sit_entry()


I've experimented with some changes to deal with the bug-outs in
update_sit_entry() and regenerate the block counts that are maintained
by the seg_entry and sec_entry structures, but I am not confident that
there still won't exist a disparity between block allocations and
their respective nodes.

Some sort of consistency checker would be handy.  ;)

We're going to run these torture tests with disable_roll_forward to
try to at least confirm that we should be focused on the recovery
process.

Russ W. Knize
Mobile Devices Android Platform
Desk: 847-523-9381
Mobile: 224-622-1391


On Mon, May 13, 2013 at 10:41 AM, Chris Fries  wrote:
> Yes, your observations are correct, total_valid_node_count is 0, and the
> caller is truncate_node.  It could be a bug in recovery accounting
> somewhere, but I haven't been able to save a copy of the disk image just
> before the recovery or rmdir failure, which is the best place to start
> investigating the bug.  It could also be some bad state of the disk during
> panic restart rather than a straight recovery bug.
>
> Adding some traces or logs into recovery might give clues too.
>
> The panics I've seen are both during mount and during rmdir.
>
> -- truncate_node
>   -- remove_inode_page
> -- f2fs_evict_inode
>   -- evict
> -- recover_orphan_inodes
>   -- f2fs_fill_super
> -- mount_bdev
>
> or
>
> -- truncate_node
>   -- remove_inode_page
> -- f2fs_evict_inode
>   -- evict
> -- d_delete
>   -- vfs_rmdir
>
>
>
>
> On Mon, May 13, 2013 at 2:05 AM, Haicheng Li 
> wrote:
>>
>>
>> On Sun, May 12, 2013 at 10:46:27PM -0500, Chris Fries wrote:
>> > From: Chris Fries 
>> >
>> > Panic loops while running LTP fsstress has been able to get
>> > a disk into two different panic loops from dec_valid_node_count.
>> > f2fs.h:714  BUG_ON(sbi->total_valid_node_count < count);
>>
>> This is interesting catch.
>> from the code, dec_valid_node_count() is only called by truncate_node():
>> dec_valid_node_count(sbi, dn->inode, 1);
>>
>> So the failure in your test means that sbi->total_valid_node_count < 1,
>> i.e. equal to 0. This should be an unexpected status.
>>
>> I think a better solution should be to avoid such over truncate_node
>> situation.
>>
>> How do you think?
>>
>> > Once, it happens during recovery itself, and the disk would cause
>> > a panic every time it mounted.
>> >
>> > Another time, it would happen during garbage collection, so the disk
>> > would cause a panic within 200 seconds of mounting.
>> >
>> > Removing this BUG_ON hasn't shown any side effects, so let's take it
>> > out and monitor.
>> >
>> > Signed-off-by: Chris Fries 
>> >
>> > ---
>> >  fs/f2fs/f2fs.h | 18 +++---
>> >  1 file changed, 15 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>> > index e80a87c..b8e9679 100644
>> > --- a/fs/f2fs/f2fs.h
>> > +++ b/fs/f2fs/f2fs.h
>> > @@ -722,9 +722,21 @@ static inline void dec_valid_node_count(struct
>> > f2fs_sb_info *sbi,
>> >  {
>> >   spin_lock(>stat_lock);
>> >
>> > - BUG_ON(sbi->total_valid_block_count < count);
>> > - BUG_ON(sbi->total_valid_node_count < count);
>> > - BUG_ON(inode->i_blocks < count);
>> > + if (sbi->total_valid_block_count < count) {
>> > + WARN(1, "F2FS: total_valid_block_count too small- %d vs
>> > %d\n",
>> > + (unsigned int)sbi->total_valid_block_count,
>> > count);
>> > + count = sbi->total_valid_block_count;
>> > + }
>> > + if (sbi->total_valid_node_count < count) {
>> > + WARN(1, "F2FS: total_valid_node_count too small- 

Re: [PATCH] f2fs: Remove BUG_ON in dec_valid_node_count

2013-05-14 Thread Russell Knize
The node accounting errors seem to be creeping in during recovery.
Once the error is introduced in the total counts, it gets recorded in
the next checkpoint and is carried forward.  Often times the error
crops-up after a failed recovery attempt, such as when recovery
attempts to recover an inode belonging to a subdirectory that was
previously removed.

We've also seen accounting errors elsewhere, such as the block bitmap
management in update_sit_entry().  In one case, we bug-out on some
random unlink operation because truncate_data_blocks_range() tries to
invalidate a block whose bit is not set in the bitmap:

kernel BUG at ../../../../../../kernel/fs/f2fs/segment.c:199!
...
do_unlinkat()
evict()
f2fs_evict_inode()
f2fs_truncate()
truncate_data_blocks_range()
invalidate_blocks()
update_sit_entry()

In another case, we bug-out during recovery while trying to allocate
the dummy page for an fsync.  In this case, the page is already
allocated:

kernel BUG at ../../../../../../kernel/fs/f2fs/segment.c:211!
...
vfs_kern_mount()
mount_fs()
f2fs_mount()
mount_bdev()
f2fs_fill_super()
recover_fsync_data()
recover_data_page()
update_sit_entry()


I've experimented with some changes to deal with the bug-outs in
update_sit_entry() and regenerate the block counts that are maintained
by the seg_entry and sec_entry structures, but I am not confident that
there still won't exist a disparity between block allocations and
their respective nodes.

Some sort of consistency checker would be handy.  ;)

We're going to run these torture tests with disable_roll_forward to
try to at least confirm that we should be focused on the recovery
process.

Russ W. Knize
Mobile Devices Android Platform
Desk: 847-523-9381
Mobile: 224-622-1391


On Mon, May 13, 2013 at 10:41 AM, Chris Fries ccfr...@gmail.com wrote:
 Yes, your observations are correct, total_valid_node_count is 0, and the
 caller is truncate_node.  It could be a bug in recovery accounting
 somewhere, but I haven't been able to save a copy of the disk image just
 before the recovery or rmdir failure, which is the best place to start
 investigating the bug.  It could also be some bad state of the disk during
 panic restart rather than a straight recovery bug.

 Adding some traces or logs into recovery might give clues too.

 The panics I've seen are both during mount and during rmdir.

 -- truncate_node
   -- remove_inode_page
 -- f2fs_evict_inode
   -- evict
 -- recover_orphan_inodes
   -- f2fs_fill_super
 -- mount_bdev

 or

 -- truncate_node
   -- remove_inode_page
 -- f2fs_evict_inode
   -- evict
 -- d_delete
   -- vfs_rmdir




 On Mon, May 13, 2013 at 2:05 AM, Haicheng Li haicheng...@linux.intel.com
 wrote:


 On Sun, May 12, 2013 at 10:46:27PM -0500, Chris Fries wrote:
  From: Chris Fries c.fr...@motorola.com
 
  Panic loops while running LTP fsstress has been able to get
  a disk into two different panic loops from dec_valid_node_count.
  f2fs.h:714  BUG_ON(sbi-total_valid_node_count  count);

 This is interesting catch.
 from the code, dec_valid_node_count() is only called by truncate_node():
 dec_valid_node_count(sbi, dn-inode, 1);

 So the failure in your test means that sbi-total_valid_node_count  1,
 i.e. equal to 0. This should be an unexpected status.

 I think a better solution should be to avoid such over truncate_node
 situation.

 How do you think?

  Once, it happens during recovery itself, and the disk would cause
  a panic every time it mounted.
 
  Another time, it would happen during garbage collection, so the disk
  would cause a panic within 200 seconds of mounting.
 
  Removing this BUG_ON hasn't shown any side effects, so let's take it
  out and monitor.
 
  Signed-off-by: Chris Fries c.fr...@motorola.com
 
  ---
   fs/f2fs/f2fs.h | 18 +++---
   1 file changed, 15 insertions(+), 3 deletions(-)
 
  diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
  index e80a87c..b8e9679 100644
  --- a/fs/f2fs/f2fs.h
  +++ b/fs/f2fs/f2fs.h
  @@ -722,9 +722,21 @@ static inline void dec_valid_node_count(struct
  f2fs_sb_info *sbi,
   {
spin_lock(sbi-stat_lock);
 
  - BUG_ON(sbi-total_valid_block_count  count);
  - BUG_ON(sbi-total_valid_node_count  count);
  - BUG_ON(inode-i_blocks  count);
  + if (sbi-total_valid_block_count  count) {
  + WARN(1, F2FS: total_valid_block_count too small- %d vs
  %d\n,
  + (unsigned int)sbi-total_valid_block_count,
  count);
  + count = sbi-total_valid_block_count;
  + }
  + if (sbi-total_valid_node_count  count) {
  + WARN(1, F2FS: total_valid_node_count too small- %d vs
  %d\n,
  + sbi-total_valid_node_count, count);
  + count = 

Re: [PATCH] f2fs: Remove BUG_ON in dec_valid_node_count

2013-05-13 Thread Haicheng Li

On Sun, May 12, 2013 at 10:46:27PM -0500, Chris Fries wrote:
> From: Chris Fries 
> 
> Panic loops while running LTP fsstress has been able to get
> a disk into two different panic loops from dec_valid_node_count.
> f2fs.h:714  BUG_ON(sbi->total_valid_node_count < count);

This is interesting catch.
from the code, dec_valid_node_count() is only called by truncate_node():
dec_valid_node_count(sbi, dn->inode, 1);

So the failure in your test means that sbi->total_valid_node_count < 1,
i.e. equal to 0. This should be an unexpected status.

I think a better solution should be to avoid such over truncate_node situation.

How do you think?

> Once, it happens during recovery itself, and the disk would cause
> a panic every time it mounted.
> 
> Another time, it would happen during garbage collection, so the disk
> would cause a panic within 200 seconds of mounting.
> 
> Removing this BUG_ON hasn't shown any side effects, so let's take it
> out and monitor.
> 
> Signed-off-by: Chris Fries 
> 
> ---
>  fs/f2fs/f2fs.h | 18 +++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index e80a87c..b8e9679 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -722,9 +722,21 @@ static inline void dec_valid_node_count(struct 
> f2fs_sb_info *sbi,
>  {
>   spin_lock(>stat_lock);
>  
> - BUG_ON(sbi->total_valid_block_count < count);
> - BUG_ON(sbi->total_valid_node_count < count);
> - BUG_ON(inode->i_blocks < count);
> + if (sbi->total_valid_block_count < count) {
> + WARN(1, "F2FS: total_valid_block_count too small- %d vs %d\n",
> + (unsigned int)sbi->total_valid_block_count, count);
> + count = sbi->total_valid_block_count;
> + }
> + if (sbi->total_valid_node_count < count) {
> + WARN(1, "F2FS: total_valid_node_count too small- %d vs %d\n",
> + sbi->total_valid_node_count, count);
> + count = sbi->total_valid_node_count;
> + }
> + if (inode->i_blocks < count) {
> + WARN(1, "F2FS: inode->i_blocks too small - %d vs %d\n",
> + (unsigned int)inode->i_blocks, count);
> + count = sbi->total_valid_node_count;
> + }
>  
>   inode->i_blocks -= count;
>   sbi->total_valid_node_count -= count;
> -- 
> 1.8.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] f2fs: Remove BUG_ON in dec_valid_node_count

2013-05-13 Thread Haicheng Li

On Sun, May 12, 2013 at 10:46:27PM -0500, Chris Fries wrote:
 From: Chris Fries c.fr...@motorola.com
 
 Panic loops while running LTP fsstress has been able to get
 a disk into two different panic loops from dec_valid_node_count.
 f2fs.h:714  BUG_ON(sbi-total_valid_node_count  count);

This is interesting catch.
from the code, dec_valid_node_count() is only called by truncate_node():
dec_valid_node_count(sbi, dn-inode, 1);

So the failure in your test means that sbi-total_valid_node_count  1,
i.e. equal to 0. This should be an unexpected status.

I think a better solution should be to avoid such over truncate_node situation.

How do you think?

 Once, it happens during recovery itself, and the disk would cause
 a panic every time it mounted.
 
 Another time, it would happen during garbage collection, so the disk
 would cause a panic within 200 seconds of mounting.
 
 Removing this BUG_ON hasn't shown any side effects, so let's take it
 out and monitor.
 
 Signed-off-by: Chris Fries c.fr...@motorola.com
 
 ---
  fs/f2fs/f2fs.h | 18 +++---
  1 file changed, 15 insertions(+), 3 deletions(-)
 
 diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
 index e80a87c..b8e9679 100644
 --- a/fs/f2fs/f2fs.h
 +++ b/fs/f2fs/f2fs.h
 @@ -722,9 +722,21 @@ static inline void dec_valid_node_count(struct 
 f2fs_sb_info *sbi,
  {
   spin_lock(sbi-stat_lock);
  
 - BUG_ON(sbi-total_valid_block_count  count);
 - BUG_ON(sbi-total_valid_node_count  count);
 - BUG_ON(inode-i_blocks  count);
 + if (sbi-total_valid_block_count  count) {
 + WARN(1, F2FS: total_valid_block_count too small- %d vs %d\n,
 + (unsigned int)sbi-total_valid_block_count, count);
 + count = sbi-total_valid_block_count;
 + }
 + if (sbi-total_valid_node_count  count) {
 + WARN(1, F2FS: total_valid_node_count too small- %d vs %d\n,
 + sbi-total_valid_node_count, count);
 + count = sbi-total_valid_node_count;
 + }
 + if (inode-i_blocks  count) {
 + WARN(1, F2FS: inode-i_blocks too small - %d vs %d\n,
 + (unsigned int)inode-i_blocks, count);
 + count = sbi-total_valid_node_count;
 + }
  
   inode-i_blocks -= count;
   sbi-total_valid_node_count -= count;
 -- 
 1.8.0
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] f2fs: Remove BUG_ON in dec_valid_node_count

2013-05-12 Thread Chris Fries
From: Chris Fries 

Panic loops while running LTP fsstress has been able to get
a disk into two different panic loops from dec_valid_node_count.
f2fs.h:714  BUG_ON(sbi->total_valid_node_count < count);

Once, it happens during recovery itself, and the disk would cause
a panic every time it mounted.

Another time, it would happen during garbage collection, so the disk
would cause a panic within 200 seconds of mounting.

Removing this BUG_ON hasn't shown any side effects, so let's take it
out and monitor.

Signed-off-by: Chris Fries 

---
 fs/f2fs/f2fs.h | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index e80a87c..b8e9679 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -722,9 +722,21 @@ static inline void dec_valid_node_count(struct 
f2fs_sb_info *sbi,
 {
spin_lock(>stat_lock);
 
-   BUG_ON(sbi->total_valid_block_count < count);
-   BUG_ON(sbi->total_valid_node_count < count);
-   BUG_ON(inode->i_blocks < count);
+   if (sbi->total_valid_block_count < count) {
+   WARN(1, "F2FS: total_valid_block_count too small- %d vs %d\n",
+   (unsigned int)sbi->total_valid_block_count, count);
+   count = sbi->total_valid_block_count;
+   }
+   if (sbi->total_valid_node_count < count) {
+   WARN(1, "F2FS: total_valid_node_count too small- %d vs %d\n",
+   sbi->total_valid_node_count, count);
+   count = sbi->total_valid_node_count;
+   }
+   if (inode->i_blocks < count) {
+   WARN(1, "F2FS: inode->i_blocks too small - %d vs %d\n",
+   (unsigned int)inode->i_blocks, count);
+   count = sbi->total_valid_node_count;
+   }
 
inode->i_blocks -= count;
sbi->total_valid_node_count -= count;
-- 
1.8.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] f2fs: Remove BUG_ON in dec_valid_node_count

2013-05-12 Thread Chris Fries
From: Chris Fries c.fr...@motorola.com

Panic loops while running LTP fsstress has been able to get
a disk into two different panic loops from dec_valid_node_count.
f2fs.h:714  BUG_ON(sbi-total_valid_node_count  count);

Once, it happens during recovery itself, and the disk would cause
a panic every time it mounted.

Another time, it would happen during garbage collection, so the disk
would cause a panic within 200 seconds of mounting.

Removing this BUG_ON hasn't shown any side effects, so let's take it
out and monitor.

Signed-off-by: Chris Fries c.fr...@motorola.com

---
 fs/f2fs/f2fs.h | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index e80a87c..b8e9679 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -722,9 +722,21 @@ static inline void dec_valid_node_count(struct 
f2fs_sb_info *sbi,
 {
spin_lock(sbi-stat_lock);
 
-   BUG_ON(sbi-total_valid_block_count  count);
-   BUG_ON(sbi-total_valid_node_count  count);
-   BUG_ON(inode-i_blocks  count);
+   if (sbi-total_valid_block_count  count) {
+   WARN(1, F2FS: total_valid_block_count too small- %d vs %d\n,
+   (unsigned int)sbi-total_valid_block_count, count);
+   count = sbi-total_valid_block_count;
+   }
+   if (sbi-total_valid_node_count  count) {
+   WARN(1, F2FS: total_valid_node_count too small- %d vs %d\n,
+   sbi-total_valid_node_count, count);
+   count = sbi-total_valid_node_count;
+   }
+   if (inode-i_blocks  count) {
+   WARN(1, F2FS: inode-i_blocks too small - %d vs %d\n,
+   (unsigned int)inode-i_blocks, count);
+   count = sbi-total_valid_node_count;
+   }
 
inode-i_blocks -= count;
sbi-total_valid_node_count -= count;
-- 
1.8.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/