Re: [RFC PATCH 2/2] f2fs: fix allocation failure

2016-10-14 Thread Jaegeuk Kim
On Fri, Oct 14, 2016 at 10:09:29PM +0800, Chao Yu wrote:
...

> >> Finally it needs to update the dirty time of inode into inode page,
> >> and writeback the page, however, before that, we didn't count the inode
> >> as imeta data. So f2fs won't be aware of dirty metadata page count is
> >> exceeded watermark of GC, result in encountering panic when allocating
> >> free segment.
> >>
> >> There is an easy way to produce this bug:
> >> 1. mount with lazytime option
> >> 2. fragment space
> >> 3. touch all files in the image
> >> 4. umount
> > 
> > I think modifying has_not_enough_secs() is enough like this.
> 
> Seems it won't solve this problem as I tested, the root cause here is that if
> huge number of inode updates due to time changes, actually inodes won't be set
> dirty as we return directly if flags is I_DIRTY_TIME in f2fs_dirty_inode, then
> once inode cache is been shrunk, inodes in lru list will be set dirty in iput:
> 
> In iput()
>   if (inode->i_nlink && (inode->i_state & I_DIRTY_TIME)) {
>   atomic_inc(>i_count);
>   inode->i_state &= ~I_DIRTY_TIME;
>   spin_unlock(>i_lock);
>   trace_writeback_lazytime_iput(inode);
>   mark_inode_dirty_sync(inode);
>   goto retry;
> 
> After that once someone calls write_checkpoint(), if number of dirty imeta 
> data
> is exceeded remain blocks in free segments, we will encounter this bug.
> 
> In order to fix this bug, I try to account these delayed dirtied inodes to
> detect actual dirty metadata number, by this way we can set delayed dirtied
> inode dirty and flush them in advance to avoid the dirty metadata number
> exceeding blocks number in free segments, finally allocation failure issue can
> be solved.

Yup, I did understand like that before. But actually, I tested this patch and
unfortunately I got a deadlock when running fsstress quickly. So, this makes me
rethink the whole design in terms of how to reuse the existing codes.

I think we're able to consider inode cache apart from FS consistency. The
IMETA list was actually keeping all the dirty inodes for FS consistency, but
it seems we don't need to store all of them which will consume free segments
dramatically during checkpoint.

I wrote a patch to add dirty inodes into IMETA list selectively.
The goal is to avoid adding dirty inodes by mark_inode_dirty_sync() called by
vfs, not by f2fs. Instead, those changes will be done by f2fs_write_inode().

Could you take a look at this?

>From bc165410f30274dce76e053b922d39a61b012e43 Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim 
Date: Fri, 14 Oct 2016 11:51:23 -0700
Subject: [PATCH] f2fs: keep dirty inodes selectively for checkpoint

This is to avoid no free segment bug during checkpoint caused by a number of
dirty inodes.

The case was reported by Chao like this.

1. mount with lazytime option
2. fragment space
3. touch all files in the image
4. umount

In this case, we actually don't need to flush dirty inode to inode page during
checkpoint.

Reported-by: Chao Yu 
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/acl.c  |  2 +-
 fs/f2fs/dir.c  |  6 +++---
 fs/f2fs/extent_cache.c |  2 +-
 fs/f2fs/f2fs.h | 25 +
 fs/f2fs/file.c |  8 
 fs/f2fs/inline.c   |  2 +-
 fs/f2fs/inode.c| 10 +-
 fs/f2fs/namei.c|  6 +++---
 fs/f2fs/super.c|  3 +++
 fs/f2fs/xattr.c|  4 ++--
 10 files changed, 36 insertions(+), 32 deletions(-)

diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c
index 1e29630..6266fba 100644
--- a/fs/f2fs/acl.c
+++ b/fs/f2fs/acl.c
@@ -386,7 +386,7 @@ int f2fs_init_acl(struct inode *inode, struct inode *dir, 
struct page *ipage,
if (error)
return error;
 
-   f2fs_mark_inode_dirty_sync(inode);
+   f2fs_mark_inode_dirty_sync(inode, true);
 
if (default_acl) {
error = __f2fs_set_acl(inode, ACL_TYPE_DEFAULT, default_acl,
diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index a46079a..e71793c 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -313,7 +313,7 @@ void f2fs_set_link(struct inode *dir, struct f2fs_dir_entry 
*de,
set_page_dirty(page);
 
dir->i_mtime = dir->i_ctime = CURRENT_TIME;
-   f2fs_mark_inode_dirty_sync(dir);
+   f2fs_mark_inode_dirty_sync(dir, false);
f2fs_put_page(page, 1);
 }
 
@@ -466,7 +466,7 @@ void update_parent_metadata(struct inode *dir, struct inode 
*inode,
clear_inode_flag(inode, FI_NEW_INODE);
}
dir->i_mtime = dir->i_ctime = CURRENT_TIME;
-   f2fs_mark_inode_dirty_sync(dir);
+   f2fs_mark_inode_dirty_sync(dir, false);
 
if (F2FS_I(dir)->i_current_depth != current_depth)
f2fs_i_depth_write(dir, current_depth);
@@ -731,7 +731,7 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, 
struct page *page,

Re: [RFC PATCH 2/2] f2fs: fix allocation failure

2016-10-14 Thread Jaegeuk Kim
On Fri, Oct 14, 2016 at 10:09:29PM +0800, Chao Yu wrote:
...

> >> Finally it needs to update the dirty time of inode into inode page,
> >> and writeback the page, however, before that, we didn't count the inode
> >> as imeta data. So f2fs won't be aware of dirty metadata page count is
> >> exceeded watermark of GC, result in encountering panic when allocating
> >> free segment.
> >>
> >> There is an easy way to produce this bug:
> >> 1. mount with lazytime option
> >> 2. fragment space
> >> 3. touch all files in the image
> >> 4. umount
> > 
> > I think modifying has_not_enough_secs() is enough like this.
> 
> Seems it won't solve this problem as I tested, the root cause here is that if
> huge number of inode updates due to time changes, actually inodes won't be set
> dirty as we return directly if flags is I_DIRTY_TIME in f2fs_dirty_inode, then
> once inode cache is been shrunk, inodes in lru list will be set dirty in iput:
> 
> In iput()
>   if (inode->i_nlink && (inode->i_state & I_DIRTY_TIME)) {
>   atomic_inc(>i_count);
>   inode->i_state &= ~I_DIRTY_TIME;
>   spin_unlock(>i_lock);
>   trace_writeback_lazytime_iput(inode);
>   mark_inode_dirty_sync(inode);
>   goto retry;
> 
> After that once someone calls write_checkpoint(), if number of dirty imeta 
> data
> is exceeded remain blocks in free segments, we will encounter this bug.
> 
> In order to fix this bug, I try to account these delayed dirtied inodes to
> detect actual dirty metadata number, by this way we can set delayed dirtied
> inode dirty and flush them in advance to avoid the dirty metadata number
> exceeding blocks number in free segments, finally allocation failure issue can
> be solved.

Yup, I did understand like that before. But actually, I tested this patch and
unfortunately I got a deadlock when running fsstress quickly. So, this makes me
rethink the whole design in terms of how to reuse the existing codes.

I think we're able to consider inode cache apart from FS consistency. The
IMETA list was actually keeping all the dirty inodes for FS consistency, but
it seems we don't need to store all of them which will consume free segments
dramatically during checkpoint.

I wrote a patch to add dirty inodes into IMETA list selectively.
The goal is to avoid adding dirty inodes by mark_inode_dirty_sync() called by
vfs, not by f2fs. Instead, those changes will be done by f2fs_write_inode().

Could you take a look at this?

>From bc165410f30274dce76e053b922d39a61b012e43 Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim 
Date: Fri, 14 Oct 2016 11:51:23 -0700
Subject: [PATCH] f2fs: keep dirty inodes selectively for checkpoint

This is to avoid no free segment bug during checkpoint caused by a number of
dirty inodes.

The case was reported by Chao like this.

1. mount with lazytime option
2. fragment space
3. touch all files in the image
4. umount

In this case, we actually don't need to flush dirty inode to inode page during
checkpoint.

Reported-by: Chao Yu 
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/acl.c  |  2 +-
 fs/f2fs/dir.c  |  6 +++---
 fs/f2fs/extent_cache.c |  2 +-
 fs/f2fs/f2fs.h | 25 +
 fs/f2fs/file.c |  8 
 fs/f2fs/inline.c   |  2 +-
 fs/f2fs/inode.c| 10 +-
 fs/f2fs/namei.c|  6 +++---
 fs/f2fs/super.c|  3 +++
 fs/f2fs/xattr.c|  4 ++--
 10 files changed, 36 insertions(+), 32 deletions(-)

diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c
index 1e29630..6266fba 100644
--- a/fs/f2fs/acl.c
+++ b/fs/f2fs/acl.c
@@ -386,7 +386,7 @@ int f2fs_init_acl(struct inode *inode, struct inode *dir, 
struct page *ipage,
if (error)
return error;
 
-   f2fs_mark_inode_dirty_sync(inode);
+   f2fs_mark_inode_dirty_sync(inode, true);
 
if (default_acl) {
error = __f2fs_set_acl(inode, ACL_TYPE_DEFAULT, default_acl,
diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index a46079a..e71793c 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -313,7 +313,7 @@ void f2fs_set_link(struct inode *dir, struct f2fs_dir_entry 
*de,
set_page_dirty(page);
 
dir->i_mtime = dir->i_ctime = CURRENT_TIME;
-   f2fs_mark_inode_dirty_sync(dir);
+   f2fs_mark_inode_dirty_sync(dir, false);
f2fs_put_page(page, 1);
 }
 
@@ -466,7 +466,7 @@ void update_parent_metadata(struct inode *dir, struct inode 
*inode,
clear_inode_flag(inode, FI_NEW_INODE);
}
dir->i_mtime = dir->i_ctime = CURRENT_TIME;
-   f2fs_mark_inode_dirty_sync(dir);
+   f2fs_mark_inode_dirty_sync(dir, false);
 
if (F2FS_I(dir)->i_current_depth != current_depth)
f2fs_i_depth_write(dir, current_depth);
@@ -731,7 +731,7 @@ void f2fs_delete_entry(struct f2fs_dir_entry *dentry, 
struct page *page,
set_page_dirty(page);
 
dir->i_ctime = dir->i_mtime = 

Re: [RFC PATCH 2/2] f2fs: fix allocation failure

2016-10-14 Thread Chao Yu
Hi Jaegeuk,

On 2016/10/14 4:49, Jaegeuk Kim wrote:
> Hi Chao,
> 
> On Thu, Oct 13, 2016 at 12:14:27AM +0800, Chao Yu wrote:
>> From: Chao Yu 
>>
>> tests/generic/251 of fstest reports a f2fs bug in below message:
>>
>> [ cut here ]
>> invalid opcode:  [#1] PREEMPT SMP
>> CPU: 1 PID: 109 Comm: kworker/u8:2 Tainted: GW  O4.8.0-rc4+ #22
>> Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
>> Workqueue: writeback wb_workfn (flush-251:1)
>> task: f33c8000 task.stack: f33c6000
>> EIP: 0060:[] EFLAGS: 00010246 CPU: 1
>> EIP is at new_curseg+0x2c9/0x2d0 [f2fs]
>> EAX: 03f3 EBX: ee3e5000 ECX: 0400 EDX: 03f3
>> ESI:  EDI: 0008 EBP: f33c78c4 ESP: f33c7890
>>  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
>> CR0: 80050033 CR2: b6706000 CR3: 2e8d70c0 CR4: 000406f0
>> Stack:
>>  eb29480c 0004 f27a0290 03f3 0008 f33c78c0 0001 eb294800
>>  0001  ee3e5000 f899dbb4 0290 f33c7924 f89926d6 c10b42b6
>>  0246  eed513e8 0246 f33c78e8 c10b7b4b f33c7924 c178c304
>> Call Trace:
>>  [] allocate_segment_by_default+0x3c6/0x3d0 [f2fs]
>>  [] allocate_data_block+0x13a/0x3c0 [f2fs]
>>  [] do_write_page+0x8b/0x230 [f2fs]
>>  [] write_node_page+0x20/0x30 [f2fs]
>>  [] f2fs_write_node_page+0x1a6/0x340 [f2fs]
>>  [] sync_node_pages+0x4a5/0x590 [f2fs]
>>  [] write_checkpoint+0x218/0x720 [f2fs]
>>  [] f2fs_gc+0x4cd/0x6b0 [f2fs]
>>  [] f2fs_balance_fs+0x18e/0x1b0 [f2fs]
>>  [] f2fs_write_data_page+0x197/0x6f0 [f2fs]
>>  [] f2fs_write_data_pages+0x28e/0x7e0 [f2fs]
>>  [] do_writepages+0x1d/0x40
>>  [] __writeback_single_inode+0x55/0x7e0
>>  [] writeback_sb_inodes+0x21b/0x490
>>  [] wb_writeback+0xdc/0x590
>>  [] wb_workfn+0xf8/0x690
>>  [] process_one_work+0x1a1/0x580
>>  [] worker_thread+0x102/0x440
>>  [] kthread+0xa1/0xc0
>>  [] ret_from_kernel_thread+0xe/0x24
>> EIP: [] new_curseg+0x2c9/0x2d0 [f2fs] SS:ESP 0068:f33c7890
>>
>> The reason is after f2fs enabled lazytime by default, when inode time is
>> changed, we do not set this inode dirty through ->f2fs_dirty_inode, so
>> itime updating will be delayed.
>>
>> Finally it needs to update the dirty time of inode into inode page,
>> and writeback the page, however, before that, we didn't count the inode
>> as imeta data. So f2fs won't be aware of dirty metadata page count is
>> exceeded watermark of GC, result in encountering panic when allocating
>> free segment.
>>
>> There is an easy way to produce this bug:
>> 1. mount with lazytime option
>> 2. fragment space
>> 3. touch all files in the image
>> 4. umount
> 
> I think modifying has_not_enough_secs() is enough like this.

Seems it won't solve this problem as I tested, the root cause here is that if
huge number of inode updates due to time changes, actually inodes won't be set
dirty as we return directly if flags is I_DIRTY_TIME in f2fs_dirty_inode, then
once inode cache is been shrunk, inodes in lru list will be set dirty in iput:

In iput()
if (inode->i_nlink && (inode->i_state & I_DIRTY_TIME)) {
atomic_inc(>i_count);
inode->i_state &= ~I_DIRTY_TIME;
spin_unlock(>i_lock);
trace_writeback_lazytime_iput(inode);
mark_inode_dirty_sync(inode);
goto retry;

After that once someone calls write_checkpoint(), if number of dirty imeta data
is exceeded remain blocks in free segments, we will encounter this bug.

In order to fix this bug, I try to account these delayed dirtied inodes to
detect actual dirty metadata number, by this way we can set delayed dirtied
inode dirty and flush them in advance to avoid the dirty metadata number
exceeding blocks number in free segments, finally allocation failure issue can
be solved.

Thanks,

> 
> ---
>  fs/f2fs/segment.h | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
> index fecb856..a6efb5c 100644
> --- a/fs/f2fs/segment.h
> +++ b/fs/f2fs/segment.h
> @@ -471,11 +471,12 @@ static inline bool need_SSR(struct f2fs_sb_info *sbi)
>  {
>   int node_secs = get_blocktype_secs(sbi, F2FS_DIRTY_NODES);
>   int dent_secs = get_blocktype_secs(sbi, F2FS_DIRTY_DENTS);
> + int imeta_secs = get_blocktype_secs(sbi, F2FS_DIRTY_IMETA);
>  
>   if (test_opt(sbi, LFS))
>   return false;
>  
> - return free_sections(sbi) <= (node_secs + 2 * dent_secs +
> + return free_sections(sbi) <= (node_secs + 2 * dent_secs + imeta_secs +
>   reserved_sections(sbi) + 1);
>  }
>  
> @@ -484,6 +485,7 @@ static inline bool has_not_enough_free_secs(struct 
> f2fs_sb_info *sbi,
>  {
>   int node_secs = get_blocktype_secs(sbi, F2FS_DIRTY_NODES);
>   int dent_secs = get_blocktype_secs(sbi, F2FS_DIRTY_DENTS);
> + int imeta_secs = get_blocktype_secs(sbi, F2FS_DIRTY_IMETA);
>  
> 

Re: [RFC PATCH 2/2] f2fs: fix allocation failure

2016-10-14 Thread Chao Yu
Hi Jaegeuk,

On 2016/10/14 4:49, Jaegeuk Kim wrote:
> Hi Chao,
> 
> On Thu, Oct 13, 2016 at 12:14:27AM +0800, Chao Yu wrote:
>> From: Chao Yu 
>>
>> tests/generic/251 of fstest reports a f2fs bug in below message:
>>
>> [ cut here ]
>> invalid opcode:  [#1] PREEMPT SMP
>> CPU: 1 PID: 109 Comm: kworker/u8:2 Tainted: GW  O4.8.0-rc4+ #22
>> Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
>> Workqueue: writeback wb_workfn (flush-251:1)
>> task: f33c8000 task.stack: f33c6000
>> EIP: 0060:[] EFLAGS: 00010246 CPU: 1
>> EIP is at new_curseg+0x2c9/0x2d0 [f2fs]
>> EAX: 03f3 EBX: ee3e5000 ECX: 0400 EDX: 03f3
>> ESI:  EDI: 0008 EBP: f33c78c4 ESP: f33c7890
>>  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
>> CR0: 80050033 CR2: b6706000 CR3: 2e8d70c0 CR4: 000406f0
>> Stack:
>>  eb29480c 0004 f27a0290 03f3 0008 f33c78c0 0001 eb294800
>>  0001  ee3e5000 f899dbb4 0290 f33c7924 f89926d6 c10b42b6
>>  0246  eed513e8 0246 f33c78e8 c10b7b4b f33c7924 c178c304
>> Call Trace:
>>  [] allocate_segment_by_default+0x3c6/0x3d0 [f2fs]
>>  [] allocate_data_block+0x13a/0x3c0 [f2fs]
>>  [] do_write_page+0x8b/0x230 [f2fs]
>>  [] write_node_page+0x20/0x30 [f2fs]
>>  [] f2fs_write_node_page+0x1a6/0x340 [f2fs]
>>  [] sync_node_pages+0x4a5/0x590 [f2fs]
>>  [] write_checkpoint+0x218/0x720 [f2fs]
>>  [] f2fs_gc+0x4cd/0x6b0 [f2fs]
>>  [] f2fs_balance_fs+0x18e/0x1b0 [f2fs]
>>  [] f2fs_write_data_page+0x197/0x6f0 [f2fs]
>>  [] f2fs_write_data_pages+0x28e/0x7e0 [f2fs]
>>  [] do_writepages+0x1d/0x40
>>  [] __writeback_single_inode+0x55/0x7e0
>>  [] writeback_sb_inodes+0x21b/0x490
>>  [] wb_writeback+0xdc/0x590
>>  [] wb_workfn+0xf8/0x690
>>  [] process_one_work+0x1a1/0x580
>>  [] worker_thread+0x102/0x440
>>  [] kthread+0xa1/0xc0
>>  [] ret_from_kernel_thread+0xe/0x24
>> EIP: [] new_curseg+0x2c9/0x2d0 [f2fs] SS:ESP 0068:f33c7890
>>
>> The reason is after f2fs enabled lazytime by default, when inode time is
>> changed, we do not set this inode dirty through ->f2fs_dirty_inode, so
>> itime updating will be delayed.
>>
>> Finally it needs to update the dirty time of inode into inode page,
>> and writeback the page, however, before that, we didn't count the inode
>> as imeta data. So f2fs won't be aware of dirty metadata page count is
>> exceeded watermark of GC, result in encountering panic when allocating
>> free segment.
>>
>> There is an easy way to produce this bug:
>> 1. mount with lazytime option
>> 2. fragment space
>> 3. touch all files in the image
>> 4. umount
> 
> I think modifying has_not_enough_secs() is enough like this.

Seems it won't solve this problem as I tested, the root cause here is that if
huge number of inode updates due to time changes, actually inodes won't be set
dirty as we return directly if flags is I_DIRTY_TIME in f2fs_dirty_inode, then
once inode cache is been shrunk, inodes in lru list will be set dirty in iput:

In iput()
if (inode->i_nlink && (inode->i_state & I_DIRTY_TIME)) {
atomic_inc(>i_count);
inode->i_state &= ~I_DIRTY_TIME;
spin_unlock(>i_lock);
trace_writeback_lazytime_iput(inode);
mark_inode_dirty_sync(inode);
goto retry;

After that once someone calls write_checkpoint(), if number of dirty imeta data
is exceeded remain blocks in free segments, we will encounter this bug.

In order to fix this bug, I try to account these delayed dirtied inodes to
detect actual dirty metadata number, by this way we can set delayed dirtied
inode dirty and flush them in advance to avoid the dirty metadata number
exceeding blocks number in free segments, finally allocation failure issue can
be solved.

Thanks,

> 
> ---
>  fs/f2fs/segment.h | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
> index fecb856..a6efb5c 100644
> --- a/fs/f2fs/segment.h
> +++ b/fs/f2fs/segment.h
> @@ -471,11 +471,12 @@ static inline bool need_SSR(struct f2fs_sb_info *sbi)
>  {
>   int node_secs = get_blocktype_secs(sbi, F2FS_DIRTY_NODES);
>   int dent_secs = get_blocktype_secs(sbi, F2FS_DIRTY_DENTS);
> + int imeta_secs = get_blocktype_secs(sbi, F2FS_DIRTY_IMETA);
>  
>   if (test_opt(sbi, LFS))
>   return false;
>  
> - return free_sections(sbi) <= (node_secs + 2 * dent_secs +
> + return free_sections(sbi) <= (node_secs + 2 * dent_secs + imeta_secs +
>   reserved_sections(sbi) + 1);
>  }
>  
> @@ -484,6 +485,7 @@ static inline bool has_not_enough_free_secs(struct 
> f2fs_sb_info *sbi,
>  {
>   int node_secs = get_blocktype_secs(sbi, F2FS_DIRTY_NODES);
>   int dent_secs = get_blocktype_secs(sbi, F2FS_DIRTY_DENTS);
> + int imeta_secs = get_blocktype_secs(sbi, F2FS_DIRTY_IMETA);
>  
>   node_secs += 

Re: [RFC PATCH 2/2] f2fs: fix allocation failure

2016-10-13 Thread Jaegeuk Kim
Hi Chao,

On Thu, Oct 13, 2016 at 12:14:27AM +0800, Chao Yu wrote:
> From: Chao Yu 
> 
> tests/generic/251 of fstest reports a f2fs bug in below message:
> 
> [ cut here ]
> invalid opcode:  [#1] PREEMPT SMP
> CPU: 1 PID: 109 Comm: kworker/u8:2 Tainted: GW  O4.8.0-rc4+ #22
> Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
> Workqueue: writeback wb_workfn (flush-251:1)
> task: f33c8000 task.stack: f33c6000
> EIP: 0060:[] EFLAGS: 00010246 CPU: 1
> EIP is at new_curseg+0x2c9/0x2d0 [f2fs]
> EAX: 03f3 EBX: ee3e5000 ECX: 0400 EDX: 03f3
> ESI:  EDI: 0008 EBP: f33c78c4 ESP: f33c7890
>  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
> CR0: 80050033 CR2: b6706000 CR3: 2e8d70c0 CR4: 000406f0
> Stack:
>  eb29480c 0004 f27a0290 03f3 0008 f33c78c0 0001 eb294800
>  0001  ee3e5000 f899dbb4 0290 f33c7924 f89926d6 c10b42b6
>  0246  eed513e8 0246 f33c78e8 c10b7b4b f33c7924 c178c304
> Call Trace:
>  [] allocate_segment_by_default+0x3c6/0x3d0 [f2fs]
>  [] allocate_data_block+0x13a/0x3c0 [f2fs]
>  [] do_write_page+0x8b/0x230 [f2fs]
>  [] write_node_page+0x20/0x30 [f2fs]
>  [] f2fs_write_node_page+0x1a6/0x340 [f2fs]
>  [] sync_node_pages+0x4a5/0x590 [f2fs]
>  [] write_checkpoint+0x218/0x720 [f2fs]
>  [] f2fs_gc+0x4cd/0x6b0 [f2fs]
>  [] f2fs_balance_fs+0x18e/0x1b0 [f2fs]
>  [] f2fs_write_data_page+0x197/0x6f0 [f2fs]
>  [] f2fs_write_data_pages+0x28e/0x7e0 [f2fs]
>  [] do_writepages+0x1d/0x40
>  [] __writeback_single_inode+0x55/0x7e0
>  [] writeback_sb_inodes+0x21b/0x490
>  [] wb_writeback+0xdc/0x590
>  [] wb_workfn+0xf8/0x690
>  [] process_one_work+0x1a1/0x580
>  [] worker_thread+0x102/0x440
>  [] kthread+0xa1/0xc0
>  [] ret_from_kernel_thread+0xe/0x24
> EIP: [] new_curseg+0x2c9/0x2d0 [f2fs] SS:ESP 0068:f33c7890
> 
> The reason is after f2fs enabled lazytime by default, when inode time is
> changed, we do not set this inode dirty through ->f2fs_dirty_inode, so
> itime updating will be delayed.
> 
> Finally it needs to update the dirty time of inode into inode page,
> and writeback the page, however, before that, we didn't count the inode
> as imeta data. So f2fs won't be aware of dirty metadata page count is
> exceeded watermark of GC, result in encountering panic when allocating
> free segment.
> 
> There is an easy way to produce this bug:
> 1. mount with lazytime option
> 2. fragment space
> 3. touch all files in the image
> 4. umount

I think modifying has_not_enough_secs() is enough like this.

---
 fs/f2fs/segment.h | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index fecb856..a6efb5c 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -471,11 +471,12 @@ static inline bool need_SSR(struct f2fs_sb_info *sbi)
 {
int node_secs = get_blocktype_secs(sbi, F2FS_DIRTY_NODES);
int dent_secs = get_blocktype_secs(sbi, F2FS_DIRTY_DENTS);
+   int imeta_secs = get_blocktype_secs(sbi, F2FS_DIRTY_IMETA);
 
if (test_opt(sbi, LFS))
return false;
 
-   return free_sections(sbi) <= (node_secs + 2 * dent_secs +
+   return free_sections(sbi) <= (node_secs + 2 * dent_secs + imeta_secs +
reserved_sections(sbi) + 1);
 }
 
@@ -484,6 +485,7 @@ static inline bool has_not_enough_free_secs(struct 
f2fs_sb_info *sbi,
 {
int node_secs = get_blocktype_secs(sbi, F2FS_DIRTY_NODES);
int dent_secs = get_blocktype_secs(sbi, F2FS_DIRTY_DENTS);
+   int imeta_secs = get_blocktype_secs(sbi, F2FS_DIRTY_IMETA);
 
node_secs += get_blocktype_secs(sbi, F2FS_DIRTY_IMETA);
 
@@ -491,7 +493,8 @@ static inline bool has_not_enough_free_secs(struct 
f2fs_sb_info *sbi,
return false;
 
return (free_sections(sbi) + freed) <=
-   (node_secs + 2 * dent_secs + reserved_sections(sbi) + needed);
+   (node_secs + 2 * dent_secs + imeta_secs +
+   reserved_sections(sbi) + needed);
 }
 
 static inline bool excess_prefree_segs(struct f2fs_sb_info *sbi)
-- 
2.8.3

Thanks,

> 
> Introduce itime data type and related flow to trace & flush delayed
> updating inode to fix this issue.
> 
> Signed-off-by: Chao Yu 
> ---
>  fs/f2fs/checkpoint.c | 37 
>  fs/f2fs/f2fs.h   |  5 +
>  fs/f2fs/file.c   |  1 +
>  fs/f2fs/namei.c  | 38 +
>  fs/f2fs/segment.h| 11 ++
>  fs/f2fs/super.c  | 59 
> +---
>  6 files changed, 139 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index d95fada..e27c64f 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -928,6 +928,43 @@ int f2fs_sync_inode_meta(struct f2fs_sb_info *sbi)
>   return 0;
>  }
>  
> +int 

Re: [RFC PATCH 2/2] f2fs: fix allocation failure

2016-10-13 Thread Jaegeuk Kim
Hi Chao,

On Thu, Oct 13, 2016 at 12:14:27AM +0800, Chao Yu wrote:
> From: Chao Yu 
> 
> tests/generic/251 of fstest reports a f2fs bug in below message:
> 
> [ cut here ]
> invalid opcode:  [#1] PREEMPT SMP
> CPU: 1 PID: 109 Comm: kworker/u8:2 Tainted: GW  O4.8.0-rc4+ #22
> Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
> Workqueue: writeback wb_workfn (flush-251:1)
> task: f33c8000 task.stack: f33c6000
> EIP: 0060:[] EFLAGS: 00010246 CPU: 1
> EIP is at new_curseg+0x2c9/0x2d0 [f2fs]
> EAX: 03f3 EBX: ee3e5000 ECX: 0400 EDX: 03f3
> ESI:  EDI: 0008 EBP: f33c78c4 ESP: f33c7890
>  DS: 007b ES: 007b FS: 00d8 GS:  SS: 0068
> CR0: 80050033 CR2: b6706000 CR3: 2e8d70c0 CR4: 000406f0
> Stack:
>  eb29480c 0004 f27a0290 03f3 0008 f33c78c0 0001 eb294800
>  0001  ee3e5000 f899dbb4 0290 f33c7924 f89926d6 c10b42b6
>  0246  eed513e8 0246 f33c78e8 c10b7b4b f33c7924 c178c304
> Call Trace:
>  [] allocate_segment_by_default+0x3c6/0x3d0 [f2fs]
>  [] allocate_data_block+0x13a/0x3c0 [f2fs]
>  [] do_write_page+0x8b/0x230 [f2fs]
>  [] write_node_page+0x20/0x30 [f2fs]
>  [] f2fs_write_node_page+0x1a6/0x340 [f2fs]
>  [] sync_node_pages+0x4a5/0x590 [f2fs]
>  [] write_checkpoint+0x218/0x720 [f2fs]
>  [] f2fs_gc+0x4cd/0x6b0 [f2fs]
>  [] f2fs_balance_fs+0x18e/0x1b0 [f2fs]
>  [] f2fs_write_data_page+0x197/0x6f0 [f2fs]
>  [] f2fs_write_data_pages+0x28e/0x7e0 [f2fs]
>  [] do_writepages+0x1d/0x40
>  [] __writeback_single_inode+0x55/0x7e0
>  [] writeback_sb_inodes+0x21b/0x490
>  [] wb_writeback+0xdc/0x590
>  [] wb_workfn+0xf8/0x690
>  [] process_one_work+0x1a1/0x580
>  [] worker_thread+0x102/0x440
>  [] kthread+0xa1/0xc0
>  [] ret_from_kernel_thread+0xe/0x24
> EIP: [] new_curseg+0x2c9/0x2d0 [f2fs] SS:ESP 0068:f33c7890
> 
> The reason is after f2fs enabled lazytime by default, when inode time is
> changed, we do not set this inode dirty through ->f2fs_dirty_inode, so
> itime updating will be delayed.
> 
> Finally it needs to update the dirty time of inode into inode page,
> and writeback the page, however, before that, we didn't count the inode
> as imeta data. So f2fs won't be aware of dirty metadata page count is
> exceeded watermark of GC, result in encountering panic when allocating
> free segment.
> 
> There is an easy way to produce this bug:
> 1. mount with lazytime option
> 2. fragment space
> 3. touch all files in the image
> 4. umount

I think modifying has_not_enough_secs() is enough like this.

---
 fs/f2fs/segment.h | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index fecb856..a6efb5c 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -471,11 +471,12 @@ static inline bool need_SSR(struct f2fs_sb_info *sbi)
 {
int node_secs = get_blocktype_secs(sbi, F2FS_DIRTY_NODES);
int dent_secs = get_blocktype_secs(sbi, F2FS_DIRTY_DENTS);
+   int imeta_secs = get_blocktype_secs(sbi, F2FS_DIRTY_IMETA);
 
if (test_opt(sbi, LFS))
return false;
 
-   return free_sections(sbi) <= (node_secs + 2 * dent_secs +
+   return free_sections(sbi) <= (node_secs + 2 * dent_secs + imeta_secs +
reserved_sections(sbi) + 1);
 }
 
@@ -484,6 +485,7 @@ static inline bool has_not_enough_free_secs(struct 
f2fs_sb_info *sbi,
 {
int node_secs = get_blocktype_secs(sbi, F2FS_DIRTY_NODES);
int dent_secs = get_blocktype_secs(sbi, F2FS_DIRTY_DENTS);
+   int imeta_secs = get_blocktype_secs(sbi, F2FS_DIRTY_IMETA);
 
node_secs += get_blocktype_secs(sbi, F2FS_DIRTY_IMETA);
 
@@ -491,7 +493,8 @@ static inline bool has_not_enough_free_secs(struct 
f2fs_sb_info *sbi,
return false;
 
return (free_sections(sbi) + freed) <=
-   (node_secs + 2 * dent_secs + reserved_sections(sbi) + needed);
+   (node_secs + 2 * dent_secs + imeta_secs +
+   reserved_sections(sbi) + needed);
 }
 
 static inline bool excess_prefree_segs(struct f2fs_sb_info *sbi)
-- 
2.8.3

Thanks,

> 
> Introduce itime data type and related flow to trace & flush delayed
> updating inode to fix this issue.
> 
> Signed-off-by: Chao Yu 
> ---
>  fs/f2fs/checkpoint.c | 37 
>  fs/f2fs/f2fs.h   |  5 +
>  fs/f2fs/file.c   |  1 +
>  fs/f2fs/namei.c  | 38 +
>  fs/f2fs/segment.h| 11 ++
>  fs/f2fs/super.c  | 59 
> +---
>  6 files changed, 139 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index d95fada..e27c64f 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -928,6 +928,43 @@ int f2fs_sync_inode_meta(struct f2fs_sb_info *sbi)
>   return 0;
>  }
>  
> +int f2fs_sync_dirty_itime(struct f2fs_sb_info *sbi)
> +{
> +