Re: [f2fs-dev] [PATCH v5 1/2] f2fs: fix to avoid broken of dnode block list

2018-07-28 Thread Chao Yu
On 2018/7/29 10:49, Jaegeuk Kim wrote:
> On 07/29, Chao Yu wrote:
>> On 2018/7/29 9:33, Jaegeuk Kim wrote:
>>> On 07/28, Chao Yu wrote:
 f2fs recovery flow is relying on dnode block link list, it means fsynced
 file recovery depends on previous dnode's persistence in the list, so
 during fsync() we should wait on all regular inode's dnode writebacked
 before issuing flush.

 By this way, we can avoid dnode block list being broken by out-of-order
 IO submission due to IO scheduler or driver.

 Sheng Yong helps to do the test with this patch:

 Target:/data (f2fs, -)
 64MB / 32768KB / 4KB / 8

 1 / PERSIST / Index

 Base:
SEQ-RD(MB/s)SEQ-WR(MB/s)RND-RD(IOPS)RND-WR(IOPS)
 Insert(TPS) Update(TPS) Delete(TPS)
 1  867.82  204.15  41440.0341370.54680.8   
 1025.94 1031.08
 2  871.87  205.87  41370.3 40275.2 791.14  
 1065.84 1101.7
 3  866.52  205.69  41795.6740596.16694.69  
 1037.16 1031.48
 Avg868.737 205.237 41535.3 40747.3 
 722.21  1042.98 1054.75

 After:
SEQ-RD(MB/s)SEQ-WR(MB/s)RND-RD(IOPS)RND-WR(IOPS)
 Insert(TPS) Update(TPS) Delete(TPS)
 1  798.81  202.5   41143   40613.87602.71  
 838.08  913.83
 2  805.79  206.47  40297.2 41291.46604.44  
 840.75  924.27
 3  814.83  206.17  41209.5740453.62602.85  
 834.66  927.91
 Avg806.477 205.047 40883.25667 40786.31667 
 603.333 837.83  922.003

 Patched/Original:
0.928332713 0.999074239 0.984300676 1.000957528 
 0.835398753 0.803303994 0.874141189
>>>
>>> I expect Sheng will provide more test results tho, at least it seems SEQ-RD
>>> in Base shows better than two After results, even if it doesn't matter with
>>> the issue. Please confirm it first in order for anybody to say there is no
>>> regression.
>>
>> Agreed.
> 
> Hmm, this patch breaks fault injection test where gives a panic in put_super
> having fsync_node_num.

Let me do the test and fix it.

Thanks,

> 
>>
>> Thanks,
>>
>>>

 It looks like atomic write will suffer performance regression.

 I suspect that the criminal is that we forcing to wait all dnode being in
 storage cache before we issue PREFLUSH+FUA.

 BTW, will commit ("f2fs: don't need to wait for node writes for atomic 
 write")
 cause the problem: we will lose data of last transaction after SPO, even if
 atomic write return no error:

 - atomic_open();
 - write() P1, P2, P3;
 - atomic_commit();
  - writeback data: P1, P2, P3;
  - writeback node: N1, N2, N3;  <--- If N1, N2 is not writebacked, N3 with 
 fsync_mark is
 writebacked, In SPOR, we won't find N3 since node chain is broken, turns 
 out that losing
 last transaction.
  - preflush + fua;
 - power-cut

 If we don't wait dnode writeback for atomic_write:

SEQ-RD(MB/s)SEQ-WR(MB/s)RND-RD(IOPS)RND-WR(IOPS)
 Insert(TPS) Update(TPS) Delete(TPS)
 1  779.91  206.03  41621.5 40333.16716.9   
 1038.21 1034.85
 2  848.51  204.35  40082.4439486.17791.83  
 1119.96 1083.77
 3  772.12  206.27  41335.2541599.65723.29  
 1055.07 971.92
 Avg800.18  205.55  41013.06333 40472.99333 
 744.007 1071.08 1030.18

 Patched/Original:
0.92108464  1.001526693 0.987425886 0.993268102 
 1.030180511 1.026942031 0.976702294

 SQLite's performance recovers.

 Jaegeuk:
 "Practically, I don't see db corruption becase of this. We can excuse to 
 lose
 the last transaction."

 Finally, we decide to keep original implementation of atomic write 
 interface
 sematics that we don't wait all dnode writeback before preflush+fua 
 submission.

 Tested-by: Sheng Yong 
 Signed-off-by: Chao Yu 
 ---
 v5:
 - add missing Tested-by.
 - fix f2fs_reset_fsync_node_info() to reset sbi->fsync_seg_id correctly.
  fs/f2fs/checkpoint.c |   2 +
  fs/f2fs/data.c   |   2 +
  fs/f2fs/f2fs.h   |  21 ++-
  fs/f2fs/file.c   |   5 +-
  fs/f2fs/node.c   | 144 +++
  fs/f2fs/super.c  |   4 ++
  6 files changed, 150 insertions(+), 28 deletions(-)


Re: [f2fs-dev] [PATCH v5 1/2] f2fs: fix to avoid broken of dnode block list

2018-07-28 Thread Jaegeuk Kim
On 07/29, Chao Yu wrote:
> On 2018/7/29 9:33, Jaegeuk Kim wrote:
> > On 07/28, Chao Yu wrote:
> >> f2fs recovery flow is relying on dnode block link list, it means fsynced
> >> file recovery depends on previous dnode's persistence in the list, so
> >> during fsync() we should wait on all regular inode's dnode writebacked
> >> before issuing flush.
> >>
> >> By this way, we can avoid dnode block list being broken by out-of-order
> >> IO submission due to IO scheduler or driver.
> >>
> >> Sheng Yong helps to do the test with this patch:
> >>
> >> Target:/data (f2fs, -)
> >> 64MB / 32768KB / 4KB / 8
> >>
> >> 1 / PERSIST / Index
> >>
> >> Base:
> >>SEQ-RD(MB/s)SEQ-WR(MB/s)RND-RD(IOPS)RND-WR(IOPS)
> >> Insert(TPS) Update(TPS) Delete(TPS)
> >> 1  867.82  204.15  41440.0341370.54680.8   
> >> 1025.94 1031.08
> >> 2  871.87  205.87  41370.3 40275.2 791.14  
> >> 1065.84 1101.7
> >> 3  866.52  205.69  41795.6740596.16694.69  
> >> 1037.16 1031.48
> >> Avg868.737 205.237 41535.3 40747.3 
> >> 722.21  1042.98 1054.75
> >>
> >> After:
> >>SEQ-RD(MB/s)SEQ-WR(MB/s)RND-RD(IOPS)RND-WR(IOPS)
> >> Insert(TPS) Update(TPS) Delete(TPS)
> >> 1  798.81  202.5   41143   40613.87602.71  
> >> 838.08  913.83
> >> 2  805.79  206.47  40297.2 41291.46604.44  
> >> 840.75  924.27
> >> 3  814.83  206.17  41209.5740453.62602.85  
> >> 834.66  927.91
> >> Avg806.477 205.047 40883.25667 40786.31667 
> >> 603.333 837.83  922.003
> >>
> >> Patched/Original:
> >>0.928332713 0.999074239 0.984300676 1.000957528 
> >> 0.835398753 0.803303994 0.874141189
> > 
> > I expect Sheng will provide more test results tho, at least it seems SEQ-RD
> > in Base shows better than two After results, even if it doesn't matter with
> > the issue. Please confirm it first in order for anybody to say there is no
> > regression.
> 
> Agreed.

Hmm, this patch breaks fault injection test where gives a panic in put_super
having fsync_node_num.

> 
> Thanks,
> 
> > 
> >>
> >> It looks like atomic write will suffer performance regression.
> >>
> >> I suspect that the criminal is that we forcing to wait all dnode being in
> >> storage cache before we issue PREFLUSH+FUA.
> >>
> >> BTW, will commit ("f2fs: don't need to wait for node writes for atomic 
> >> write")
> >> cause the problem: we will lose data of last transaction after SPO, even if
> >> atomic write return no error:
> >>
> >> - atomic_open();
> >> - write() P1, P2, P3;
> >> - atomic_commit();
> >>  - writeback data: P1, P2, P3;
> >>  - writeback node: N1, N2, N3;  <--- If N1, N2 is not writebacked, N3 with 
> >> fsync_mark is
> >> writebacked, In SPOR, we won't find N3 since node chain is broken, turns 
> >> out that losing
> >> last transaction.
> >>  - preflush + fua;
> >> - power-cut
> >>
> >> If we don't wait dnode writeback for atomic_write:
> >>
> >>SEQ-RD(MB/s)SEQ-WR(MB/s)RND-RD(IOPS)RND-WR(IOPS)
> >> Insert(TPS) Update(TPS) Delete(TPS)
> >> 1  779.91  206.03  41621.5 40333.16716.9   
> >> 1038.21 1034.85
> >> 2  848.51  204.35  40082.4439486.17791.83  
> >> 1119.96 1083.77
> >> 3  772.12  206.27  41335.2541599.65723.29  
> >> 1055.07 971.92
> >> Avg800.18  205.55  41013.06333 40472.99333 
> >> 744.007 1071.08 1030.18
> >>
> >> Patched/Original:
> >>0.92108464  1.001526693 0.987425886 0.993268102 
> >> 1.030180511 1.026942031 0.976702294
> >>
> >> SQLite's performance recovers.
> >>
> >> Jaegeuk:
> >> "Practically, I don't see db corruption becase of this. We can excuse to 
> >> lose
> >> the last transaction."
> >>
> >> Finally, we decide to keep original implementation of atomic write 
> >> interface
> >> sematics that we don't wait all dnode writeback before preflush+fua 
> >> submission.
> >>
> >> Tested-by: Sheng Yong 
> >> Signed-off-by: Chao Yu 
> >> ---
> >> v5:
> >> - add missing Tested-by.
> >> - fix f2fs_reset_fsync_node_info() to reset sbi->fsync_seg_id correctly.
> >>  fs/f2fs/checkpoint.c |   2 +
> >>  fs/f2fs/data.c   |   2 +
> >>  fs/f2fs/f2fs.h   |  21 ++-
> >>  fs/f2fs/file.c   |   5 +-
> >>  fs/f2fs/node.c   | 144 +++
> >>  fs/f2fs/super.c  |   4 ++
> >>  6 files changed, 150 insertions(+), 28 deletions(-)
> >>
> >> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> >> index 

Re: [f2fs-dev] [PATCH v5 1/2] f2fs: fix to avoid broken of dnode block list

2018-07-28 Thread Chao Yu
On 2018/7/29 9:33, Jaegeuk Kim wrote:
> On 07/28, Chao Yu wrote:
>> f2fs recovery flow is relying on dnode block link list, it means fsynced
>> file recovery depends on previous dnode's persistence in the list, so
>> during fsync() we should wait on all regular inode's dnode writebacked
>> before issuing flush.
>>
>> By this way, we can avoid dnode block list being broken by out-of-order
>> IO submission due to IO scheduler or driver.
>>
>> Sheng Yong helps to do the test with this patch:
>>
>> Target:/data (f2fs, -)
>> 64MB / 32768KB / 4KB / 8
>>
>> 1 / PERSIST / Index
>>
>> Base:
>>  SEQ-RD(MB/s)SEQ-WR(MB/s)RND-RD(IOPS)RND-WR(IOPS)
>> Insert(TPS) Update(TPS) Delete(TPS)
>> 1867.82  204.15  41440.0341370.54680.8   
>> 1025.94 1031.08
>> 2871.87  205.87  41370.3 40275.2 791.14  
>> 1065.84 1101.7
>> 3866.52  205.69  41795.6740596.16694.69  
>> 1037.16 1031.48
>> Avg  868.737 205.237 41535.3 40747.3 722.21  
>> 1042.98 1054.75
>>
>> After:
>>  SEQ-RD(MB/s)SEQ-WR(MB/s)RND-RD(IOPS)RND-WR(IOPS)
>> Insert(TPS) Update(TPS) Delete(TPS)
>> 1798.81  202.5   41143   40613.87602.71  
>> 838.08  913.83
>> 2805.79  206.47  40297.2 41291.46604.44  
>> 840.75  924.27
>> 3814.83  206.17  41209.5740453.62602.85  
>> 834.66  927.91
>> Avg  806.477 205.047 40883.25667 40786.31667 
>> 603.333 837.83  922.003
>>
>> Patched/Original:
>>  0.928332713 0.999074239 0.984300676 1.000957528 
>> 0.835398753 0.803303994 0.874141189
> 
> I expect Sheng will provide more test results tho, at least it seems SEQ-RD
> in Base shows better than two After results, even if it doesn't matter with
> the issue. Please confirm it first in order for anybody to say there is no
> regression.

Agreed.

Thanks,

> 
>>
>> It looks like atomic write will suffer performance regression.
>>
>> I suspect that the criminal is that we forcing to wait all dnode being in
>> storage cache before we issue PREFLUSH+FUA.
>>
>> BTW, will commit ("f2fs: don't need to wait for node writes for atomic 
>> write")
>> cause the problem: we will lose data of last transaction after SPO, even if
>> atomic write return no error:
>>
>> - atomic_open();
>> - write() P1, P2, P3;
>> - atomic_commit();
>>  - writeback data: P1, P2, P3;
>>  - writeback node: N1, N2, N3;  <--- If N1, N2 is not writebacked, N3 with 
>> fsync_mark is
>> writebacked, In SPOR, we won't find N3 since node chain is broken, turns out 
>> that losing
>> last transaction.
>>  - preflush + fua;
>> - power-cut
>>
>> If we don't wait dnode writeback for atomic_write:
>>
>>  SEQ-RD(MB/s)SEQ-WR(MB/s)RND-RD(IOPS)RND-WR(IOPS)
>> Insert(TPS) Update(TPS) Delete(TPS)
>> 1779.91  206.03  41621.5 40333.16716.9   
>> 1038.21 1034.85
>> 2848.51  204.35  40082.4439486.17791.83  
>> 1119.96 1083.77
>> 3772.12  206.27  41335.2541599.65723.29  
>> 1055.07 971.92
>> Avg  800.18  205.55  41013.06333 40472.99333 
>> 744.007 1071.08 1030.18
>>
>> Patched/Original:
>>  0.92108464  1.001526693 0.987425886 0.993268102 
>> 1.030180511 1.026942031 0.976702294
>>
>> SQLite's performance recovers.
>>
>> Jaegeuk:
>> "Practically, I don't see db corruption becase of this. We can excuse to lose
>> the last transaction."
>>
>> Finally, we decide to keep original implementation of atomic write interface
>> sematics that we don't wait all dnode writeback before preflush+fua 
>> submission.
>>
>> Tested-by: Sheng Yong 
>> Signed-off-by: Chao Yu 
>> ---
>> v5:
>> - add missing Tested-by.
>> - fix f2fs_reset_fsync_node_info() to reset sbi->fsync_seg_id correctly.
>>  fs/f2fs/checkpoint.c |   2 +
>>  fs/f2fs/data.c   |   2 +
>>  fs/f2fs/f2fs.h   |  21 ++-
>>  fs/f2fs/file.c   |   5 +-
>>  fs/f2fs/node.c   | 144 +++
>>  fs/f2fs/super.c  |   4 ++
>>  6 files changed, 150 insertions(+), 28 deletions(-)
>>
>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>> index 581710760ba6..2136430f9f0d 100644
>> --- a/fs/f2fs/checkpoint.c
>> +++ b/fs/f2fs/checkpoint.c
>> @@ -1410,6 +1410,8 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, 
>> struct cp_control *cpc)
>>  
>>  f2fs_release_ino_entry(sbi, false);
>>  
>> +f2fs_reset_fsync_node_info(sbi);
>> +
>>  clear_sbi_flag(sbi, SBI_IS_DIRTY);
>>  clear_sbi_flag(sbi, SBI_NEED_CP);
>>  

Re: [f2fs-dev] [PATCH v5 1/2] f2fs: fix to avoid broken of dnode block list

2018-07-28 Thread Jaegeuk Kim
On 07/28, Chao Yu wrote:
> f2fs recovery flow is relying on dnode block link list, it means fsynced
> file recovery depends on previous dnode's persistence in the list, so
> during fsync() we should wait on all regular inode's dnode writebacked
> before issuing flush.
> 
> By this way, we can avoid dnode block list being broken by out-of-order
> IO submission due to IO scheduler or driver.
> 
> Sheng Yong helps to do the test with this patch:
> 
> Target:/data (f2fs, -)
> 64MB / 32768KB / 4KB / 8
> 
> 1 / PERSIST / Index
> 
> Base:
>   SEQ-RD(MB/s)SEQ-WR(MB/s)RND-RD(IOPS)RND-WR(IOPS)
> Insert(TPS) Update(TPS) Delete(TPS)
> 1 867.82  204.15  41440.0341370.54680.8   
> 1025.94 1031.08
> 2 871.87  205.87  41370.3 40275.2 791.14  
> 1065.84 1101.7
> 3 866.52  205.69  41795.6740596.16694.69  
> 1037.16 1031.48
> Avg   868.737 205.237 41535.3 40747.3 722.21  
> 1042.98 1054.75
> 
> After:
>   SEQ-RD(MB/s)SEQ-WR(MB/s)RND-RD(IOPS)RND-WR(IOPS)
> Insert(TPS) Update(TPS) Delete(TPS)
> 1 798.81  202.5   41143   40613.87602.71  
> 838.08  913.83
> 2 805.79  206.47  40297.2 41291.46604.44  
> 840.75  924.27
> 3 814.83  206.17  41209.5740453.62602.85  
> 834.66  927.91
> Avg   806.477 205.047 40883.25667 40786.31667 
> 603.333 837.83  922.003
> 
> Patched/Original:
>   0.928332713 0.999074239 0.984300676 1.000957528 
> 0.835398753 0.803303994 0.874141189

I expect Sheng will provide more test results tho, at least it seems SEQ-RD
in Base shows better than two After results, even if it doesn't matter with
the issue. Please confirm it first in order for anybody to say there is no
regression.

> 
> It looks like atomic write will suffer performance regression.
> 
> I suspect that the criminal is that we forcing to wait all dnode being in
> storage cache before we issue PREFLUSH+FUA.
> 
> BTW, will commit ("f2fs: don't need to wait for node writes for atomic write")
> cause the problem: we will lose data of last transaction after SPO, even if
> atomic write return no error:
> 
> - atomic_open();
> - write() P1, P2, P3;
> - atomic_commit();
>  - writeback data: P1, P2, P3;
>  - writeback node: N1, N2, N3;  <--- If N1, N2 is not writebacked, N3 with 
> fsync_mark is
> writebacked, In SPOR, we won't find N3 since node chain is broken, turns out 
> that losing
> last transaction.
>  - preflush + fua;
> - power-cut
> 
> If we don't wait dnode writeback for atomic_write:
> 
>   SEQ-RD(MB/s)SEQ-WR(MB/s)RND-RD(IOPS)RND-WR(IOPS)
> Insert(TPS) Update(TPS) Delete(TPS)
> 1 779.91  206.03  41621.5 40333.16716.9   
> 1038.21 1034.85
> 2 848.51  204.35  40082.4439486.17791.83  
> 1119.96 1083.77
> 3 772.12  206.27  41335.2541599.65723.29  
> 1055.07 971.92
> Avg   800.18  205.55  41013.06333 40472.99333 
> 744.007 1071.08 1030.18
> 
> Patched/Original:
>   0.92108464  1.001526693 0.987425886 0.993268102 
> 1.030180511 1.026942031 0.976702294
> 
> SQLite's performance recovers.
> 
> Jaegeuk:
> "Practically, I don't see db corruption becase of this. We can excuse to lose
> the last transaction."
> 
> Finally, we decide to keep original implementation of atomic write interface
> sematics that we don't wait all dnode writeback before preflush+fua 
> submission.
> 
> Tested-by: Sheng Yong 
> Signed-off-by: Chao Yu 
> ---
> v5:
> - add missing Tested-by.
> - fix f2fs_reset_fsync_node_info() to reset sbi->fsync_seg_id correctly.
>  fs/f2fs/checkpoint.c |   2 +
>  fs/f2fs/data.c   |   2 +
>  fs/f2fs/f2fs.h   |  21 ++-
>  fs/f2fs/file.c   |   5 +-
>  fs/f2fs/node.c   | 144 +++
>  fs/f2fs/super.c  |   4 ++
>  6 files changed, 150 insertions(+), 28 deletions(-)
> 
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index 581710760ba6..2136430f9f0d 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -1410,6 +1410,8 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, 
> struct cp_control *cpc)
>  
>   f2fs_release_ino_entry(sbi, false);
>  
> + f2fs_reset_fsync_node_info(sbi);
> +
>   clear_sbi_flag(sbi, SBI_IS_DIRTY);
>   clear_sbi_flag(sbi, SBI_NEED_CP);
>   __set_cp_next_pack(sbi);
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 6b8ca5011bfd..572c91e43337 100644
> --- a/fs/f2fs/data.c
> +++