[f2fs-dev] [PATCH v2] f2fs: fix to avoid accessing uninitialized field of inode page in is_alive()

2019-09-09 Thread Chao Yu
If inode is newly created, inode page may not synchronize with inode cache,
so fields like .i_inline or .i_extra_isize could be wrong, in below call
path, we may access such wrong fields, result in failing to migrate valid
target block.

Thread AThread B
- f2fs_create
 - f2fs_add_link
  - f2fs_add_dentry
   - f2fs_init_inode_metadata
- f2fs_add_inline_entry
 - f2fs_new_inode_page
 - f2fs_put_page
 : inode page wasn't updated with inode cache
- gc_data_segment
 - is_alive
  - f2fs_get_node_page
  - datablock_addr
   - offset_in_addr
   : access uninitialized fields

Fixes: 7a2af766af15 ("f2fs: enhance on-disk inode structure scalability")
Signed-off-by: Chao Yu 
---
v2:
- update inode page before f2fs_put_page()
 fs/f2fs/dir.c| 5 +
 fs/f2fs/inline.c | 5 +
 2 files changed, 10 insertions(+)

diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index 7afbf8f5ab08..4033778bcbbf 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -682,6 +682,11 @@ int f2fs_add_regular_entry(struct inode *dir, const struct 
qstr *new_name,
 
if (inode) {
f2fs_i_pino_write(inode, dir->i_ino);
+
+   /* synchronize inode page's data from inode cache */
+   if (is_inode_flag_set(inode, FI_NEW_INODE))
+   f2fs_update_inode(inode, page);
+
f2fs_put_page(page, 1);
}
 
diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c
index 16ebdd4d1f2c..896db0416f0e 100644
--- a/fs/f2fs/inline.c
+++ b/fs/f2fs/inline.c
@@ -589,6 +589,11 @@ int f2fs_add_inline_entry(struct inode *dir, const struct 
qstr *new_name,
/* we don't need to mark_inode_dirty now */
if (inode) {
f2fs_i_pino_write(inode, dir->i_ino);
+
+   /* synchronize inode page's data from inode cache */
+   if (is_inode_flag_set(inode, FI_NEW_INODE))
+   f2fs_update_inode(inode, page);
+
f2fs_put_page(page, 1);
}
 
-- 
2.18.0.rc1



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs: fix to avoid accessing uninitialized field of inode page in is_alive()

2019-09-09 Thread Chao Yu
On 2019/9/9 22:37, Jaegeuk Kim wrote:
> On 09/09, Chao Yu wrote:
>> On 2019/9/9 17:33, Jaegeuk Kim wrote:
>>> On 09/09, Chao Yu wrote:
 On 2019/9/9 16:37, Jaegeuk Kim wrote:
> On 09/09, Chao Yu wrote:
>> On 2019/9/9 15:58, Chao Yu wrote:
>>> On 2019/9/9 15:44, Jaegeuk Kim wrote:
 On 09/07, Chao Yu wrote:
> On 2019-9-7 7:48, Jaegeuk Kim wrote:
>> On 09/06, Chao Yu wrote:
>>> If inode is newly created, inode page may not synchronize with 
>>> inode cache,
>>> so fields like .i_inline or .i_extra_isize could be wrong, in below 
>>> call
>>> path, we may access such wrong fields, result in failing to migrate 
>>> valid
>>> target block.
>>
>> If data is valid, how can we get new inode page?

 Let me rephrase the question. If inode is newly created, is this data 
 block
 really valid to move in GC?
>>>
>>> I guess it's valid, let double check that.
>>
>> We can see inode page:
>>
>> - f2fs_create
>>  - f2fs_add_link
>>   - f2fs_add_dentry
>>- f2fs_init_inode_metadata
>> - f2fs_add_inline_entry
>>  - ipage = f2fs_new_inode_page
>>  - f2fs_put_page(ipage)   < after this
>
> Can you print out how many block was assigned to this inode?
> 
> Can we update inode before finally putting ipage?

Agreed.

Thanks,

> 

 Add log like this:

if (!test_and_set_bit(segno, SIT_I(sbi)->invalid_segmap)) {
if (is_inode) {
for (i = 0; i < 923 - 50; i++) {
__le32 *base = blkaddr_in_node(node);
unsigned ofs = offset_in_addr(inode);

printk("i:%u, addr:%x\n", i,
le32_to_cpu(*(base + i)));
}
printk("i_inline: %u\n", inode->i_inline);
}

 It shows:
 ...
 i:10, addr:e66a
 ...
 i:46, addr:e66c
 i:47, addr:e66d
 i:48, addr:e66e
 i:49, addr:e66f
 i:50, addr:e670
 i:51, addr:e671
 i:52, addr:e672
 i:53, addr:e673
 i:54, addr:e674
 i:55, addr:e675
 i:56, addr:e676
 ...
 i:140, addr:2c35<--- we want to migrate this block, however, without 
 correct
 .i_inline and .i_extra_isize value, we can just find i_addr[i:140-6] = 
 NULL_ADDR
>>>
>>> So, the theory is the block is indeed valid and the address was updated 
>>> before
>>> write_inode()?
>>
>> I guess so. :)
>>
>> Thanks,
>>
>>>
 i:141, addr:2c38
 i:142, addr:2c39
 i:143, addr:2c3b
 i:144, addr:2c3e
 i:145, addr:2c40
 i:146, addr:2c44
 i:147, addr:2c48
 i:148, addr:2c4a
 i:149, addr:2c4c
 i:150, addr:2c4f
 i:151, addr:2c59
 i:152, addr:2c5d
 ...
 i:188, addr:e677
 i:189, addr:e678
 i:190, addr:e679
 i:191, addr:e67a
 i:192, addr:e67b
 i:193, addr:e67c
 i:194, addr:e67d
 i:195, addr:e67e
 i:196, addr:e67f
 i:197, addr:e680
 i:198, addr:
 i:199, addr:
 i:200, addr:
 i:201, addr:
 i:202, addr:
 i:203, addr:
 i:204, addr:
 i:205, addr:
 i:206, addr:
 i:207, addr:
 i:208, addr:
 i:209, addr:
 i:210, addr:
 i:211, addr:
 i:212, addr:
 i:213, addr:
 i:214, addr:
 i:215, addr:
 i:216, addr:
 i:217, addr:
 i:218, addr:
 i:219, addr:
 i:220, addr:
 i:221, addr:
 i:222, addr:
 i:223, addr:
 i:224, addr:
 i:225, addr:
 i:226, addr:
 i:227, addr:
 i:228, addr:
 i:229, addr:
 i:230, addr:
 i:231, addr:
 i:232, addr:
 i:233, addr:
 i:234, addr:b032
 i:235, addr:b033
 i:236, addr:b034
 i:237, addr:b035
 i:238, addr:b036
 i:239, addr:b038
 ...
 i:283, addr:e681
 ...
 i_inline: 0

 F2FS-fs (zram1): summary nid: 360, ofs: 134, ver: 0
 F2FS-fs (zram1): blkaddr 2c35 (blkaddr in node 0) <-blkaddr in node is 
 NULL_ADDR
 F2FS-fs (zram1): expect: seg 14, ofs_in_seg: 53
 F2FS-fs (zram1): real: seg 4294967295, ofs_in_seg: 0
 F2FS-fs (zram1): ofs: 53, 0
 F2FS-fs (zram1): node info ino:360, nid:360, nofs:0
 F2FS-fs (zram1): ofs_in_addr: 0
 F2FS-fs (zram1): end 

>
>>
>>>

>
> is_alive()
> {
> ...
>   node_page = f2fs_get_node_page(sbi, nid);  <--- inode page

 Aren't we seeing the below version 

Re: [f2fs-dev] [PATCH 2/2] f2fs: avoid infinite GC loop due to stale atomic files

2019-09-09 Thread Chao Yu
On 2019/9/9 22:34, Jaegeuk Kim wrote:
> On 09/09, Chao Yu wrote:
>> On 2019/9/9 16:38, Jaegeuk Kim wrote:
>>> On 09/09, Chao Yu wrote:
 On 2019/9/9 16:21, Jaegeuk Kim wrote:
> On 09/09, Chao Yu wrote:
>> On 2019/9/9 16:01, Jaegeuk Kim wrote:
>>> On 09/09, Chao Yu wrote:
 On 2019/9/9 15:30, Jaegeuk Kim wrote:
> On 09/09, Chao Yu wrote:
>> On 2019/9/9 9:25, Jaegeuk Kim wrote:
>>> If committing atomic pages is failed when doing 
>>> f2fs_do_sync_file(), we can
>>> get commited pages but atomic_file being still set like:
>>>
>>> - inmem:0, atomic IO:4 (Max.   10), volatile IO:0 (Max. 
>>>0)
>>>
>>> If GC selects this block, we can get an infinite loop like this:
>>>
>>> f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 
>>> 0x2359a8, oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type 
>>> = COLD_DATA
>>> f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, 
>>> sector = 18533696, size = 4096
>>> f2fs_get_victim: dev = (253,7), type = No TYPE, policy = 
>>> (Foreground GC, LFS-mode, Greedy), victim = 4355, cost = 1, 
>>> ofs_unit = 1, pre_victim_secno = 4355, prefree = 0, free = 234
>>> f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, 
>>> i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
>>> f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 
>>> 0x2359a8, oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type 
>>> = COLD_DATA
>>> f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, 
>>> sector = 18533696, size = 4096
>>> f2fs_get_victim: dev = (253,7), type = No TYPE, policy = 
>>> (Foreground GC, LFS-mode, Greedy), victim = 4355, cost = 1, 
>>> ofs_unit = 1, pre_victim_secno = 4355, prefree = 0, free = 234
>>> f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, 
>>> i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
>>>
>>> In that moment, we can observe:
>>>
>>> [Before]
>>> Try to move 5084219 blocks (BG: 384508)
>>>   - data blocks : 4962373 (274483)
>>>   - node blocks : 121846 (110025)
>>> Skipped : atomic write 4534686 (10)
>>>
>>> [After]
>>> Try to move 5088973 blocks (BG: 384508)
>>>   - data blocks : 4967127 (274483)
>>>   - node blocks : 121846 (110025)
>>> Skipped : atomic write 4539440 (10)
>>>
>>> Signed-off-by: Jaegeuk Kim 
>>> ---
>>>  fs/f2fs/file.c | 10 +-
>>>  1 file changed, 5 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>>> index 7ae2f3bd8c2f..68b6da734e5f 100644
>>> --- a/fs/f2fs/file.c
>>> +++ b/fs/f2fs/file.c
>>> @@ -1997,11 +1997,11 @@ static int 
>>> f2fs_ioc_commit_atomic_write(struct file *filp)
>>> goto err_out;
>>>  
>>> ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 0, true);
>>> -   if (!ret) {
>>> -   clear_inode_flag(inode, FI_ATOMIC_FILE);
>>> -   F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC] 
>>> = 0;
>>> -   stat_dec_atomic_write(inode);
>>> -   }
>>> +
>>> +   /* doesn't need to check error */
>>> +   clear_inode_flag(inode, FI_ATOMIC_FILE);
>>> +   F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
>>> +   stat_dec_atomic_write(inode);
>>
>> If there are still valid atomic write pages linked in .inmem_pages, 
>> it may cause
>> memory leak when we just clear FI_ATOMIC_FILE flag.
>
> f2fs_commit_inmem_pages() should have flushed them.

 Oh, we failed to flush its nodes.

 However we won't clear such info if we failed to flush inmen pages, it 
 looks
 inconsistent.

 Any interface needed to drop inmem pages or clear ATOMIC_FILE flag in 
 that two
 error path? I'm not very clear how sqlite handle such error.
>>>
>>> f2fs_drop_inmem_pages() did that, but not in this case.
>>
>> What I mean is, for any error returned from atomic_commit() interface, 
>> should
>> userspace application handle it with consistent way, like trigger
>> f2fs_drop_inmem_pages(), so we don't need to handle it inside 
>> atomic_commit().
>
> f2fs_ioc_abort_volatile_write() will be triggered.

 If userspace can do this, we can get rid of this patch, or am I missing 
 sth?
>>>
>>> We don't know when that will come. And, other threads are waiting for GC 

Re: [PATCH] f2fs: fix to avoid accessing uninitialized field of inode page in is_alive()

2019-09-09 Thread Jaegeuk Kim
On 09/09, Chao Yu wrote:
> On 2019/9/9 17:33, Jaegeuk Kim wrote:
> > On 09/09, Chao Yu wrote:
> >> On 2019/9/9 16:37, Jaegeuk Kim wrote:
> >>> On 09/09, Chao Yu wrote:
>  On 2019/9/9 15:58, Chao Yu wrote:
> > On 2019/9/9 15:44, Jaegeuk Kim wrote:
> >> On 09/07, Chao Yu wrote:
> >>> On 2019-9-7 7:48, Jaegeuk Kim wrote:
>  On 09/06, Chao Yu wrote:
> > If inode is newly created, inode page may not synchronize with 
> > inode cache,
> > so fields like .i_inline or .i_extra_isize could be wrong, in below 
> > call
> > path, we may access such wrong fields, result in failing to migrate 
> > valid
> > target block.
> 
>  If data is valid, how can we get new inode page?
> >>
> >> Let me rephrase the question. If inode is newly created, is this data 
> >> block
> >> really valid to move in GC?
> >
> > I guess it's valid, let double check that.
> 
>  We can see inode page:
> 
>  - f2fs_create
>   - f2fs_add_link
>    - f2fs_add_dentry
> - f2fs_init_inode_metadata
>  - f2fs_add_inline_entry
>   - ipage = f2fs_new_inode_page
>   - f2fs_put_page(ipage)   < after this
> >>>
> >>> Can you print out how many block was assigned to this inode?

Can we update inode before finally putting ipage?

> >>
> >> Add log like this:
> >>
> >>if (!test_and_set_bit(segno, SIT_I(sbi)->invalid_segmap)) {
> >>if (is_inode) {
> >>for (i = 0; i < 923 - 50; i++) {
> >>__le32 *base = blkaddr_in_node(node);
> >>unsigned ofs = offset_in_addr(inode);
> >>
> >>printk("i:%u, addr:%x\n", i,
> >>le32_to_cpu(*(base + i)));
> >>}
> >>printk("i_inline: %u\n", inode->i_inline);
> >>}
> >>
> >> It shows:
> >> ...
> >> i:10, addr:e66a
> >> ...
> >> i:46, addr:e66c
> >> i:47, addr:e66d
> >> i:48, addr:e66e
> >> i:49, addr:e66f
> >> i:50, addr:e670
> >> i:51, addr:e671
> >> i:52, addr:e672
> >> i:53, addr:e673
> >> i:54, addr:e674
> >> i:55, addr:e675
> >> i:56, addr:e676
> >> ...
> >> i:140, addr:2c35<--- we want to migrate this block, however, without 
> >> correct
> >> .i_inline and .i_extra_isize value, we can just find i_addr[i:140-6] = 
> >> NULL_ADDR
> > 
> > So, the theory is the block is indeed valid and the address was updated 
> > before
> > write_inode()?
> 
> I guess so. :)
> 
> Thanks,
> 
> > 
> >> i:141, addr:2c38
> >> i:142, addr:2c39
> >> i:143, addr:2c3b
> >> i:144, addr:2c3e
> >> i:145, addr:2c40
> >> i:146, addr:2c44
> >> i:147, addr:2c48
> >> i:148, addr:2c4a
> >> i:149, addr:2c4c
> >> i:150, addr:2c4f
> >> i:151, addr:2c59
> >> i:152, addr:2c5d
> >> ...
> >> i:188, addr:e677
> >> i:189, addr:e678
> >> i:190, addr:e679
> >> i:191, addr:e67a
> >> i:192, addr:e67b
> >> i:193, addr:e67c
> >> i:194, addr:e67d
> >> i:195, addr:e67e
> >> i:196, addr:e67f
> >> i:197, addr:e680
> >> i:198, addr:
> >> i:199, addr:
> >> i:200, addr:
> >> i:201, addr:
> >> i:202, addr:
> >> i:203, addr:
> >> i:204, addr:
> >> i:205, addr:
> >> i:206, addr:
> >> i:207, addr:
> >> i:208, addr:
> >> i:209, addr:
> >> i:210, addr:
> >> i:211, addr:
> >> i:212, addr:
> >> i:213, addr:
> >> i:214, addr:
> >> i:215, addr:
> >> i:216, addr:
> >> i:217, addr:
> >> i:218, addr:
> >> i:219, addr:
> >> i:220, addr:
> >> i:221, addr:
> >> i:222, addr:
> >> i:223, addr:
> >> i:224, addr:
> >> i:225, addr:
> >> i:226, addr:
> >> i:227, addr:
> >> i:228, addr:
> >> i:229, addr:
> >> i:230, addr:
> >> i:231, addr:
> >> i:232, addr:
> >> i:233, addr:
> >> i:234, addr:b032
> >> i:235, addr:b033
> >> i:236, addr:b034
> >> i:237, addr:b035
> >> i:238, addr:b036
> >> i:239, addr:b038
> >> ...
> >> i:283, addr:e681
> >> ...
> >> i_inline: 0
> >>
> >> F2FS-fs (zram1): summary nid: 360, ofs: 134, ver: 0
> >> F2FS-fs (zram1): blkaddr 2c35 (blkaddr in node 0) <-blkaddr in node is 
> >> NULL_ADDR
> >> F2FS-fs (zram1): expect: seg 14, ofs_in_seg: 53
> >> F2FS-fs (zram1): real: seg 4294967295, ofs_in_seg: 0
> >> F2FS-fs (zram1): ofs: 53, 0
> >> F2FS-fs (zram1): node info ino:360, nid:360, nofs:0
> >> F2FS-fs (zram1): ofs_in_addr: 0
> >> F2FS-fs (zram1): end 
> >>
> >>>
> 
> >
> >>
> >>>
> >>> is_alive()
> >>> {
> >>> ...
> >>>   node_page = f2fs_get_node_page(sbi, nid);  <--- inode page
> >>
> >> Aren't we seeing the below version warnings?
> >>
> >> if (sum->version != dni->version) {
> 

Re: [f2fs-dev] [PATCH 2/2] f2fs: avoid infinite GC loop due to stale atomic files

2019-09-09 Thread Jaegeuk Kim
On 09/09, Chao Yu wrote:
> On 2019/9/9 16:38, Jaegeuk Kim wrote:
> > On 09/09, Chao Yu wrote:
> >> On 2019/9/9 16:21, Jaegeuk Kim wrote:
> >>> On 09/09, Chao Yu wrote:
>  On 2019/9/9 16:01, Jaegeuk Kim wrote:
> > On 09/09, Chao Yu wrote:
> >> On 2019/9/9 15:30, Jaegeuk Kim wrote:
> >>> On 09/09, Chao Yu wrote:
>  On 2019/9/9 9:25, Jaegeuk Kim wrote:
> > If committing atomic pages is failed when doing 
> > f2fs_do_sync_file(), we can
> > get commited pages but atomic_file being still set like:
> >
> > - inmem:0, atomic IO:4 (Max.   10), volatile IO:0 (Max. 
> >0)
> >
> > If GC selects this block, we can get an infinite loop like this:
> >
> > f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 
> > 0x2359a8, oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type 
> > = COLD_DATA
> > f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, 
> > sector = 18533696, size = 4096
> > f2fs_get_victim: dev = (253,7), type = No TYPE, policy = 
> > (Foreground GC, LFS-mode, Greedy), victim = 4355, cost = 1, 
> > ofs_unit = 1, pre_victim_secno = 4355, prefree = 0, free = 234
> > f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, 
> > i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
> > f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 
> > 0x2359a8, oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type 
> > = COLD_DATA
> > f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, 
> > sector = 18533696, size = 4096
> > f2fs_get_victim: dev = (253,7), type = No TYPE, policy = 
> > (Foreground GC, LFS-mode, Greedy), victim = 4355, cost = 1, 
> > ofs_unit = 1, pre_victim_secno = 4355, prefree = 0, free = 234
> > f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, 
> > i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
> >
> > In that moment, we can observe:
> >
> > [Before]
> > Try to move 5084219 blocks (BG: 384508)
> >   - data blocks : 4962373 (274483)
> >   - node blocks : 121846 (110025)
> > Skipped : atomic write 4534686 (10)
> >
> > [After]
> > Try to move 5088973 blocks (BG: 384508)
> >   - data blocks : 4967127 (274483)
> >   - node blocks : 121846 (110025)
> > Skipped : atomic write 4539440 (10)
> >
> > Signed-off-by: Jaegeuk Kim 
> > ---
> >  fs/f2fs/file.c | 10 +-
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > index 7ae2f3bd8c2f..68b6da734e5f 100644
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -1997,11 +1997,11 @@ static int 
> > f2fs_ioc_commit_atomic_write(struct file *filp)
> > goto err_out;
> >  
> > ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 0, true);
> > -   if (!ret) {
> > -   clear_inode_flag(inode, FI_ATOMIC_FILE);
> > -   F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC] 
> > = 0;
> > -   stat_dec_atomic_write(inode);
> > -   }
> > +
> > +   /* doesn't need to check error */
> > +   clear_inode_flag(inode, FI_ATOMIC_FILE);
> > +   F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
> > +   stat_dec_atomic_write(inode);
> 
>  If there are still valid atomic write pages linked in .inmem_pages, 
>  it may cause
>  memory leak when we just clear FI_ATOMIC_FILE flag.
> >>>
> >>> f2fs_commit_inmem_pages() should have flushed them.
> >>
> >> Oh, we failed to flush its nodes.
> >>
> >> However we won't clear such info if we failed to flush inmen pages, it 
> >> looks
> >> inconsistent.
> >>
> >> Any interface needed to drop inmem pages or clear ATOMIC_FILE flag in 
> >> that two
> >> error path? I'm not very clear how sqlite handle such error.
> >
> > f2fs_drop_inmem_pages() did that, but not in this case.
> 
>  What I mean is, for any error returned from atomic_commit() interface, 
>  should
>  userspace application handle it with consistent way, like trigger
>  f2fs_drop_inmem_pages(), so we don't need to handle it inside 
>  atomic_commit().
> >>>
> >>> f2fs_ioc_abort_volatile_write() will be triggered.
> >>
> >> If userspace can do this, we can get rid of this patch, or am I missing 
> >> sth?
> > 
> > We don't know when that will come. And, other threads are waiting for GC 
> > here.
> 
> Yes, however, even 

Re: [f2fs-dev] [PATCH 1/2] f2fs: do not select same victim right again

2019-09-09 Thread Jaegeuk Kim
On 09/09, Chao Yu wrote:
> On 2019/9/9 16:06, Jaegeuk Kim wrote:
> > On 09/09, Chao Yu wrote:
> >> On 2019/9/9 9:25, Jaegeuk Kim wrote:
> >>> GC must avoid select the same victim again.
> >>
> >> Blocks in previous victim will occupy addition free segment, I doubt after 
> >> this
> >> change, FGGC may encounter out-of-free space issue more frequently.
> > 
> > Hmm, actually this change seems wrong by sec_usage_check().
> > We may be able to avoid this only in the suspicious loop?
> > 
> > ---
> >  fs/f2fs/gc.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> > index e88f98ddf396..5877bd729689 100644
> > --- a/fs/f2fs/gc.c
> > +++ b/fs/f2fs/gc.c
> > @@ -1326,7 +1326,7 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
> > round++;
> > }
> >  
> > -   if (gc_type == FG_GC)
> > +   if (gc_type == FG_GC && seg_freed)
> 
> That's original solution Sahitya provided to avoid infinite loop of GC, but I
> suggest to find the root cause first, then we added .invalid_segmap for that
> purpose.

I've checked the Sahitya's patch. So, it seems the problem can happen due to
is_alive or atomic_file.

> 
> Thanks,
> 
> > sbi->cur_victim_sec = NULL_SEGNO;
> >  
> > if (sync)
> > 


Re: [f2fs-dev] [PATCH 1/2] f2fs: do not select same victim right again

2019-09-09 Thread Chao Yu
On 2019/9/9 16:06, Jaegeuk Kim wrote:
> On 09/09, Chao Yu wrote:
>> On 2019/9/9 9:25, Jaegeuk Kim wrote:
>>> GC must avoid select the same victim again.
>>
>> Blocks in previous victim will occupy addition free segment, I doubt after 
>> this
>> change, FGGC may encounter out-of-free space issue more frequently.
> 
> Hmm, actually this change seems wrong by sec_usage_check().
> We may be able to avoid this only in the suspicious loop?
> 
> ---
>  fs/f2fs/gc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index e88f98ddf396..5877bd729689 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -1326,7 +1326,7 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
>   round++;
>   }
>  
> - if (gc_type == FG_GC)
> + if (gc_type == FG_GC && seg_freed)

That's original solution Sahitya provided to avoid infinite loop of GC, but I
suggest to find the root cause first, then we added .invalid_segmap for that
purpose.

Thanks,

>   sbi->cur_victim_sec = NULL_SEGNO;
>  
>   if (sync)
> 


Re: [f2fs-dev] [PATCH 2/2] f2fs: avoid infinite GC loop due to stale atomic files

2019-09-09 Thread Chao Yu
On 2019/9/9 16:38, Jaegeuk Kim wrote:
> On 09/09, Chao Yu wrote:
>> On 2019/9/9 16:21, Jaegeuk Kim wrote:
>>> On 09/09, Chao Yu wrote:
 On 2019/9/9 16:01, Jaegeuk Kim wrote:
> On 09/09, Chao Yu wrote:
>> On 2019/9/9 15:30, Jaegeuk Kim wrote:
>>> On 09/09, Chao Yu wrote:
 On 2019/9/9 9:25, Jaegeuk Kim wrote:
> If committing atomic pages is failed when doing f2fs_do_sync_file(), 
> we can
> get commited pages but atomic_file being still set like:
>
> - inmem:0, atomic IO:4 (Max.   10), volatile IO:0 (Max.   
>  0)
>
> If GC selects this block, we can get an infinite loop like this:
>
> f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, 
> oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
> f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, 
> sector = 18533696, size = 4096
> f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground 
> GC, LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, 
> pre_victim_secno = 4355, prefree = 0, free = 234
> f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, 
> i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
> f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, 
> oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
> f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, 
> sector = 18533696, size = 4096
> f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground 
> GC, LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, 
> pre_victim_secno = 4355, prefree = 0, free = 234
> f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, 
> i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
>
> In that moment, we can observe:
>
> [Before]
> Try to move 5084219 blocks (BG: 384508)
>   - data blocks : 4962373 (274483)
>   - node blocks : 121846 (110025)
> Skipped : atomic write 4534686 (10)
>
> [After]
> Try to move 5088973 blocks (BG: 384508)
>   - data blocks : 4967127 (274483)
>   - node blocks : 121846 (110025)
> Skipped : atomic write 4539440 (10)
>
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/file.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 7ae2f3bd8c2f..68b6da734e5f 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -1997,11 +1997,11 @@ static int 
> f2fs_ioc_commit_atomic_write(struct file *filp)
>   goto err_out;
>  
>   ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 0, true);
> - if (!ret) {
> - clear_inode_flag(inode, FI_ATOMIC_FILE);
> - F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC] 
> = 0;
> - stat_dec_atomic_write(inode);
> - }
> +
> + /* doesn't need to check error */
> + clear_inode_flag(inode, FI_ATOMIC_FILE);
> + F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
> + stat_dec_atomic_write(inode);

 If there are still valid atomic write pages linked in .inmem_pages, it 
 may cause
 memory leak when we just clear FI_ATOMIC_FILE flag.
>>>
>>> f2fs_commit_inmem_pages() should have flushed them.
>>
>> Oh, we failed to flush its nodes.
>>
>> However we won't clear such info if we failed to flush inmen pages, it 
>> looks
>> inconsistent.
>>
>> Any interface needed to drop inmem pages or clear ATOMIC_FILE flag in 
>> that two
>> error path? I'm not very clear how sqlite handle such error.
>
> f2fs_drop_inmem_pages() did that, but not in this case.

 What I mean is, for any error returned from atomic_commit() interface, 
 should
 userspace application handle it with consistent way, like trigger
 f2fs_drop_inmem_pages(), so we don't need to handle it inside 
 atomic_commit().
>>>
>>> f2fs_ioc_abort_volatile_write() will be triggered.
>>
>> If userspace can do this, we can get rid of this patch, or am I missing sth?
> 
> We don't know when that will come. And, other threads are waiting for GC here.

Yes, however, even atomic_write won't be called sometimes... that's why we add
handling logic in f2fs_gc().

> 
>>
>> - f2fs_ioc_abort_volatile_write
>>  - f2fs_drop_inmem_pages
>>   - clear_inode_flag(inode, FI_ATOMIC_FILE);
>>   - fi->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
>>   

Re: [PATCH] f2fs: fix to avoid accessing uninitialized field of inode page in is_alive()

2019-09-09 Thread Chao Yu
On 2019/9/9 17:33, Jaegeuk Kim wrote:
> On 09/09, Chao Yu wrote:
>> On 2019/9/9 16:37, Jaegeuk Kim wrote:
>>> On 09/09, Chao Yu wrote:
 On 2019/9/9 15:58, Chao Yu wrote:
> On 2019/9/9 15:44, Jaegeuk Kim wrote:
>> On 09/07, Chao Yu wrote:
>>> On 2019-9-7 7:48, Jaegeuk Kim wrote:
 On 09/06, Chao Yu wrote:
> If inode is newly created, inode page may not synchronize with inode 
> cache,
> so fields like .i_inline or .i_extra_isize could be wrong, in below 
> call
> path, we may access such wrong fields, result in failing to migrate 
> valid
> target block.

 If data is valid, how can we get new inode page?
>>
>> Let me rephrase the question. If inode is newly created, is this data 
>> block
>> really valid to move in GC?
>
> I guess it's valid, let double check that.

 We can see inode page:

 - f2fs_create
  - f2fs_add_link
   - f2fs_add_dentry
- f2fs_init_inode_metadata
 - f2fs_add_inline_entry
  - ipage = f2fs_new_inode_page
  - f2fs_put_page(ipage)   < after this
>>>
>>> Can you print out how many block was assigned to this inode?
>>
>> Add log like this:
>>
>>  if (!test_and_set_bit(segno, SIT_I(sbi)->invalid_segmap)) {
>>  if (is_inode) {
>>  for (i = 0; i < 923 - 50; i++) {
>>  __le32 *base = blkaddr_in_node(node);
>>  unsigned ofs = offset_in_addr(inode);
>>
>>  printk("i:%u, addr:%x\n", i,
>>  le32_to_cpu(*(base + i)));
>>  }
>>  printk("i_inline: %u\n", inode->i_inline);
>>  }
>>
>> It shows:
>> ...
>> i:10, addr:e66a
>> ...
>> i:46, addr:e66c
>> i:47, addr:e66d
>> i:48, addr:e66e
>> i:49, addr:e66f
>> i:50, addr:e670
>> i:51, addr:e671
>> i:52, addr:e672
>> i:53, addr:e673
>> i:54, addr:e674
>> i:55, addr:e675
>> i:56, addr:e676
>> ...
>> i:140, addr:2c35<--- we want to migrate this block, however, without 
>> correct
>> .i_inline and .i_extra_isize value, we can just find i_addr[i:140-6] = 
>> NULL_ADDR
> 
> So, the theory is the block is indeed valid and the address was updated before
> write_inode()?

I guess so. :)

Thanks,

> 
>> i:141, addr:2c38
>> i:142, addr:2c39
>> i:143, addr:2c3b
>> i:144, addr:2c3e
>> i:145, addr:2c40
>> i:146, addr:2c44
>> i:147, addr:2c48
>> i:148, addr:2c4a
>> i:149, addr:2c4c
>> i:150, addr:2c4f
>> i:151, addr:2c59
>> i:152, addr:2c5d
>> ...
>> i:188, addr:e677
>> i:189, addr:e678
>> i:190, addr:e679
>> i:191, addr:e67a
>> i:192, addr:e67b
>> i:193, addr:e67c
>> i:194, addr:e67d
>> i:195, addr:e67e
>> i:196, addr:e67f
>> i:197, addr:e680
>> i:198, addr:
>> i:199, addr:
>> i:200, addr:
>> i:201, addr:
>> i:202, addr:
>> i:203, addr:
>> i:204, addr:
>> i:205, addr:
>> i:206, addr:
>> i:207, addr:
>> i:208, addr:
>> i:209, addr:
>> i:210, addr:
>> i:211, addr:
>> i:212, addr:
>> i:213, addr:
>> i:214, addr:
>> i:215, addr:
>> i:216, addr:
>> i:217, addr:
>> i:218, addr:
>> i:219, addr:
>> i:220, addr:
>> i:221, addr:
>> i:222, addr:
>> i:223, addr:
>> i:224, addr:
>> i:225, addr:
>> i:226, addr:
>> i:227, addr:
>> i:228, addr:
>> i:229, addr:
>> i:230, addr:
>> i:231, addr:
>> i:232, addr:
>> i:233, addr:
>> i:234, addr:b032
>> i:235, addr:b033
>> i:236, addr:b034
>> i:237, addr:b035
>> i:238, addr:b036
>> i:239, addr:b038
>> ...
>> i:283, addr:e681
>> ...
>> i_inline: 0
>>
>> F2FS-fs (zram1): summary nid: 360, ofs: 134, ver: 0
>> F2FS-fs (zram1): blkaddr 2c35 (blkaddr in node 0) <-blkaddr in node is 
>> NULL_ADDR
>> F2FS-fs (zram1): expect: seg 14, ofs_in_seg: 53
>> F2FS-fs (zram1): real: seg 4294967295, ofs_in_seg: 0
>> F2FS-fs (zram1): ofs: 53, 0
>> F2FS-fs (zram1): node info ino:360, nid:360, nofs:0
>> F2FS-fs (zram1): ofs_in_addr: 0
>> F2FS-fs (zram1): end 
>>
>>>

>
>>
>>>
>>> is_alive()
>>> {
>>> ...
>>> node_page = f2fs_get_node_page(sbi, nid);  <--- inode page
>>
>> Aren't we seeing the below version warnings?
>>
>> if (sum->version != dni->version) {
>>  f2fs_warn(sbi, "%s: valid data with mismatched node version.",
>>__func__);
>> set_sbi_flag(sbi, SBI_NEED_FSCK);
>> }

 The version of summary and dni are all zero.
>>>
>>> Then, this node was allocated and removed without being flushed.
>>>

 summary nid: 613, ofs: 111, ver: 0
 blkaddr 2436 (blkaddr in node 0)
 expect: seg 

Re: [PATCH] f2fs: fix to avoid accessing uninitialized field of inode page in is_alive()

2019-09-09 Thread Jaegeuk Kim
On 09/09, Chao Yu wrote:
> On 2019/9/9 16:37, Jaegeuk Kim wrote:
> > On 09/09, Chao Yu wrote:
> >> On 2019/9/9 15:58, Chao Yu wrote:
> >>> On 2019/9/9 15:44, Jaegeuk Kim wrote:
>  On 09/07, Chao Yu wrote:
> > On 2019-9-7 7:48, Jaegeuk Kim wrote:
> >> On 09/06, Chao Yu wrote:
> >>> If inode is newly created, inode page may not synchronize with inode 
> >>> cache,
> >>> so fields like .i_inline or .i_extra_isize could be wrong, in below 
> >>> call
> >>> path, we may access such wrong fields, result in failing to migrate 
> >>> valid
> >>> target block.
> >>
> >> If data is valid, how can we get new inode page?
> 
>  Let me rephrase the question. If inode is newly created, is this data 
>  block
>  really valid to move in GC?
> >>>
> >>> I guess it's valid, let double check that.
> >>
> >> We can see inode page:
> >>
> >> - f2fs_create
> >>  - f2fs_add_link
> >>   - f2fs_add_dentry
> >>- f2fs_init_inode_metadata
> >> - f2fs_add_inline_entry
> >>  - ipage = f2fs_new_inode_page
> >>  - f2fs_put_page(ipage)   < after this
> > 
> > Can you print out how many block was assigned to this inode?
> 
> Add log like this:
> 
>   if (!test_and_set_bit(segno, SIT_I(sbi)->invalid_segmap)) {
>   if (is_inode) {
>   for (i = 0; i < 923 - 50; i++) {
>   __le32 *base = blkaddr_in_node(node);
>   unsigned ofs = offset_in_addr(inode);
> 
>   printk("i:%u, addr:%x\n", i,
>   le32_to_cpu(*(base + i)));
>   }
>   printk("i_inline: %u\n", inode->i_inline);
>   }
> 
> It shows:
> ...
> i:10, addr:e66a
> ...
> i:46, addr:e66c
> i:47, addr:e66d
> i:48, addr:e66e
> i:49, addr:e66f
> i:50, addr:e670
> i:51, addr:e671
> i:52, addr:e672
> i:53, addr:e673
> i:54, addr:e674
> i:55, addr:e675
> i:56, addr:e676
> ...
> i:140, addr:2c35<--- we want to migrate this block, however, without 
> correct
> .i_inline and .i_extra_isize value, we can just find i_addr[i:140-6] = 
> NULL_ADDR

So, the theory is the block is indeed valid and the address was updated before
write_inode()?

> i:141, addr:2c38
> i:142, addr:2c39
> i:143, addr:2c3b
> i:144, addr:2c3e
> i:145, addr:2c40
> i:146, addr:2c44
> i:147, addr:2c48
> i:148, addr:2c4a
> i:149, addr:2c4c
> i:150, addr:2c4f
> i:151, addr:2c59
> i:152, addr:2c5d
> ...
> i:188, addr:e677
> i:189, addr:e678
> i:190, addr:e679
> i:191, addr:e67a
> i:192, addr:e67b
> i:193, addr:e67c
> i:194, addr:e67d
> i:195, addr:e67e
> i:196, addr:e67f
> i:197, addr:e680
> i:198, addr:
> i:199, addr:
> i:200, addr:
> i:201, addr:
> i:202, addr:
> i:203, addr:
> i:204, addr:
> i:205, addr:
> i:206, addr:
> i:207, addr:
> i:208, addr:
> i:209, addr:
> i:210, addr:
> i:211, addr:
> i:212, addr:
> i:213, addr:
> i:214, addr:
> i:215, addr:
> i:216, addr:
> i:217, addr:
> i:218, addr:
> i:219, addr:
> i:220, addr:
> i:221, addr:
> i:222, addr:
> i:223, addr:
> i:224, addr:
> i:225, addr:
> i:226, addr:
> i:227, addr:
> i:228, addr:
> i:229, addr:
> i:230, addr:
> i:231, addr:
> i:232, addr:
> i:233, addr:
> i:234, addr:b032
> i:235, addr:b033
> i:236, addr:b034
> i:237, addr:b035
> i:238, addr:b036
> i:239, addr:b038
> ...
> i:283, addr:e681
> ...
> i_inline: 0
> 
> F2FS-fs (zram1): summary nid: 360, ofs: 134, ver: 0
> F2FS-fs (zram1): blkaddr 2c35 (blkaddr in node 0) <-blkaddr in node is 
> NULL_ADDR
> F2FS-fs (zram1): expect: seg 14, ofs_in_seg: 53
> F2FS-fs (zram1): real: seg 4294967295, ofs_in_seg: 0
> F2FS-fs (zram1): ofs: 53, 0
> F2FS-fs (zram1): node info ino:360, nid:360, nofs:0
> F2FS-fs (zram1): ofs_in_addr: 0
> F2FS-fs (zram1): end 
> 
> > 
> >>
> >>>
> 
> >
> > is_alive()
> > {
> > ...
> > node_page = f2fs_get_node_page(sbi, nid);  <--- inode page
> 
>  Aren't we seeing the below version warnings?
> 
>  if (sum->version != dni->version) {
>   f2fs_warn(sbi, "%s: valid data with mismatched node version.",
> __func__);
>  set_sbi_flag(sbi, SBI_NEED_FSCK);
>  }
> >>
> >> The version of summary and dni are all zero.
> > 
> > Then, this node was allocated and removed without being flushed.
> > 
> >>
> >> summary nid: 613, ofs: 111, ver: 0
> >> blkaddr 2436 (blkaddr in node 0)
> >> expect: seg 10, ofs_in_seg: 54
> >> real: seg 4294967295, ofs_in_seg: 0
> >> ofs: 54, 0
> >> node info ino:613, nid:613, nofs:0
> >> ofs_in_addr: 0
> >>
> >> Thanks,
> >>
> 
> 

Re: [PATCH] f2fs: fix to avoid accessing uninitialized field of inode page in is_alive()

2019-09-09 Thread Chao Yu
On 2019/9/9 16:37, Jaegeuk Kim wrote:
> On 09/09, Chao Yu wrote:
>> On 2019/9/9 15:58, Chao Yu wrote:
>>> On 2019/9/9 15:44, Jaegeuk Kim wrote:
 On 09/07, Chao Yu wrote:
> On 2019-9-7 7:48, Jaegeuk Kim wrote:
>> On 09/06, Chao Yu wrote:
>>> If inode is newly created, inode page may not synchronize with inode 
>>> cache,
>>> so fields like .i_inline or .i_extra_isize could be wrong, in below call
>>> path, we may access such wrong fields, result in failing to migrate 
>>> valid
>>> target block.
>>
>> If data is valid, how can we get new inode page?

 Let me rephrase the question. If inode is newly created, is this data block
 really valid to move in GC?
>>>
>>> I guess it's valid, let double check that.
>>
>> We can see inode page:
>>
>> - f2fs_create
>>  - f2fs_add_link
>>   - f2fs_add_dentry
>>- f2fs_init_inode_metadata
>> - f2fs_add_inline_entry
>>  - ipage = f2fs_new_inode_page
>>  - f2fs_put_page(ipage)   < after this
> 
> Can you print out how many block was assigned to this inode?

Add log like this:

if (!test_and_set_bit(segno, SIT_I(sbi)->invalid_segmap)) {
if (is_inode) {
for (i = 0; i < 923 - 50; i++) {
__le32 *base = blkaddr_in_node(node);
unsigned ofs = offset_in_addr(inode);

printk("i:%u, addr:%x\n", i,
le32_to_cpu(*(base + i)));
}
printk("i_inline: %u\n", inode->i_inline);
}

It shows:
...
i:10, addr:e66a
...
i:46, addr:e66c
i:47, addr:e66d
i:48, addr:e66e
i:49, addr:e66f
i:50, addr:e670
i:51, addr:e671
i:52, addr:e672
i:53, addr:e673
i:54, addr:e674
i:55, addr:e675
i:56, addr:e676
...
i:140, addr:2c35<--- we want to migrate this block, however, without correct
.i_inline and .i_extra_isize value, we can just find i_addr[i:140-6] = NULL_ADDR
i:141, addr:2c38
i:142, addr:2c39
i:143, addr:2c3b
i:144, addr:2c3e
i:145, addr:2c40
i:146, addr:2c44
i:147, addr:2c48
i:148, addr:2c4a
i:149, addr:2c4c
i:150, addr:2c4f
i:151, addr:2c59
i:152, addr:2c5d
...
i:188, addr:e677
i:189, addr:e678
i:190, addr:e679
i:191, addr:e67a
i:192, addr:e67b
i:193, addr:e67c
i:194, addr:e67d
i:195, addr:e67e
i:196, addr:e67f
i:197, addr:e680
i:198, addr:
i:199, addr:
i:200, addr:
i:201, addr:
i:202, addr:
i:203, addr:
i:204, addr:
i:205, addr:
i:206, addr:
i:207, addr:
i:208, addr:
i:209, addr:
i:210, addr:
i:211, addr:
i:212, addr:
i:213, addr:
i:214, addr:
i:215, addr:
i:216, addr:
i:217, addr:
i:218, addr:
i:219, addr:
i:220, addr:
i:221, addr:
i:222, addr:
i:223, addr:
i:224, addr:
i:225, addr:
i:226, addr:
i:227, addr:
i:228, addr:
i:229, addr:
i:230, addr:
i:231, addr:
i:232, addr:
i:233, addr:
i:234, addr:b032
i:235, addr:b033
i:236, addr:b034
i:237, addr:b035
i:238, addr:b036
i:239, addr:b038
...
i:283, addr:e681
...
i_inline: 0

F2FS-fs (zram1): summary nid: 360, ofs: 134, ver: 0
F2FS-fs (zram1): blkaddr 2c35 (blkaddr in node 0) <-blkaddr in node is NULL_ADDR
F2FS-fs (zram1): expect: seg 14, ofs_in_seg: 53
F2FS-fs (zram1): real: seg 4294967295, ofs_in_seg: 0
F2FS-fs (zram1): ofs: 53, 0
F2FS-fs (zram1): node info ino:360, nid:360, nofs:0
F2FS-fs (zram1): ofs_in_addr: 0
F2FS-fs (zram1): end 

> 
>>
>>>

>
> is_alive()
> {
> ...
>   node_page = f2fs_get_node_page(sbi, nid);  <--- inode page

 Aren't we seeing the below version warnings?

 if (sum->version != dni->version) {
f2fs_warn(sbi, "%s: valid data with mismatched node version.",
__func__);
 set_sbi_flag(sbi, SBI_NEED_FSCK);
 }
>>
>> The version of summary and dni are all zero.
> 
> Then, this node was allocated and removed without being flushed.
> 
>>
>> summary nid: 613, ofs: 111, ver: 0
>> blkaddr 2436 (blkaddr in node 0)
>> expect: seg 10, ofs_in_seg: 54
>> real: seg 4294967295, ofs_in_seg: 0
>> ofs: 54, 0
>> node info ino:613, nid:613, nofs:0
>> ofs_in_addr: 0
>>
>> Thanks,
>>

>
>   source_blkaddr = datablock_addr(NULL, node_page, ofs_in_node);

 So, we're getting this? Does this incur infinite loop in GC?

 if (!test_and_set_bit(segno, SIT_I(sbi)->invalid_segmap)) {
f2fs_err(sbi, "mismatched blkaddr %u (source_blkaddr %u) in seg %u\n",
f2fs_bug_on(sbi, 1);
 }
>>>
>>> Yes, I only get this with generic/269, rather than "valid data with 
>>> mismatched
>>> node version.".
> 
> Was this block moved as valid? In either way, 

Re: [f2fs-dev] [PATCH 2/2] f2fs: avoid infinite GC loop due to stale atomic files

2019-09-09 Thread Jaegeuk Kim
On 09/09, Jaegeuk Kim wrote:
> On 09/09, Chao Yu wrote:
> > On 2019/9/9 16:21, Jaegeuk Kim wrote:
> > > On 09/09, Chao Yu wrote:
> > >> On 2019/9/9 16:01, Jaegeuk Kim wrote:
> > >>> On 09/09, Chao Yu wrote:
> >  On 2019/9/9 15:30, Jaegeuk Kim wrote:
> > > On 09/09, Chao Yu wrote:
> > >> On 2019/9/9 9:25, Jaegeuk Kim wrote:
> > >>> If committing atomic pages is failed when doing 
> > >>> f2fs_do_sync_file(), we can
> > >>> get commited pages but atomic_file being still set like:
> > >>>
> > >>> - inmem:0, atomic IO:4 (Max.   10), volatile IO:0 (Max. 
> > >>>0)
> > >>>
> > >>> If GC selects this block, we can get an infinite loop like this:
> > >>>
> > >>> f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 
> > >>> 0x2359a8, oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type 
> > >>> = COLD_DATA
> > >>> f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, 
> > >>> sector = 18533696, size = 4096
> > >>> f2fs_get_victim: dev = (253,7), type = No TYPE, policy = 
> > >>> (Foreground GC, LFS-mode, Greedy), victim = 4355, cost = 1, 
> > >>> ofs_unit = 1, pre_victim_secno = 4355, prefree = 0, free = 234
> > >>> f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, 
> > >>> i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
> > >>> f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 
> > >>> 0x2359a8, oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type 
> > >>> = COLD_DATA
> > >>> f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, 
> > >>> sector = 18533696, size = 4096
> > >>> f2fs_get_victim: dev = (253,7), type = No TYPE, policy = 
> > >>> (Foreground GC, LFS-mode, Greedy), victim = 4355, cost = 1, 
> > >>> ofs_unit = 1, pre_victim_secno = 4355, prefree = 0, free = 234
> > >>> f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, 
> > >>> i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
> > >>>
> > >>> In that moment, we can observe:
> > >>>
> > >>> [Before]
> > >>> Try to move 5084219 blocks (BG: 384508)
> > >>>   - data blocks : 4962373 (274483)
> > >>>   - node blocks : 121846 (110025)
> > >>> Skipped : atomic write 4534686 (10)
> > >>>
> > >>> [After]
> > >>> Try to move 5088973 blocks (BG: 384508)
> > >>>   - data blocks : 4967127 (274483)
> > >>>   - node blocks : 121846 (110025)
> > >>> Skipped : atomic write 4539440 (10)
> > >>>
> > >>> Signed-off-by: Jaegeuk Kim 
> > >>> ---
> > >>>  fs/f2fs/file.c | 10 +-
> > >>>  1 file changed, 5 insertions(+), 5 deletions(-)
> > >>>
> > >>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > >>> index 7ae2f3bd8c2f..68b6da734e5f 100644
> > >>> --- a/fs/f2fs/file.c
> > >>> +++ b/fs/f2fs/file.c
> > >>> @@ -1997,11 +1997,11 @@ static int 
> > >>> f2fs_ioc_commit_atomic_write(struct file *filp)
> > >>> goto err_out;
> > >>>  
> > >>> ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 0, true);
> > >>> -   if (!ret) {
> > >>> -   clear_inode_flag(inode, FI_ATOMIC_FILE);
> > >>> -   F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC] 
> > >>> = 0;
> > >>> -   stat_dec_atomic_write(inode);
> > >>> -   }
> > >>> +
> > >>> +   /* doesn't need to check error */
> > >>> +   clear_inode_flag(inode, FI_ATOMIC_FILE);
> > >>> +   F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
> > >>> +   stat_dec_atomic_write(inode);
> > >>
> > >> If there are still valid atomic write pages linked in .inmem_pages, 
> > >> it may cause
> > >> memory leak when we just clear FI_ATOMIC_FILE flag.
> > >
> > > f2fs_commit_inmem_pages() should have flushed them.
> > 
> >  Oh, we failed to flush its nodes.
> > 
> >  However we won't clear such info if we failed to flush inmen pages, it 
> >  looks
> >  inconsistent.
> > 
> >  Any interface needed to drop inmem pages or clear ATOMIC_FILE flag in 
> >  that two
> >  error path? I'm not very clear how sqlite handle such error.
> > >>>
> > >>> f2fs_drop_inmem_pages() did that, but not in this case.
> > >>
> > >> What I mean is, for any error returned from atomic_commit() interface, 
> > >> should
> > >> userspace application handle it with consistent way, like trigger
> > >> f2fs_drop_inmem_pages(), so we don't need to handle it inside 
> > >> atomic_commit().
> > > 
> > > f2fs_ioc_abort_volatile_write() will be triggered.
> > 
> > If userspace can do this, we can get rid of this patch, or am I missing sth?
> 
> We don't know when that will come. And, other threads are waiting for GC here.
> 

Actually, we can call this.

---
 fs/f2fs/file.c | 6 +-
 1 file changed, 1 

Re: [f2fs-dev] [PATCH 2/2] f2fs: avoid infinite GC loop due to stale atomic files

2019-09-09 Thread Jaegeuk Kim
On 09/09, Chao Yu wrote:
> On 2019/9/9 16:21, Jaegeuk Kim wrote:
> > On 09/09, Chao Yu wrote:
> >> On 2019/9/9 16:01, Jaegeuk Kim wrote:
> >>> On 09/09, Chao Yu wrote:
>  On 2019/9/9 15:30, Jaegeuk Kim wrote:
> > On 09/09, Chao Yu wrote:
> >> On 2019/9/9 9:25, Jaegeuk Kim wrote:
> >>> If committing atomic pages is failed when doing f2fs_do_sync_file(), 
> >>> we can
> >>> get commited pages but atomic_file being still set like:
> >>>
> >>> - inmem:0, atomic IO:4 (Max.   10), volatile IO:0 (Max.   
> >>>  0)
> >>>
> >>> If GC selects this block, we can get an infinite loop like this:
> >>>
> >>> f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, 
> >>> oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
> >>> f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, 
> >>> sector = 18533696, size = 4096
> >>> f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground 
> >>> GC, LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, 
> >>> pre_victim_secno = 4355, prefree = 0, free = 234
> >>> f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, 
> >>> i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
> >>> f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, 
> >>> oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
> >>> f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, 
> >>> sector = 18533696, size = 4096
> >>> f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground 
> >>> GC, LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, 
> >>> pre_victim_secno = 4355, prefree = 0, free = 234
> >>> f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, 
> >>> i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
> >>>
> >>> In that moment, we can observe:
> >>>
> >>> [Before]
> >>> Try to move 5084219 blocks (BG: 384508)
> >>>   - data blocks : 4962373 (274483)
> >>>   - node blocks : 121846 (110025)
> >>> Skipped : atomic write 4534686 (10)
> >>>
> >>> [After]
> >>> Try to move 5088973 blocks (BG: 384508)
> >>>   - data blocks : 4967127 (274483)
> >>>   - node blocks : 121846 (110025)
> >>> Skipped : atomic write 4539440 (10)
> >>>
> >>> Signed-off-by: Jaegeuk Kim 
> >>> ---
> >>>  fs/f2fs/file.c | 10 +-
> >>>  1 file changed, 5 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> >>> index 7ae2f3bd8c2f..68b6da734e5f 100644
> >>> --- a/fs/f2fs/file.c
> >>> +++ b/fs/f2fs/file.c
> >>> @@ -1997,11 +1997,11 @@ static int 
> >>> f2fs_ioc_commit_atomic_write(struct file *filp)
> >>>   goto err_out;
> >>>  
> >>>   ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 0, true);
> >>> - if (!ret) {
> >>> - clear_inode_flag(inode, FI_ATOMIC_FILE);
> >>> - F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC] 
> >>> = 0;
> >>> - stat_dec_atomic_write(inode);
> >>> - }
> >>> +
> >>> + /* doesn't need to check error */
> >>> + clear_inode_flag(inode, FI_ATOMIC_FILE);
> >>> + F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
> >>> + stat_dec_atomic_write(inode);
> >>
> >> If there are still valid atomic write pages linked in .inmem_pages, it 
> >> may cause
> >> memory leak when we just clear FI_ATOMIC_FILE flag.
> >
> > f2fs_commit_inmem_pages() should have flushed them.
> 
>  Oh, we failed to flush its nodes.
> 
>  However we won't clear such info if we failed to flush inmen pages, it 
>  looks
>  inconsistent.
> 
>  Any interface needed to drop inmem pages or clear ATOMIC_FILE flag in 
>  that two
>  error path? I'm not very clear how sqlite handle such error.
> >>>
> >>> f2fs_drop_inmem_pages() did that, but not in this case.
> >>
> >> What I mean is, for any error returned from atomic_commit() interface, 
> >> should
> >> userspace application handle it with consistent way, like trigger
> >> f2fs_drop_inmem_pages(), so we don't need to handle it inside 
> >> atomic_commit().
> > 
> > f2fs_ioc_abort_volatile_write() will be triggered.
> 
> If userspace can do this, we can get rid of this patch, or am I missing sth?

We don't know when that will come. And, other threads are waiting for GC here.

> 
> - f2fs_ioc_abort_volatile_write
>  - f2fs_drop_inmem_pages
>   - clear_inode_flag(inode, FI_ATOMIC_FILE);
>   - fi->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
>   - stat_dec_atomic_write(inode);
> 
> > 
> >>
> >>>
> 
>  Thanks,
> 
> >
> >>
> >> So my question is why below logic didn't handle such 

Re: [PATCH] f2fs: fix to avoid accessing uninitialized field of inode page in is_alive()

2019-09-09 Thread Jaegeuk Kim
On 09/09, Chao Yu wrote:
> On 2019/9/9 15:58, Chao Yu wrote:
> > On 2019/9/9 15:44, Jaegeuk Kim wrote:
> >> On 09/07, Chao Yu wrote:
> >>> On 2019-9-7 7:48, Jaegeuk Kim wrote:
>  On 09/06, Chao Yu wrote:
> > If inode is newly created, inode page may not synchronize with inode 
> > cache,
> > so fields like .i_inline or .i_extra_isize could be wrong, in below call
> > path, we may access such wrong fields, result in failing to migrate 
> > valid
> > target block.
> 
>  If data is valid, how can we get new inode page?
> >>
> >> Let me rephrase the question. If inode is newly created, is this data block
> >> really valid to move in GC?
> > 
> > I guess it's valid, let double check that.
> 
> We can see inode page:
> 
> - f2fs_create
>  - f2fs_add_link
>   - f2fs_add_dentry
>- f2fs_init_inode_metadata
> - f2fs_add_inline_entry
>  - ipage = f2fs_new_inode_page
>  - f2fs_put_page(ipage)   < after this

Can you print out how many block was assigned to this inode?

> 
> > 
> >>
> >>>
> >>> is_alive()
> >>> {
> >>> ...
> >>>   node_page = f2fs_get_node_page(sbi, nid);  <--- inode page
> >>
> >> Aren't we seeing the below version warnings?
> >>
> >> if (sum->version != dni->version) {
> >>f2fs_warn(sbi, "%s: valid data with mismatched node version.",
> >>__func__);
> >> set_sbi_flag(sbi, SBI_NEED_FSCK);
> >> }
> 
> The version of summary and dni are all zero.

Then, this node was allocated and removed without being flushed.

> 
> summary nid: 613, ofs: 111, ver: 0
> blkaddr 2436 (blkaddr in node 0)
> expect: seg 10, ofs_in_seg: 54
> real: seg 4294967295, ofs_in_seg: 0
> ofs: 54, 0
> node info ino:613, nid:613, nofs:0
> ofs_in_addr: 0
> 
> Thanks,
> 
> >>
> >>>
> >>>   source_blkaddr = datablock_addr(NULL, node_page, ofs_in_node);
> >>
> >> So, we're getting this? Does this incur infinite loop in GC?
> >>
> >> if (!test_and_set_bit(segno, SIT_I(sbi)->invalid_segmap)) {
> >>f2fs_err(sbi, "mismatched blkaddr %u (source_blkaddr %u) in seg %u\n",
> >>f2fs_bug_on(sbi, 1);
> >> }
> > 
> > Yes, I only get this with generic/269, rather than "valid data with 
> > mismatched
> > node version.".

Was this block moved as valid? In either way, is_alive() returns false, no?
How about checking i_blocks to detect the page is initialized in is_alive()?

> > 
> > With this patch, generic/269 won't panic again.
> > 
> > Thanks,
> > 
> >>
> >>> ...
> >>> }
> >>>
> >>> datablock_addr()
> >>> {
> >>> ...
> >>>   base = offset_in_addr(_node->i);  <--- the base could be wrong here 
> >>> due to
> >>> accessing uninitialized .i_inline of raw_node->i.
> >>> ...
> >>> }
> >>>
> >>> Thanks,
> >>>
> 
> >
> > - gc_data_segment
> >  - is_alive
> >   - datablock_addr
> >- offset_in_addr
> >
> > Fixes: 7a2af766af15 ("f2fs: enhance on-disk inode structure 
> > scalability")
> > Signed-off-by: Chao Yu 
> > ---
> >  fs/f2fs/dir.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
> > index 765f13354d3f..b1840852967e 100644
> > --- a/fs/f2fs/dir.c
> > +++ b/fs/f2fs/dir.c
> > @@ -479,6 +479,9 @@ struct page *f2fs_init_inode_metadata(struct inode 
> > *inode, struct inode *dir,
> > if (IS_ERR(page))
> > return page;
> >  
> > +   /* synchronize inode page's data from inode cache */
> > +   f2fs_update_inode(inode, page);
> > +
> > if (S_ISDIR(inode->i_mode)) {
> > /* in order to handle error case */
> > get_page(page);
> > -- 
> > 2.18.0.rc1
> >> .
> >>


Re: [f2fs-dev] [PATCH 2/2] f2fs: avoid infinite GC loop due to stale atomic files

2019-09-09 Thread Chao Yu
On 2019/9/9 16:21, Jaegeuk Kim wrote:
> On 09/09, Chao Yu wrote:
>> On 2019/9/9 16:01, Jaegeuk Kim wrote:
>>> On 09/09, Chao Yu wrote:
 On 2019/9/9 15:30, Jaegeuk Kim wrote:
> On 09/09, Chao Yu wrote:
>> On 2019/9/9 9:25, Jaegeuk Kim wrote:
>>> If committing atomic pages is failed when doing f2fs_do_sync_file(), we 
>>> can
>>> get commited pages but atomic_file being still set like:
>>>
>>> - inmem:0, atomic IO:4 (Max.   10), volatile IO:0 (Max.
>>> 0)
>>>
>>> If GC selects this block, we can get an infinite loop like this:
>>>
>>> f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, 
>>> oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
>>> f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, sector 
>>> = 18533696, size = 4096
>>> f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground 
>>> GC, LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, 
>>> pre_victim_secno = 4355, prefree = 0, free = 234
>>> f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, 
>>> i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
>>> f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, 
>>> oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
>>> f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, sector 
>>> = 18533696, size = 4096
>>> f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground 
>>> GC, LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, 
>>> pre_victim_secno = 4355, prefree = 0, free = 234
>>> f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, 
>>> i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
>>>
>>> In that moment, we can observe:
>>>
>>> [Before]
>>> Try to move 5084219 blocks (BG: 384508)
>>>   - data blocks : 4962373 (274483)
>>>   - node blocks : 121846 (110025)
>>> Skipped : atomic write 4534686 (10)
>>>
>>> [After]
>>> Try to move 5088973 blocks (BG: 384508)
>>>   - data blocks : 4967127 (274483)
>>>   - node blocks : 121846 (110025)
>>> Skipped : atomic write 4539440 (10)
>>>
>>> Signed-off-by: Jaegeuk Kim 
>>> ---
>>>  fs/f2fs/file.c | 10 +-
>>>  1 file changed, 5 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>>> index 7ae2f3bd8c2f..68b6da734e5f 100644
>>> --- a/fs/f2fs/file.c
>>> +++ b/fs/f2fs/file.c
>>> @@ -1997,11 +1997,11 @@ static int f2fs_ioc_commit_atomic_write(struct 
>>> file *filp)
>>> goto err_out;
>>>  
>>> ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 0, true);
>>> -   if (!ret) {
>>> -   clear_inode_flag(inode, FI_ATOMIC_FILE);
>>> -   F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC] 
>>> = 0;
>>> -   stat_dec_atomic_write(inode);
>>> -   }
>>> +
>>> +   /* doesn't need to check error */
>>> +   clear_inode_flag(inode, FI_ATOMIC_FILE);
>>> +   F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
>>> +   stat_dec_atomic_write(inode);
>>
>> If there are still valid atomic write pages linked in .inmem_pages, it 
>> may cause
>> memory leak when we just clear FI_ATOMIC_FILE flag.
>
> f2fs_commit_inmem_pages() should have flushed them.

 Oh, we failed to flush its nodes.

 However we won't clear such info if we failed to flush inmen pages, it 
 looks
 inconsistent.

 Any interface needed to drop inmem pages or clear ATOMIC_FILE flag in that 
 two
 error path? I'm not very clear how sqlite handle such error.
>>>
>>> f2fs_drop_inmem_pages() did that, but not in this case.
>>
>> What I mean is, for any error returned from atomic_commit() interface, should
>> userspace application handle it with consistent way, like trigger
>> f2fs_drop_inmem_pages(), so we don't need to handle it inside 
>> atomic_commit().
> 
> f2fs_ioc_abort_volatile_write() will be triggered.

If userspace can do this, we can get rid of this patch, or am I missing sth?

- f2fs_ioc_abort_volatile_write
 - f2fs_drop_inmem_pages
  - clear_inode_flag(inode, FI_ATOMIC_FILE);
  - fi->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
  - stat_dec_atomic_write(inode);

> 
>>
>>>

 Thanks,

>
>>
>> So my question is why below logic didn't handle such condition well?
>>
>> f2fs_gc()
>>
>>  if (has_not_enough_free_secs(sbi, sec_freed, 0)) {
>>  if (skipped_round <= MAX_SKIP_GC_COUNT ||
>>  skipped_round * 2 < round) {
>>  segno = NULL_SEGNO;
>>  goto gc_more;
>>   

Re: [f2fs-dev] [PATCH 2/2] f2fs: avoid infinite GC loop due to stale atomic files

2019-09-09 Thread Jaegeuk Kim
On 09/09, Chao Yu wrote:
> On 2019/9/9 16:01, Jaegeuk Kim wrote:
> > On 09/09, Chao Yu wrote:
> >> On 2019/9/9 15:30, Jaegeuk Kim wrote:
> >>> On 09/09, Chao Yu wrote:
>  On 2019/9/9 9:25, Jaegeuk Kim wrote:
> > If committing atomic pages is failed when doing f2fs_do_sync_file(), we 
> > can
> > get commited pages but atomic_file being still set like:
> >
> > - inmem:0, atomic IO:4 (Max.   10), volatile IO:0 (Max.
> > 0)
> >
> > If GC selects this block, we can get an infinite loop like this:
> >
> > f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, 
> > oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
> > f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, sector 
> > = 18533696, size = 4096
> > f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground 
> > GC, LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, 
> > pre_victim_secno = 4355, prefree = 0, free = 234
> > f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, 
> > i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
> > f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, 
> > oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
> > f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, sector 
> > = 18533696, size = 4096
> > f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground 
> > GC, LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, 
> > pre_victim_secno = 4355, prefree = 0, free = 234
> > f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, 
> > i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
> >
> > In that moment, we can observe:
> >
> > [Before]
> > Try to move 5084219 blocks (BG: 384508)
> >   - data blocks : 4962373 (274483)
> >   - node blocks : 121846 (110025)
> > Skipped : atomic write 4534686 (10)
> >
> > [After]
> > Try to move 5088973 blocks (BG: 384508)
> >   - data blocks : 4967127 (274483)
> >   - node blocks : 121846 (110025)
> > Skipped : atomic write 4539440 (10)
> >
> > Signed-off-by: Jaegeuk Kim 
> > ---
> >  fs/f2fs/file.c | 10 +-
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > index 7ae2f3bd8c2f..68b6da734e5f 100644
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -1997,11 +1997,11 @@ static int f2fs_ioc_commit_atomic_write(struct 
> > file *filp)
> > goto err_out;
> >  
> > ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 0, true);
> > -   if (!ret) {
> > -   clear_inode_flag(inode, FI_ATOMIC_FILE);
> > -   F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC] 
> > = 0;
> > -   stat_dec_atomic_write(inode);
> > -   }
> > +
> > +   /* doesn't need to check error */
> > +   clear_inode_flag(inode, FI_ATOMIC_FILE);
> > +   F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
> > +   stat_dec_atomic_write(inode);
> 
>  If there are still valid atomic write pages linked in .inmem_pages, it 
>  may cause
>  memory leak when we just clear FI_ATOMIC_FILE flag.
> >>>
> >>> f2fs_commit_inmem_pages() should have flushed them.
> >>
> >> Oh, we failed to flush its nodes.
> >>
> >> However we won't clear such info if we failed to flush inmen pages, it 
> >> looks
> >> inconsistent.
> >>
> >> Any interface needed to drop inmem pages or clear ATOMIC_FILE flag in that 
> >> two
> >> error path? I'm not very clear how sqlite handle such error.
> > 
> > f2fs_drop_inmem_pages() did that, but not in this case.
> 
> What I mean is, for any error returned from atomic_commit() interface, should
> userspace application handle it with consistent way, like trigger
> f2fs_drop_inmem_pages(), so we don't need to handle it inside atomic_commit().

f2fs_ioc_abort_volatile_write() will be triggered.

> 
> > 
> >>
> >> Thanks,
> >>
> >>>
> 
>  So my question is why below logic didn't handle such condition well?
> 
>  f2fs_gc()
> 
>   if (has_not_enough_free_secs(sbi, sec_freed, 0)) {
>   if (skipped_round <= MAX_SKIP_GC_COUNT ||
>   skipped_round * 2 < round) {
>   segno = NULL_SEGNO;
>   goto gc_more;
>   }
> 
>   if (first_skipped < last_skipped &&
>   (last_skipped - first_skipped) >
>   sbi->skipped_gc_rwsem) {
>   f2fs_drop_inmem_pages_all(sbi, true);
> >>>
> >>> This is doing nothing, since 

Re: [PATCH] f2fs: fix to avoid accessing uninitialized field of inode page in is_alive()

2019-09-09 Thread Chao Yu
On 2019/9/9 15:58, Chao Yu wrote:
> On 2019/9/9 15:44, Jaegeuk Kim wrote:
>> On 09/07, Chao Yu wrote:
>>> On 2019-9-7 7:48, Jaegeuk Kim wrote:
 On 09/06, Chao Yu wrote:
> If inode is newly created, inode page may not synchronize with inode 
> cache,
> so fields like .i_inline or .i_extra_isize could be wrong, in below call
> path, we may access such wrong fields, result in failing to migrate valid
> target block.

 If data is valid, how can we get new inode page?
>>
>> Let me rephrase the question. If inode is newly created, is this data block
>> really valid to move in GC?
> 
> I guess it's valid, let double check that.

We can see inode page:

- f2fs_create
 - f2fs_add_link
  - f2fs_add_dentry
   - f2fs_init_inode_metadata
- f2fs_add_inline_entry
 - ipage = f2fs_new_inode_page
 - f2fs_put_page(ipage)   < after this

> 
>>
>>>
>>> is_alive()
>>> {
>>> ...
>>> node_page = f2fs_get_node_page(sbi, nid);  <--- inode page
>>
>> Aren't we seeing the below version warnings?
>>
>> if (sum->version != dni->version) {
>>  f2fs_warn(sbi, "%s: valid data with mismatched node version.",
>>__func__);
>> set_sbi_flag(sbi, SBI_NEED_FSCK);
>> }

The version of summary and dni are all zero.

summary nid: 613, ofs: 111, ver: 0
blkaddr 2436 (blkaddr in node 0)
expect: seg 10, ofs_in_seg: 54
real: seg 4294967295, ofs_in_seg: 0
ofs: 54, 0
node info ino:613, nid:613, nofs:0
ofs_in_addr: 0

Thanks,

>>
>>>
>>> source_blkaddr = datablock_addr(NULL, node_page, ofs_in_node);
>>
>> So, we're getting this? Does this incur infinite loop in GC?
>>
>> if (!test_and_set_bit(segno, SIT_I(sbi)->invalid_segmap)) {
>>  f2fs_err(sbi, "mismatched blkaddr %u (source_blkaddr %u) in seg %u\n",
>>  f2fs_bug_on(sbi, 1);
>> }
> 
> Yes, I only get this with generic/269, rather than "valid data with mismatched
> node version.".
> 
> With this patch, generic/269 won't panic again.
> 
> Thanks,
> 
>>
>>> ...
>>> }
>>>
>>> datablock_addr()
>>> {
>>> ...
>>> base = offset_in_addr(_node->i);  <--- the base could be wrong here 
>>> due to
>>> accessing uninitialized .i_inline of raw_node->i.
>>> ...
>>> }
>>>
>>> Thanks,
>>>

>
> - gc_data_segment
>  - is_alive
>   - datablock_addr
>- offset_in_addr
>
> Fixes: 7a2af766af15 ("f2fs: enhance on-disk inode structure scalability")
> Signed-off-by: Chao Yu 
> ---
>  fs/f2fs/dir.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
> index 765f13354d3f..b1840852967e 100644
> --- a/fs/f2fs/dir.c
> +++ b/fs/f2fs/dir.c
> @@ -479,6 +479,9 @@ struct page *f2fs_init_inode_metadata(struct inode 
> *inode, struct inode *dir,
>   if (IS_ERR(page))
>   return page;
>  
> + /* synchronize inode page's data from inode cache */
> + f2fs_update_inode(inode, page);
> +
>   if (S_ISDIR(inode->i_mode)) {
>   /* in order to handle error case */
>   get_page(page);
> -- 
> 2.18.0.rc1
>> .
>>


Re: [f2fs-dev] [PATCH 1/2] f2fs: do not select same victim right again

2019-09-09 Thread Jaegeuk Kim
On 09/09, Chao Yu wrote:
> On 2019/9/9 9:25, Jaegeuk Kim wrote:
> > GC must avoid select the same victim again.
> 
> Blocks in previous victim will occupy addition free segment, I doubt after 
> this
> change, FGGC may encounter out-of-free space issue more frequently.

Hmm, actually this change seems wrong by sec_usage_check().
We may be able to avoid this only in the suspicious loop?

---
 fs/f2fs/gc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index e88f98ddf396..5877bd729689 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1326,7 +1326,7 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
round++;
}
 
-   if (gc_type == FG_GC)
+   if (gc_type == FG_GC && seg_freed)
sbi->cur_victim_sec = NULL_SEGNO;
 
if (sync)
-- 
2.19.0.605.g01d371f741-goog



Re: [f2fs-dev] [PATCH 2/2] f2fs: avoid infinite GC loop due to stale atomic files

2019-09-09 Thread Chao Yu
On 2019/9/9 16:01, Jaegeuk Kim wrote:
> On 09/09, Chao Yu wrote:
>> On 2019/9/9 15:30, Jaegeuk Kim wrote:
>>> On 09/09, Chao Yu wrote:
 On 2019/9/9 9:25, Jaegeuk Kim wrote:
> If committing atomic pages is failed when doing f2fs_do_sync_file(), we 
> can
> get commited pages but atomic_file being still set like:
>
> - inmem:0, atomic IO:4 (Max.   10), volatile IO:0 (Max.0)
>
> If GC selects this block, we can get an infinite loop like this:
>
> f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, 
> oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
> f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, sector = 
> 18533696, size = 4096
> f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground GC, 
> LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, 
> pre_victim_secno = 4355, prefree = 0, free = 234
> f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, 
> i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
> f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, 
> oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
> f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, sector = 
> 18533696, size = 4096
> f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground GC, 
> LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, 
> pre_victim_secno = 4355, prefree = 0, free = 234
> f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, 
> i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
>
> In that moment, we can observe:
>
> [Before]
> Try to move 5084219 blocks (BG: 384508)
>   - data blocks : 4962373 (274483)
>   - node blocks : 121846 (110025)
> Skipped : atomic write 4534686 (10)
>
> [After]
> Try to move 5088973 blocks (BG: 384508)
>   - data blocks : 4967127 (274483)
>   - node blocks : 121846 (110025)
> Skipped : atomic write 4539440 (10)
>
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/file.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 7ae2f3bd8c2f..68b6da734e5f 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -1997,11 +1997,11 @@ static int f2fs_ioc_commit_atomic_write(struct 
> file *filp)
>   goto err_out;
>  
>   ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 0, true);
> - if (!ret) {
> - clear_inode_flag(inode, FI_ATOMIC_FILE);
> - F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
> - stat_dec_atomic_write(inode);
> - }
> +
> + /* doesn't need to check error */
> + clear_inode_flag(inode, FI_ATOMIC_FILE);
> + F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
> + stat_dec_atomic_write(inode);

 If there are still valid atomic write pages linked in .inmem_pages, it may 
 cause
 memory leak when we just clear FI_ATOMIC_FILE flag.
>>>
>>> f2fs_commit_inmem_pages() should have flushed them.
>>
>> Oh, we failed to flush its nodes.
>>
>> However we won't clear such info if we failed to flush inmen pages, it looks
>> inconsistent.
>>
>> Any interface needed to drop inmem pages or clear ATOMIC_FILE flag in that 
>> two
>> error path? I'm not very clear how sqlite handle such error.
> 
> f2fs_drop_inmem_pages() did that, but not in this case.

What I mean is, for any error returned from atomic_commit() interface, should
userspace application handle it with consistent way, like trigger
f2fs_drop_inmem_pages(), so we don't need to handle it inside atomic_commit().

> 
>>
>> Thanks,
>>
>>>

 So my question is why below logic didn't handle such condition well?

 f2fs_gc()

if (has_not_enough_free_secs(sbi, sec_freed, 0)) {
if (skipped_round <= MAX_SKIP_GC_COUNT ||
skipped_round * 2 < round) {
segno = NULL_SEGNO;
goto gc_more;
}

if (first_skipped < last_skipped &&
(last_skipped - first_skipped) >
sbi->skipped_gc_rwsem) {
f2fs_drop_inmem_pages_all(sbi, true);
>>>
>>> This is doing nothing, since f2fs_commit_inmem_pages() removed the inode
>>> from inmem list.
>>>
segno = NULL_SEGNO;
goto gc_more;
}
if (gc_type == FG_GC && !is_sbi_flag_set(sbi, SBI_CP_DISABLED))
ret = f2fs_write_checkpoint(sbi, );
}

>   } else {
>   ret = f2fs_do_sync_file(filp, 

Re: [f2fs-dev] [PATCH 2/2] f2fs: avoid infinite GC loop due to stale atomic files

2019-09-09 Thread Jaegeuk Kim
On 09/09, Chao Yu wrote:
> On 2019/9/9 15:30, Jaegeuk Kim wrote:
> > On 09/09, Chao Yu wrote:
> >> On 2019/9/9 9:25, Jaegeuk Kim wrote:
> >>> If committing atomic pages is failed when doing f2fs_do_sync_file(), we 
> >>> can
> >>> get commited pages but atomic_file being still set like:
> >>>
> >>> - inmem:0, atomic IO:4 (Max.   10), volatile IO:0 (Max.0)
> >>>
> >>> If GC selects this block, we can get an infinite loop like this:
> >>>
> >>> f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, 
> >>> oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
> >>> f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, sector = 
> >>> 18533696, size = 4096
> >>> f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground GC, 
> >>> LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, 
> >>> pre_victim_secno = 4355, prefree = 0, free = 234
> >>> f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, 
> >>> i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
> >>> f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, 
> >>> oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
> >>> f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, sector = 
> >>> 18533696, size = 4096
> >>> f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground GC, 
> >>> LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, 
> >>> pre_victim_secno = 4355, prefree = 0, free = 234
> >>> f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, 
> >>> i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
> >>>
> >>> In that moment, we can observe:
> >>>
> >>> [Before]
> >>> Try to move 5084219 blocks (BG: 384508)
> >>>   - data blocks : 4962373 (274483)
> >>>   - node blocks : 121846 (110025)
> >>> Skipped : atomic write 4534686 (10)
> >>>
> >>> [After]
> >>> Try to move 5088973 blocks (BG: 384508)
> >>>   - data blocks : 4967127 (274483)
> >>>   - node blocks : 121846 (110025)
> >>> Skipped : atomic write 4539440 (10)
> >>>
> >>> Signed-off-by: Jaegeuk Kim 
> >>> ---
> >>>  fs/f2fs/file.c | 10 +-
> >>>  1 file changed, 5 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> >>> index 7ae2f3bd8c2f..68b6da734e5f 100644
> >>> --- a/fs/f2fs/file.c
> >>> +++ b/fs/f2fs/file.c
> >>> @@ -1997,11 +1997,11 @@ static int f2fs_ioc_commit_atomic_write(struct 
> >>> file *filp)
> >>>   goto err_out;
> >>>  
> >>>   ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 0, true);
> >>> - if (!ret) {
> >>> - clear_inode_flag(inode, FI_ATOMIC_FILE);
> >>> - F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
> >>> - stat_dec_atomic_write(inode);
> >>> - }
> >>> +
> >>> + /* doesn't need to check error */
> >>> + clear_inode_flag(inode, FI_ATOMIC_FILE);
> >>> + F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
> >>> + stat_dec_atomic_write(inode);
> >>
> >> If there are still valid atomic write pages linked in .inmem_pages, it may 
> >> cause
> >> memory leak when we just clear FI_ATOMIC_FILE flag.
> > 
> > f2fs_commit_inmem_pages() should have flushed them.
> 
> Oh, we failed to flush its nodes.
> 
> However we won't clear such info if we failed to flush inmen pages, it looks
> inconsistent.
> 
> Any interface needed to drop inmem pages or clear ATOMIC_FILE flag in that two
> error path? I'm not very clear how sqlite handle such error.

f2fs_drop_inmem_pages() did that, but not in this case.

> 
> Thanks,
> 
> > 
> >>
> >> So my question is why below logic didn't handle such condition well?
> >>
> >> f2fs_gc()
> >>
> >>if (has_not_enough_free_secs(sbi, sec_freed, 0)) {
> >>if (skipped_round <= MAX_SKIP_GC_COUNT ||
> >>skipped_round * 2 < round) {
> >>segno = NULL_SEGNO;
> >>goto gc_more;
> >>}
> >>
> >>if (first_skipped < last_skipped &&
> >>(last_skipped - first_skipped) >
> >>sbi->skipped_gc_rwsem) {
> >>f2fs_drop_inmem_pages_all(sbi, true);
> > 
> > This is doing nothing, since f2fs_commit_inmem_pages() removed the inode
> > from inmem list.
> > 
> >>segno = NULL_SEGNO;
> >>goto gc_more;
> >>}
> >>if (gc_type == FG_GC && !is_sbi_flag_set(sbi, SBI_CP_DISABLED))
> >>ret = f2fs_write_checkpoint(sbi, );
> >>}
> >>
> >>>   } else {
> >>>   ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 1, false);
> >>>   }
> >>>
> > .
> > 


Re: [PATCH] f2fs: fix to avoid accessing uninitialized field of inode page in is_alive()

2019-09-09 Thread Chao Yu
On 2019/9/9 15:44, Jaegeuk Kim wrote:
> On 09/07, Chao Yu wrote:
>> On 2019-9-7 7:48, Jaegeuk Kim wrote:
>>> On 09/06, Chao Yu wrote:
 If inode is newly created, inode page may not synchronize with inode cache,
 so fields like .i_inline or .i_extra_isize could be wrong, in below call
 path, we may access such wrong fields, result in failing to migrate valid
 target block.
>>>
>>> If data is valid, how can we get new inode page?
> 
> Let me rephrase the question. If inode is newly created, is this data block
> really valid to move in GC?

I guess it's valid, let double check that.

> 
>>
>> is_alive()
>> {
>> ...
>>  node_page = f2fs_get_node_page(sbi, nid);  <--- inode page
> 
> Aren't we seeing the below version warnings?
> 
> if (sum->version != dni->version) {
>   f2fs_warn(sbi, "%s: valid data with mismatched node version.",
>__func__);
> set_sbi_flag(sbi, SBI_NEED_FSCK);
> }
> 
>>
>>  source_blkaddr = datablock_addr(NULL, node_page, ofs_in_node);
> 
> So, we're getting this? Does this incur infinite loop in GC?
> 
> if (!test_and_set_bit(segno, SIT_I(sbi)->invalid_segmap)) {
>   f2fs_err(sbi, "mismatched blkaddr %u (source_blkaddr %u) in seg %u\n",
>   f2fs_bug_on(sbi, 1);
> }

Yes, I only get this with generic/269, rather than "valid data with mismatched
node version.".

With this patch, generic/269 won't panic again.

Thanks,

> 
>> ...
>> }
>>
>> datablock_addr()
>> {
>> ...
>>  base = offset_in_addr(_node->i);  <--- the base could be wrong here 
>> due to
>> accessing uninitialized .i_inline of raw_node->i.
>> ...
>> }
>>
>> Thanks,
>>
>>>

 - gc_data_segment
  - is_alive
   - datablock_addr
- offset_in_addr

 Fixes: 7a2af766af15 ("f2fs: enhance on-disk inode structure scalability")
 Signed-off-by: Chao Yu 
 ---
  fs/f2fs/dir.c | 3 +++
  1 file changed, 3 insertions(+)

 diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
 index 765f13354d3f..b1840852967e 100644
 --- a/fs/f2fs/dir.c
 +++ b/fs/f2fs/dir.c
 @@ -479,6 +479,9 @@ struct page *f2fs_init_inode_metadata(struct inode 
 *inode, struct inode *dir,
if (IS_ERR(page))
return page;
  
 +  /* synchronize inode page's data from inode cache */
 +  f2fs_update_inode(inode, page);
 +
if (S_ISDIR(inode->i_mode)) {
/* in order to handle error case */
get_page(page);
 -- 
 2.18.0.rc1
> .
> 


Re: [f2fs-dev] [PATCH 2/2] f2fs: avoid infinite GC loop due to stale atomic files

2019-09-09 Thread Chao Yu
On 2019/9/9 15:30, Jaegeuk Kim wrote:
> On 09/09, Chao Yu wrote:
>> On 2019/9/9 9:25, Jaegeuk Kim wrote:
>>> If committing atomic pages is failed when doing f2fs_do_sync_file(), we can
>>> get commited pages but atomic_file being still set like:
>>>
>>> - inmem:0, atomic IO:4 (Max.   10), volatile IO:0 (Max.0)
>>>
>>> If GC selects this block, we can get an infinite loop like this:
>>>
>>> f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, 
>>> oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
>>> f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, sector = 
>>> 18533696, size = 4096
>>> f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground GC, 
>>> LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, pre_victim_secno 
>>> = 4355, prefree = 0, free = 234
>>> f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, i_size 
>>> = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
>>> f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, 
>>> oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
>>> f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, sector = 
>>> 18533696, size = 4096
>>> f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground GC, 
>>> LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, pre_victim_secno 
>>> = 4355, prefree = 0, free = 234
>>> f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, i_size 
>>> = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
>>>
>>> In that moment, we can observe:
>>>
>>> [Before]
>>> Try to move 5084219 blocks (BG: 384508)
>>>   - data blocks : 4962373 (274483)
>>>   - node blocks : 121846 (110025)
>>> Skipped : atomic write 4534686 (10)
>>>
>>> [After]
>>> Try to move 5088973 blocks (BG: 384508)
>>>   - data blocks : 4967127 (274483)
>>>   - node blocks : 121846 (110025)
>>> Skipped : atomic write 4539440 (10)
>>>
>>> Signed-off-by: Jaegeuk Kim 
>>> ---
>>>  fs/f2fs/file.c | 10 +-
>>>  1 file changed, 5 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>>> index 7ae2f3bd8c2f..68b6da734e5f 100644
>>> --- a/fs/f2fs/file.c
>>> +++ b/fs/f2fs/file.c
>>> @@ -1997,11 +1997,11 @@ static int f2fs_ioc_commit_atomic_write(struct file 
>>> *filp)
>>> goto err_out;
>>>  
>>> ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 0, true);
>>> -   if (!ret) {
>>> -   clear_inode_flag(inode, FI_ATOMIC_FILE);
>>> -   F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
>>> -   stat_dec_atomic_write(inode);
>>> -   }
>>> +
>>> +   /* doesn't need to check error */
>>> +   clear_inode_flag(inode, FI_ATOMIC_FILE);
>>> +   F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
>>> +   stat_dec_atomic_write(inode);
>>
>> If there are still valid atomic write pages linked in .inmem_pages, it may 
>> cause
>> memory leak when we just clear FI_ATOMIC_FILE flag.
> 
> f2fs_commit_inmem_pages() should have flushed them.

Oh, we failed to flush its nodes.

However we won't clear such info if we failed to flush inmen pages, it looks
inconsistent.

Any interface needed to drop inmem pages or clear ATOMIC_FILE flag in that two
error path? I'm not very clear how sqlite handle such error.

Thanks,

> 
>>
>> So my question is why below logic didn't handle such condition well?
>>
>> f2fs_gc()
>>
>>  if (has_not_enough_free_secs(sbi, sec_freed, 0)) {
>>  if (skipped_round <= MAX_SKIP_GC_COUNT ||
>>  skipped_round * 2 < round) {
>>  segno = NULL_SEGNO;
>>  goto gc_more;
>>  }
>>
>>  if (first_skipped < last_skipped &&
>>  (last_skipped - first_skipped) >
>>  sbi->skipped_gc_rwsem) {
>>  f2fs_drop_inmem_pages_all(sbi, true);
> 
> This is doing nothing, since f2fs_commit_inmem_pages() removed the inode
> from inmem list.
> 
>>  segno = NULL_SEGNO;
>>  goto gc_more;
>>  }
>>  if (gc_type == FG_GC && !is_sbi_flag_set(sbi, SBI_CP_DISABLED))
>>  ret = f2fs_write_checkpoint(sbi, );
>>  }
>>
>>> } else {
>>> ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 1, false);
>>> }
>>>
> .
> 


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [PATCH] f2fs: fix to avoid accessing uninitialized field of inode page in is_alive()

2019-09-09 Thread Jaegeuk Kim
On 09/07, Chao Yu wrote:
> On 2019-9-7 7:48, Jaegeuk Kim wrote:
> > On 09/06, Chao Yu wrote:
> >> If inode is newly created, inode page may not synchronize with inode cache,
> >> so fields like .i_inline or .i_extra_isize could be wrong, in below call
> >> path, we may access such wrong fields, result in failing to migrate valid
> >> target block.
> > 
> > If data is valid, how can we get new inode page?

Let me rephrase the question. If inode is newly created, is this data block
really valid to move in GC?

> 
> is_alive()
> {
> ...
>   node_page = f2fs_get_node_page(sbi, nid);  <--- inode page

Aren't we seeing the below version warnings?

if (sum->version != dni->version) {
f2fs_warn(sbi, "%s: valid data with mismatched node version.",
   __func__);
set_sbi_flag(sbi, SBI_NEED_FSCK);
}

> 
>   source_blkaddr = datablock_addr(NULL, node_page, ofs_in_node);

So, we're getting this? Does this incur infinite loop in GC?

if (!test_and_set_bit(segno, SIT_I(sbi)->invalid_segmap)) {
f2fs_err(sbi, "mismatched blkaddr %u (source_blkaddr %u) in seg %u\n",
f2fs_bug_on(sbi, 1);
}

> ...
> }
> 
> datablock_addr()
> {
> ...
>   base = offset_in_addr(_node->i);  <--- the base could be wrong here 
> due to
> accessing uninitialized .i_inline of raw_node->i.
> ...
> }
> 
> Thanks,
> 
> > 
> >>
> >> - gc_data_segment
> >>  - is_alive
> >>   - datablock_addr
> >>- offset_in_addr
> >>
> >> Fixes: 7a2af766af15 ("f2fs: enhance on-disk inode structure scalability")
> >> Signed-off-by: Chao Yu 
> >> ---
> >>  fs/f2fs/dir.c | 3 +++
> >>  1 file changed, 3 insertions(+)
> >>
> >> diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
> >> index 765f13354d3f..b1840852967e 100644
> >> --- a/fs/f2fs/dir.c
> >> +++ b/fs/f2fs/dir.c
> >> @@ -479,6 +479,9 @@ struct page *f2fs_init_inode_metadata(struct inode 
> >> *inode, struct inode *dir,
> >>if (IS_ERR(page))
> >>return page;
> >>  
> >> +  /* synchronize inode page's data from inode cache */
> >> +  f2fs_update_inode(inode, page);
> >> +
> >>if (S_ISDIR(inode->i_mode)) {
> >>/* in order to handle error case */
> >>get_page(page);
> >> -- 
> >> 2.18.0.rc1


Re: [f2fs-dev] [PATCH 2/2] f2fs: avoid infinite GC loop due to stale atomic files

2019-09-09 Thread Jaegeuk Kim
On 09/09, Chao Yu wrote:
> On 2019/9/9 9:25, Jaegeuk Kim wrote:
> > If committing atomic pages is failed when doing f2fs_do_sync_file(), we can
> > get commited pages but atomic_file being still set like:
> > 
> > - inmem:0, atomic IO:4 (Max.   10), volatile IO:0 (Max.0)
> > 
> > If GC selects this block, we can get an infinite loop like this:
> > 
> > f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, 
> > oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
> > f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, sector = 
> > 18533696, size = 4096
> > f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground GC, 
> > LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, pre_victim_secno 
> > = 4355, prefree = 0, free = 234
> > f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, i_size 
> > = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
> > f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, 
> > oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
> > f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, sector = 
> > 18533696, size = 4096
> > f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground GC, 
> > LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, pre_victim_secno 
> > = 4355, prefree = 0, free = 234
> > f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, i_size 
> > = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
> > 
> > In that moment, we can observe:
> > 
> > [Before]
> > Try to move 5084219 blocks (BG: 384508)
> >   - data blocks : 4962373 (274483)
> >   - node blocks : 121846 (110025)
> > Skipped : atomic write 4534686 (10)
> > 
> > [After]
> > Try to move 5088973 blocks (BG: 384508)
> >   - data blocks : 4967127 (274483)
> >   - node blocks : 121846 (110025)
> > Skipped : atomic write 4539440 (10)
> > 
> > Signed-off-by: Jaegeuk Kim 
> > ---
> >  fs/f2fs/file.c | 10 +-
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> > 
> > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > index 7ae2f3bd8c2f..68b6da734e5f 100644
> > --- a/fs/f2fs/file.c
> > +++ b/fs/f2fs/file.c
> > @@ -1997,11 +1997,11 @@ static int f2fs_ioc_commit_atomic_write(struct file 
> > *filp)
> > goto err_out;
> >  
> > ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 0, true);
> > -   if (!ret) {
> > -   clear_inode_flag(inode, FI_ATOMIC_FILE);
> > -   F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
> > -   stat_dec_atomic_write(inode);
> > -   }
> > +
> > +   /* doesn't need to check error */
> > +   clear_inode_flag(inode, FI_ATOMIC_FILE);
> > +   F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
> > +   stat_dec_atomic_write(inode);
> 
> If there are still valid atomic write pages linked in .inmem_pages, it may 
> cause
> memory leak when we just clear FI_ATOMIC_FILE flag.

f2fs_commit_inmem_pages() should have flushed them.

> 
> So my question is why below logic didn't handle such condition well?
> 
> f2fs_gc()
> 
>   if (has_not_enough_free_secs(sbi, sec_freed, 0)) {
>   if (skipped_round <= MAX_SKIP_GC_COUNT ||
>   skipped_round * 2 < round) {
>   segno = NULL_SEGNO;
>   goto gc_more;
>   }
> 
>   if (first_skipped < last_skipped &&
>   (last_skipped - first_skipped) >
>   sbi->skipped_gc_rwsem) {
>   f2fs_drop_inmem_pages_all(sbi, true);

This is doing nothing, since f2fs_commit_inmem_pages() removed the inode
from inmem list.

>   segno = NULL_SEGNO;
>   goto gc_more;
>   }
>   if (gc_type == FG_GC && !is_sbi_flag_set(sbi, SBI_CP_DISABLED))
>   ret = f2fs_write_checkpoint(sbi, );
>   }
> 
> > } else {
> > ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 1, false);
> > }
> >