RE: [f2fs-dev] [PATCH] f2fs: sepearte hot/cold in free nid

2018-04-19 Thread heyunlei


>-Original Message-
>From: Chao Yu [mailto:yuch...@huawei.com]
>Sent: Friday, April 20, 2018 9:53 AM
>To: jaeg...@kernel.org
>Cc: linux-kernel@vger.kernel.org; linux-f2fs-de...@lists.sourceforge.net
>Subject: [f2fs-dev] [PATCH] f2fs: sepearte hot/cold in free nid
>
>As most indirect node, dindirect node, and xattr node won't be updated
>after they are created, but inode node and other direct node will change
>more frequently, so store their nat entries mixedly in whole nat table
>will suffer:
>- fragment nat table soon due to different update rate
>- more nat block update due to fragmented nat table
>

BTW, should we enable this patch:  f2fs: reuse nids more aggressively?

I think it will decrease nat area fragment and will decrease io of nat?

>In order to solve above issue, we're trying to separate whole nat table to
>two part:
>a. Hot free nid area:
> - range: [nid #0, nid #x)
> - store node block address for
>   * inode node
>   * other direct node
>b. Cold free nid area:
> - range: [nid #x, max nid)
> - store node block address for
>   * indirect node
>   * dindirect node
>   * xattr node
>
>Allocation strategy example:
>
>Free nid: '-'
>Used nid: '='
>
>1. Initial status:
>Free Nids: 
>|---|
>   ^   ^   ^   
> ^
>Alloc Range:   |---|   
>|---|
>   hot_start   hot_end 
> cold_start  cold_end
>
>2. Free nids have ran out:
>Free Nids: 
>|===-===|
>   ^   ^   ^   
> ^
>Alloc Range:   |===|   
>|===|
>   hot_start   hot_end 
> cold_start  cold_end
>
>3. Expand hot/cold area range:
>Free Nids: 
>|===-===|
>   ^   ^   ^   
> ^
>Alloc Range:   |===|   
>|===|
>   hot_start   hot_end cold_start  
> cold_end
>
>4. Hot free nids have ran out:
>Free Nids: 
>|===-===|
>   ^   ^   ^   
> ^
>Alloc Range:   |===|   
>|===|
>   hot_start   hot_end cold_start  
> cold_end
>
>5. Expand hot area range, hot/cold area boundary has been fixed:
>Free Nids: 
>|===-===|
>   ^   ^   
> ^
>Alloc Range:   
>|===|===|
>   hot_start   hot_end(cold_start) 
> cold_end
>
>Run xfstests with generic/*:
>
>before
>node_write:169660
>cp_count:  60118
>node/cp2.82
>
>after:
>node_write:159145
>cp_count:  84501
>node/cp:   2.64
>
>Signed-off-by: Chao Yu 
>---
> fs/f2fs/checkpoint.c |   4 -
> fs/f2fs/debug.c  |   6 +-
> fs/f2fs/f2fs.h   |  19 +++-
> fs/f2fs/inode.c  |   2 +-
> fs/f2fs/namei.c  |   2 +-
> fs/f2fs/node.c   | 302 ---
> fs/f2fs/node.h   |  17 +--
> fs/f2fs/segment.c|   8 +-
> fs/f2fs/shrinker.c   |   3 +-
> fs/f2fs/xattr.c  |  10 +-
> 10 files changed, 221 insertions(+), 152 deletions(-)
>
>diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>index 96785ffc6181..c17feec72c74 100644
>--- a/fs/f2fs/checkpoint.c
>+++ b/fs/f2fs/checkpoint.c
>@@ -1029,14 +1029,10 @@ int f2fs_sync_inode_meta(struct f2fs_sb_info *sbi)
> static void __prepare_cp_block(struct f2fs_sb_info *sbi)
> {
>   struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
>-  struct f2fs_nm_info *nm_i = NM_I(sbi);
>-  nid_t last_nid = nm_i->next_scan_nid;
>
>-  next_free_nid(sbi, &last_nid);
>   ckpt->valid_block_count = cpu_to_le64(valid_user_blocks(sbi));
>   ckpt->valid_node_count = cpu_to_le32(valid_node_count(sbi));
>   ckpt->valid_inode_count = cpu_to_le32(valid_inode_count(sbi));
>-  ckpt->next_free_nid = cpu_to_le32(last_nid);
> }
>
> /*
>diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
>index 7bb036a3bb81..b13c1d4f110f 100644
>--- a/fs/f2fs/debug.c
>+++ b/fs/f2fs/debug.c
>@@ -100,7 +100,8 @@ static void update_general_status(struct f2fs_sb_info *sbi)
>   si->dirty_nats = NM_I(sbi)->dirty_nat_cnt;
>   si->sits = MAIN_SEGS(sbi);
>   si->dirty_sits = SIT

RE: [PATCH] f2fs: set_code_data in move_data_block

2018-02-10 Thread heyunlei


>-Original Message-
>From: Songyunlong (Euler)
>Sent: Sunday, February 11, 2018 11:35 AM
>To: jaeg...@kernel.org; c...@kernel.org; Yuchao (T); yunlong.s...@icloud.com
>Cc: miaoxie (A); Wangbintian; shengyong (A); heyunlei; 
>linux-f2fs-de...@lists.sourceforge.net; linux-kernel@vger.kernel.org
>Subject: Re: [PATCH] f2fs: set_code_data in move_data_block
>
>Ping...
>
>move_data_block misses set_cold_data, then the F2FS_WB_CP_DATA will
>lack these data pages in move_data_block, and write_checkpoint can
>not make sure this pages committed to the flash.
>

Here, we write encrypted_page back, which belong to meta mapping.

Thanks.


>On 2018/2/8 20:33, Yunlong Song wrote:
>> Signed-off-by: Yunlong Song 
>> ---
>>   fs/f2fs/gc.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>> index b9d93fd..2095630 100644
>> --- a/fs/f2fs/gc.c
>> +++ b/fs/f2fs/gc.c
>> @@ -692,6 +692,7 @@ static void move_data_block(struct inode *inode, block_t 
>> bidx,
>>  fio.op = REQ_OP_WRITE;
>>  fio.op_flags = REQ_SYNC;
>>  fio.new_blkaddr = newaddr;
>> +set_cold_data(fio.page);
>>  err = f2fs_submit_page_write(&fio);
>>  if (err) {
>>  if (PageWriteback(fio.encrypted_page))
>>
>
>--
>Thanks,
>Yunlong Song



Re: [f2fs-dev] [PATCH 2/2] f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE

2017-03-30 Thread heyunlei

Hi Jaegeuk,

Can we split in place update bios into single sbi->f2fs_bio_info for more page
merged in out place update? This case can be show as below:

in place update submit a bio with one page
out place update submit a bio with one page
in place update submit a bio with one page
out place update submit a bio with one page
... ...

just like WB_SYNC_ALL and WB_SYNC_NONE case.

Thanks.

On 2017/3/30 4:48, Jaegeuk Kim wrote:

If two threads try to flush dirty pages in different inodes respectively,
f2fs_write_data_pages() will produce WRITE and WRITE_SYNC one at a time,
resulting in a lot of 4KB seperated IOs.

So, this patch gives higher priority to WB_SYNC_ALL IOs and gathers write
IOs with a big WRITE_SYNC'ed bio.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/data.c  | 15 +--
 fs/f2fs/f2fs.h  |  3 +++
 fs/f2fs/super.c |  2 ++
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 8f36080b47c4..b1cac6d85bcb 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1605,8 +1605,10 @@ static int f2fs_write_cache_pages(struct address_space 
*mapping,
last_idx = page->index;
}

-   if (--wbc->nr_to_write <= 0 &&
-   wbc->sync_mode == WB_SYNC_NONE) {
+   /* give a priority to WB_SYNC threads */
+   if ((atomic_read(&F2FS_M_SB(mapping)->wb_sync_req) ||
+   --wbc->nr_to_write <= 0) &&
+   wbc->sync_mode == WB_SYNC_NONE) {
done = 1;
break;
}
@@ -1662,9 +1664,18 @@ static int f2fs_write_data_pages(struct address_space 
*mapping,

trace_f2fs_writepages(mapping->host, wbc, DATA);

+   /* to avoid spliting IOs due to mixed WB_SYNC_ALL and WB_SYNC_NONE */
+   if (wbc->sync_mode == WB_SYNC_ALL)
+   atomic_inc(&sbi->wb_sync_req);
+   else if (atomic_read(&sbi->wb_sync_req))
+   goto skip_write;
+
blk_start_plug(&plug);
ret = f2fs_write_cache_pages(mapping, wbc);
blk_finish_plug(&plug);
+
+   if (wbc->sync_mode == WB_SYNC_ALL)
+   atomic_dec(&sbi->wb_sync_req);
/*
 * if some pages were truncated, we cannot guarantee its mapping->host
 * to detect pending bios.
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 32d6f674c114..fd39db681226 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -888,6 +888,9 @@ struct f2fs_sb_info {
/* # of allocated blocks */
struct percpu_counter alloc_valid_block_count;

+   /* writeback control */
+   atomic_t wb_sync_req;   /* count # of WB_SYNC threads */
+
/* valid inode count */
struct percpu_counter total_valid_inode_count;

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 2d78f3c76d18..cb65e6d0d275 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1566,6 +1566,8 @@ static void init_sb_info(struct f2fs_sb_info *sbi)
for (i = 0; i < NR_COUNT_TYPE; i++)
atomic_set(&sbi->nr_pages[i], 0);

+   atomic_set(&sbi->wb_sync_req, 0);
+
INIT_LIST_HEAD(&sbi->s_list);
mutex_init(&sbi->umount_mutex);
mutex_init(&sbi->wio_mutex[NODE]);





Re: [f2fs-dev] [PATCH 2/2] f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE

2017-03-30 Thread heyunlei

Hi Jaegeuk,

I try this patch and find it can fix below case:

   kworker/u16:3-423   [002]    183.812347: submit_bio: kworker/u16:3(423): 
WRITE block 104749352 on mmcblk0p50 (8 sectors)
 fio-2122  [003]    183.812380: submit_bio: fio(2122): WRITE 
block 104749360 on mmcblk0p50 (24 sectors)
   kworker/u16:3-423   [002]    183.812388: submit_bio: kworker/u16:3(423): 
WRITE block 104749384 on mmcblk0p50 (8 sectors)
 fio-2122  [003]    183.812403: submit_bio: fio(2122): WRITE 
block 104749392 on mmcblk0p50 (8 sectors)
   kworker/u16:3-423   [002]    183.812404: submit_bio: kworker/u16:3(423): 
WRITE block 104749400 on mmcblk0p50 (8 sectors)
 fio-2122  [003]    183.812427: submit_bio: fio(2122): WRITE 
block 104749408 on mmcblk0p50 (16 sectors)
   kworker/u16:3-423   [002]    183.812429: submit_bio: kworker/u16:3(423): 
WRITE block 104749424 on mmcblk0p50 (8 sectors)
 fio-2122  [003]    183.812450: submit_bio: fio(2122): WRITE 
block 104749432 on mmcblk0p50 (16 sectors)
   kworker/u16:3-423   [002]    183.812455: submit_bio: kworker/u16:3(423): 
WRITE block 104749448 on mmcblk0p50 (8 sectors)
 fio-2122  [003]    183.812470: submit_bio: fio(2122): WRITE 
block 104749456 on mmcblk0p50 (8 sectors)
   kworker/u16:3-423   [002]    183.812476: submit_bio: kworker/u16:3(423): 
WRITE block 104749464 on mmcblk0p50 (8 sectors)
 fio-2122  [003]    183.812492: submit_bio: fio(2122): WRITE 
block 104749472 on mmcblk0p50 (16 sectors)
   kworker/u16:3-423   [002]    183.812497: submit_bio: kworker/u16:3(423): 
WRITE block 104749488 on mmcblk0p50 (8 sectors)
 fio-2122  [003]    183.812512: submit_bio: fio(2122): WRITE 
block 104749496 on mmcblk0p50 (8 sectors)
   kworker/u16:3-423   [002]    183.812514: submit_bio: kworker/u16:3(423): 
WRITE block 104749504 on mmcblk0p50 (8 sectors)
 fio-2122  [003]    183.812532: submit_bio: fio(2122): WRITE 
block 104749512 on mmcblk0p50 (16 sectors)

   ... ...

Thanks.

On 2017/3/30 4:48, Jaegeuk Kim wrote:

If two threads try to flush dirty pages in different inodes respectively,
f2fs_write_data_pages() will produce WRITE and WRITE_SYNC one at a time,
resulting in a lot of 4KB seperated IOs.

So, this patch gives higher priority to WB_SYNC_ALL IOs and gathers write
IOs with a big WRITE_SYNC'ed bio.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/data.c  | 15 +--
 fs/f2fs/f2fs.h  |  3 +++
 fs/f2fs/super.c |  2 ++
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 8f36080b47c4..b1cac6d85bcb 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1605,8 +1605,10 @@ static int f2fs_write_cache_pages(struct address_space 
*mapping,
last_idx = page->index;
}

-   if (--wbc->nr_to_write <= 0 &&
-   wbc->sync_mode == WB_SYNC_NONE) {
+   /* give a priority to WB_SYNC threads */
+   if ((atomic_read(&F2FS_M_SB(mapping)->wb_sync_req) ||
+   --wbc->nr_to_write <= 0) &&
+   wbc->sync_mode == WB_SYNC_NONE) {
done = 1;
break;
}
@@ -1662,9 +1664,18 @@ static int f2fs_write_data_pages(struct address_space 
*mapping,

trace_f2fs_writepages(mapping->host, wbc, DATA);

+   /* to avoid spliting IOs due to mixed WB_SYNC_ALL and WB_SYNC_NONE */
+   if (wbc->sync_mode == WB_SYNC_ALL)
+   atomic_inc(&sbi->wb_sync_req);
+   else if (atomic_read(&sbi->wb_sync_req))
+   goto skip_write;
+
blk_start_plug(&plug);
ret = f2fs_write_cache_pages(mapping, wbc);
blk_finish_plug(&plug);
+
+   if (wbc->sync_mode == WB_SYNC_ALL)
+   atomic_dec(&sbi->wb_sync_req);
/*
 * if some pages were truncated, we cannot guarantee its mapping->host
 * to detect pending bios.
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 32d6f674c114..fd39db681226 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -888,6 +888,9 @@ struct f2fs_sb_info {
/* # of allocated blocks */
struct percpu_counter alloc_valid_block_count;

+   /* writeback control */
+   atomic_t wb_sync_req;   /* count # of WB_SYNC threads */
+
/* valid inode count */
struct percpu_counter total_valid_inode_count;

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 2d78f3c76d18..cb65e6d0d275 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1566,6 +1566,8 @@ static void init_sb_info(struct f2fs_sb_info *sbi)
for (i = 0; i < NR_COUNT_TYPE; i++)
atomic_set(&sbi->nr_pages[i], 0);

+   atomic_set(&sbi->wb_sync_req, 0);
+
INIT_LIST_HEAD(&sbi->s_list);
mutex_init(&sb

Re: [f2fs-dev] [PATCH 1/2] f2fs: write small sized IO to hot log

2017-03-30 Thread heyunlei

Hi Jaegeuk,

On 2017/3/30 4:48, Jaegeuk Kim wrote:

It would better split small and large IOs separately in order to get more
consecutive big writes.

The default threshold is set to 64KB, but configurable by sysfs/min_hot_blocks.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/data.c|  9 +
 fs/f2fs/f2fs.h|  2 ++
 fs/f2fs/segment.c | 13 ++---
 fs/f2fs/segment.h |  1 +
 fs/f2fs/super.c   |  2 ++
 5 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 090413236b27..8f36080b47c4 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1432,6 +1432,8 @@ static int __write_data_page(struct page *page, bool 
*submitted,
need_balance_fs = true;
else if (has_not_enough_free_secs(sbi, 0, 0))
goto redirty_out;
+   else
+   set_inode_flag(inode, FI_HOT_DATA);


Why here we need this, can you explain more about this?

Thanks.



err = -EAGAIN;
if (f2fs_has_inline_data(inode)) {
@@ -1457,6 +1459,7 @@ static int __write_data_page(struct page *page, bool 
*submitted,
if (wbc->for_reclaim) {
f2fs_submit_merged_bio_cond(sbi, inode, 0, page->index,
DATA, WRITE);
+   clear_inode_flag(inode, FI_HOT_DATA);
remove_dirty_inode(inode);
submitted = NULL;
}
@@ -1511,6 +1514,12 @@ static int f2fs_write_cache_pages(struct address_space 
*mapping,

pagevec_init(&pvec, 0);

+   if (get_dirty_pages(mapping->host) <=
+   SM_I(F2FS_M_SB(mapping))->min_hot_blocks)
+   set_inode_flag(mapping->host, FI_HOT_DATA);
+   else
+   clear_inode_flag(mapping->host, FI_HOT_DATA);
+
if (wbc->range_cyclic) {
writeback_index = mapping->writeback_index; /* prev offset */
index = writeback_index;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 5a49518ee786..32d6f674c114 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -678,6 +678,7 @@ struct f2fs_sm_info {
unsigned int ipu_policy;/* in-place-update policy */
unsigned int min_ipu_util;  /* in-place-update threshold */
unsigned int min_fsync_blocks;  /* threshold for fsync */
+   unsigned int min_hot_blocks;/* threshold for hot block allocation */

/* for flush command control */
struct flush_cmd_control *fcc_info;
@@ -1717,6 +1718,7 @@ enum {
FI_DO_DEFRAG,   /* indicate defragment is running */
FI_DIRTY_FILE,  /* indicate regular/symlink has dirty pages */
FI_NO_PREALLOC, /* indicate skipped preallocated blocks */
+   FI_HOT_DATA,/* indicate file is hot */
 };

 static inline void __mark_inode_dirty_flag(struct inode *inode,
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index b5b2a4745328..bff3f3bc7827 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -1841,18 +1841,16 @@ static int __get_segment_type_6(struct page *page, enum 
page_type p_type)
if (p_type == DATA) {
struct inode *inode = page->mapping->host;

-   if (S_ISDIR(inode->i_mode))
-   return CURSEG_HOT_DATA;
-   else if (is_cold_data(page) || file_is_cold(inode))
+   if (is_cold_data(page) || file_is_cold(inode))
return CURSEG_COLD_DATA;
-   else
-   return CURSEG_WARM_DATA;
+   if (is_inode_flag_set(inode, FI_HOT_DATA))
+   return CURSEG_HOT_DATA;
+   return CURSEG_WARM_DATA;
} else {
if (IS_DNODE(page))
return is_cold_node(page) ? CURSEG_WARM_NODE :
CURSEG_HOT_NODE;
-   else
-   return CURSEG_COLD_NODE;
+   return CURSEG_COLD_NODE;
}
 }

@@ -2959,6 +2957,7 @@ int build_segment_manager(struct f2fs_sb_info *sbi)
sm_info->ipu_policy = 1 << F2FS_IPU_FSYNC;
sm_info->min_ipu_util = DEF_MIN_IPU_UTIL;
sm_info->min_fsync_blocks = DEF_MIN_FSYNC_BLOCKS;
+   sm_info->min_hot_blocks = DEF_MIN_HOT_BLOCKS;

sm_info->trim_sections = DEF_BATCHED_TRIM_SECTIONS;

diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index 31846b0fcb95..57e36c1ce7bd 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -540,6 +540,7 @@ static inline int utilization(struct f2fs_sb_info *sbi)
  */
 #define DEF_MIN_IPU_UTIL   70
 #define DEF_MIN_FSYNC_BLOCKS   8
+#define DEF_MIN_HOT_BLOCKS 16

 enum {
F2FS_IPU_FORCE,
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index b4c5c6298698..2d78f3c76d18 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -296,6 +296,7 @@ F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, batched_trim_sections, 
trim_sections);
 F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, ipu_policy, ipu_policy);
 

Re: [f2fs-dev] [PATCH 1/5] f2fs: relax node version check for victim data in gc

2017-03-29 Thread heyunlei

Hi all,

On 2017/3/26 5:27, Jaegeuk Kim wrote:

On 03/25, Chao Yu wrote:

Hi Jaegeuk,

On 2017/3/25 15:59, Jaegeuk Kim wrote:

- has_not_enough_free_secs
node_secs: 0  dent_secs: 0  freed:0  free_segments:103  reserved:104

  - f2fs_gc
 - get_victim_by_default
alloc_mode 0, gc_mode 1, max_search 2672, offset 4654, ofs_unit 1

- do_garbage_collect
start_segno 3976, end_segno 3977   type 0

  - is_alive
nid 22797, blkaddr 2131882, ofs_in_node 0, version 0x8/0x0

   - gc_data_segment 766, segno 3976, block 512/426 not alive

So, this patch fixes subtle corrupted case where node version does not match
to summary version which results in infinite loop by gc.

Reported-by: Yunlei He 
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/gc.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 939be88a8833..bbeee41aaf73 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -551,8 +551,10 @@ static bool is_alive(struct f2fs_sb_info *sbi, struct 
f2fs_summary *sum,
get_node_info(sbi, nid, dni);

if (sum->version != dni->version) {


If the node was been truncated, we will increase its version number, since it
was been truncated, so it will never be writebacked to storage, so the version
in summary will not be updated.




The same problem I came across with a node segment:

481 get_node_info(sbi, nid, &ni);
482 if (ni.blk_addr != start_addr + off) {
483 f2fs_put_page(node_page, 1);
484 continue;
485 }

Here, get victim method always selected segno 5169 for garbage collection,

but this section gc failed for upper condition:

gc_node_segment 494, blk_addr 1697572,start_addr 2668544,off 200

I think is same problem with is_alive function.

Thanks.



That's covered by node page lock, so we shouldn't be reached out to this point.
Let's think more about this.

Thanks,


So this case can happen, shouldn't we just set SBI_NEED_FSCK for the case:
sum->version != dni->version - 1

Thanks,


-   f2fs_put_page(node_page, 1);
-   return false;
+   f2fs_msg(sbi->sb, KERN_WARNING,
+   "%s: valid data with mismatched node version.",
+   __func__);
+   set_sbi_flag(sbi, SBI_NEED_FSCK);
}

*nofs = ofs_of_node(node_page);



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
linux-f2fs-de...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

.





Re: [f2fs-dev] [PATCH 6/6] f2fs: show # of on-going flush and discard bios

2017-01-13 Thread heyunlei

Hi Jaegeuk,

On 2017/1/13 6:44, Jaegeuk Kim wrote:

This patch adds stat information for flush and discard commands.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/debug.c   | 7 +--
 fs/f2fs/f2fs.h| 3 ++-
 fs/f2fs/segment.c | 4 
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index f9f6b0aeba02..e67cab2fafac 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -54,6 +54,8 @@ static void update_general_status(struct f2fs_sb_info *sbi)
si->max_aw_cnt = atomic_read(&sbi->max_aw_cnt);
si->nr_wb_cp_data = get_pages(sbi, F2FS_WB_CP_DATA);
si->nr_wb_data = get_pages(sbi, F2FS_WB_DATA);
+   si->nr_flush = atomic_read(&SM_I(sbi)->fcc_info->submit_flush);
+   si->nr_discard = atomic_read(&SM_I(sbi)->dcc_info->submit_discard);
si->total_count = (int)sbi->user_block_count / sbi->blocks_per_seg;
si->rsvd_segs = reserved_segments(sbi);
si->overp_segs = overprovision_segments(sbi);
@@ -318,8 +320,9 @@ static int stat_show(struct seq_file *s, void *v)
seq_printf(s, "  - Inner Struct Count: tree: %d(%d), node: 
%d\n",
si->ext_tree, si->zombie_tree, si->ext_node);
seq_puts(s, "\nBalancing F2FS Async:\n");
-   seq_printf(s, "  - IO (CP: %4d, Data: %4d)\n",
-  si->nr_wb_cp_data, si->nr_wb_data);
+   seq_printf(s, "  - IO (CP: %4d, Data: %4d, Flush: %4d, Discard: 
%4d)\n",
+  si->nr_wb_cp_data, si->nr_wb_data,
+  si->nr_flush, si->nr_discard);
seq_printf(s, "  - inmem: %4d, atomic IO: %4d (Max. %4d)\n",
   si->inmem_pages, si->aw_cnt, si->max_aw_cnt);
seq_printf(s, "  - nodes: %4d in %4d\n",
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index ef5e3709c161..d51bc18e7292 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -201,6 +201,7 @@ struct discard_cmd_control {
wait_queue_head_t discard_wait_queue;   /* waiting queue for wake-up */
struct mutex cmd_lock;
int max_discards;   /* max. discards to be issued */
+   atomic_t submit_discard;/* # of issued discard */
 };

 /* for the list of fsync inodes, used only during recovery */
@@ -2254,7 +2255,7 @@ struct f2fs_stat_info {
unsigned int ndirty_dirs, ndirty_files, ndirty_all;
int nats, dirty_nats, sits, dirty_sits, free_nids, alloc_nids;
int total_count, utilization;
-   int bg_gc, nr_wb_cp_data, nr_wb_data;
+   int bg_gc, nr_wb_cp_data, nr_wb_data, nr_flush, nr_discard;
int inline_xattr, inline_inode, inline_dir, orphans;
int aw_cnt, max_aw_cnt;
unsigned int valid_count, valid_node_count, valid_inode_count, 
discard_blks;
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 74364006bfa6..5a2d380182d9 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -653,6 +653,8 @@ static void __remove_discard_cmd(struct f2fs_sb_info *sbi, 
struct discard_cmd *d
 {
int err = dc->bio->bi_error;

+   atomic_dec(&(SM_I(sbi)->dcc_info->submit_discard));

Here submit_discard decrease will include discard command without bio submit.

Thanks.

+
if (err == -EOPNOTSUPP)
err = 0;

@@ -678,6 +680,7 @@ void f2fs_wait_discard_bio(struct f2fs_sb_info *sbi, 
block_t blkaddr)
if (dc->state == D_PREP) {
dc->state = D_SUBMIT;
submit_bio(dc->bio);
+   atomic_inc(&dcc->submit_discard);
}
wait_for_completion_io(&dc->wait);

@@ -723,6 +726,7 @@ static int issue_discard_thread(void *data)
if (dc->state == D_PREP) {
dc->state = D_SUBMIT;
submit_bio(dc->bio);
+   atomic_inc(&dcc->submit_discard);
if (iter++ > DISCARD_ISSUE_RATE)
break;
} else if (dc->state == D_DONE) {





Re: [f2fs-dev] [PATCH 6/6] f2fs: show # of on-going flush and discard bios

2017-01-13 Thread heyunlei

Hi Jaegeuk Kim,

On 2017/1/13 6:44, Jaegeuk Kim wrote:

This patch adds stat information for flush and discard commands.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/debug.c   | 7 +--
 fs/f2fs/f2fs.h| 3 ++-
 fs/f2fs/segment.c | 4 
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index f9f6b0aeba02..e67cab2fafac 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -54,6 +54,8 @@ static void update_general_status(struct f2fs_sb_info *sbi)
si->max_aw_cnt = atomic_read(&sbi->max_aw_cnt);
si->nr_wb_cp_data = get_pages(sbi, F2FS_WB_CP_DATA);
si->nr_wb_data = get_pages(sbi, F2FS_WB_DATA);
+   si->nr_flush = atomic_read(&SM_I(sbi)->fcc_info->submit_flush);
+   si->nr_discard = atomic_read(&SM_I(sbi)->dcc_info->submit_discard);
si->total_count = (int)sbi->user_block_count / sbi->blocks_per_seg;
si->rsvd_segs = reserved_segments(sbi);
si->overp_segs = overprovision_segments(sbi);
@@ -318,8 +320,9 @@ static int stat_show(struct seq_file *s, void *v)
seq_printf(s, "  - Inner Struct Count: tree: %d(%d), node: 
%d\n",
si->ext_tree, si->zombie_tree, si->ext_node);
seq_puts(s, "\nBalancing F2FS Async:\n");
-   seq_printf(s, "  - IO (CP: %4d, Data: %4d)\n",
-  si->nr_wb_cp_data, si->nr_wb_data);
+   seq_printf(s, "  - IO (CP: %4d, Data: %4d, Flush: %4d, Discard: 
%4d)\n",
+  si->nr_wb_cp_data, si->nr_wb_data,
+  si->nr_flush, si->nr_discard);
seq_printf(s, "  - inmem: %4d, atomic IO: %4d (Max. %4d)\n",
   si->inmem_pages, si->aw_cnt, si->max_aw_cnt);
seq_printf(s, "  - nodes: %4d in %4d\n",
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index ef5e3709c161..d51bc18e7292 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -201,6 +201,7 @@ struct discard_cmd_control {
wait_queue_head_t discard_wait_queue;   /* waiting queue for wake-up */
struct mutex cmd_lock;
int max_discards;   /* max. discards to be issued */
+   atomic_t submit_discard;/* # of issued discard */
 };


Here, we need to initialize submit_discard.

Thanks.


 /* for the list of fsync inodes, used only during recovery */
@@ -2254,7 +2255,7 @@ struct f2fs_stat_info {
unsigned int ndirty_dirs, ndirty_files, ndirty_all;
int nats, dirty_nats, sits, dirty_sits, free_nids, alloc_nids;
int total_count, utilization;
-   int bg_gc, nr_wb_cp_data, nr_wb_data;
+   int bg_gc, nr_wb_cp_data, nr_wb_data, nr_flush, nr_discard;
int inline_xattr, inline_inode, inline_dir, orphans;
int aw_cnt, max_aw_cnt;
unsigned int valid_count, valid_node_count, valid_inode_count, 
discard_blks;
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 74364006bfa6..5a2d380182d9 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -653,6 +653,8 @@ static void __remove_discard_cmd(struct f2fs_sb_info *sbi, 
struct discard_cmd *d
 {
int err = dc->bio->bi_error;

+   atomic_dec(&(SM_I(sbi)->dcc_info->submit_discard));
+
if (err == -EOPNOTSUPP)
err = 0;

@@ -678,6 +680,7 @@ void f2fs_wait_discard_bio(struct f2fs_sb_info *sbi, 
block_t blkaddr)
if (dc->state == D_PREP) {
dc->state = D_SUBMIT;
submit_bio(dc->bio);
+   atomic_inc(&dcc->submit_discard);
}
wait_for_completion_io(&dc->wait);

@@ -723,6 +726,7 @@ static int issue_discard_thread(void *data)
if (dc->state == D_PREP) {
dc->state = D_SUBMIT;
submit_bio(dc->bio);
+   atomic_inc(&dcc->submit_discard);
if (iter++ > DISCARD_ISSUE_RATE)
break;
} else if (dc->state == D_DONE) {





Re: [f2fs-dev] [PATCH] f2fs: fix to account total free nid correctly

2016-11-17 Thread heyunlei



On 2016/11/15 4:45, Jaegeuk Kim wrote:

On Mon, Nov 14, 2016 at 07:24:56PM +0800, Chao Yu wrote:

Thread AThread BThread C
- f2fs_create
 - f2fs_new_inode
  - f2fs_lock_op
   - alloc_nid
alloc last nid
  - f2fs_unlock_op
- f2fs_create
 - f2fs_new_inode
  - f2fs_lock_op
   - alloc_nid
as node count still not
be increased, we will
loop in alloc_nid
- f2fs_write_node_pages
 - f2fs_balance_fs_bg
  - f2fs_sync_fs
   - write_checkpoint
- block_operations
 - f2fs_lock_all
 - f2fs_lock_op

While creating new inode, we do not allocate and account nid atomically,
so that when there is almost no free nids left, we may encounter deadloop
like above stack.

In order to avoid that, add nm_i::free_nid_cnt for accounting free nids
and do nid allocation atomically during node creation.


How about using nm_i::avaiable_nids for this?
It seems that we don't need both of variables at the same time.

Thanks,



Signed-off-by: Chao Yu 
---
 fs/f2fs/f2fs.h |  1 +
 fs/f2fs/node.c | 19 +++
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 6de1fbf..9de6f20 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -551,6 +551,7 @@ struct f2fs_nm_info {
struct radix_tree_root free_nid_root;/* root of the free_nid cache */
struct list_head nid_list[MAX_NID_LIST];/* lists for free nids */
unsigned int nid_cnt[MAX_NID_LIST]; /* the number of free node id */
+   unsigned int free_nid_cnt;  /* the number of total free nid */
spinlock_t nid_list_lock;   /* protect nid lists ops */
struct mutex build_lock;/* lock for build free nids */

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index d58438f..e412d0e 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1885,11 +1885,13 @@ bool alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid)
return false;
}
 #endif
-   if (unlikely(sbi->total_valid_node_count + 1 > nm_i->available_nids))
-   return false;
-
spin_lock(&nm_i->nid_list_lock);

+   if (unlikely(nm_i->free_nid_cnt == 0)) {
+   spin_unlock(&nm_i->nid_list_lock);
+   return false;
+   }
+
/* We should not use stale free nids created by build_free_nids */
if (nm_i->nid_cnt[FREE_NID_LIST] && !on_build_free_nids(nm_i)) {
f2fs_bug_on(sbi, list_empty(&nm_i->nid_list[FREE_NID_LIST]));
@@ -1900,6 +1902,7 @@ bool alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid)
__remove_nid_from_list(sbi, i, FREE_NID_LIST, true);
i->state = NID_ALLOC;
__insert_nid_to_list(sbi, i, ALLOC_NID_LIST, false);
+   nm_i->free_nid_cnt--;
spin_unlock(&nm_i->nid_list_lock);
return true;
}
@@ -1951,6 +1954,9 @@ void alloc_nid_failed(struct f2fs_sb_info *sbi, nid_t nid)
i->state = NID_NEW;
__insert_nid_to_list(sbi, i, FREE_NID_LIST, false);
}
+
+   nm_i->free_nid_cnt++;
+
spin_unlock(&nm_i->nid_list_lock);

if (need_free)
@@ -,8 +2228,12 @@ static void __flush_nat_entry_set(struct f2fs_sb_info 
*sbi,
raw_nat_from_node_info(raw_ne, &ne->ni);
nat_reset_flag(ne);
__clear_nat_cache_dirty(NM_I(sbi), ne);
-   if (nat_get_blkaddr(ne) == NULL_ADDR)
+   if (nat_get_blkaddr(ne) == NULL_ADDR) {
add_free_nid(sbi, nid, false);
+   spin_lock(&NM_I(sbi)->nid_list_lock);
+   NM_I(sbi)->free_nid_cnt++;

Hi all,
Here, we should consider clean NULL_ADDR nat entry in journal.

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index dcfab29..b22ecb0 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -158,6 +158,13 @@ static void __set_nat_cache_dirty(struct f2fs_nm_info 
*nm_i,
if (get_nat_flag(ne, IS_DIRTY))
return;

+   if (ne->ni.blk_addr == NULL_ADDR) {
+   spin_lock(&nm_i->free_nid_list_lock);
+   nm_i->available_nids--;
+   spin_unlock(&nm_i->free_nid_list_lock);
+   }
+
+

Thanks.


+   spin_unlock(&NM_I(sbi)->nid_list_lock);
+   }
}

if (to_journal)
@@ -2302,6 +2312,7 @@ static int init_node_manager(struct f2fs_sb_info *sbi)
nm_i->nid_cnt[FREE_NID_LIST] = 0;
nm_i->nid_cnt[ALLOC_NID_LIST] = 0;
nm_i->nat_cnt = 0;
+   nm

Re: [f2fs-dev] [PATCH] f2fs: fix to set superblock dirty correctly

2016-08-26 Thread heyunlei

Hi all,

On 2016/8/27 9:01, Jaegeuk Kim wrote:

On Fri, Aug 26, 2016 at 10:20:18PM +0800, Chao Yu wrote:

From: Chao Yu 

tests/generic/251 of fstest suit complains us with below message:

[ cut here ]
invalid opcode:  [#1] PREEMPT SMP
CPU: 2 PID: 7698 Comm: fstrim Tainted: G   O4.7.0+ #21
task: e9f4e000 task.stack: e7262000
EIP: 0060:[] EFLAGS: 00010202 CPU: 2
EIP is at write_checkpoint+0xfde/0x1020 [f2fs]
EAX: f33eb300 EBX: eecac310 ECX: 0001 EDX: 0001
ESI: eecac000 EDI: eecac5f0 EBP: e7263dec ESP: e7263d18
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: b76ab01c CR3: 2eb89de0 CR4: 000406f0
Stack:
 0001 a220fb7b e9f4e000 0002 419ff2d3 b3a05151 0002 e9f4e5d8
 e9f4e000 419ff2d3 b3a05151 eecac310 c10b8154 b3a05151 419ff2d3 c10b78bd
 e9f4e000 e9f4e000 e9f4e5d8 0001 e9f4e000 ec409000 eecac2cc eecac288
Call Trace:
 [] ? __lock_acquire+0x3c4/0x760
 [] ? mark_held_locks+0x5d/0x80
 [] f2fs_trim_fs+0x1c2/0x2e0 [f2fs]
 [] f2fs_ioctl+0x6b6/0x10b0 [f2fs]
 [] ? __this_cpu_preempt_check+0xf/0x20
 [] ? trace_hardirqs_off_caller+0x91/0x120
 [] ? __exchange_data_block+0xd30/0xd30 [f2fs]
 [] do_vfs_ioctl+0x81/0x7f0
 [] ? kmem_cache_free+0x245/0x2e0
 [] ? get_unused_fd_flags+0x40/0x40
 [] ? putname+0x4c/0x50
 [] ? do_sys_open+0x16e/0x1d0
 [] ? do_fast_syscall_32+0x30/0x1c0
 [] ? __this_cpu_preempt_check+0xf/0x20
 [] SyS_ioctl+0x58/0x80
 [] do_fast_syscall_32+0xa1/0x1c0
 [] sysenter_past_esp+0x45/0x74
EIP: [] write_checkpoint+0xfde/0x1020 [f2fs] SS:ESP 0068:e7263d18
---[ end trace 4de95d7e6b3aa7c6 ]---

The reason is: with below call stack, we will encounter BUG_ON during
doing fstrim.

Thread AThread B
- write_checkpoint
 - do_checkpoint
- f2fs_write_inode
 - update_inode_page
  - update_inode
   - set_page_dirty
- f2fs_set_node_page_dirty
 - inc_page_count
  - percpu_counter_inc
  - set_sbi_flag(SBI_IS_DIRTY)
  - clear_sbi_flag(SBI_IS_DIRTY)

Thread CThread D
- f2fs_write_node_page
 - set_node_addr
  - __set_nat_cache_dirty
   - nm_i->dirty_nat_cnt++
- do_vfs_ioctl
 - f2fs_ioctl
  - f2fs_trim_fs
   - write_checkpoint
- f2fs_bug_on(nm_i->dirty_nat_cnt)

Fix it by setting superblock dirty correctly in do_checkpoint and
f2fs_write_node_page.

Signed-off-by: Chao Yu 
---
 fs/f2fs/checkpoint.c | 4 
 fs/f2fs/node.c   | 1 +
 2 files changed, 5 insertions(+)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index cd0443d..68c723c 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1153,6 +1153,10 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, 
struct cp_control *cpc)
clear_prefree_segments(sbi, cpc);
clear_sbi_flag(sbi, SBI_IS_DIRTY);

+   /* redirty superblock if node page is updated by ->write_inode */
+   if (get_pages(sbi, F2FS_DIRTY_NODES))


Need to check F2FS_DIRTY_IMETA and F2FS_DIRTY_DENTS as well?
And, if we have this, I don't think we need to worry about f2fs_lock_op() for
update_inode_page() as well.

Thanks,

Maybe we can add this:

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 7c8e219..d32539a 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -267,6 +267,9 @@ void f2fs_submit_page_mbio(struct f2fs_io_info *fio)

down_write(&io->io_rwsem);

+   if (!is_read)
+   set_sbi_flag(sbi, SBI_IS_DIRTY);
+
if (io->bio && (io->last_block_in_bio != fio->new_blkaddr - 1 ||
(io->fio.op != fio->op || io->fio.op_flags != fio->op_flags)))
__submit_merged_bio(io);

This is deleted by this patch:

f2fs: use bio count instead of F2FS_WRITEBACK page count

which one is better?

Thanks.






+   set_sbi_flag(sbi, SBI_IS_DIRTY);
+
return 0;
 }

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 8a28800..365c6ff 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1597,6 +1597,7 @@ static int f2fs_write_node_page(struct page *page,
fio.old_blkaddr = ni.blk_addr;
write_node_page(nid, &fio);
set_node_addr(sbi, &ni, fio.new_blkaddr, is_fsync_dnode(page));
+   set_sbi_flag(sbi, SBI_IS_DIRTY);
dec_page_count(sbi, F2FS_DIRTY_NODES);
up_read(&sbi->node_write);

--
2.7.2


--
___
Linux-f2fs-devel mailing list
linux-f2fs-de...@lists.sourceforge.net
https://lists.sourcefor