[f2fs-dev] [PATCH] f2fs_io: fix output of do_read()
echo 1 > file f2fs_io read 1 0 1 dio 4096 ./file Read 0 bytes total_time = 17 us, print 4096 bytes: : ffd537 ffc957 0500 0100 : 0200 : 0300 : ffc10f 0200 For the case reading across EOF, it missed to copy returned data to print_buf. After: f2fs_io read 1 0 1 dio 4096 ./file pread expected: 4096, readed: 2 Read 2 bytes total_time = 177 us, print 4096 bytes: : 310a Signed-off-by: Chao Yu --- tools/f2fs_io/f2fs_io.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/tools/f2fs_io/f2fs_io.c b/tools/f2fs_io/f2fs_io.c index a7b593a..79b4d04 100644 --- a/tools/f2fs_io/f2fs_io.c +++ b/tools/f2fs_io/f2fs_io.c @@ -867,8 +867,15 @@ static void do_read(int argc, char **argv, const struct cmd_desc *cmd) if (!do_mmap) { for (i = 0; i < count; i++) { ret = pread(fd, buf, buf_size, offset + buf_size * i); - if (ret != buf_size) + if (ret != buf_size) { + printf("pread expected: %"PRIu64", readed: %"PRIu64"\n", + buf_size, ret); + if (ret > 0) { + read_cnt += ret; + memcpy(print_buf, buf, print_bytes); + } break; + } read_cnt += ret; if (i == 0) -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH] f2fs: fix to force buffered IO on inline_data inode
It will return all zero data when DIO reading from inline_data inode, it is because f2fs_iomap_begin() assign iomap->type w/ IOMAP_HOLE incorrectly for this case. We can let iomap framework handle inline data via assigning iomap->type and iomap->inline_data correctly, however, it will be a little bit complicated when handling race case in between direct IO and buffered IO. So, let's force to use buffered IO to fix this issue. Cc: sta...@vger.kernel.org Reported-by: Barry Song Signed-off-by: Chao Yu --- fs/f2fs/file.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index db6236f27852..e038910ad1e5 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -851,6 +851,8 @@ static bool f2fs_force_buffered_io(struct inode *inode, int rw) return true; if (f2fs_compressed_file(inode)) return true; + if (f2fs_has_inline_data(inode)) + return true; /* disallow direct IO if any of devices has unaligned blksize */ if (f2fs_is_multi_device(sbi) && !sbi->aligned_blksize) -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH 2/2] f2fs: fix to do sanity check on blocks for inline_data inode
inode can be fuzzed, so it can has F2FS_INLINE_DATA flag and valid i_blocks/i_nid value, this patch supports to do extra sanity check to detect such corrupted state. Signed-off-by: Chao Yu --- fs/f2fs/f2fs.h | 2 +- fs/f2fs/inline.c | 20 +++- fs/f2fs/inode.c | 2 +- 3 files changed, 21 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 1974b6aff397..f463961b497c 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -4149,7 +4149,7 @@ extern struct kmem_cache *f2fs_inode_entry_slab; * inline.c */ bool f2fs_may_inline_data(struct inode *inode); -bool f2fs_sanity_check_inline_data(struct inode *inode); +bool f2fs_sanity_check_inline_data(struct inode *inode, struct page *ipage); bool f2fs_may_inline_dentry(struct inode *inode); void f2fs_do_read_inline_data(struct folio *folio, struct page *ipage); void f2fs_truncate_inline_inode(struct inode *inode, diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c index 7638d0d7b7ee..0203c3baabb6 100644 --- a/fs/f2fs/inline.c +++ b/fs/f2fs/inline.c @@ -33,11 +33,29 @@ bool f2fs_may_inline_data(struct inode *inode) return !f2fs_post_read_required(inode); } -bool f2fs_sanity_check_inline_data(struct inode *inode) +static bool inode_has_blocks(struct inode *inode, struct page *ipage) +{ + struct f2fs_inode *ri = F2FS_INODE(ipage); + int i; + + if (F2FS_HAS_BLOCKS(inode)) + return true; + + for (i = 0; i < DEF_NIDS_PER_INODE; i++) { + if (ri->i_nid[i]) + return true; + } + return false; +} + +bool f2fs_sanity_check_inline_data(struct inode *inode, struct page *ipage) { if (!f2fs_has_inline_data(inode)) return false; + if (inode_has_blocks(inode, ipage)) + return false; + if (!support_inline_data(inode)) return true; diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index 791c06e159fd..4b39aebd3c70 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -344,7 +344,7 @@ static bool sanity_check_inode(struct inode *inode, struct page *node_page) } } - if (f2fs_sanity_check_inline_data(inode)) { + if (f2fs_sanity_check_inline_data(inode, node_page)) { f2fs_warn(sbi, "%s: inode (ino=%lx, mode=%u) should not have inline_data, run fsck to fix", __func__, inode->i_ino, inode->i_mode); return false; -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH 1/2] f2fs: fix to do sanity check on F2FS_INLINE_DATA flag in inode during GC
syzbot reports a f2fs bug as below: [ cut here ] kernel BUG at fs/f2fs/inline.c:258! CPU: 1 PID: 34 Comm: kworker/u8:2 Not tainted 6.9.0-rc6-syzkaller-00012-g9e4bc4bcae01 #0 RIP: 0010:f2fs_write_inline_data+0x781/0x790 fs/f2fs/inline.c:258 Call Trace: f2fs_write_single_data_page+0xb65/0x1d60 fs/f2fs/data.c:2834 f2fs_write_cache_pages fs/f2fs/data.c:3133 [inline] __f2fs_write_data_pages fs/f2fs/data.c:3288 [inline] f2fs_write_data_pages+0x1efe/0x3a90 fs/f2fs/data.c:3315 do_writepages+0x35b/0x870 mm/page-writeback.c:2612 __writeback_single_inode+0x165/0x10b0 fs/fs-writeback.c:1650 writeback_sb_inodes+0x905/0x1260 fs/fs-writeback.c:1941 wb_writeback+0x457/0xce0 fs/fs-writeback.c:2117 wb_do_writeback fs/fs-writeback.c:2264 [inline] wb_workfn+0x410/0x1090 fs/fs-writeback.c:2304 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0xa12/0x17c0 kernel/workqueue.c:3335 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416 kthread+0x2f2/0x390 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 The root cause is: inline_data inode can be fuzzed, so that there may be valid blkaddr in its direct node, once f2fs triggers background GC to migrate the block, it will hit f2fs_bug_on() during dirty page writeback. Let's add sanity check on F2FS_INLINE_DATA flag in inode during GC, so that, it can forbid migrating inline_data inode's data block for fixing. Reported-by: syzbot+848062ba19c8782ca...@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/d103ce06174d7...@google.com Signed-off-by: Chao Yu --- fs/f2fs/gc.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 6066c6eecf41..20e2f989013b 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -1563,6 +1563,16 @@ static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, continue; } + if (f2fs_has_inline_data(inode)) { + iput(inode); + set_sbi_flag(sbi, SBI_NEED_FSCK); + f2fs_err_ratelimited(sbi, + "inode %lx has both inline_data flag and " + "data block, nid=%u, ofs_in_node=%u", + inode->i_ino, dni.nid, ofs_in_node); + continue; + } + err = f2fs_gc_pinned_control(inode, gc_type, segno); if (err == -EAGAIN) { iput(inode); -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH] f2fs: fix to truncate preallocated blocks in f2fs_file_open()
chenyuwen reports a f2fs bug as below: Unable to handle kernel NULL pointer dereference at virtual address 0011 fscrypt_set_bio_crypt_ctx+0x78/0x1e8 f2fs_grab_read_bio+0x78/0x208 f2fs_submit_page_read+0x44/0x154 f2fs_get_read_data_page+0x288/0x5f4 f2fs_get_lock_data_page+0x60/0x190 truncate_partial_data_page+0x108/0x4fc f2fs_do_truncate_blocks+0x344/0x5f0 f2fs_truncate_blocks+0x6c/0x134 f2fs_truncate+0xd8/0x200 f2fs_iget+0x20c/0x5ac do_garbage_collect+0x5d0/0xf6c f2fs_gc+0x22c/0x6a4 f2fs_disable_checkpoint+0xc8/0x310 f2fs_fill_super+0x14bc/0x1764 mount_bdev+0x1b4/0x21c f2fs_mount+0x20/0x30 legacy_get_tree+0x50/0xbc vfs_get_tree+0x5c/0x1b0 do_new_mount+0x298/0x4cc path_mount+0x33c/0x5fc __arm64_sys_mount+0xcc/0x15c invoke_syscall+0x60/0x150 el0_svc_common+0xb8/0xf8 do_el0_svc+0x28/0xa0 el0_svc+0x24/0x84 el0t_64_sync_handler+0x88/0xec It is because inode.i_crypt_info is not initialized during below path: - mount - f2fs_fill_super - f2fs_disable_checkpoint - f2fs_gc - f2fs_iget - f2fs_truncate So, let's relocate truncation of preallocated blocks to f2fs_file_open(), after fscrypt_file_open(). Fixes: d4dd19ec1ea0 ("f2fs: do not expose unwritten blocks to user by DIO") Reported-by: chenyuwen Closes: https://lore.kernel.org/linux-kernel/20240517085327.1188515-1-yuwen.c...@xjmz.com Signed-off-by: Chao Yu --- fs/f2fs/file.c | 28 +++- fs/f2fs/inode.c | 8 2 files changed, 27 insertions(+), 9 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index ef4cfb4436ef..058fcc83a2fc 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -554,6 +554,28 @@ static int f2fs_file_mmap(struct file *file, struct vm_area_struct *vma) return 0; } +static int finish_preallocate_blocks(struct inode *inode) +{ + int ret; + + if (is_sbi_flag_set(F2FS_I_SB(inode), SBI_POR_DOING)) + return 0; + + inode_lock(inode); + if (!file_should_truncate(inode)) { + inode_unlock(inode); + return 0; + } + + ret = f2fs_truncate(inode); + inode_unlock(inode); + if (ret) + return ret; + + file_dont_truncate(inode); + return 0; +} + static int f2fs_file_open(struct inode *inode, struct file *filp) { int err = fscrypt_file_open(inode, filp); @@ -571,7 +593,11 @@ static int f2fs_file_open(struct inode *inode, struct file *filp) filp->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC; filp->f_mode |= FMODE_CAN_ODIRECT; - return dquot_file_open(inode, filp); + err = dquot_file_open(inode, filp); + if (err) + return err; + + return finish_preallocate_blocks(inode); } void f2fs_truncate_data_blocks_range(struct dnode_of_data *dn, int count) diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index 005dde72aff3..791c06e159fd 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -610,14 +610,6 @@ struct inode *f2fs_iget(struct super_block *sb, unsigned long ino) } f2fs_set_inode_flags(inode); - if (file_should_truncate(inode) && - !is_sbi_flag_set(sbi, SBI_POR_DOING)) { - ret = f2fs_truncate(inode); - if (ret) - goto bad_inode; - file_dont_truncate(inode); - } - unlock_new_inode(inode); trace_f2fs_iget(inode); return inode; -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] f2fs: fix to check return value of f2fs_allocate_new_section
On 2024/5/17 19:26, Zhiguo Niu wrote: commit 245930617c9b ("f2fs: fix to handle error paths of {new,change}_curseg()") missed this allocated path, fix it. Signed-off-by: Zhiguo Niu Reviewed-by: Chao Yu Thanks, ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] 答复: [External Mail][PATCH] f2fs: fix panic in f2fs_put_super
On 2024/5/16 18:15, 孙士杰 wrote: I didn't get it, if there is no cp_err, f2fs_write_checkpoint() in f2fs_put_super() will flush all dirty pages of node_inode, if there is cp_err, below flow will keep all dirty pages being truncated, and there is sanity check on all types of dirty pages. ===》 I understand what you mean, so is it better to modify in this way? Please help to check, thank you Hi, let's figure out the root cause first? Thanks, -- *发件人:* sunshijie *发送时间:* 2024年5月16日 18:13:38 *收件人:* jaeg...@kernel.org; c...@kernel.org; linux-f2fs-devel@lists.sourceforge.net; linux-ker...@vger.kernel.org *抄送:* 孙士杰 *主题:* [External Mail][PATCH] f2fs: fix panic in f2fs_put_super [外部邮件] 此邮件来源于小米公司外部,请谨慎处理。若对邮件安全性存疑,请将邮件转发给mi...@xiaomi.com进行反馈 When thread A calls kill_f2fs_super, Thread A first executes the code sbi->node_inode = NULL; Then thread A may submit a bio to the function iput(sbi->meta_inode); Then thread A enters the process D state, Now that the bio submitted by thread A is complete, it calls f2fs_write_end_io and may trigger null-ptr-deref in NODE_MAPPING. Thread A IRQ context - f2fs_put_super - sbi->node_inode = NULL; - iput(sbi->meta_inode); - iput_final - write_inode_now - writeback_single_inode - __writeback_single_inode - filemap_fdatawait - filemap_fdatawait_range - __kcfi_typeid_free_transhuge_page - __filemap_fdatawait_range - wait_on_page_writeback - folio_wait_writeback - folio_wait_bit - folio_wait_bit_common - io_schedule - __handle_irq_event_percpu - ufs_qcom_mcq_esi_handler - ufshcd_mcq_poll_cqe_nolock - ufshcd_compl_one_cqe - scsi_done - scsi_done_internal - blk_mq_complete_request - scsi_complete - scsi_finish_command - scsi_io_completion - scsi_end_request - blk_update_request - bio_endio - f2fs_write_end_io - NODE_MAPPING(sbi) Signed-off-by: sunshijie --- fs/f2fs/super.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index adffc9b80a9c..62d4f229f601 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -1642,9 +1642,9 @@ static void f2fs_put_super(struct super_block *sb) f2fs_destroy_compress_inode(sbi); iput(sbi->node_inode); - sbi->node_inode = NULL; - iput(sbi->meta_inode); + + sbi->node_inode = NULL; sbi->meta_inode = NULL; mutex_unlock(>umount_mutex); -- 2.34.1 #/**本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete
Re: [f2fs-dev] [PATCH] f2fs: fix panic in f2fs_put_super
On 2024/5/16 16:55, sunshijie wrote: When thread A calls kill_f2fs_super, Thread A first executes the code sbi->node_inode = NULL; Then thread A may submit a bio to the function iput(sbi->meta_inode); Then thread A enters the process D state, Now that the bio submitted by thread A is complete, it calls f2fs_write_end_io and may trigger null-ptr-deref in NODE_MAPPING. I didn't get it, if there is no cp_err, f2fs_write_checkpoint() in f2fs_put_super() will flush all dirty pages of node_inode, if there is cp_err, below flow will keep all dirty pages being truncated, and there is sanity check on all types of dirty pages. /* our cp_error case, we can wait for any writeback page */ f2fs_flush_merged_writes(sbi); f2fs_wait_on_all_pages(sbi, F2FS_WB_CP_DATA); if (err || f2fs_cp_error(sbi)) { truncate_inode_pages_final(NODE_MAPPING(sbi)); truncate_inode_pages_final(META_MAPPING(sbi)); } for (i = 0; i < NR_COUNT_TYPE; i++) { if (!get_pages(sbi, i)) continue; f2fs_err(sbi, "detect filesystem reference count leak during " "umount, type: %d, count: %lld", i, get_pages(sbi, i)); f2fs_bug_on(sbi, 1); } So, is there any missing case that dirty page of node_inode is missed by f2fs_put_super()? Thanks, Thread A IRQ context - f2fs_put_super - sbi->node_inode = NULL; - iput(sbi->meta_inode); - iput_final - write_inode_now - writeback_single_inode - __writeback_single_inode - filemap_fdatawait - filemap_fdatawait_range - __kcfi_typeid_free_transhuge_page - __filemap_fdatawait_range - wait_on_page_writeback - folio_wait_writeback - folio_wait_bit - folio_wait_bit_common - io_schedule - __handle_irq_event_percpu - ufs_qcom_mcq_esi_handler - ufshcd_mcq_poll_cqe_nolock - ufshcd_compl_one_cqe - scsi_done - scsi_done_internal - blk_mq_complete_request - scsi_complete - scsi_finish_command - scsi_io_completion - scsi_end_request - blk_update_request - bio_endio - f2fs_write_end_io - NODE_MAPPING(sbi) Signed-off-by: sunshijie --- fs/f2fs/super.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index adffc9b80a9c..aeb085e11f9a 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -1641,12 +1641,12 @@ static void f2fs_put_super(struct super_block *sb) f2fs_destroy_compress_inode(sbi); - iput(sbi->node_inode); - sbi->node_inode = NULL; - iput(sbi->meta_inode); sbi->meta_inode = NULL; + iput(sbi->node_inode); + sbi->node_inode = NULL; + mutex_unlock(>umount_mutex); /* ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] f2fs:modify the entering condition for f2fs_migrate_blocks()
On 2024/5/15 16:24, Liao Yuanhong wrote: Currently, when we allocating a swap file on zone UFS, this file will created on conventional UFS. If the swap file size is not aligned with the zone size, the last extent will enter f2fs_migrate_blocks(), resulting in significant additional I/O overhead and prolonged lock occupancy. In most cases, this is unnecessary, because on Conventional UFS, as long as the start block of the swap file is aligned with zone, it is sequentially aligned.To circumvent this issue, we have altered the conditions for entering f2fs_migrate_blocks(). Now, if the start block of the last extent is aligned with the start of zone, we avoids entering f2fs_migrate_blocks(). Hi, Is it possible that we can pin swapfile, and fallocate on it aligned to zone size, then mkswap and swapon? Thanks, Signed-off-by: Liao Yuanhong Signed-off-by: Wu Bo --- fs/f2fs/data.c | 23 +-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 50ceb25b3..4d58fb6c2 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -3925,10 +3925,12 @@ static int check_swap_activate(struct swap_info_struct *sis, block_t pblock; block_t lowest_pblock = -1; block_t highest_pblock = 0; + block_t blk_start; int nr_extents = 0; unsigned int nr_pblocks; unsigned int blks_per_sec = BLKS_PER_SEC(sbi); unsigned int not_aligned = 0; + unsigned int cur_sec; int ret = 0; /* @@ -3965,23 +3967,39 @@ static int check_swap_activate(struct swap_info_struct *sis, pblock = map.m_pblk; nr_pblocks = map.m_len; - if ((pblock - SM_I(sbi)->main_blkaddr) % blks_per_sec || + blk_start = pblock - SM_I(sbi)->main_blkaddr; + + if (blk_start % blks_per_sec || nr_pblocks % blks_per_sec || !f2fs_valid_pinned_area(sbi, pblock)) { bool last_extent = false; not_aligned++; + cur_sec = (blk_start + nr_pblocks) / BLKS_PER_SEC(sbi); nr_pblocks = roundup(nr_pblocks, blks_per_sec); - if (cur_lblock + nr_pblocks > sis->max) + if (cur_lblock + nr_pblocks > sis->max) { nr_pblocks -= blks_per_sec; + /* the start address is aligned to section */ + if (!(blk_start % blks_per_sec)) + last_extent = true; + } + /* this extent is last one */ if (!nr_pblocks) { nr_pblocks = last_lblock - cur_lblock; last_extent = true; } + /* +* the last extent which located on conventional UFS doesn't +* need migrate +*/ + if (last_extent && f2fs_sb_has_blkzoned(sbi) && + cur_sec < GET_SEC_FROM_SEG(sbi, first_zoned_segno(sbi))) + goto next; + ret = f2fs_migrate_blocks(inode, cur_lblock, nr_pblocks); if (ret) { @@ -3994,6 +4012,7 @@ static int check_swap_activate(struct swap_info_struct *sis, goto retry; } +next: if (cur_lblock + nr_pblocks >= sis->max) nr_pblocks = sis->max - cur_lblock; -- 2.25.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH] f2fs: add support for FS_IOC_GETFSSYSFSPATH
FS_IOC_GETFSSYSFSPATH ioctl expects sysfs sub-path of a filesystem, the format can be "$FSTYP/$SYSFS_IDENTIFIER" under /sys/fs, it can helps to standardizes exporting sysfs datas across filesystems. This patch wires up FS_IOC_GETFSSYSFSPATH for f2fs, it will output "f2fs/". Signed-off-by: Chao Yu --- fs/f2fs/super.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index daf2c4dbe150..1f0f306cbcac 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -4481,6 +4481,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) sb->s_flags = (sb->s_flags & ~SB_POSIXACL) | (test_opt(sbi, POSIX_ACL) ? SB_POSIXACL : 0); super_set_uuid(sb, (void *) raw_super->uuid, sizeof(raw_super->uuid)); + super_set_sysfs_name_bdev(sb); sb->s_iflags |= SB_I_CGROUPWB; /* init f2fs-specific super block info */ -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH v2] f2fs: fix to avoid racing in between read and OPU dio write
On 2024/5/15 12:42, Jaegeuk Kim wrote: On 05/15, Chao Yu wrote: On 2024/5/15 0:09, Jaegeuk Kim wrote: On 05/10, Chao Yu wrote: If lfs mode is on, buffered read may race w/ OPU dio write as below, it may cause buffered read hits unwritten data unexpectly, and for dio read, the race condition exists as well. Thread A Thread B - f2fs_file_write_iter - f2fs_dio_write_iter - __iomap_dio_rw - f2fs_iomap_begin - f2fs_map_blocks - __allocate_data_block - allocated blkaddr #x - iomap_dio_submit_bio - f2fs_file_read_iter - filemap_read - f2fs_read_data_folio - f2fs_mpage_readpages - f2fs_map_blocks : get blkaddr #x - f2fs_submit_read_bio IRQ - f2fs_read_end_io : read IO on blkaddr #x complete IRQ - iomap_dio_bio_end_io : direct write IO on blkaddr #x complete This patch introduces a new per-inode i_opu_rwsem lock to avoid such race condition. Wasn't this supposed to be managed by user-land? Actually, the test case is: 1. mount w/ lfs mode 2. touch file; 3. initialize file w/ 4k zeroed data; fsync; 4. continue triggering dio write 4k zeroed data to file; 5. and meanwhile, continue triggering buf/dio 4k read in file, use md5sum to verify the 4k data; It expects data is all zero, however it turned out it's not. Can we check outstanding write bios instead of abusing locks? I didn't figure out a way to solve this w/o lock, due to: - write bios can be issued after outstanding write bios check condition, result in the race. - once read() detects that there are outstanding write bios, we need to delay read flow rather than fail it, right? It looks using a lock is more proper here? Any suggestion? Thanks, Thanks, Fixes: f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode") Signed-off-by: Chao Yu --- v2: - fix to cover dio read path w/ i_opu_rwsem as well. fs/f2fs/f2fs.h | 1 + fs/f2fs/file.c | 28 ++-- fs/f2fs/super.c | 1 + 3 files changed, 28 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 30058e16a5d0..91cf4b3d6bc6 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -847,6 +847,7 @@ struct f2fs_inode_info { /* avoid racing between foreground op and gc */ struct f2fs_rwsem i_gc_rwsem[2]; struct f2fs_rwsem i_xattr_sem; /* avoid racing between reading and changing EAs */ + struct f2fs_rwsem i_opu_rwsem; /* avoid racing between buf read and opu dio write */ int i_extra_isize; /* size of extra space located in i_addr */ kprojid_t i_projid; /* id for project quota */ diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 72ce1a522fb2..4ec260af321f 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -4445,6 +4445,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to) const loff_t pos = iocb->ki_pos; const size_t count = iov_iter_count(to); struct iomap_dio *dio; + bool do_opu = f2fs_lfs_mode(sbi); ssize_t ret; if (count == 0) @@ -4457,8 +4458,14 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to) ret = -EAGAIN; goto out; } + if (do_opu && !f2fs_down_read_trylock(>i_opu_rwsem)) { + f2fs_up_read(>i_gc_rwsem[READ]); + ret = -EAGAIN; + goto out; + } } else { f2fs_down_read(>i_gc_rwsem[READ]); + f2fs_down_read(>i_opu_rwsem); } /* @@ -4477,6 +4484,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to) ret = iomap_dio_complete(dio); } + f2fs_up_read(>i_opu_rwsem); f2fs_up_read(>i_gc_rwsem[READ]); file_accessed(file); @@ -4523,7 +4531,13 @@ static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *to) if (f2fs_should_use_dio(inode, iocb, to)) { ret = f2fs_dio_read_iter(iocb, to); } else { + bool do_opu = f2fs_lfs_mode(F2FS_I_SB(inode)); + + if (do_opu) + f2fs_down_read(_I(inode)->i_opu_rwsem); ret = filemap_read(iocb, to, 0); + if (do_opu) + f2fs_up_read(_I(inode)->i_opu_rwsem); if (ret > 0) f2fs_update_iostat(F2FS_I_SB(inode), inode, APP_BUFFERED_READ_IO, ret); @@ -4748,14 +4762,22 @@ static ssize_t f2fs_dio_write
Re: [f2fs-dev] [PATCH 3/3] f2fs: fix to do sanity check on i_nid for inline_data inode
On 2024/5/15 12:39, Jaegeuk Kim wrote: On 05/15, Chao Yu wrote: On 2024/5/15 0:07, Jaegeuk Kim wrote: 外部邮件/External Mail On 05/11, Chao Yu wrote: On 2024/5/11 8:38, Jaegeuk Kim wrote: On 05/10, Chao Yu wrote: On 2024/5/10 11:36, Jaegeuk Kim wrote: On 05/10, Chao Yu wrote: On 2024/5/9 23:52, Jaegeuk Kim wrote: On 05/06, Chao Yu wrote: syzbot reports a f2fs bug as below: [ cut here ] kernel BUG at fs/f2fs/inline.c:258! CPU: 1 PID: 34 Comm: kworker/u8:2 Not tainted 6.9.0-rc6-syzkaller-00012-g9e4bc4bcae01 #0 RIP: 0010:f2fs_write_inline_data+0x781/0x790 fs/f2fs/inline.c:258 Call Trace: f2fs_write_single_data_page+0xb65/0x1d60 fs/f2fs/data.c:2834 f2fs_write_cache_pages fs/f2fs/data.c:3133 [inline] __f2fs_write_data_pages fs/f2fs/data.c:3288 [inline] f2fs_write_data_pages+0x1efe/0x3a90 fs/f2fs/data.c:3315 do_writepages+0x35b/0x870 mm/page-writeback.c:2612 __writeback_single_inode+0x165/0x10b0 fs/fs-writeback.c:1650 writeback_sb_inodes+0x905/0x1260 fs/fs-writeback.c:1941 wb_writeback+0x457/0xce0 fs/fs-writeback.c:2117 wb_do_writeback fs/fs-writeback.c:2264 [inline] wb_workfn+0x410/0x1090 fs/fs-writeback.c:2304 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0xa12/0x17c0 kernel/workqueue.c:3335 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416 kthread+0x2f2/0x390 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 The root cause is: inline_data inode can be fuzzed, so that there may be valid blkaddr in its direct node, once f2fs triggers background GC to migrate the block, it will hit f2fs_bug_on() during dirty page writeback. Let's add sanity check on i_nid field for inline_data inode, meanwhile, forbid to migrate inline_data inode's data block to fix this issue. Reported-by: syzbot+848062ba19c8782ca...@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/d103ce06174d7...@google.com Signed-off-by: Chao Yu --- fs/f2fs/f2fs.h | 2 +- fs/f2fs/gc.c | 6 ++ fs/f2fs/inline.c | 17 - fs/f2fs/inode.c | 2 +- 4 files changed, 24 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index fced2b7652f4..c876813b5532 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -4146,7 +4146,7 @@ extern struct kmem_cache *f2fs_inode_entry_slab; * inline.c */ bool f2fs_may_inline_data(struct inode *inode); -bool f2fs_sanity_check_inline_data(struct inode *inode); +bool f2fs_sanity_check_inline_data(struct inode *inode, struct page *ipage); bool f2fs_may_inline_dentry(struct inode *inode); void f2fs_do_read_inline_data(struct page *page, struct page *ipage); void f2fs_truncate_inline_inode(struct inode *inode, diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index e86c7f01539a..041957750478 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -1563,6 +1563,12 @@ static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, continue; } + if (f2fs_has_inline_data(inode)) { + iput(inode); + set_sbi_flag(sbi, SBI_NEED_FSCK); + continue; Any race condtion to get this as false alarm? Since there is no reproducer for the bug, I doubt it was caused by metadata fuzzing, something like this: - inline inode has one valid blkaddr in i_addr or in dnode reference by i_nid; - SIT/SSA entry of the block is valid; - background GC migrates the block; - kworker writeback it, and trigger the bug_on(). Wasn't detected by sanity_check_inode? I fuzzed non-inline inode w/ below metadata fields: - i_blocks = 1 - i_size = 2048 - i_inline |= 0x02 sanity_check_inode() doesn't complain. I mean, the below sanity_check_inode() can cover the fuzzed case? I'm wondering I didn't figure out a generic way in sanity_check_inode() to catch all fuzzed cases. The patch described: "The root cause is: inline_data inode can be fuzzed, so that there may be valid blkaddr in its direct node, once f2fs triggers background GC to migrate the block, it will hit f2fs_bug_on() during dirty page writeback." Do you suspect the node block address was suddenly assigned after f2fs_iget()? No, I suspect that the image was fuzzed by tools offline, not in runtime after mount(). Otherwise, it looks checking them in sanity_check_inode would be enough. e.g. case #1 - blkaddr, its dnode, SSA and SIT are consistent - dnode.footer.ino points to inline inode - inline inode doesn't link to the donde Something like fuzzed special file, please check details in below commit: 9056d6489f5a ("f2fs: fix to do sanity check on inode type during garbage collection") case #2 - blkaddr, its dnode, SSA an
Re: [f2fs-dev] [PATCH] f2fs: Add inline to f2fs_build_fault_attr() stub
On 2024/5/13 23:40, Nathan Chancellor wrote: When building without CONFIG_F2FS_FAULT_INJECTION, there is a warning from each file that includes f2fs.h because the stub for f2fs_build_fault_attr() is missing inline: In file included from fs/f2fs/segment.c:21: fs/f2fs/f2fs.h:4605:12: warning: 'f2fs_build_fault_attr' defined but not used [-Wunused-function] 4605 | static int f2fs_build_fault_attr(struct f2fs_sb_info *sbi, unsigned long rate, |^ Add the missing inline to resolve all of the warnings for this configuration. Fixes: 4ed886b187f4 ("f2fs: check validation of fault attrs in f2fs_build_fault_attr()") Signed-off-by: Nathan Chancellor Reviewed-by: Chao Yu Thanks, ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] f2fs: initialize last_block_in_bio variable
On 2024/5/14 19:35, Wu Bo wrote: Initialize last_block_in_bio of struct f2fs_bio_info and clean up code. Signed-off-by: Wu Bo Reviewed-by: Chao Yu Thanks, ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH v2] f2fs: fix to avoid racing in between read and OPU dio write
On 2024/5/15 0:09, Jaegeuk Kim wrote: On 05/10, Chao Yu wrote: If lfs mode is on, buffered read may race w/ OPU dio write as below, it may cause buffered read hits unwritten data unexpectly, and for dio read, the race condition exists as well. Thread A Thread B - f2fs_file_write_iter - f2fs_dio_write_iter - __iomap_dio_rw - f2fs_iomap_begin - f2fs_map_blocks - __allocate_data_block - allocated blkaddr #x - iomap_dio_submit_bio - f2fs_file_read_iter - filemap_read - f2fs_read_data_folio - f2fs_mpage_readpages - f2fs_map_blocks : get blkaddr #x - f2fs_submit_read_bio IRQ - f2fs_read_end_io : read IO on blkaddr #x complete IRQ - iomap_dio_bio_end_io : direct write IO on blkaddr #x complete This patch introduces a new per-inode i_opu_rwsem lock to avoid such race condition. Wasn't this supposed to be managed by user-land? Actually, the test case is: 1. mount w/ lfs mode 2. touch file; 3. initialize file w/ 4k zeroed data; fsync; 4. continue triggering dio write 4k zeroed data to file; 5. and meanwhile, continue triggering buf/dio 4k read in file, use md5sum to verify the 4k data; It expects data is all zero, however it turned out it's not. Thanks, Fixes: f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode") Signed-off-by: Chao Yu --- v2: - fix to cover dio read path w/ i_opu_rwsem as well. fs/f2fs/f2fs.h | 1 + fs/f2fs/file.c | 28 ++-- fs/f2fs/super.c | 1 + 3 files changed, 28 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 30058e16a5d0..91cf4b3d6bc6 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -847,6 +847,7 @@ struct f2fs_inode_info { /* avoid racing between foreground op and gc */ struct f2fs_rwsem i_gc_rwsem[2]; struct f2fs_rwsem i_xattr_sem; /* avoid racing between reading and changing EAs */ + struct f2fs_rwsem i_opu_rwsem; /* avoid racing between buf read and opu dio write */ int i_extra_isize; /* size of extra space located in i_addr */ kprojid_t i_projid; /* id for project quota */ diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 72ce1a522fb2..4ec260af321f 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -4445,6 +4445,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to) const loff_t pos = iocb->ki_pos; const size_t count = iov_iter_count(to); struct iomap_dio *dio; + bool do_opu = f2fs_lfs_mode(sbi); ssize_t ret; if (count == 0) @@ -4457,8 +4458,14 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to) ret = -EAGAIN; goto out; } + if (do_opu && !f2fs_down_read_trylock(>i_opu_rwsem)) { + f2fs_up_read(>i_gc_rwsem[READ]); + ret = -EAGAIN; + goto out; + } } else { f2fs_down_read(>i_gc_rwsem[READ]); + f2fs_down_read(>i_opu_rwsem); } /* @@ -4477,6 +4484,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to) ret = iomap_dio_complete(dio); } + f2fs_up_read(>i_opu_rwsem); f2fs_up_read(>i_gc_rwsem[READ]); file_accessed(file); @@ -4523,7 +4531,13 @@ static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *to) if (f2fs_should_use_dio(inode, iocb, to)) { ret = f2fs_dio_read_iter(iocb, to); } else { + bool do_opu = f2fs_lfs_mode(F2FS_I_SB(inode)); + + if (do_opu) + f2fs_down_read(_I(inode)->i_opu_rwsem); ret = filemap_read(iocb, to, 0); + if (do_opu) + f2fs_up_read(_I(inode)->i_opu_rwsem); if (ret > 0) f2fs_update_iostat(F2FS_I_SB(inode), inode, APP_BUFFERED_READ_IO, ret); @@ -4748,14 +4762,22 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from, ret = -EAGAIN; goto out; } + if (do_opu && !f2fs_down_write_trylock(>i_opu_rwsem)) { + f2fs_up_read(>i_gc_rwsem[READ]); + f2fs_up_read(>i_gc_rwsem[WRITE]); + ret = -EAGAIN; + goto out; + } } else { ret = f2fs_convert_inline_inode(inode); if (ret)
Re: [f2fs-dev] [PATCH 3/3] f2fs: fix to do sanity check on i_nid for inline_data inode
On 2024/5/15 0:07, Jaegeuk Kim wrote: 外部邮件/External Mail On 05/11, Chao Yu wrote: On 2024/5/11 8:38, Jaegeuk Kim wrote: On 05/10, Chao Yu wrote: On 2024/5/10 11:36, Jaegeuk Kim wrote: On 05/10, Chao Yu wrote: On 2024/5/9 23:52, Jaegeuk Kim wrote: On 05/06, Chao Yu wrote: syzbot reports a f2fs bug as below: [ cut here ] kernel BUG at fs/f2fs/inline.c:258! CPU: 1 PID: 34 Comm: kworker/u8:2 Not tainted 6.9.0-rc6-syzkaller-00012-g9e4bc4bcae01 #0 RIP: 0010:f2fs_write_inline_data+0x781/0x790 fs/f2fs/inline.c:258 Call Trace: f2fs_write_single_data_page+0xb65/0x1d60 fs/f2fs/data.c:2834 f2fs_write_cache_pages fs/f2fs/data.c:3133 [inline] __f2fs_write_data_pages fs/f2fs/data.c:3288 [inline] f2fs_write_data_pages+0x1efe/0x3a90 fs/f2fs/data.c:3315 do_writepages+0x35b/0x870 mm/page-writeback.c:2612 __writeback_single_inode+0x165/0x10b0 fs/fs-writeback.c:1650 writeback_sb_inodes+0x905/0x1260 fs/fs-writeback.c:1941 wb_writeback+0x457/0xce0 fs/fs-writeback.c:2117 wb_do_writeback fs/fs-writeback.c:2264 [inline] wb_workfn+0x410/0x1090 fs/fs-writeback.c:2304 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0xa12/0x17c0 kernel/workqueue.c:3335 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416 kthread+0x2f2/0x390 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 The root cause is: inline_data inode can be fuzzed, so that there may be valid blkaddr in its direct node, once f2fs triggers background GC to migrate the block, it will hit f2fs_bug_on() during dirty page writeback. Let's add sanity check on i_nid field for inline_data inode, meanwhile, forbid to migrate inline_data inode's data block to fix this issue. Reported-by: syzbot+848062ba19c8782ca...@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/d103ce06174d7...@google.com Signed-off-by: Chao Yu --- fs/f2fs/f2fs.h | 2 +- fs/f2fs/gc.c | 6 ++ fs/f2fs/inline.c | 17 - fs/f2fs/inode.c | 2 +- 4 files changed, 24 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index fced2b7652f4..c876813b5532 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -4146,7 +4146,7 @@ extern struct kmem_cache *f2fs_inode_entry_slab; * inline.c */ bool f2fs_may_inline_data(struct inode *inode); -bool f2fs_sanity_check_inline_data(struct inode *inode); +bool f2fs_sanity_check_inline_data(struct inode *inode, struct page *ipage); bool f2fs_may_inline_dentry(struct inode *inode); void f2fs_do_read_inline_data(struct page *page, struct page *ipage); void f2fs_truncate_inline_inode(struct inode *inode, diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index e86c7f01539a..041957750478 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -1563,6 +1563,12 @@ static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, continue; } + if (f2fs_has_inline_data(inode)) { + iput(inode); + set_sbi_flag(sbi, SBI_NEED_FSCK); + continue; Any race condtion to get this as false alarm? Since there is no reproducer for the bug, I doubt it was caused by metadata fuzzing, something like this: - inline inode has one valid blkaddr in i_addr or in dnode reference by i_nid; - SIT/SSA entry of the block is valid; - background GC migrates the block; - kworker writeback it, and trigger the bug_on(). Wasn't detected by sanity_check_inode? I fuzzed non-inline inode w/ below metadata fields: - i_blocks = 1 - i_size = 2048 - i_inline |= 0x02 sanity_check_inode() doesn't complain. I mean, the below sanity_check_inode() can cover the fuzzed case? I'm wondering I didn't figure out a generic way in sanity_check_inode() to catch all fuzzed cases. The patch described: "The root cause is: inline_data inode can be fuzzed, so that there may be valid blkaddr in its direct node, once f2fs triggers background GC to migrate the block, it will hit f2fs_bug_on() during dirty page writeback." Do you suspect the node block address was suddenly assigned after f2fs_iget()? No, I suspect that the image was fuzzed by tools offline, not in runtime after mount(). Otherwise, it looks checking them in sanity_check_inode would be enough. e.g. case #1 - blkaddr, its dnode, SSA and SIT are consistent - dnode.footer.ino points to inline inode - inline inode doesn't link to the donde Something like fuzzed special file, please check details in below commit: 9056d6489f5a ("f2fs: fix to do sanity check on inode type during garbage collection") case #2 - blkaddr, its dnode, SSA and SIT are consistent - blkaddr locates in inline inode's i_addr The image status is something like
Re: [f2fs-dev] [PATCH 3/3] f2fs: fix to do sanity check on i_nid for inline_data inode
On 2024/5/11 8:38, Jaegeuk Kim wrote: On 05/10, Chao Yu wrote: On 2024/5/10 11:36, Jaegeuk Kim wrote: On 05/10, Chao Yu wrote: On 2024/5/9 23:52, Jaegeuk Kim wrote: On 05/06, Chao Yu wrote: syzbot reports a f2fs bug as below: [ cut here ] kernel BUG at fs/f2fs/inline.c:258! CPU: 1 PID: 34 Comm: kworker/u8:2 Not tainted 6.9.0-rc6-syzkaller-00012-g9e4bc4bcae01 #0 RIP: 0010:f2fs_write_inline_data+0x781/0x790 fs/f2fs/inline.c:258 Call Trace: f2fs_write_single_data_page+0xb65/0x1d60 fs/f2fs/data.c:2834 f2fs_write_cache_pages fs/f2fs/data.c:3133 [inline] __f2fs_write_data_pages fs/f2fs/data.c:3288 [inline] f2fs_write_data_pages+0x1efe/0x3a90 fs/f2fs/data.c:3315 do_writepages+0x35b/0x870 mm/page-writeback.c:2612 __writeback_single_inode+0x165/0x10b0 fs/fs-writeback.c:1650 writeback_sb_inodes+0x905/0x1260 fs/fs-writeback.c:1941 wb_writeback+0x457/0xce0 fs/fs-writeback.c:2117 wb_do_writeback fs/fs-writeback.c:2264 [inline] wb_workfn+0x410/0x1090 fs/fs-writeback.c:2304 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0xa12/0x17c0 kernel/workqueue.c:3335 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416 kthread+0x2f2/0x390 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 The root cause is: inline_data inode can be fuzzed, so that there may be valid blkaddr in its direct node, once f2fs triggers background GC to migrate the block, it will hit f2fs_bug_on() during dirty page writeback. Let's add sanity check on i_nid field for inline_data inode, meanwhile, forbid to migrate inline_data inode's data block to fix this issue. Reported-by: syzbot+848062ba19c8782ca...@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/d103ce06174d7...@google.com Signed-off-by: Chao Yu --- fs/f2fs/f2fs.h | 2 +- fs/f2fs/gc.c | 6 ++ fs/f2fs/inline.c | 17 - fs/f2fs/inode.c | 2 +- 4 files changed, 24 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index fced2b7652f4..c876813b5532 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -4146,7 +4146,7 @@ extern struct kmem_cache *f2fs_inode_entry_slab; * inline.c */ bool f2fs_may_inline_data(struct inode *inode); -bool f2fs_sanity_check_inline_data(struct inode *inode); +bool f2fs_sanity_check_inline_data(struct inode *inode, struct page *ipage); bool f2fs_may_inline_dentry(struct inode *inode); void f2fs_do_read_inline_data(struct page *page, struct page *ipage); void f2fs_truncate_inline_inode(struct inode *inode, diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index e86c7f01539a..041957750478 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -1563,6 +1563,12 @@ static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, continue; } + if (f2fs_has_inline_data(inode)) { + iput(inode); + set_sbi_flag(sbi, SBI_NEED_FSCK); + continue; Any race condtion to get this as false alarm? Since there is no reproducer for the bug, I doubt it was caused by metadata fuzzing, something like this: - inline inode has one valid blkaddr in i_addr or in dnode reference by i_nid; - SIT/SSA entry of the block is valid; - background GC migrates the block; - kworker writeback it, and trigger the bug_on(). Wasn't detected by sanity_check_inode? I fuzzed non-inline inode w/ below metadata fields: - i_blocks = 1 - i_size = 2048 - i_inline |= 0x02 sanity_check_inode() doesn't complain. I mean, the below sanity_check_inode() can cover the fuzzed case? I'm wondering I didn't figure out a generic way in sanity_check_inode() to catch all fuzzed cases. e.g. case #1 - blkaddr, its dnode, SSA and SIT are consistent - dnode.footer.ino points to inline inode - inline inode doesn't link to the donde Something like fuzzed special file, please check details in below commit: 9056d6489f5a ("f2fs: fix to do sanity check on inode type during garbage collection") case #2 - blkaddr, its dnode, SSA and SIT are consistent - blkaddr locates in inline inode's i_addr Thanks, whether we really need to check it in the gc path. Thanks, Thoughts? Thanks, + } + err = f2fs_gc_pinned_control(inode, gc_type, segno); if (err == -EAGAIN) { iput(inode); diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c index ac00423f117b..067600fed3d4 100644 --- a/fs/f2fs/inline.c +++ b/fs/f2fs/inline.c @@ -33,11 +33,26 @@ bool f2fs_may_inline_data(struct inode *inode) return !f2fs_post_read_required(inode); } -bool f2fs_sanity_check_inline_data(struct inode *inode) +static bool has_node_blocks(st
Re: [f2fs-dev] [PATCH 3/3] f2fs: fix to do sanity check on i_nid for inline_data inode
On 2024/5/10 11:36, Jaegeuk Kim wrote: On 05/10, Chao Yu wrote: On 2024/5/9 23:52, Jaegeuk Kim wrote: On 05/06, Chao Yu wrote: syzbot reports a f2fs bug as below: [ cut here ] kernel BUG at fs/f2fs/inline.c:258! CPU: 1 PID: 34 Comm: kworker/u8:2 Not tainted 6.9.0-rc6-syzkaller-00012-g9e4bc4bcae01 #0 RIP: 0010:f2fs_write_inline_data+0x781/0x790 fs/f2fs/inline.c:258 Call Trace: f2fs_write_single_data_page+0xb65/0x1d60 fs/f2fs/data.c:2834 f2fs_write_cache_pages fs/f2fs/data.c:3133 [inline] __f2fs_write_data_pages fs/f2fs/data.c:3288 [inline] f2fs_write_data_pages+0x1efe/0x3a90 fs/f2fs/data.c:3315 do_writepages+0x35b/0x870 mm/page-writeback.c:2612 __writeback_single_inode+0x165/0x10b0 fs/fs-writeback.c:1650 writeback_sb_inodes+0x905/0x1260 fs/fs-writeback.c:1941 wb_writeback+0x457/0xce0 fs/fs-writeback.c:2117 wb_do_writeback fs/fs-writeback.c:2264 [inline] wb_workfn+0x410/0x1090 fs/fs-writeback.c:2304 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0xa12/0x17c0 kernel/workqueue.c:3335 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416 kthread+0x2f2/0x390 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 The root cause is: inline_data inode can be fuzzed, so that there may be valid blkaddr in its direct node, once f2fs triggers background GC to migrate the block, it will hit f2fs_bug_on() during dirty page writeback. Let's add sanity check on i_nid field for inline_data inode, meanwhile, forbid to migrate inline_data inode's data block to fix this issue. Reported-by: syzbot+848062ba19c8782ca...@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/d103ce06174d7...@google.com Signed-off-by: Chao Yu --- fs/f2fs/f2fs.h | 2 +- fs/f2fs/gc.c | 6 ++ fs/f2fs/inline.c | 17 - fs/f2fs/inode.c | 2 +- 4 files changed, 24 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index fced2b7652f4..c876813b5532 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -4146,7 +4146,7 @@ extern struct kmem_cache *f2fs_inode_entry_slab; * inline.c */ bool f2fs_may_inline_data(struct inode *inode); -bool f2fs_sanity_check_inline_data(struct inode *inode); +bool f2fs_sanity_check_inline_data(struct inode *inode, struct page *ipage); bool f2fs_may_inline_dentry(struct inode *inode); void f2fs_do_read_inline_data(struct page *page, struct page *ipage); void f2fs_truncate_inline_inode(struct inode *inode, diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index e86c7f01539a..041957750478 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -1563,6 +1563,12 @@ static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, continue; } + if (f2fs_has_inline_data(inode)) { + iput(inode); + set_sbi_flag(sbi, SBI_NEED_FSCK); + continue; Any race condtion to get this as false alarm? Since there is no reproducer for the bug, I doubt it was caused by metadata fuzzing, something like this: - inline inode has one valid blkaddr in i_addr or in dnode reference by i_nid; - SIT/SSA entry of the block is valid; - background GC migrates the block; - kworker writeback it, and trigger the bug_on(). Wasn't detected by sanity_check_inode? I fuzzed non-inline inode w/ below metadata fields: - i_blocks = 1 - i_size = 2048 - i_inline |= 0x02 sanity_check_inode() doesn't complain. Thanks, Thoughts? Thanks, + } + err = f2fs_gc_pinned_control(inode, gc_type, segno); if (err == -EAGAIN) { iput(inode); diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c index ac00423f117b..067600fed3d4 100644 --- a/fs/f2fs/inline.c +++ b/fs/f2fs/inline.c @@ -33,11 +33,26 @@ bool f2fs_may_inline_data(struct inode *inode) return !f2fs_post_read_required(inode); } -bool f2fs_sanity_check_inline_data(struct inode *inode) +static bool has_node_blocks(struct inode *inode, struct page *ipage) +{ + struct f2fs_inode *ri = F2FS_INODE(ipage); + int i; + + for (i = 0; i < DEF_NIDS_PER_INODE; i++) { + if (ri->i_nid[i]) + return true; + } + return false; +} + +bool f2fs_sanity_check_inline_data(struct inode *inode, struct page *ipage) { if (!f2fs_has_inline_data(inode)) return false; + if (has_node_blocks(inode, ipage)) + return false; + if (!support_inline_data(inode)) return true; diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index c26effdce9aa..1423cd27a477 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -343,7 +343,7 @@ static bool sanity_check
[f2fs-dev] [PATCH v2 2/3] f2fs: fix to add missing iput() in gc_data_segment()
During gc_data_segment(), if inode state is abnormal, it missed to call iput(), fix it. Fixes: b73e52824c89 ("f2fs: reposition unlock_new_inode to prevent accessing invalid inode") Fixes: 9056d6489f5a ("f2fs: fix to do sanity check on inode type during garbage collection") Signed-off-by: Chao Yu --- v2: - fix wrong fixes tag line. fs/f2fs/gc.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index ac4cbbe50c2f..6066c6eecf41 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -1554,10 +1554,15 @@ static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, int err; inode = f2fs_iget(sb, dni.ino); - if (IS_ERR(inode) || is_bad_inode(inode) || - special_file(inode->i_mode)) + if (IS_ERR(inode)) continue; + if (is_bad_inode(inode) || + special_file(inode->i_mode)) { + iput(inode); + continue; + } + err = f2fs_gc_pinned_control(inode, gc_type, segno); if (err == -EAGAIN) { iput(inode); -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 2/3] f2fs: fix to add missing iput() in gc_data_segment()
On 2024/5/9 10:49, Chao Yu wrote: On 2024/5/9 8:46, Jaegeuk Kim wrote: On 05/06, Chao Yu wrote: During gc_data_segment(), if inode state is abnormal, it missed to call iput(), fix it. Fixes: 132e3209789c ("f2fs: remove false alarm on iget failure during GC") Oh, this line should be replaced w/ below one, let me revise the patch. Fixes: b73e52824c89 ("f2fs: reposition unlock_new_inode to prevent accessing invalid inode"). Thanks, Fixes: 9056d6489f5a ("f2fs: fix to do sanity check on inode type during garbage collection") Signed-off-by: Chao Yu --- fs/f2fs/gc.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 8852814dab7f..e86c7f01539a 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -1554,10 +1554,15 @@ static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, int err; inode = f2fs_iget(sb, dni.ino); - if (IS_ERR(inode) || is_bad_inode(inode) || - special_file(inode->i_mode)) + if (IS_ERR(inode)) continue; + if (is_bad_inode(inode) || + special_file(inode->i_mode)) { + iput(inode); iget_failed() called iput()? It looks the bad inode was referenced in this context, it needs to be iput()ed here. The bad inode was made in other thread, please check description in commit b73e52824c89 ("f2fs: reposition unlock_new_inode to prevent accessing invalid inode"). Thanks, + continue; + } + err = f2fs_gc_pinned_control(inode, gc_type, segno); if (err == -EAGAIN) { iput(inode); -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH v3 5/5] f2fs: compress: don't allow unaligned truncation on released compress inode
f2fs image may be corrupted after below testcase: - mkfs.f2fs -O extra_attr,compression -f /dev/vdb - mount /dev/vdb /mnt/f2fs - touch /mnt/f2fs/file - f2fs_io setflags compression /mnt/f2fs/file - dd if=/dev/zero of=/mnt/f2fs/file bs=4k count=4 - f2fs_io release_cblocks /mnt/f2fs/file - truncate -s 8192 /mnt/f2fs/file - umount /mnt/f2fs - fsck.f2fs /dev/vdb [ASSERT] (fsck_chk_inode_blk:1256) --> ino: 0x5 has i_blocks: 0x0002, but has 0x3 blocks [FSCK] valid_block_count matching with CP [Fail] [0x4, 0x5] [FSCK] other corrupted bugs [Fail] The reason is: partial truncation assume compressed inode has reserved blocks, after partial truncation, valid block count may change w/o .i_blocks and .total_valid_block_count update, result in corruption. This patch only allow cluster size aligned truncation on released compress inode for fixing. Fixes: c61404153eb6 ("f2fs: introduce FI_COMPRESS_RELEASED instead of using IMMUTABLE bit") Signed-off-by: Chao Yu --- v3: - fix typo in commit description: w/ -> w/o fs/f2fs/file.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 7371f485b3f7..15f4222da891 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -952,9 +952,14 @@ int f2fs_setattr(struct mnt_idmap *idmap, struct dentry *dentry, ATTR_GID | ATTR_TIMES_SET return -EPERM; - if ((attr->ia_valid & ATTR_SIZE) && - !f2fs_is_compress_backend_ready(inode)) - return -EOPNOTSUPP; + if ((attr->ia_valid & ATTR_SIZE)) { + if (!f2fs_is_compress_backend_ready(inode)) + return -EOPNOTSUPP; + if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED) && + !IS_ALIGNED(attr->ia_size, + F2FS_BLK_TO_BYTES(F2FS_I(inode)->i_cluster_size))) + return -EINVAL; + } err = setattr_prepare(idmap, dentry, attr); if (err) -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH v2] f2fs: fix to avoid racing in between read and OPU dio write
If lfs mode is on, buffered read may race w/ OPU dio write as below, it may cause buffered read hits unwritten data unexpectly, and for dio read, the race condition exists as well. Thread AThread B - f2fs_file_write_iter - f2fs_dio_write_iter - __iomap_dio_rw - f2fs_iomap_begin - f2fs_map_blocks - __allocate_data_block - allocated blkaddr #x - iomap_dio_submit_bio - f2fs_file_read_iter - filemap_read - f2fs_read_data_folio - f2fs_mpage_readpages - f2fs_map_blocks : get blkaddr #x - f2fs_submit_read_bio IRQ - f2fs_read_end_io : read IO on blkaddr #x complete IRQ - iomap_dio_bio_end_io : direct write IO on blkaddr #x complete This patch introduces a new per-inode i_opu_rwsem lock to avoid such race condition. Fixes: f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode") Signed-off-by: Chao Yu --- v2: - fix to cover dio read path w/ i_opu_rwsem as well. fs/f2fs/f2fs.h | 1 + fs/f2fs/file.c | 28 ++-- fs/f2fs/super.c | 1 + 3 files changed, 28 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 30058e16a5d0..91cf4b3d6bc6 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -847,6 +847,7 @@ struct f2fs_inode_info { /* avoid racing between foreground op and gc */ struct f2fs_rwsem i_gc_rwsem[2]; struct f2fs_rwsem i_xattr_sem; /* avoid racing between reading and changing EAs */ + struct f2fs_rwsem i_opu_rwsem; /* avoid racing between buf read and opu dio write */ int i_extra_isize; /* size of extra space located in i_addr */ kprojid_t i_projid; /* id for project quota */ diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 72ce1a522fb2..4ec260af321f 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -4445,6 +4445,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to) const loff_t pos = iocb->ki_pos; const size_t count = iov_iter_count(to); struct iomap_dio *dio; + bool do_opu = f2fs_lfs_mode(sbi); ssize_t ret; if (count == 0) @@ -4457,8 +4458,14 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to) ret = -EAGAIN; goto out; } + if (do_opu && !f2fs_down_read_trylock(>i_opu_rwsem)) { + f2fs_up_read(>i_gc_rwsem[READ]); + ret = -EAGAIN; + goto out; + } } else { f2fs_down_read(>i_gc_rwsem[READ]); + f2fs_down_read(>i_opu_rwsem); } /* @@ -4477,6 +4484,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to) ret = iomap_dio_complete(dio); } + f2fs_up_read(>i_opu_rwsem); f2fs_up_read(>i_gc_rwsem[READ]); file_accessed(file); @@ -4523,7 +4531,13 @@ static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *to) if (f2fs_should_use_dio(inode, iocb, to)) { ret = f2fs_dio_read_iter(iocb, to); } else { + bool do_opu = f2fs_lfs_mode(F2FS_I_SB(inode)); + + if (do_opu) + f2fs_down_read(_I(inode)->i_opu_rwsem); ret = filemap_read(iocb, to, 0); + if (do_opu) + f2fs_up_read(_I(inode)->i_opu_rwsem); if (ret > 0) f2fs_update_iostat(F2FS_I_SB(inode), inode, APP_BUFFERED_READ_IO, ret); @@ -4748,14 +4762,22 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from, ret = -EAGAIN; goto out; } + if (do_opu && !f2fs_down_write_trylock(>i_opu_rwsem)) { + f2fs_up_read(>i_gc_rwsem[READ]); + f2fs_up_read(>i_gc_rwsem[WRITE]); + ret = -EAGAIN; + goto out; + } } else { ret = f2fs_convert_inline_inode(inode); if (ret) goto out; f2fs_down_read(>i_gc_rwsem[WRITE]); - if (do_opu) + if (do_opu) { f2fs_down_read(>i_gc_rwsem[READ]); + f2fs_down_write(>i_opu_rwsem); + } } /* @@ -4779,8 +4801,10 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *
Re: [f2fs-dev] [PATCH 3/3] f2fs: fix to do sanity check on i_nid for inline_data inode
On 2024/5/9 23:52, Jaegeuk Kim wrote: On 05/06, Chao Yu wrote: syzbot reports a f2fs bug as below: [ cut here ] kernel BUG at fs/f2fs/inline.c:258! CPU: 1 PID: 34 Comm: kworker/u8:2 Not tainted 6.9.0-rc6-syzkaller-00012-g9e4bc4bcae01 #0 RIP: 0010:f2fs_write_inline_data+0x781/0x790 fs/f2fs/inline.c:258 Call Trace: f2fs_write_single_data_page+0xb65/0x1d60 fs/f2fs/data.c:2834 f2fs_write_cache_pages fs/f2fs/data.c:3133 [inline] __f2fs_write_data_pages fs/f2fs/data.c:3288 [inline] f2fs_write_data_pages+0x1efe/0x3a90 fs/f2fs/data.c:3315 do_writepages+0x35b/0x870 mm/page-writeback.c:2612 __writeback_single_inode+0x165/0x10b0 fs/fs-writeback.c:1650 writeback_sb_inodes+0x905/0x1260 fs/fs-writeback.c:1941 wb_writeback+0x457/0xce0 fs/fs-writeback.c:2117 wb_do_writeback fs/fs-writeback.c:2264 [inline] wb_workfn+0x410/0x1090 fs/fs-writeback.c:2304 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0xa12/0x17c0 kernel/workqueue.c:3335 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416 kthread+0x2f2/0x390 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 The root cause is: inline_data inode can be fuzzed, so that there may be valid blkaddr in its direct node, once f2fs triggers background GC to migrate the block, it will hit f2fs_bug_on() during dirty page writeback. Let's add sanity check on i_nid field for inline_data inode, meanwhile, forbid to migrate inline_data inode's data block to fix this issue. Reported-by: syzbot+848062ba19c8782ca...@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/d103ce06174d7...@google.com Signed-off-by: Chao Yu --- fs/f2fs/f2fs.h | 2 +- fs/f2fs/gc.c | 6 ++ fs/f2fs/inline.c | 17 - fs/f2fs/inode.c | 2 +- 4 files changed, 24 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index fced2b7652f4..c876813b5532 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -4146,7 +4146,7 @@ extern struct kmem_cache *f2fs_inode_entry_slab; * inline.c */ bool f2fs_may_inline_data(struct inode *inode); -bool f2fs_sanity_check_inline_data(struct inode *inode); +bool f2fs_sanity_check_inline_data(struct inode *inode, struct page *ipage); bool f2fs_may_inline_dentry(struct inode *inode); void f2fs_do_read_inline_data(struct page *page, struct page *ipage); void f2fs_truncate_inline_inode(struct inode *inode, diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index e86c7f01539a..041957750478 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -1563,6 +1563,12 @@ static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, continue; } + if (f2fs_has_inline_data(inode)) { + iput(inode); + set_sbi_flag(sbi, SBI_NEED_FSCK); + continue; Any race condtion to get this as false alarm? Since there is no reproducer for the bug, I doubt it was caused by metadata fuzzing, something like this: - inline inode has one valid blkaddr in i_addr or in dnode reference by i_nid; - SIT/SSA entry of the block is valid; - background GC migrates the block; - kworker writeback it, and trigger the bug_on(). Thoughts? Thanks, + } + err = f2fs_gc_pinned_control(inode, gc_type, segno); if (err == -EAGAIN) { iput(inode); diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c index ac00423f117b..067600fed3d4 100644 --- a/fs/f2fs/inline.c +++ b/fs/f2fs/inline.c @@ -33,11 +33,26 @@ bool f2fs_may_inline_data(struct inode *inode) return !f2fs_post_read_required(inode); } -bool f2fs_sanity_check_inline_data(struct inode *inode) +static bool has_node_blocks(struct inode *inode, struct page *ipage) +{ + struct f2fs_inode *ri = F2FS_INODE(ipage); + int i; + + for (i = 0; i < DEF_NIDS_PER_INODE; i++) { + if (ri->i_nid[i]) + return true; + } + return false; +} + +bool f2fs_sanity_check_inline_data(struct inode *inode, struct page *ipage) { if (!f2fs_has_inline_data(inode)) return false; + if (has_node_blocks(inode, ipage)) + return false; + if (!support_inline_data(inode)) return true; diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index c26effdce9aa..1423cd27a477 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -343,7 +343,7 @@ static bool sanity_check_inode(struct inode *inode, struct page *node_page) } } - if (f2fs_sanity_check_inline_data(inode)) { + if (f2fs_sanity_check_inline_data(inode, node_page)) { f2fs_warn(sbi, "%s: inode (ino=%lx, mode=%u) should not have inline_data, run
Re: [f2fs-dev] [PATCH 2/3] f2fs: fix to add missing iput() in gc_data_segment()
On 2024/5/9 8:46, Jaegeuk Kim wrote: On 05/06, Chao Yu wrote: During gc_data_segment(), if inode state is abnormal, it missed to call iput(), fix it. Fixes: 132e3209789c ("f2fs: remove false alarm on iget failure during GC") Fixes: 9056d6489f5a ("f2fs: fix to do sanity check on inode type during garbage collection") Signed-off-by: Chao Yu --- fs/f2fs/gc.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 8852814dab7f..e86c7f01539a 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -1554,10 +1554,15 @@ static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, int err; inode = f2fs_iget(sb, dni.ino); - if (IS_ERR(inode) || is_bad_inode(inode) || - special_file(inode->i_mode)) + if (IS_ERR(inode)) continue; + if (is_bad_inode(inode) || + special_file(inode->i_mode)) { + iput(inode); iget_failed() called iput()? It looks the bad inode was referenced in this context, it needs to be iput()ed here. The bad inode was made in other thread, please check description in commit b73e52824c89 ("f2fs: reposition unlock_new_inode to prevent accessing invalid inode"). Thanks, + continue; + } + err = f2fs_gc_pinned_control(inode, gc_type, segno); if (err == -EAGAIN) { iput(inode); -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH] f2fs: fix to avoid racing in between buffered read and OPU dio write
If lfs mode is on, buffered read may race w/ OPU dio write as below, it may cause buffered read hits unwritten data unexpectly. Thread AThread B - f2fs_file_write_iter - f2fs_dio_write_iter - __iomap_dio_rw - f2fs_iomap_begin - f2fs_map_blocks - __allocate_data_block - allocated blkaddr #x - iomap_dio_submit_bio - f2fs_file_read_iter - filemap_read - f2fs_read_data_folio - f2fs_mpage_readpages - f2fs_map_blocks : get blkaddr #x - f2fs_submit_read_bio IRQ - f2fs_read_end_io : read IO on blkaddr #x complete IRQ - iomap_dio_bio_end_io : direct write IO on blkaddr #x complete This patch introduces a new per-inode i_opu_rwsem lock to avoid such race condition. Fixes: f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode") Signed-off-by: Chao Yu --- fs/f2fs/f2fs.h | 1 + fs/f2fs/file.c | 20 ++-- fs/f2fs/super.c | 1 + 3 files changed, 20 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 145b985bf252..b69ec1109572 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -847,6 +847,7 @@ struct f2fs_inode_info { /* avoid racing between foreground op and gc */ struct f2fs_rwsem i_gc_rwsem[2]; struct f2fs_rwsem i_xattr_sem; /* avoid racing between reading and changing EAs */ + struct f2fs_rwsem i_opu_rwsem; /* avoid racing between buf read and opu dio write */ int i_extra_isize; /* size of extra space located in i_addr */ kprojid_t i_projid; /* id for project quota */ diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index ef4cfb4436ef..c761db952b37 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -4545,7 +4545,13 @@ static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *to) if (f2fs_should_use_dio(inode, iocb, to)) { ret = f2fs_dio_read_iter(iocb, to); } else { + bool do_opu = f2fs_lfs_mode(F2FS_I_SB(inode)); + + if (do_opu) + f2fs_down_read(_I(inode)->i_opu_rwsem); ret = filemap_read(iocb, to, 0); + if (do_opu) + f2fs_up_read(_I(inode)->i_opu_rwsem); if (ret > 0) f2fs_update_iostat(F2FS_I_SB(inode), inode, APP_BUFFERED_READ_IO, ret); @@ -4770,14 +4776,22 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from, ret = -EAGAIN; goto out; } + if (do_opu && !f2fs_down_write_trylock(>i_opu_rwsem)) { + f2fs_up_read(>i_gc_rwsem[READ]); + f2fs_up_read(>i_gc_rwsem[WRITE]); + ret = -EAGAIN; + goto out; + } } else { ret = f2fs_convert_inline_inode(inode); if (ret) goto out; f2fs_down_read(>i_gc_rwsem[WRITE]); - if (do_opu) + if (do_opu) { f2fs_down_read(>i_gc_rwsem[READ]); + f2fs_down_write(>i_opu_rwsem); + } } /* @@ -4801,8 +4815,10 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from, ret = iomap_dio_complete(dio); } - if (do_opu) + if (do_opu) { + f2fs_up_write(>i_opu_rwsem); f2fs_up_read(>i_gc_rwsem[READ]); + } f2fs_up_read(>i_gc_rwsem[WRITE]); if (ret < 0) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index daf2c4dbe150..b4ed3b094366 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -1428,6 +1428,7 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb) init_f2fs_rwsem(>i_gc_rwsem[READ]); init_f2fs_rwsem(>i_gc_rwsem[WRITE]); init_f2fs_rwsem(>i_xattr_sem); + init_f2fs_rwsem(>i_opu_rwsem); /* Will be used by directory only */ fi->i_dir_level = F2FS_SB(sb)->dir_level; -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH v2 5/5] f2fs: compress: don't allow unaligned truncation on released compress inode
f2fs image may be corrupted after below testcase: - mkfs.f2fs -O extra_attr,compression -f /dev/vdb - mount /dev/vdb /mnt/f2fs - touch /mnt/f2fs/file - f2fs_io setflags compression /mnt/f2fs/file - dd if=/dev/zero of=/mnt/f2fs/file bs=4k count=4 - f2fs_io release_cblocks /mnt/f2fs/file - truncate -s 8192 /mnt/f2fs/file - umount /mnt/f2fs - fsck.f2fs /dev/vdb [ASSERT] (fsck_chk_inode_blk:1256) --> ino: 0x5 has i_blocks: 0x0002, but has 0x3 blocks [FSCK] valid_block_count matching with CP [Fail] [0x4, 0x5] [FSCK] other corrupted bugs [Fail] The reason is: partial truncation assume compressed inode has reserved blocks, after partial truncation, valid block count may change w/ .i_blocks and .total_valid_block_count update, result in corruption. This patch only allow cluster size aligned truncation on released compress inode for fixing. Fixes: c61404153eb6 ("f2fs: introduce FI_COMPRESS_RELEASED instead of using IMMUTABLE bit") Signed-off-by: Chao Yu --- v2: - fix compile warning reported by lkp. fs/f2fs/file.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 3f0db351e976..0c8194dc6807 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -952,9 +952,14 @@ int f2fs_setattr(struct mnt_idmap *idmap, struct dentry *dentry, ATTR_GID | ATTR_TIMES_SET return -EPERM; - if ((attr->ia_valid & ATTR_SIZE) && - !f2fs_is_compress_backend_ready(inode)) - return -EOPNOTSUPP; + if ((attr->ia_valid & ATTR_SIZE)) { + if (!f2fs_is_compress_backend_ready(inode)) + return -EOPNOTSUPP; + if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED) && + !IS_ALIGNED(attr->ia_size, + F2FS_BLK_TO_BYTES(F2FS_I(inode)->i_cluster_size))) + return -EINVAL; + } err = setattr_prepare(idmap, dentry, attr); if (err) -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH v2] f2fs: check validation of fault attrs in f2fs_build_fault_attr()
- It missed to check validation of fault attrs in parse_options(), let's fix to add check condition in f2fs_build_fault_attr(). - Use f2fs_build_fault_attr() in __sbi_store() to clean up code. Signed-off-by: Chao Yu --- v2: - add static for f2fs_build_fault_attr(). fs/f2fs/f2fs.h | 12 fs/f2fs/super.c | 27 --- fs/f2fs/sysfs.c | 14 ++ 3 files changed, 38 insertions(+), 15 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 95a40d4f778f..a29576f46796 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -72,7 +72,7 @@ enum { struct f2fs_fault_info { atomic_t inject_ops; - unsigned int inject_rate; + int inject_rate; unsigned int inject_type; }; @@ -4597,10 +4597,14 @@ static inline bool f2fs_need_verity(const struct inode *inode, pgoff_t idx) } #ifdef CONFIG_F2FS_FAULT_INJECTION -extern void f2fs_build_fault_attr(struct f2fs_sb_info *sbi, unsigned int rate, - unsigned int type); +extern int f2fs_build_fault_attr(struct f2fs_sb_info *sbi, unsigned long rate, + unsigned long type); #else -#define f2fs_build_fault_attr(sbi, rate, type) do { } while (0) +static int f2fs_build_fault_attr(struct f2fs_sb_info *sbi, unsigned long rate, + unsigned long type) +{ + return 0; +} #endif static inline bool is_journalled_quota(struct f2fs_sb_info *sbi) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index a4bc26dfdb1a..94918ae7eddb 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -66,21 +66,31 @@ const char *f2fs_fault_name[FAULT_MAX] = { [FAULT_NO_SEGMENT] = "no free segment", }; -void f2fs_build_fault_attr(struct f2fs_sb_info *sbi, unsigned int rate, - unsigned int type) +int f2fs_build_fault_attr(struct f2fs_sb_info *sbi, unsigned long rate, + unsigned long type) { struct f2fs_fault_info *ffi = _OPTION(sbi).fault_info; if (rate) { + if (rate > INT_MAX) + return -EINVAL; atomic_set(>inject_ops, 0); - ffi->inject_rate = rate; + ffi->inject_rate = (int)rate; } - if (type) - ffi->inject_type = type; + if (type) { + if (type >= BIT(FAULT_MAX)) + return -EINVAL; + ffi->inject_type = (unsigned int)type; + } if (!rate && !type) memset(ffi, 0, sizeof(struct f2fs_fault_info)); + else + f2fs_info(sbi, + "build fault injection attr: rate: %lu, type: 0x%lx", + rate, type); + return 0; } #endif @@ -886,14 +896,17 @@ static int parse_options(struct super_block *sb, char *options, bool is_remount) case Opt_fault_injection: if (args->from && match_int(args, )) return -EINVAL; - f2fs_build_fault_attr(sbi, arg, F2FS_ALL_FAULT_TYPE); + if (f2fs_build_fault_attr(sbi, arg, + F2FS_ALL_FAULT_TYPE)) + return -EINVAL; set_opt(sbi, FAULT_INJECTION); break; case Opt_fault_type: if (args->from && match_int(args, )) return -EINVAL; - f2fs_build_fault_attr(sbi, 0, arg); + if (f2fs_build_fault_attr(sbi, 0, arg)) + return -EINVAL; set_opt(sbi, FAULT_INJECTION); break; #else diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c index a568ce96cf56..7aa3844e7a80 100644 --- a/fs/f2fs/sysfs.c +++ b/fs/f2fs/sysfs.c @@ -484,10 +484,16 @@ static ssize_t __sbi_store(struct f2fs_attr *a, if (ret < 0) return ret; #ifdef CONFIG_F2FS_FAULT_INJECTION - if (a->struct_type == FAULT_INFO_TYPE && t >= BIT(FAULT_MAX)) - return -EINVAL; - if (a->struct_type == FAULT_INFO_RATE && t >= UINT_MAX) - return -EINVAL; + if (a->struct_type == FAULT_INFO_TYPE) { + if (f2fs_build_fault_attr(sbi, 0, t)) + return -EINVAL; + return count; + } + if (a->struct_type == FAULT_INFO_RATE) { + if (f2fs_build_fault_attr(sbi, t, 0)) + return -EINVAL; + return count; + } #endif if (a->struct_type == RESERVED_BLOCKS) { spi
[f2fs-dev] [PATCH v2 1/3] f2fs: fix to release node block count in error path of f2fs_new_node_page()
It missed to call dec_valid_node_count() to release node block count in error path, fix it. Fixes: 141170b759e0 ("f2fs: fix to avoid use f2fs_bug_on() in f2fs_new_node_page()") Signed-off-by: Chao Yu --- v2: - avoid comppile warning if CONFIG_F2FS_CHECK_FS is off. fs/f2fs/node.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index b3de6d6cdb02..7df5ad84cb5e 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1319,6 +1319,7 @@ struct page *f2fs_new_node_page(struct dnode_of_data *dn, unsigned int ofs) } if (unlikely(new_ni.blk_addr != NULL_ADDR)) { err = -EFSCORRUPTED; + dec_valid_node_count(sbi, dn->inode, !ofs); set_sbi_flag(sbi, SBI_NEED_FSCK); f2fs_handle_error(sbi, ERROR_INVALID_BLKADDR); goto fail; @@ -1345,7 +1346,6 @@ struct page *f2fs_new_node_page(struct dnode_of_data *dn, unsigned int ofs) if (ofs == 0) inc_valid_inode_count(sbi); return page; - fail: clear_node_page_dirty(page); f2fs_put_page(page, 1); -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH] f2fs: use f2fs_{err, info}_ratelimited() for cleanup
Commit b1c9d3f833ba ("f2fs: support printk_ratelimited() in f2fs_printk()") missed some cases, cover all remains for cleanup. Signed-off-by: Chao Yu --- fs/f2fs/compress.c | 54 +- fs/f2fs/segment.c | 5 ++--- 2 files changed, 26 insertions(+), 33 deletions(-) diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c index 8892c8262141..3c70a9697063 100644 --- a/fs/f2fs/compress.c +++ b/fs/f2fs/compress.c @@ -198,8 +198,8 @@ static int lzo_compress_pages(struct compress_ctx *cc) ret = lzo1x_1_compress(cc->rbuf, cc->rlen, cc->cbuf->cdata, >clen, cc->private); if (ret != LZO_E_OK) { - printk_ratelimited("%sF2FS-fs (%s): lzo compress failed, ret:%d\n", - KERN_ERR, F2FS_I_SB(cc->inode)->sb->s_id, ret); + f2fs_err_ratelimited(F2FS_I_SB(cc->inode), + "lzo compress failed, ret:%d", ret); return -EIO; } return 0; @@ -212,17 +212,15 @@ static int lzo_decompress_pages(struct decompress_io_ctx *dic) ret = lzo1x_decompress_safe(dic->cbuf->cdata, dic->clen, dic->rbuf, >rlen); if (ret != LZO_E_OK) { - printk_ratelimited("%sF2FS-fs (%s): lzo decompress failed, ret:%d\n", - KERN_ERR, F2FS_I_SB(dic->inode)->sb->s_id, ret); + f2fs_err_ratelimited(F2FS_I_SB(dic->inode), + "lzo decompress failed, ret:%d", ret); return -EIO; } if (dic->rlen != PAGE_SIZE << dic->log_cluster_size) { - printk_ratelimited("%sF2FS-fs (%s): lzo invalid rlen:%zu, " - "expected:%lu\n", KERN_ERR, - F2FS_I_SB(dic->inode)->sb->s_id, - dic->rlen, - PAGE_SIZE << dic->log_cluster_size); + f2fs_err_ratelimited(F2FS_I_SB(dic->inode), + "lzo invalid rlen:%zu, expected:%lu", + dic->rlen, PAGE_SIZE << dic->log_cluster_size); return -EIO; } return 0; @@ -294,16 +292,15 @@ static int lz4_decompress_pages(struct decompress_io_ctx *dic) ret = LZ4_decompress_safe(dic->cbuf->cdata, dic->rbuf, dic->clen, dic->rlen); if (ret < 0) { - printk_ratelimited("%sF2FS-fs (%s): lz4 decompress failed, ret:%d\n", - KERN_ERR, F2FS_I_SB(dic->inode)->sb->s_id, ret); + f2fs_err_ratelimited(F2FS_I_SB(dic->inode), + "lz4 decompress failed, ret:%d", ret); return -EIO; } if (ret != PAGE_SIZE << dic->log_cluster_size) { - printk_ratelimited("%sF2FS-fs (%s): lz4 invalid ret:%d, " - "expected:%lu\n", KERN_ERR, - F2FS_I_SB(dic->inode)->sb->s_id, ret, - PAGE_SIZE << dic->log_cluster_size); + f2fs_err_ratelimited(F2FS_I_SB(dic->inode), + "lz4 invalid ret:%d, expected:%lu", + ret, PAGE_SIZE << dic->log_cluster_size); return -EIO; } return 0; @@ -350,9 +347,8 @@ static int zstd_init_compress_ctx(struct compress_ctx *cc) stream = zstd_init_cstream(, 0, workspace, workspace_size); if (!stream) { - printk_ratelimited("%sF2FS-fs (%s): %s zstd_init_cstream failed\n", - KERN_ERR, F2FS_I_SB(cc->inode)->sb->s_id, - __func__); + f2fs_err_ratelimited(F2FS_I_SB(cc->inode), + "%s zstd_init_cstream failed", __func__); kvfree(workspace); return -EIO; } @@ -390,16 +386,16 @@ static int zstd_compress_pages(struct compress_ctx *cc) ret = zstd_compress_stream(stream, , ); if (zstd_is_error(ret)) { - printk_ratelimited("%sF2FS-fs (%s): %s zstd_compress_stream failed, ret: %d\n", - KERN_ERR, F2FS_I_SB(cc->inode)->sb->s_id, + f2fs_err_ratelimited(F2FS_I_SB(cc->inode), + "%s zstd_compress_stream failed, ret: %d", __func__, zstd_get_error_code(ret));
[f2fs-dev] [PATCH] f2fs: check validation of fault attrs in f2fs_build_fault_attr()
- It missed to check validation of fault attrs in parse_options(), let's fix to add check condition in f2fs_build_fault_attr(). - Use f2fs_build_fault_attr() in __sbi_store() to clean up code. Signed-off-by: Chao Yu --- fs/f2fs/f2fs.h | 12 fs/f2fs/super.c | 27 --- fs/f2fs/sysfs.c | 14 ++ 3 files changed, 38 insertions(+), 15 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 95a40d4f778f..b03d75e4eedc 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -72,7 +72,7 @@ enum { struct f2fs_fault_info { atomic_t inject_ops; - unsigned int inject_rate; + int inject_rate; unsigned int inject_type; }; @@ -4597,10 +4597,14 @@ static inline bool f2fs_need_verity(const struct inode *inode, pgoff_t idx) } #ifdef CONFIG_F2FS_FAULT_INJECTION -extern void f2fs_build_fault_attr(struct f2fs_sb_info *sbi, unsigned int rate, - unsigned int type); +extern int f2fs_build_fault_attr(struct f2fs_sb_info *sbi, unsigned long rate, + unsigned long type); #else -#define f2fs_build_fault_attr(sbi, rate, type) do { } while (0) +int f2fs_build_fault_attr(struct f2fs_sb_info *sbi, unsigned long rate, + unsigned long type) +{ + return 0; +} #endif static inline bool is_journalled_quota(struct f2fs_sb_info *sbi) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index a4bc26dfdb1a..94918ae7eddb 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -66,21 +66,31 @@ const char *f2fs_fault_name[FAULT_MAX] = { [FAULT_NO_SEGMENT] = "no free segment", }; -void f2fs_build_fault_attr(struct f2fs_sb_info *sbi, unsigned int rate, - unsigned int type) +int f2fs_build_fault_attr(struct f2fs_sb_info *sbi, unsigned long rate, + unsigned long type) { struct f2fs_fault_info *ffi = _OPTION(sbi).fault_info; if (rate) { + if (rate > INT_MAX) + return -EINVAL; atomic_set(>inject_ops, 0); - ffi->inject_rate = rate; + ffi->inject_rate = (int)rate; } - if (type) - ffi->inject_type = type; + if (type) { + if (type >= BIT(FAULT_MAX)) + return -EINVAL; + ffi->inject_type = (unsigned int)type; + } if (!rate && !type) memset(ffi, 0, sizeof(struct f2fs_fault_info)); + else + f2fs_info(sbi, + "build fault injection attr: rate: %lu, type: 0x%lx", + rate, type); + return 0; } #endif @@ -886,14 +896,17 @@ static int parse_options(struct super_block *sb, char *options, bool is_remount) case Opt_fault_injection: if (args->from && match_int(args, )) return -EINVAL; - f2fs_build_fault_attr(sbi, arg, F2FS_ALL_FAULT_TYPE); + if (f2fs_build_fault_attr(sbi, arg, + F2FS_ALL_FAULT_TYPE)) + return -EINVAL; set_opt(sbi, FAULT_INJECTION); break; case Opt_fault_type: if (args->from && match_int(args, )) return -EINVAL; - f2fs_build_fault_attr(sbi, 0, arg); + if (f2fs_build_fault_attr(sbi, 0, arg)) + return -EINVAL; set_opt(sbi, FAULT_INJECTION); break; #else diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c index a568ce96cf56..7aa3844e7a80 100644 --- a/fs/f2fs/sysfs.c +++ b/fs/f2fs/sysfs.c @@ -484,10 +484,16 @@ static ssize_t __sbi_store(struct f2fs_attr *a, if (ret < 0) return ret; #ifdef CONFIG_F2FS_FAULT_INJECTION - if (a->struct_type == FAULT_INFO_TYPE && t >= BIT(FAULT_MAX)) - return -EINVAL; - if (a->struct_type == FAULT_INFO_RATE && t >= UINT_MAX) - return -EINVAL; + if (a->struct_type == FAULT_INFO_TYPE) { + if (f2fs_build_fault_attr(sbi, 0, t)) + return -EINVAL; + return count; + } + if (a->struct_type == FAULT_INFO_RATE) { + if (f2fs_build_fault_attr(sbi, t, 0)) + return -EINVAL; + return count; + } #endif if (a->struct_type == RESERVED_BLOCKS) { spin_lock(>stat_lock); -- 2.40.1
[f2fs-dev] [PATCH 2/2] f2fs: fix to limit gc_pin_file_threshold
type of f2fs_inode.i_gc_failures, f2fs_inode_info.i_gc_failures, and f2fs_sb_info.gc_pin_file_threshold is __le16, unsigned int, and u64, so it will cause truncation during comparison and persistence. Unifying variable of these three variables to unsigned short, and add an upper boundary limitation for gc_pin_file_threshold. Signed-off-by: Chao Yu --- Documentation/ABI/testing/sysfs-fs-f2fs | 2 +- fs/f2fs/f2fs.h | 4 ++-- fs/f2fs/file.c | 11 ++- fs/f2fs/gc.h| 1 + fs/f2fs/sysfs.c | 7 +++ 5 files changed, 17 insertions(+), 8 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs index 1a4d83953379..cad6c3dc1f9c 100644 --- a/Documentation/ABI/testing/sysfs-fs-f2fs +++ b/Documentation/ABI/testing/sysfs-fs-f2fs @@ -331,7 +331,7 @@ Date: January 2018 Contact: Jaegeuk Kim Description: This indicates how many GC can be failed for the pinned file. If it exceeds this, F2FS doesn't guarantee its pinning - state. 2048 trials is set by default. + state. 2048 trials is set by default, and 65535 as maximum. What: /sys/fs/f2fs//extension_list Date: February 2018 diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 400ff8e1abe0..3dff45cd6cde 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -813,7 +813,7 @@ struct f2fs_inode_info { unsigned char i_dir_level; /* use for dentry level for large dir */ union { unsigned int i_current_depth; /* only for directory depth */ - unsigned int i_gc_failures; /* for gc failure statistic */ + unsigned short i_gc_failures; /* for gc failure statistic */ }; unsigned int i_pino;/* parent inode number */ umode_t i_acl_mode; /* keep file acl mode temporarily */ @@ -1672,7 +1672,7 @@ struct f2fs_sb_info { unsigned long long skipped_gc_rwsem;/* FG_GC only */ /* threshold for gc trials on pinned files */ - u64 gc_pin_file_threshold; + unsigned short gc_pin_file_threshold; struct f2fs_rwsem pin_sem; /* maximum # of trials to find a victim segment for SSR and GC */ diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 200cafc75dce..1b1b08923f7d 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -3194,16 +3194,17 @@ int f2fs_pin_file_control(struct inode *inode, bool inc) struct f2fs_inode_info *fi = F2FS_I(inode); struct f2fs_sb_info *sbi = F2FS_I_SB(inode); - /* Use i_gc_failures for normal file as a risk signal. */ - if (inc) - f2fs_i_gc_failures_write(inode, fi->i_gc_failures + 1); - - if (fi->i_gc_failures > sbi->gc_pin_file_threshold) { + if (fi->i_gc_failures >= sbi->gc_pin_file_threshold) { f2fs_warn(sbi, "%s: Enable GC = ino %lx after %x GC trials", __func__, inode->i_ino, fi->i_gc_failures); clear_inode_flag(inode, FI_PIN_FILE); return -EAGAIN; } + + /* Use i_gc_failures for normal file as a risk signal. */ + if (inc) + f2fs_i_gc_failures_write(inode, fi->i_gc_failures + 1); + return 0; } diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h index 9c0d06c4d19a..a8ea3301b815 100644 --- a/fs/f2fs/gc.h +++ b/fs/f2fs/gc.h @@ -26,6 +26,7 @@ #define LIMIT_FREE_BLOCK 40 /* percentage over invalid + free space */ #define DEF_GC_FAILED_PINNED_FILES 2048 +#define MAX_GC_FAILED_PINNED_FILES USHRT_MAX /* Search max. number of dirty segments to select a victim segment */ #define DEF_MAX_VICTIM_SEARCH 4096 /* covers 8GB */ diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c index 7aa3844e7a80..09d3ecfaa4f1 100644 --- a/fs/f2fs/sysfs.c +++ b/fs/f2fs/sysfs.c @@ -681,6 +681,13 @@ static ssize_t __sbi_store(struct f2fs_attr *a, return count; } + if (!strcmp(a->attr.name, "gc_pin_file_threshold")) { + if (t > MAX_GC_FAILED_PINNED_FILES) + return -EINVAL; + sbi->gc_pin_file_threshold = t; + return count; + } + if (!strcmp(a->attr.name, "gc_reclaimed_segments")) { if (t != 0) return -EINVAL; -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH 1/2] f2fs: remove unused GC_FAILURE_PIN
After commit 3db1de0e582c ("f2fs: change the current atomic write way"), we removed all GC_FAILURE_ATOMIC usage, let's change i_gc_failures[] array to i_pin_failure for cleanup. Meanwhile, let's define i_current_depth and i_gc_failures as union variable due to they won't be valid at the same time. Signed-off-by: Chao Yu --- fs/f2fs/f2fs.h | 14 +- fs/f2fs/file.c | 12 +--- fs/f2fs/inode.c| 6 ++ fs/f2fs/recovery.c | 3 +-- 4 files changed, 13 insertions(+), 22 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index b03d75e4eedc..400ff8e1abe0 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -765,11 +765,6 @@ enum { #define DEF_DIR_LEVEL 0 -enum { - GC_FAILURE_PIN, - MAX_GC_FAILURE -}; - /* used for f2fs_inode_info->flags */ enum { FI_NEW_INODE, /* indicate newly allocated inode */ @@ -816,9 +811,10 @@ struct f2fs_inode_info { unsigned long i_flags; /* keep an inode flags for ioctl */ unsigned char i_advise; /* use to give file attribute hints */ unsigned char i_dir_level; /* use for dentry level for large dir */ - unsigned int i_current_depth; /* only for directory depth */ - /* for gc failure statistic */ - unsigned int i_gc_failures[MAX_GC_FAILURE]; + union { + unsigned int i_current_depth; /* only for directory depth */ + unsigned int i_gc_failures; /* for gc failure statistic */ + }; unsigned int i_pino;/* parent inode number */ umode_t i_acl_mode; /* keep file acl mode temporarily */ @@ -3133,7 +3129,7 @@ static inline void f2fs_i_depth_write(struct inode *inode, unsigned int depth) static inline void f2fs_i_gc_failures_write(struct inode *inode, unsigned int count) { - F2FS_I(inode)->i_gc_failures[GC_FAILURE_PIN] = count; + F2FS_I(inode)->i_gc_failures = count; f2fs_mark_inode_dirty_sync(inode, true); } diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index ac9d6380e433..200cafc75dce 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -3196,13 +3196,11 @@ int f2fs_pin_file_control(struct inode *inode, bool inc) /* Use i_gc_failures for normal file as a risk signal. */ if (inc) - f2fs_i_gc_failures_write(inode, - fi->i_gc_failures[GC_FAILURE_PIN] + 1); + f2fs_i_gc_failures_write(inode, fi->i_gc_failures + 1); - if (fi->i_gc_failures[GC_FAILURE_PIN] > sbi->gc_pin_file_threshold) { + if (fi->i_gc_failures > sbi->gc_pin_file_threshold) { f2fs_warn(sbi, "%s: Enable GC = ino %lx after %x GC trials", - __func__, inode->i_ino, - fi->i_gc_failures[GC_FAILURE_PIN]); + __func__, inode->i_ino, fi->i_gc_failures); clear_inode_flag(inode, FI_PIN_FILE); return -EAGAIN; } @@ -3266,7 +3264,7 @@ static int f2fs_ioc_set_pin_file(struct file *filp, unsigned long arg) } set_inode_flag(inode, FI_PIN_FILE); - ret = F2FS_I(inode)->i_gc_failures[GC_FAILURE_PIN]; + ret = F2FS_I(inode)->i_gc_failures; done: f2fs_update_time(sbi, REQ_TIME); out: @@ -3281,7 +3279,7 @@ static int f2fs_ioc_get_pin_file(struct file *filp, unsigned long arg) __u32 pin = 0; if (is_inode_flag_set(inode, FI_PIN_FILE)) - pin = F2FS_I(inode)->i_gc_failures[GC_FAILURE_PIN]; + pin = F2FS_I(inode)->i_gc_failures; return put_user(pin, (u32 __user *)arg); } diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index 1423cd27a477..9a8c2b63f56d 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -408,8 +408,7 @@ static int do_read_inode(struct inode *inode) if (S_ISDIR(inode->i_mode)) fi->i_current_depth = le32_to_cpu(ri->i_current_depth); else if (S_ISREG(inode->i_mode)) - fi->i_gc_failures[GC_FAILURE_PIN] = - le16_to_cpu(ri->i_gc_failures); + fi->i_gc_failures = le16_to_cpu(ri->i_gc_failures); fi->i_xattr_nid = le32_to_cpu(ri->i_xattr_nid); fi->i_flags = le32_to_cpu(ri->i_flags); if (S_ISREG(inode->i_mode)) @@ -679,8 +678,7 @@ void f2fs_update_inode(struct inode *inode, struct page *node_page) ri->i_current_depth = cpu_to_le32(F2FS_I(inode)->i_current_depth); else if (S_ISREG(inode->i_mode)) - ri->i_gc_failures = - cpu_to_le16(F2FS_I(inode)->i_gc_failures[GC_FAILURE_PIN]); + ri->i_gc_failures = cpu_to_le16(F2FS_I(inode)->i_gc_failures); ri->i_xattr_nid = cpu_to_le32(F2FS_I(in
[f2fs-dev] [PATCH 1/5] f2fs: compress: fix to update i_compr_blocks correctly
Previously, we account reserved blocks and compressed blocks into @compr_blocks, then, f2fs_i_compr_blocks_update(,compr_blocks) will update i_compr_blocks incorrectly, fix it. Meanwhile, for the case all blocks in cluster were reserved, fix to update dn->ofs_in_node correctly. Fixes: eb8fbaa53374 ("f2fs: compress: fix to check unreleased compressed cluster") Signed-off-by: Chao Yu --- fs/f2fs/file.c | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 1761ad125f97..6c84485687d3 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -3641,7 +3641,8 @@ static int reserve_compress_blocks(struct dnode_of_data *dn, pgoff_t count, while (count) { int compr_blocks = 0; - blkcnt_t reserved; + blkcnt_t reserved = 0; + blkcnt_t to_reserved; int ret; for (i = 0; i < cluster_size; i++) { @@ -3661,20 +3662,26 @@ static int reserve_compress_blocks(struct dnode_of_data *dn, pgoff_t count, * fails in release_compress_blocks(), so NEW_ADDR * is a possible case. */ - if (blkaddr == NEW_ADDR || - __is_valid_data_blkaddr(blkaddr)) { + if (blkaddr == NEW_ADDR) { + reserved++; + continue; + } + if (__is_valid_data_blkaddr(blkaddr)) { compr_blocks++; continue; } } - reserved = cluster_size - compr_blocks; + to_reserved = cluster_size - compr_blocks - reserved; /* for the case all blocks in cluster were reserved */ - if (reserved == 1) + if (to_reserved == 1) { + dn->ofs_in_node += cluster_size; goto next; + } - ret = inc_valid_block_count(sbi, dn->inode, , false); + ret = inc_valid_block_count(sbi, dn->inode, + _reserved, false); if (unlikely(ret)) return ret; @@ -3685,7 +3692,7 @@ static int reserve_compress_blocks(struct dnode_of_data *dn, pgoff_t count, f2fs_i_compr_blocks_update(dn->inode, compr_blocks, true); - *reserved_blocks += reserved; + *reserved_blocks += to_reserved; next: count -= cluster_size; } -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH 2/5] f2fs: compress: fix error path of inc_valid_block_count()
If inc_valid_block_count() can not allocate all requested blocks, it needs to release block count in .total_valid_block_count and resevation blocks in inode. Fixes: 54607494875e ("f2fs: compress: fix to avoid inconsistence bewteen i_blocks and dnode") Signed-off-by: Chao Yu --- fs/f2fs/f2fs.h | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index c876813b5532..95a40d4f778f 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -2309,7 +2309,7 @@ static inline void f2fs_i_blocks_write(struct inode *, block_t, bool, bool); static inline int inc_valid_block_count(struct f2fs_sb_info *sbi, struct inode *inode, blkcnt_t *count, bool partial) { - blkcnt_t diff = 0, release = 0; + long long diff = 0, release = 0; block_t avail_user_block_count; int ret; @@ -2329,26 +2329,27 @@ static inline int inc_valid_block_count(struct f2fs_sb_info *sbi, percpu_counter_add(>alloc_valid_block_count, (*count)); spin_lock(>stat_lock); - sbi->total_valid_block_count += (block_t)(*count); - avail_user_block_count = get_available_block_count(sbi, inode, true); - if (unlikely(sbi->total_valid_block_count > avail_user_block_count)) { + avail_user_block_count = get_available_block_count(sbi, inode, true); + diff = (long long)sbi->total_valid_block_count + *count - + avail_user_block_count; + if (unlikely(diff > 0)) { if (!partial) { spin_unlock(>stat_lock); + release = *count; goto enospc; } - - diff = sbi->total_valid_block_count - avail_user_block_count; if (diff > *count) diff = *count; *count -= diff; release = diff; - sbi->total_valid_block_count -= diff; if (!*count) { spin_unlock(>stat_lock); goto enospc; } } + sbi->total_valid_block_count += (block_t)(*count); + spin_unlock(>stat_lock); if (unlikely(release)) { -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH 5/5] f2fs: compress: don't allow unaligned truncation on released compress inode
f2fs image may be corrupted after below testcase: - mkfs.f2fs -O extra_attr,compression -f /dev/vdb - mount /dev/vdb /mnt/f2fs - touch /mnt/f2fs/file - f2fs_io setflags compression /mnt/f2fs/file - dd if=/dev/zero of=/mnt/f2fs/file bs=4k count=4 - f2fs_io release_cblocks /mnt/f2fs/file - truncate -s 8192 /mnt/f2fs/file - umount /mnt/f2fs - fsck.f2fs /dev/vdb [ASSERT] (fsck_chk_inode_blk:1256) --> ino: 0x5 has i_blocks: 0x0002, but has 0x3 blocks [FSCK] valid_block_count matching with CP [Fail] [0x4, 0x5] [FSCK] other corrupted bugs [Fail] The reason is: partial truncation assume compressed inode has reserved blocks, after partial truncation, valid block count may change w/ .i_blocks and .total_valid_block_count update, result in corruption. This patch only allow cluster size aligned truncation on released compress inode for fixing. Fixes: c61404153eb6 ("f2fs: introduce FI_COMPRESS_RELEASED instead of using IMMUTABLE bit") Signed-off-by: Chao Yu --- fs/f2fs/file.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 3f0db351e976..ac9d6380e433 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -952,9 +952,14 @@ int f2fs_setattr(struct mnt_idmap *idmap, struct dentry *dentry, ATTR_GID | ATTR_TIMES_SET return -EPERM; - if ((attr->ia_valid & ATTR_SIZE) && - !f2fs_is_compress_backend_ready(inode)) - return -EOPNOTSUPP; + if ((attr->ia_valid & ATTR_SIZE)) { + if (!f2fs_is_compress_backend_ready(inode)) + return -EOPNOTSUPP; + if (is_inode_flag_set(inode, FI_COMPRESS_RELEASED) && + (attr->ia_size % + F2FS_BLK_TO_BYTES(F2FS_I(inode)->i_cluster_size))) + return -EINVAL; + } err = setattr_prepare(idmap, dentry, attr); if (err) -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH 3/5] f2fs: compress: fix typo in f2fs_reserve_compress_blocks()
s/released/reserved. Signed-off-by: Chao Yu --- fs/f2fs/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 6c84485687d3..e77e958a9f92 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -3785,7 +3785,7 @@ static int f2fs_reserve_compress_blocks(struct file *filp, unsigned long arg) } else if (reserved_blocks && atomic_read(_I(inode)->i_compr_blocks)) { set_sbi_flag(sbi, SBI_NEED_FSCK); - f2fs_warn(sbi, "%s: partial blocks were released i_ino=%lx " + f2fs_warn(sbi, "%s: partial blocks were reserved i_ino=%lx " "iblocks=%llu, reserved=%u, compr_blocks=%u, " "run fsck to fix.", __func__, inode->i_ino, inode->i_blocks, -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH 4/5] f2fs: compress: fix to cover {reserve, release}_compress_blocks() w/ cp_rwsem lock
It needs to cover {reserve,release}_compress_blocks() w/ cp_rwsem lock to avoid racing with checkpoint, otherwise, filesystem metadata including blkaddr in dnode, inode fields and .total_valid_block_count may be corrupted after SPO case. Fixes: ef8d563f184e ("f2fs: introduce F2FS_IOC_RELEASE_COMPRESS_BLOCKS") Fixes: c75488fb4d82 ("f2fs: introduce F2FS_IOC_RESERVE_COMPRESS_BLOCKS") Signed-off-by: Chao Yu --- fs/f2fs/file.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index e77e958a9f92..3f0db351e976 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -3570,9 +3570,12 @@ static int f2fs_release_compress_blocks(struct file *filp, unsigned long arg) struct dnode_of_data dn; pgoff_t end_offset, count; + f2fs_lock_op(sbi); + set_new_dnode(, inode, NULL, NULL, 0); ret = f2fs_get_dnode_of_data(, page_idx, LOOKUP_NODE); if (ret) { + f2fs_unlock_op(sbi); if (ret == -ENOENT) { page_idx = f2fs_get_next_page_offset(, page_idx); @@ -3590,6 +3593,8 @@ static int f2fs_release_compress_blocks(struct file *filp, unsigned long arg) f2fs_put_dnode(); + f2fs_unlock_op(sbi); + if (ret < 0) break; @@ -3742,9 +3747,12 @@ static int f2fs_reserve_compress_blocks(struct file *filp, unsigned long arg) struct dnode_of_data dn; pgoff_t end_offset, count; + f2fs_lock_op(sbi); + set_new_dnode(, inode, NULL, NULL, 0); ret = f2fs_get_dnode_of_data(, page_idx, LOOKUP_NODE); if (ret) { + f2fs_unlock_op(sbi); if (ret == -ENOENT) { page_idx = f2fs_get_next_page_offset(, page_idx); @@ -3762,6 +3770,8 @@ static int f2fs_reserve_compress_blocks(struct file *filp, unsigned long arg) f2fs_put_dnode(); + f2fs_unlock_op(sbi); + if (ret < 0) break; -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH 2/3] f2fs: fix to add missing iput() in gc_data_segment()
During gc_data_segment(), if inode state is abnormal, it missed to call iput(), fix it. Fixes: 132e3209789c ("f2fs: remove false alarm on iget failure during GC") Fixes: 9056d6489f5a ("f2fs: fix to do sanity check on inode type during garbage collection") Signed-off-by: Chao Yu --- fs/f2fs/gc.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 8852814dab7f..e86c7f01539a 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -1554,10 +1554,15 @@ static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, int err; inode = f2fs_iget(sb, dni.ino); - if (IS_ERR(inode) || is_bad_inode(inode) || - special_file(inode->i_mode)) + if (IS_ERR(inode)) continue; + if (is_bad_inode(inode) || + special_file(inode->i_mode)) { + iput(inode); + continue; + } + err = f2fs_gc_pinned_control(inode, gc_type, segno); if (err == -EAGAIN) { iput(inode); -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH 3/3] f2fs: fix to do sanity check on i_nid for inline_data inode
syzbot reports a f2fs bug as below: [ cut here ] kernel BUG at fs/f2fs/inline.c:258! CPU: 1 PID: 34 Comm: kworker/u8:2 Not tainted 6.9.0-rc6-syzkaller-00012-g9e4bc4bcae01 #0 RIP: 0010:f2fs_write_inline_data+0x781/0x790 fs/f2fs/inline.c:258 Call Trace: f2fs_write_single_data_page+0xb65/0x1d60 fs/f2fs/data.c:2834 f2fs_write_cache_pages fs/f2fs/data.c:3133 [inline] __f2fs_write_data_pages fs/f2fs/data.c:3288 [inline] f2fs_write_data_pages+0x1efe/0x3a90 fs/f2fs/data.c:3315 do_writepages+0x35b/0x870 mm/page-writeback.c:2612 __writeback_single_inode+0x165/0x10b0 fs/fs-writeback.c:1650 writeback_sb_inodes+0x905/0x1260 fs/fs-writeback.c:1941 wb_writeback+0x457/0xce0 fs/fs-writeback.c:2117 wb_do_writeback fs/fs-writeback.c:2264 [inline] wb_workfn+0x410/0x1090 fs/fs-writeback.c:2304 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0xa12/0x17c0 kernel/workqueue.c:3335 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416 kthread+0x2f2/0x390 kernel/kthread.c:388 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 The root cause is: inline_data inode can be fuzzed, so that there may be valid blkaddr in its direct node, once f2fs triggers background GC to migrate the block, it will hit f2fs_bug_on() during dirty page writeback. Let's add sanity check on i_nid field for inline_data inode, meanwhile, forbid to migrate inline_data inode's data block to fix this issue. Reported-by: syzbot+848062ba19c8782ca...@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/d103ce06174d7...@google.com Signed-off-by: Chao Yu --- fs/f2fs/f2fs.h | 2 +- fs/f2fs/gc.c | 6 ++ fs/f2fs/inline.c | 17 - fs/f2fs/inode.c | 2 +- 4 files changed, 24 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index fced2b7652f4..c876813b5532 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -4146,7 +4146,7 @@ extern struct kmem_cache *f2fs_inode_entry_slab; * inline.c */ bool f2fs_may_inline_data(struct inode *inode); -bool f2fs_sanity_check_inline_data(struct inode *inode); +bool f2fs_sanity_check_inline_data(struct inode *inode, struct page *ipage); bool f2fs_may_inline_dentry(struct inode *inode); void f2fs_do_read_inline_data(struct page *page, struct page *ipage); void f2fs_truncate_inline_inode(struct inode *inode, diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index e86c7f01539a..041957750478 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -1563,6 +1563,12 @@ static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum, continue; } + if (f2fs_has_inline_data(inode)) { + iput(inode); + set_sbi_flag(sbi, SBI_NEED_FSCK); + continue; + } + err = f2fs_gc_pinned_control(inode, gc_type, segno); if (err == -EAGAIN) { iput(inode); diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c index ac00423f117b..067600fed3d4 100644 --- a/fs/f2fs/inline.c +++ b/fs/f2fs/inline.c @@ -33,11 +33,26 @@ bool f2fs_may_inline_data(struct inode *inode) return !f2fs_post_read_required(inode); } -bool f2fs_sanity_check_inline_data(struct inode *inode) +static bool has_node_blocks(struct inode *inode, struct page *ipage) +{ + struct f2fs_inode *ri = F2FS_INODE(ipage); + int i; + + for (i = 0; i < DEF_NIDS_PER_INODE; i++) { + if (ri->i_nid[i]) + return true; + } + return false; +} + +bool f2fs_sanity_check_inline_data(struct inode *inode, struct page *ipage) { if (!f2fs_has_inline_data(inode)) return false; + if (has_node_blocks(inode, ipage)) + return false; + if (!support_inline_data(inode)) return true; diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index c26effdce9aa..1423cd27a477 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -343,7 +343,7 @@ static bool sanity_check_inode(struct inode *inode, struct page *node_page) } } - if (f2fs_sanity_check_inline_data(inode)) { + if (f2fs_sanity_check_inline_data(inode, node_page)) { f2fs_warn(sbi, "%s: inode (ino=%lx, mode=%u) should not have inline_data, run fsck to fix", __func__, inode->i_ino, inode->i_mode); return false; -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH 1/3] f2fs: fix to release node block count in error path of f2fs_new_node_page()
It missed to call dec_valid_node_count() to release node block count in error path, fix it. Fixes: 141170b759e0 ("f2fs: fix to avoid use f2fs_bug_on() in f2fs_new_node_page()") Signed-off-by: Chao Yu --- fs/f2fs/node.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index b3de6d6cdb02..ae39971825bc 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1313,15 +1313,14 @@ struct page *f2fs_new_node_page(struct dnode_of_data *dn, unsigned int ofs) #ifdef CONFIG_F2FS_CHECK_FS err = f2fs_get_node_info(sbi, dn->nid, _ni, false); - if (err) { - dec_valid_node_count(sbi, dn->inode, !ofs); - goto fail; - } + if (err) + goto out_dec; + if (unlikely(new_ni.blk_addr != NULL_ADDR)) { err = -EFSCORRUPTED; set_sbi_flag(sbi, SBI_NEED_FSCK); f2fs_handle_error(sbi, ERROR_INVALID_BLKADDR); - goto fail; + goto out_dec; } #endif new_ni.nid = dn->nid; @@ -1345,7 +1344,8 @@ struct page *f2fs_new_node_page(struct dnode_of_data *dn, unsigned int ofs) if (ofs == 0) inc_valid_inode_count(sbi); return page; - +out_dec: + dec_valid_node_count(sbi, dn->inode, !ofs); fail: clear_node_page_dirty(page); f2fs_put_page(page, 1); -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] mkfs.f2fs: align each device to zone size
On 2024/4/10 20:38, Sheng Yong wrote: For multiple device, each device should be aligned to zone size, instead of aligning the total size. Signed-off-by: Sheng Yong Reviewed-by: Chao Yu Thanks, ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH v2] f2fs: fix block migration when section is not aligned to pow2
On 2024/4/29 11:51, Wu Bo wrote: As for zoned-UFS, f2fs section size is forced to zone size. And zone size may not aligned to pow2. Fixes: 859fca6b706e ("f2fs: swap: support migrating swapfile in aligned write mode") Signed-off-by: Liao Yuanhong Signed-off-by: Wu Bo Reviewed-by: Chao Yu Thanks, ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] f2fs:remove the restriction on zone sector being align to pow2
On 2024/4/28 19:14, Liao Yuanhong wrote: For zoned-UFS, sector size may not aligned to pow2, so we need to remove the pow2 limitation. Signed-off-by: Liao Yuanhong --- drivers/md/dm-table.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index 41f1d731ae5a..823f2f6a2d53 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c Hi, please discuss this in dm-de...@lists.linux.dev, thanks. Thanks, @@ -1663,10 +1663,6 @@ static int validate_hardware_zoned(struct dm_table *t, bool zoned, return -EINVAL; } - /* Check zone size validity and compatibility */ - if (!zone_sectors || !is_power_of_2(zone_sectors)) - return -EINVAL; - if (dm_table_any_dev_attr(t, device_not_matches_zone_sectors, _sectors)) { DMERR("%s: zone sectors is not consistent across all zoned devices", dm_device_name(t->md)); ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 3/3] f2fs: fix false alarm on invalid block address
On 2024/4/28 9:23, Daeho Jeong wrote: I have a question. Is it okay for META_GENERIC? It seems all users of META_GENERIC comes from IO paths: a) f2fs_merge_page_bio b) f2fs_submit_page_bio c) f2fs_submit_page_write - verify_fio_blkaddr They are all impossible cases? so it's fine to record the error for this case? Thanks, ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH v2 3/8] f2fs: drop usage of page_index
On 2024/4/24 6:58, Matthew Wilcox wrote: On Wed, Apr 24, 2024 at 01:03:34AM +0800, Kairui Song wrote: @@ -4086,8 +4086,7 @@ void f2fs_clear_page_cache_dirty_tag(struct page *page) unsigned long flags; xa_lock_irqsave(>i_pages, flags); - __xa_clear_mark(>i_pages, page_index(page), - PAGECACHE_TAG_DIRTY); + __xa_clear_mark(>i_pages, page->index, PAGECACHE_TAG_DIRTY); xa_unlock_irqrestore(>i_pages, flags); } I just sent a patch which is going to conflict with this: https://lore.kernel.org/linux-mm/20240423225552.4113447-3-wi...@infradead.org/ Chao Yu, Jaegeuk Kim; what are your plans for converting f2fs to use Hi Matthew, I've converted .read_folio and .readahead of f2fs to use folio w/ below patchset, and let me take a look how to support and enable large folio... https://lore.kernel.org/linux-f2fs-devel/20240422062417.2421616-1-c...@kernel.org/ Thanks, folios? This is getting quite urgent. ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH v2] f2fs: zone: fix to don't trigger OPU on pinfile for direct IO
Otherwise, it breaks pinfile's sematics. Cc: Daeho Jeong Signed-off-by: Chao Yu --- v2: - fix to disallow OPU on pinfile no matter what device type f2fs uses. fs/f2fs/data.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index d8e4434e8801..56600dd43834 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1595,8 +1595,9 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag) } /* use out-place-update for direct IO under LFS mode */ - if (map->m_may_create && - (is_hole || (f2fs_lfs_mode(sbi) && flag == F2FS_GET_BLOCK_DIO))) { + if (map->m_may_create && (is_hole || + (flag == F2FS_GET_BLOCK_DIO && f2fs_lfs_mode(sbi) && + !f2fs_is_pinned_file(inode { if (unlikely(f2fs_cp_error(sbi))) { err = -EIO; goto sync_out; -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] f2fs: fix block migration when section is not aligned to pow2
On 2024/4/26 18:41, Wu Bo wrote: As for zoned-UFS, f2fs section size is forced to zone size. And zone size may not aligned to pow2. Fixes: 859fca6b706e ("f2fs: swap: support migrating swapfile in aligned write mode") Signed-off-by: Liao Yuanhong Signed-off-by: Wu Bo Reviewed-by: Chao Yu Thanks, ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] f2fs: zone: fix to don't trigger OPU on pinfile for direct IO
On 2024/4/26 19:30, Zhiguo Niu wrote: Dear Chao, On Fri, Apr 26, 2024 at 6:37 PM Chao Yu wrote: Otherwise, it breaks pinfile's sematics. Cc: Daeho Jeong Signed-off-by: Chao Yu --- fs/f2fs/data.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index bee1e45f76b8..e29000d83d52 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1596,7 +1596,8 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag) /* use out-place-update for direct IO under LFS mode */ if (map->m_may_create && - (is_hole || (f2fs_lfs_mode(sbi) && flag == F2FS_GET_BLOCK_DIO))) { + (is_hole || (flag == F2FS_GET_BLOCK_DIO && (f2fs_lfs_mode(sbi) && + (!f2fs_sb_has_blkzoned(sbi) || !f2fs_is_pinned_file(inode)) { Excuse me I a little question, should pin files not be written in OPU mode regardless of device type(conventional or zone)? Agreed, so it looks we need remove !f2fs_sb_has_blkzoned condition here... Thanks, thanks! if (unlikely(f2fs_cp_error(sbi))) { err = -EIO; goto sync_out; -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] f2fs: zone: fix to don't trigger OPU on pinfile for direct IO
On 2024/4/26 22:14, Daeho Jeong wrote: On Fri, Apr 26, 2024 at 3:35 AM Chao Yu wrote: Otherwise, it breaks pinfile's sematics. Cc: Daeho Jeong Signed-off-by: Chao Yu --- fs/f2fs/data.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index bee1e45f76b8..e29000d83d52 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1596,7 +1596,8 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag) /* use out-place-update for direct IO under LFS mode */ if (map->m_may_create && - (is_hole || (f2fs_lfs_mode(sbi) && flag == F2FS_GET_BLOCK_DIO))) { + (is_hole || (flag == F2FS_GET_BLOCK_DIO && (f2fs_lfs_mode(sbi) && + (!f2fs_sb_has_blkzoned(sbi) || !f2fs_is_pinned_file(inode)) { if (unlikely(f2fs_cp_error(sbi))) { err = -EIO; goto sync_out; -- 2.40.1 So, we block overwrite io for the pinfile here. I guess you mean we blocked append write for pinfile, right? static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) { ... if (f2fs_is_pinned_file(inode) && !f2fs_overwrite_io(inode, pos, count)) { If !f2fs_overwrite_io() is true, it means it may trigger append write on pinfile? Thanks, ret = -EIO; goto out_unlock; } ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [syzbot] [f2fs?] KASAN: slab-out-of-bounds Read in f2fs_get_node_info
#syz test git://git.kernel.org/pub/scm/linux/kernel/git/chao/linux.git bugfix/syzbot On 2024/4/25 15:59, syzbot wrote: Hello, syzbot found the following issue on: HEAD commit:ed30a4a51bb1 Linux 6.9-rc5 git tree: upstream console+strace: https://syzkaller.appspot.com/x/log.txt?x=1116bc3098 kernel config: https://syzkaller.appspot.com/x/.config?x=5a05c230e142f2bc dashboard link: https://syzkaller.appspot.com/bug?extid=3694e283cf5c40df6d14 compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1128486b18 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1516bc3098 Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/7a2e1a02882c/disk-ed30a4a5.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/329966999344/vmlinux-ed30a4a5.xz kernel image: https://storage.googleapis.com/syzbot-assets/1befbdf4dcac/bzImage-ed30a4a5.xz mounted in repro: https://storage.googleapis.com/syzbot-assets/42ddf2738cf7/mount_0.gz IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+3694e283cf5c40df6...@syzkaller.appspotmail.com F2FS-fs (loop0): Mounted with checkpoint version = 48b305e4 == BUG: KASAN: slab-out-of-bounds in f2fs_test_bit fs/f2fs/f2fs.h:2933 [inline] BUG: KASAN: slab-out-of-bounds in current_nat_addr fs/f2fs/node.h:213 [inline] BUG: KASAN: slab-out-of-bounds in f2fs_get_node_info+0xece/0x1200 fs/f2fs/node.c:600 Read of size 1 at addr 88807a58c76c by task syz-executor280/5076 CPU: 1 PID: 5076 Comm: syz-executor280 Not tainted 6.9.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 f2fs_test_bit fs/f2fs/f2fs.h:2933 [inline] current_nat_addr fs/f2fs/node.h:213 [inline] f2fs_get_node_info+0xece/0x1200 fs/f2fs/node.c:600 f2fs_xattr_fiemap fs/f2fs/data.c:1848 [inline] f2fs_fiemap+0x55d/0x1ee0 fs/f2fs/data.c:1925 ioctl_fiemap fs/ioctl.c:220 [inline] do_vfs_ioctl+0x1c07/0x2e50 fs/ioctl.c:838 __do_sys_ioctl fs/ioctl.c:902 [inline] __se_sys_ioctl+0x81/0x170 fs/ioctl.c:890 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f60d34ae739 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 61 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:7ffc9f2f1148 EFLAGS: 0246 ORIG_RAX: 0010 RAX: ffda RBX: 7ffc9f2f1318 RCX: 7f60d34ae739 RDX: 2040 RSI: c020660b RDI: 0004 RBP: 7f60d3527610 R08: R09: 7ffc9f2f1318 R10: 551a R11: 0246 R12: 0001 R13: 7ffc9f2f1308 R14: 0001 R15: 0001 Allocated by task 5076: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 poison_kmalloc_redzone mm/kasan/common.c:370 [inline] __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387 kasan_kmalloc include/linux/kasan.h:211 [inline] __do_kmalloc_node mm/slub.c:3966 [inline] __kmalloc_node_track_caller+0x24e/0x4e0 mm/slub.c:3986 kmemdup+0x2a/0x60 mm/util.c:131 init_node_manager fs/f2fs/node.c:3268 [inline] f2fs_build_node_manager+0x8cc/0x2870 fs/f2fs/node.c:3329 f2fs_fill_super+0x583c/0x8120 fs/f2fs/super.c:4540 mount_bdev+0x20a/0x2d0 fs/super.c:1658 legacy_get_tree+0xee/0x190 fs/fs_context.c:662 vfs_get_tree+0x90/0x2a0 fs/super.c:1779 do_new_mount+0x2be/0xb40 fs/namespace.c:3352 do_mount fs/namespace.c:3692 [inline] __do_sys_mount fs/namespace.c:3898 [inline] __se_sys_mount+0x2d9/0x3c0 fs/namespace.c:3875 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f The buggy address belongs to the object at 88807a58c700 which belongs to the cache kmalloc-64 of size 64 The buggy address is located 44 bytes to the right of allocated 64-byte region [88807a58c700, 88807a58c740) The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping: index:0x0 pfn:0x7a58c flags: 0xfff8000800(slab|node=0|zone=1|lastcpupid=0xfff) page_type: 0x() raw: 00fff8000800 888015041640 eaaa6400 dead0004 raw: 00200020 0001 page dumped because: kasan: bad access detected page_owner tracks the page as allocated
[f2fs-dev] [PATCH] f2fs: zone: fix to don't trigger OPU on pinfile for direct IO
Otherwise, it breaks pinfile's sematics. Cc: Daeho Jeong Signed-off-by: Chao Yu --- fs/f2fs/data.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index bee1e45f76b8..e29000d83d52 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -1596,7 +1596,8 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map, int flag) /* use out-place-update for direct IO under LFS mode */ if (map->m_may_create && - (is_hole || (f2fs_lfs_mode(sbi) && flag == F2FS_GET_BLOCK_DIO))) { + (is_hole || (flag == F2FS_GET_BLOCK_DIO && (f2fs_lfs_mode(sbi) && + (!f2fs_sb_has_blkzoned(sbi) || !f2fs_is_pinned_file(inode)) { if (unlikely(f2fs_cp_error(sbi))) { err = -EIO; goto sync_out; -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] f2fs: remove redundant parameter in is_next_segment_free()
On 2024/4/25 22:55, Yifan Zhao wrote: is_next_segment_free() takes a redundant `type` parameter. Remove it. Signed-off-by: Yifan Zhao Reviewed-by: Chao Yu Thanks, ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 2/2] f2fs: remove unnecessary block size check in init_f2fs_fs()
On 2024/4/16 19:12, Zhiguo Niu wrote: On Tue, Apr 16, 2024 at 3:22 PM Chao Yu wrote: After commit d7e9a9037de2 ("f2fs: Support Block Size == Page Size"), F2FS_BLKSIZE equals to PAGE_SIZE, remove unnecessary check condition. Signed-off-by: Chao Yu --- fs/f2fs/super.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 6d1e4fc629e2..32aa6d6fa871 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -4933,12 +4933,6 @@ static int __init init_f2fs_fs(void) { int err; - if (PAGE_SIZE != F2FS_BLKSIZE) { - printk("F2FS not supported on PAGE_SIZE(%lu) != BLOCK_SIZE(%lu)\n", - PAGE_SIZE, F2FS_BLKSIZE); - return -EINVAL; - } - err = init_inodecache(); if (err) goto fail; Dear Chao, Can you help modify the following comment msg together with this patch? They are also related to commit d7e9a9037de2 ("f2fs: Support Block Size == Page Size"). If you think there is a more suitable description, please help modify it directly. Zhiguo, I missed to reply this, I guess you can update "f2fs: fix some ambiguous comments". thanks! diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h index a357287..241e7b18 100644 --- a/include/linux/f2fs_fs.h +++ b/include/linux/f2fs_fs.h @@ -394,7 +394,8 @@ struct f2fs_nat_block { /* * F2FS uses 4 bytes to represent block address. As a result, supported size of - * disk is 16 TB and it equals to 16 * 1024 * 1024 / 2 segments. + * disk is 16 TB for a 4K page size and 64 TB for a 16K page size and it equals disk is 16 TB for 4K size block and 64 TB for 16K size block and it equals to (1 << 32) / 512 segments. #define F2FS_MAX_SEGMENT((1 << 32) / 512) Thanks, + * to 16 * 1024 * 1024 / 2 segments. */ #define F2FS_MAX_SEGMENT ((16 * 1024 * 1024) / 2) @@ -424,8 +425,10 @@ struct f2fs_sit_block { /* * For segment summary * - * One summary block contains exactly 512 summary entries, which represents - * exactly one segment by default. Not allow to change the basic units. + * One summary block with 4KB size contains exactly 512 summary entries, which + * represents exactly one segment with 2MB size. + * Similarly, in the case of 16k block size, it represents one segment with 8MB size. + * Not allow to change the basic units. * * NOTE: For initializing fields, you must use set_summary * @@ -556,6 +559,7 @@ struct f2fs_summary_block { /* * space utilization of regular dentry and inline dentry (w/o extra reservation) + * when block size is 4KB. -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH] f2fs: fix to avoid allocating WARM_DATA segment for direct IO
If active_log is not 6, we never use WARM_DATA segment, let's avoid allocating WARM_DATA segment for direct IO. Signed-off-by: Yunlei He Signed-off-by: Chao Yu --- fs/f2fs/data.c| 3 ++- fs/f2fs/f2fs.h| 2 +- fs/f2fs/file.c| 5 +++-- fs/f2fs/segment.c | 11 +-- 4 files changed, 15 insertions(+), 6 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index bee1e45f76b8..0c516c653f05 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -4179,7 +4179,8 @@ static int f2fs_iomap_begin(struct inode *inode, loff_t offset, loff_t length, map.m_lblk = bytes_to_blks(inode, offset); map.m_len = bytes_to_blks(inode, offset + length - 1) - map.m_lblk + 1; map.m_next_pgofs = _pgofs; - map.m_seg_type = f2fs_rw_hint_to_seg_type(inode->i_write_hint); + map.m_seg_type = f2fs_rw_hint_to_seg_type(F2FS_I_SB(inode), + inode->i_write_hint); if (flags & IOMAP_WRITE) map.m_may_create = true; diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index e8ff301eaf32..6dd50a6075c0 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -3747,7 +3747,7 @@ int f2fs_build_segment_manager(struct f2fs_sb_info *sbi); void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi); int __init f2fs_create_segment_manager_caches(void); void f2fs_destroy_segment_manager_caches(void); -int f2fs_rw_hint_to_seg_type(enum rw_hint hint); +int f2fs_rw_hint_to_seg_type(struct f2fs_sb_info *sbi, enum rw_hint hint); enum rw_hint f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi, enum page_type type, enum temp_type temp); unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi, diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 856a5d3bd6bf..23601d747716 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -4643,7 +4643,8 @@ static int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *iter, map.m_may_create = true; if (dio) { - map.m_seg_type = f2fs_rw_hint_to_seg_type(inode->i_write_hint); + map.m_seg_type = f2fs_rw_hint_to_seg_type(sbi, + inode->i_write_hint); flag = F2FS_GET_BLOCK_PRE_DIO; } else { map.m_seg_type = NO_CHECK_TYPE; @@ -4696,7 +4697,7 @@ static void f2fs_dio_write_submit_io(const struct iomap_iter *iter, { struct inode *inode = iter->inode; struct f2fs_sb_info *sbi = F2FS_I_SB(inode); - int seg_type = f2fs_rw_hint_to_seg_type(inode->i_write_hint); + int seg_type = f2fs_rw_hint_to_seg_type(sbi, inode->i_write_hint); enum temp_type temp = f2fs_get_segment_temp(seg_type); bio->bi_write_hint = f2fs_io_type_to_rw_hint(sbi, DATA, temp); diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 8313d6aeaf41..94f3380be04c 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -3358,8 +3358,14 @@ int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct fstrim_range *range) return err; } -int f2fs_rw_hint_to_seg_type(enum rw_hint hint) +int f2fs_rw_hint_to_seg_type(struct f2fs_sb_info *sbi, enum rw_hint hint) { + if (F2FS_OPTION(sbi).active_logs == 2) + return CURSEG_HOT_DATA; + else if (F2FS_OPTION(sbi).active_logs == 4) + return CURSEG_COLD_DATA; + + /* active_log == 6 */ switch (hint) { case WRITE_LIFE_SHORT: return CURSEG_HOT_DATA; @@ -3499,7 +3505,8 @@ static int __get_segment_type_6(struct f2fs_io_info *fio) is_inode_flag_set(inode, FI_HOT_DATA) || f2fs_is_cow_file(inode)) return CURSEG_HOT_DATA; - return f2fs_rw_hint_to_seg_type(inode->i_write_hint); + return f2fs_rw_hint_to_seg_type(F2FS_I_SB(inode), + inode->i_write_hint); } else { if (IS_DNODE(fio->page)) return is_cold_node(fio->page) ? CURSEG_WARM_NODE : -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [syzbot] [f2fs?] KASAN: slab-out-of-bounds Read in f2fs_get_node_info
#syz test git://git.kernel.org/pub/scm/linux/kernel/git/chao/linux.git bugfix/syzbot On 2024/4/25 15:59, syzbot wrote: Hello, syzbot found the following issue on: HEAD commit:ed30a4a51bb1 Linux 6.9-rc5 git tree: upstream console+strace: https://syzkaller.appspot.com/x/log.txt?x=1116bc3098 kernel config: https://syzkaller.appspot.com/x/.config?x=5a05c230e142f2bc dashboard link: https://syzkaller.appspot.com/bug?extid=3694e283cf5c40df6d14 compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1128486b18 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1516bc3098 Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/7a2e1a02882c/disk-ed30a4a5.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/329966999344/vmlinux-ed30a4a5.xz kernel image: https://storage.googleapis.com/syzbot-assets/1befbdf4dcac/bzImage-ed30a4a5.xz mounted in repro: https://storage.googleapis.com/syzbot-assets/42ddf2738cf7/mount_0.gz IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+3694e283cf5c40df6...@syzkaller.appspotmail.com F2FS-fs (loop0): Mounted with checkpoint version = 48b305e4 == BUG: KASAN: slab-out-of-bounds in f2fs_test_bit fs/f2fs/f2fs.h:2933 [inline] BUG: KASAN: slab-out-of-bounds in current_nat_addr fs/f2fs/node.h:213 [inline] BUG: KASAN: slab-out-of-bounds in f2fs_get_node_info+0xece/0x1200 fs/f2fs/node.c:600 Read of size 1 at addr 88807a58c76c by task syz-executor280/5076 CPU: 1 PID: 5076 Comm: syz-executor280 Not tainted 6.9.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 f2fs_test_bit fs/f2fs/f2fs.h:2933 [inline] current_nat_addr fs/f2fs/node.h:213 [inline] f2fs_get_node_info+0xece/0x1200 fs/f2fs/node.c:600 f2fs_xattr_fiemap fs/f2fs/data.c:1848 [inline] f2fs_fiemap+0x55d/0x1ee0 fs/f2fs/data.c:1925 ioctl_fiemap fs/ioctl.c:220 [inline] do_vfs_ioctl+0x1c07/0x2e50 fs/ioctl.c:838 __do_sys_ioctl fs/ioctl.c:902 [inline] __se_sys_ioctl+0x81/0x170 fs/ioctl.c:890 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f60d34ae739 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 61 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:7ffc9f2f1148 EFLAGS: 0246 ORIG_RAX: 0010 RAX: ffda RBX: 7ffc9f2f1318 RCX: 7f60d34ae739 RDX: 2040 RSI: c020660b RDI: 0004 RBP: 7f60d3527610 R08: R09: 7ffc9f2f1318 R10: 551a R11: 0246 R12: 0001 R13: 7ffc9f2f1308 R14: 0001 R15: 0001 Allocated by task 5076: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 poison_kmalloc_redzone mm/kasan/common.c:370 [inline] __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387 kasan_kmalloc include/linux/kasan.h:211 [inline] __do_kmalloc_node mm/slub.c:3966 [inline] __kmalloc_node_track_caller+0x24e/0x4e0 mm/slub.c:3986 kmemdup+0x2a/0x60 mm/util.c:131 init_node_manager fs/f2fs/node.c:3268 [inline] f2fs_build_node_manager+0x8cc/0x2870 fs/f2fs/node.c:3329 f2fs_fill_super+0x583c/0x8120 fs/f2fs/super.c:4540 mount_bdev+0x20a/0x2d0 fs/super.c:1658 legacy_get_tree+0xee/0x190 fs/fs_context.c:662 vfs_get_tree+0x90/0x2a0 fs/super.c:1779 do_new_mount+0x2be/0xb40 fs/namespace.c:3352 do_mount fs/namespace.c:3692 [inline] __do_sys_mount fs/namespace.c:3898 [inline] __se_sys_mount+0x2d9/0x3c0 fs/namespace.c:3875 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f The buggy address belongs to the object at 88807a58c700 which belongs to the cache kmalloc-64 of size 64 The buggy address is located 44 bytes to the right of allocated 64-byte region [88807a58c700, 88807a58c740) The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping: index:0x0 pfn:0x7a58c flags: 0xfff8000800(slab|node=0|zone=1|lastcpupid=0xfff) page_type: 0x() raw: 00fff8000800 888015041640 eaaa6400 dead0004 raw: 00200020 0001 page dumped because: kasan: bad access detected page_owner tracks the page as allocated
Re: [f2fs-dev] [PATCH] f2fs: use helper to print zone condition
On 2024/4/23 19:27, Wu Bo wrote: To make code clean, use blk_zone_cond_str() to print debug information. Signed-off-by: Wu Bo Reviewed-by: Chao Yu Thanks, ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH] f2fs: fix to do sanity check on i_xattr_nid in sanity_check_inode()
syzbot reports a kernel bug as below: F2FS-fs (loop0): Mounted with checkpoint version = 48b305e4 == BUG: KASAN: slab-out-of-bounds in f2fs_test_bit fs/f2fs/f2fs.h:2933 [inline] BUG: KASAN: slab-out-of-bounds in current_nat_addr fs/f2fs/node.h:213 [inline] BUG: KASAN: slab-out-of-bounds in f2fs_get_node_info+0xece/0x1200 fs/f2fs/node.c:600 Read of size 1 at addr 88807a58c76c by task syz-executor280/5076 CPU: 1 PID: 5076 Comm: syz-executor280 Not tainted 6.9.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 f2fs_test_bit fs/f2fs/f2fs.h:2933 [inline] current_nat_addr fs/f2fs/node.h:213 [inline] f2fs_get_node_info+0xece/0x1200 fs/f2fs/node.c:600 f2fs_xattr_fiemap fs/f2fs/data.c:1848 [inline] f2fs_fiemap+0x55d/0x1ee0 fs/f2fs/data.c:1925 ioctl_fiemap fs/ioctl.c:220 [inline] do_vfs_ioctl+0x1c07/0x2e50 fs/ioctl.c:838 __do_sys_ioctl fs/ioctl.c:902 [inline] __se_sys_ioctl+0x81/0x170 fs/ioctl.c:890 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f The root cause is we missed to do sanity check on i_xattr_nid during f2fs_iget(), so that in fiemap() path, current_nat_addr() will access nat_bitmap w/ offset from invalid i_xattr_nid, result in triggering kasan bug report, fix it. Reported-by: syzbot+3694e283cf5c40df6...@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-f2fs-devel/94036c0616e72...@google.com Signed-off-by: Chao Yu --- fs/f2fs/inode.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index d7a5a88a1a5e..7968b14d49f4 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -362,6 +362,12 @@ static bool sanity_check_inode(struct inode *inode, struct page *node_page) return false; } + if (fi->i_xattr_nid && f2fs_check_nid_range(sbi, fi->i_xattr_nid)) { + f2fs_warn(sbi, "%s: inode (ino=%lx) has corrupted i_xattr_nid: %u, run fsck to fix.", + __func__, inode->i_ino, fi->i_xattr_nid); + return false; + } + return true; } -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 3/3] f2fs: fix false alarm on invalid block address
On 2024/4/19 18:27, Juhyung Park wrote: On Sat, Apr 13, 2024 at 5:57 AM Jaegeuk Kim wrote: On 04/11, Chao Yu wrote: On 2024/4/10 4:34, Jaegeuk Kim wrote: f2fs_ra_meta_pages can try to read ahead on invalid block address which is not the corruption case. In which case we will read ahead invalid meta pages? recovery w/ META_POR? In my case, it seems like it's META_SIT, and it's triggered right after mount. Ah, I see, actually it hits at this case, thanks for the information. Thanks, fsck detects invalid_blkaddr, and when the kernel mounts it, it immediately flags invalid_blkaddr again: [6.333498] init: [libfs_mgr] Running /system/bin/fsck.f2fs -a -c 1 --debug-cache /dev/block/sda13 [6.337671] fsck.f2fs: Info: Fix the reported corruption. [6.337947] fsck.f2fs: Info: not exist /proc/version! [6.338010] fsck.f2fs: Info: can't find /sys, assuming normal block device [6.338294] fsck.f2fs: Info: MKFS version [6.338319] fsck.f2fs: "5.10.160-android12-9-ge5cfec41c8e2" [6.338366] fsck.f2fs: Info: FSCK version [6.338380] fsck.f2fs: from "5.10-arter97" [6.338393] fsck.f2fs: to "5.10-arter97" [6.338414] fsck.f2fs: Info: superblock features = 1499 : encrypt verity extra_attr project_quota quota_ino casefold [6.338429] fsck.f2fs: Info: superblock encrypt level = 0, salt = [6.338442] fsck.f2fs: Info: checkpoint stop reason: shutdown(180) [6.338455] fsck.f2fs: Info: fs errors: invalid_blkaddr [6.338468] fsck.f2fs: Info: Segments per section = 1 [6.338480] fsck.f2fs: Info: Sections per zone = 1 [6.338492] fsck.f2fs: Info: total FS sectors = 58971571 (230357 MB) [6.340599] fsck.f2fs: Info: CKPT version = 2b7e3b29 [6.340620] fsck.f2fs: Info: version timestamp cur: 19789296, prev: 18407008 [6.677041] fsck.f2fs: Info: checkpoint state = 46 : crc compacted_summary orphan_inodes sudden-power-off [6.677052] fsck.f2fs: [FSCK] Check node 1 / 712937 (0.00%) [8.997922] fsck.f2fs: [FSCK] Check node 71294 / 712937 (10.00%) [ 10.629205] fsck.f2fs: [FSCK] Check node 142587 / 712937 (20.00%) [ 12.278186] fsck.f2fs: [FSCK] Check node 213880 / 712937 (30.00%) [ 13.768177] fsck.f2fs: [FSCK] Check node 285173 / 712937 (40.00%) [ 17.446971] fsck.f2fs: [FSCK] Check node 356466 / 712937 (50.00%) [ 19.891623] fsck.f2fs: [FSCK] Check node 427759 / 712937 (60.00%) [ 23.251327] fsck.f2fs: [FSCK] Check node 499052 / 712937 (70.00%) [ 28.493457] fsck.f2fs: [FSCK] Check node 570345 / 712937 (80.00%) [ 29.640800] fsck.f2fs: [FSCK] Check node 641638 / 712937 (90.00%) [ 30.718347] fsck.f2fs: [FSCK] Check node 712931 / 712937 (100.00%) [ 30.724176] fsck.f2fs: [ 30.737160] fsck.f2fs: [FSCK] Max image size: 167506 MB, Free space: 62850 MB [ 30.737164] fsck.f2fs: [FSCK] Unreachable nat entries [Ok..] [0x0] [ 30.737638] fsck.f2fs: [FSCK] SIT valid block bitmap checking [Ok..] [ 30.737640] fsck.f2fs: [FSCK] Hard link checking for regular file [Ok..] [0xd] [ 30.737641] fsck.f2fs: [FSCK] valid_block_count matching with CP [Ok..] [0x28b98e6] [ 30.737644] fsck.f2fs: [FSCK] valid_node_count matching with CP (de lookup) [Ok..] [0xae0e9] [ 30.737646] fsck.f2fs: [FSCK] valid_node_count matching with CP (nat lookup) [Ok..] [0xae0e9] [ 30.737647] fsck.f2fs: [FSCK] valid_inode_count matched with CP [Ok..] [0xa74a3] [ 30.737649] fsck.f2fs: [FSCK] free segment_count matched with CP [Ok..] [0x7aa3] [ 30.737662] fsck.f2fs: [FSCK] next block offset is free [Ok..] [ 30.737663] fsck.f2fs: [FSCK] fixing SIT types [ 30.737867] fsck.f2fs: [FSCK] other corrupted bugs [Ok..] [ 30.737893] fsck.f2fs: [update_superblock: 765] Info: Done to update superblock [ 30.960610] fsck.f2fs: [ 30.960618] fsck.f2fs: Done: 24.622956 secs [ 30.960620] fsck.f2fs: [ 30.960622] fsck.f2fs: c, u, RA, CH, CM, Repl= [ 30.960627] fsck.f2fs: 1 1 43600517 42605434 995083 985083 [ 30.963274] F2FS-fs (sda13): Using encoding defined by superblock: utf8-12.1.0 with flags 0x0 [ 30.995360] __f2fs_is_valid_blkaddr: type=2 (Manually added that print ^) [ 30.995369] [ cut here ] [ 30.995375] WARNING: CPU: 7 PID: 1 at f2fs_handle_error+0x18/0x3c [ 30.995378] CPU: 7 PID: 1 Comm: init Tainted: G S W 5.10.209-arter97-r15-kernelsu-g0867d0e4f1d2 #6 [ 30.995379] Hardware name: Qualcomm Technologies, Inc. Cape QRD with PM8010 (DT) [ 30.995380] pstate: 2245 (nzCv daif +PAN -UAO +TCO BTYPE=--) [ 30.995382] pc : f2fs_handle_error+0x18/0x3c [ 30.995384] lr : __f2fs_is_valid_blkaddr+0x2a4/0x2b0 [ 30.995385] sp : ff80209e79b0 [ 30.995386] x29: ff80209e79b0 x28: 0037 [ 30.995388] x27: 01c7 x26: 20120121 [ 30.995389] x25: 00d9 x24: [ 30.995390] x23: 00f1a700 x22: 0
Re: [f2fs-dev] [PATCH 2/3 v2] f2fs: clear writeback when compression failed
On 2024/4/17 0:49, Jaegeuk Kim wrote: Let's stop issuing compressed writes and clear their writeback flags. Signed-off-by: Jaegeuk Kim Reviewed-by: Chao Yu Thanks, ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] f2fs: fix false alarm on invalid block address
On 2024/4/25 1:35, Jaegeuk Kim wrote: f2fs_ra_meta_pages can try to read ahead on invalid block address which is not the corruption case. Cc: # v6.9+ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=218770 Fixes: 31f85ccc84b8 ("f2fs: unify the error handling of f2fs_is_valid_blkaddr") Signed-off-by: Jaegeuk Kim Reviewed-by: Chao Yu Thanks, ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH v2 1/2] f2fs: use per-log target_bitmap to improve lookup performace of ssr allocation
Jaegeuk, any comments for this serials? On 2024/4/11 16:23, Chao Yu wrote: After commit 899fee36fac0 ("f2fs: fix to avoid data corruption by forbidding SSR overwrite"), valid block bitmap of current openned segment is fixed, let's introduce a per-log bitmap instead of temp bitmap to avoid unnecessary calculation overhead whenever allocating free slot w/ SSR allocator. Signed-off-by: Chao Yu --- v2: - rebase to last dev-test branch. fs/f2fs/segment.c | 30 ++ fs/f2fs/segment.h | 1 + 2 files changed, 23 insertions(+), 8 deletions(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 6474b7338e81..af716925db19 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -2840,31 +2840,39 @@ static int new_curseg(struct f2fs_sb_info *sbi, int type, bool new_sec) return 0; } -static int __next_free_blkoff(struct f2fs_sb_info *sbi, - int segno, block_t start) +static void __get_segment_bitmap(struct f2fs_sb_info *sbi, + unsigned long *target_map, + int segno) { struct seg_entry *se = get_seg_entry(sbi, segno); int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long); - unsigned long *target_map = SIT_I(sbi)->tmp_map; unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map; unsigned long *cur_map = (unsigned long *)se->cur_valid_map; int i; for (i = 0; i < entries; i++) target_map[i] = ckpt_map[i] | cur_map[i]; +} + +static int __next_free_blkoff(struct f2fs_sb_info *sbi, unsigned long *bitmap, + int segno, block_t start) +{ + __get_segment_bitmap(sbi, bitmap, segno); - return __find_rev_next_zero_bit(target_map, BLKS_PER_SEG(sbi), start); + return __find_rev_next_zero_bit(bitmap, BLKS_PER_SEG(sbi), start); } static int f2fs_find_next_ssr_block(struct f2fs_sb_info *sbi, - struct curseg_info *seg) + struct curseg_info *seg) { - return __next_free_blkoff(sbi, seg->segno, seg->next_blkoff + 1); + return __find_rev_next_zero_bit(seg->target_map, + BLKS_PER_SEG(sbi), seg->next_blkoff + 1); } bool f2fs_segment_has_free_slot(struct f2fs_sb_info *sbi, int segno) { - return __next_free_blkoff(sbi, segno, 0) < BLKS_PER_SEG(sbi); + return __next_free_blkoff(sbi, SIT_I(sbi)->tmp_map, segno, 0) < + BLKS_PER_SEG(sbi); } /* @@ -2890,7 +2898,8 @@ static int change_curseg(struct f2fs_sb_info *sbi, int type) reset_curseg(sbi, type, 1); curseg->alloc_type = SSR; - curseg->next_blkoff = __next_free_blkoff(sbi, curseg->segno, 0); + curseg->next_blkoff = __next_free_blkoff(sbi, curseg->target_map, + curseg->segno, 0); sum_page = f2fs_get_sum_page(sbi, new_segno); if (IS_ERR(sum_page)) { @@ -4635,6 +4644,10 @@ static int build_curseg(struct f2fs_sb_info *sbi) sizeof(struct f2fs_journal), GFP_KERNEL); if (!array[i].journal) return -ENOMEM; + array[i].target_map = f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, + GFP_KERNEL); + if (!array[i].target_map) + return -ENOMEM; if (i < NR_PERSISTENT_LOG) array[i].seg_type = CURSEG_HOT_DATA + i; else if (i == CURSEG_COLD_DATA_PINNED) @@ -5453,6 +5466,7 @@ static void destroy_curseg(struct f2fs_sb_info *sbi) for (i = 0; i < NR_CURSEG_TYPE; i++) { kfree(array[i].sum_blk); kfree(array[i].journal); + kfree(array[i].target_map); } kfree(array); } diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index e1c0f418aa11..10f3e44f036f 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -292,6 +292,7 @@ struct curseg_info { struct f2fs_summary_block *sum_blk; /* cached summary block */ struct rw_semaphore journal_rwsem; /* protect journal area */ struct f2fs_journal *journal; /* cached journal info */ + unsigned long *target_map; /* bitmap for SSR allocator */ unsigned char alloc_type; /* current allocation type */ unsigned short seg_type;/* segment type like CURSEG_XXX_TYPE */ unsigned int segno; /* current segment number */ ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH v4] f2fs: zone: don't block IO if there is remained open zone
On 2024/4/22 1:28, Juhyung Park wrote: Hi Chao, a small nit.. :) s/openned/opened/g Juhyung, thanks for the report, I've fixed it in v5. :) Thanks, $ git grep openned v6.9-rc1 | wc -l 2 $ git grep opened v6.9-rc1 | wc -l 2130 On Thu, Apr 11, 2024 at 5:33 PM Chao Yu wrote: max open zone may be larger than log header number of f2fs, for such case, it doesn't need to wait last IO in previous zone, let's introduce available_open_zone semaphore, and reduce it once we submit first write IO in a zone, and increase it after completion of last IO in the zone. Cc: Daeho Jeong Signed-off-by: Chao Yu --- v4: - avoid unneeded condition in f2fs_blkzoned_submit_merged_write(). fs/f2fs/data.c| 105 ++ fs/f2fs/f2fs.h| 34 --- fs/f2fs/iostat.c | 7 fs/f2fs/iostat.h | 2 + fs/f2fs/segment.c | 43 --- fs/f2fs/segment.h | 12 +- fs/f2fs/super.c | 2 + 7 files changed, 156 insertions(+), 49 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 60056b9a51be..71472ab6b7e7 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -373,11 +373,10 @@ static void f2fs_write_end_io(struct bio *bio) #ifdef CONFIG_BLK_DEV_ZONED static void f2fs_zone_write_end_io(struct bio *bio) { - struct f2fs_bio_info *io = (struct f2fs_bio_info *)bio->bi_private; + struct f2fs_sb_info *sbi = iostat_get_bio_private(bio); - bio->bi_private = io->bi_private; - complete(>zone_wait); f2fs_write_end_io(bio); + up(>available_open_zones); } #endif @@ -531,6 +530,24 @@ static void __submit_merged_bio(struct f2fs_bio_info *io) if (!io->bio) return; +#ifdef CONFIG_BLK_DEV_ZONED + if (io->open_zone) { + /* +* if there is no open zone, it will wait for last IO in +* previous zone before submitting new IO. +*/ + down(>sbi->available_open_zones); + io->open_zone = false; + io->zone_openned = true; + } + + if (io->close_zone) { + io->bio->bi_end_io = f2fs_zone_write_end_io; + io->zone_openned = false; + io->close_zone = false; + } +#endif + if (is_read_io(fio->op)) { trace_f2fs_prepare_read_bio(io->sbi->sb, fio->type, io->bio); f2fs_submit_read_bio(io->sbi, io->bio, fio->type); @@ -601,9 +618,9 @@ int f2fs_init_write_merge_io(struct f2fs_sb_info *sbi) INIT_LIST_HEAD(>write_io[i][j].bio_list); init_f2fs_rwsem(>write_io[i][j].bio_list_lock); #ifdef CONFIG_BLK_DEV_ZONED - init_completion(>write_io[i][j].zone_wait); - sbi->write_io[i][j].zone_pending_bio = NULL; - sbi->write_io[i][j].bi_private = NULL; + sbi->write_io[i][j].open_zone = false; + sbi->write_io[i][j].zone_openned = false; + sbi->write_io[i][j].close_zone = false; #endif } } @@ -634,6 +651,31 @@ static void __f2fs_submit_merged_write(struct f2fs_sb_info *sbi, f2fs_up_write(>io_rwsem); } +void f2fs_blkzoned_submit_merged_write(struct f2fs_sb_info *sbi, int type) +{ +#ifdef CONFIG_BLK_DEV_ZONED + struct f2fs_bio_info *io; + + if (!f2fs_sb_has_blkzoned(sbi)) + return; + + io = sbi->write_io[PAGE_TYPE(type)] + type_to_temp(type); + + f2fs_down_write(>io_rwsem); + if (io->zone_openned) { + if (io->bio) { + io->close_zone = true; + __submit_merged_bio(io); + } else { + up(>available_open_zones); + io->zone_openned = false; + } + } + f2fs_up_write(>io_rwsem); +#endif + +} + static void __submit_merged_write_cond(struct f2fs_sb_info *sbi, struct inode *inode, struct page *page, nid_t ino, enum page_type type, bool force) @@ -918,22 +960,16 @@ int f2fs_merge_page_bio(struct f2fs_io_info *fio) } #ifdef CONFIG_BLK_DEV_ZONED -static bool is_end_zone_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr) +static bool is_blkaddr_zone_boundary(struct f2fs_sb_info *sbi, + block_t blkaddr, bool start) { - int devi = 0; + if (!f2fs_blkaddr_in_seqzone(sbi, blkaddr)) + return false; + + if (start) + return (blkaddr % sbi->blocks_per_blkz) == 0; + return (blkaddr % sbi->blocks_per_blkz == sbi->blocks_per_blkz - 1); - if (f2fs_is_multi_device(sbi)) { - devi = f2fs_targe
[f2fs-dev] [PATCH v5] f2fs: zone: don't block IO if there is remained open zone
max open zone may be larger than log header number of f2fs, for such case, it doesn't need to wait last IO in previous zone, let's introduce available_open_zone semaphore, and reduce it once we submit first write IO in a zone, and increase it after completion of last IO in the zone. Cc: Daeho Jeong Signed-off-by: Chao Yu Reviewed-by: Daeho Jeong --- v5: - fix `openned` typo pointed out by Juhyung Park fs/f2fs/data.c| 105 ++ fs/f2fs/f2fs.h| 31 +++--- fs/f2fs/iostat.c | 7 fs/f2fs/iostat.h | 2 + fs/f2fs/segment.c | 37 +++- fs/f2fs/segment.h | 3 +- fs/f2fs/super.c | 2 + 7 files changed, 143 insertions(+), 44 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index d01345af5f3e..657579358498 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -373,11 +373,10 @@ static void f2fs_write_end_io(struct bio *bio) #ifdef CONFIG_BLK_DEV_ZONED static void f2fs_zone_write_end_io(struct bio *bio) { - struct f2fs_bio_info *io = (struct f2fs_bio_info *)bio->bi_private; + struct f2fs_sb_info *sbi = iostat_get_bio_private(bio); - bio->bi_private = io->bi_private; - complete(>zone_wait); f2fs_write_end_io(bio); + up(>available_open_zones); } #endif @@ -533,6 +532,24 @@ static void __submit_merged_bio(struct f2fs_bio_info *io) if (!io->bio) return; +#ifdef CONFIG_BLK_DEV_ZONED + if (io->open_zone) { + /* +* if there is no open zone, it will wait for last IO in +* previous zone before submitting new IO. +*/ + down(>sbi->available_open_zones); + io->open_zone = false; + io->zone_opened = true; + } + + if (io->close_zone) { + io->bio->bi_end_io = f2fs_zone_write_end_io; + io->zone_opened = false; + io->close_zone = false; + } +#endif + if (is_read_io(fio->op)) { trace_f2fs_prepare_read_bio(io->sbi->sb, fio->type, io->bio); f2fs_submit_read_bio(io->sbi, io->bio, fio->type); @@ -603,9 +620,9 @@ int f2fs_init_write_merge_io(struct f2fs_sb_info *sbi) INIT_LIST_HEAD(>write_io[i][j].bio_list); init_f2fs_rwsem(>write_io[i][j].bio_list_lock); #ifdef CONFIG_BLK_DEV_ZONED - init_completion(>write_io[i][j].zone_wait); - sbi->write_io[i][j].zone_pending_bio = NULL; - sbi->write_io[i][j].bi_private = NULL; + sbi->write_io[i][j].open_zone = false; + sbi->write_io[i][j].zone_opened = false; + sbi->write_io[i][j].close_zone = false; #endif } } @@ -636,6 +653,31 @@ static void __f2fs_submit_merged_write(struct f2fs_sb_info *sbi, f2fs_up_write(>io_rwsem); } +void f2fs_blkzoned_submit_merged_write(struct f2fs_sb_info *sbi, int type) +{ +#ifdef CONFIG_BLK_DEV_ZONED + struct f2fs_bio_info *io; + + if (!f2fs_sb_has_blkzoned(sbi)) + return; + + io = sbi->write_io[PAGE_TYPE(type)] + type_to_temp(type); + + f2fs_down_write(>io_rwsem); + if (io->zone_opened) { + if (io->bio) { + io->close_zone = true; + __submit_merged_bio(io); + } else { + up(>available_open_zones); + io->zone_opened = false; + } + } + f2fs_up_write(>io_rwsem); +#endif + +} + static void __submit_merged_write_cond(struct f2fs_sb_info *sbi, struct inode *inode, struct page *page, nid_t ino, enum page_type type, bool force) @@ -920,22 +962,16 @@ int f2fs_merge_page_bio(struct f2fs_io_info *fio) } #ifdef CONFIG_BLK_DEV_ZONED -static bool is_end_zone_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr) +static bool is_blkaddr_zone_boundary(struct f2fs_sb_info *sbi, + block_t blkaddr, bool start) { - int devi = 0; + if (!f2fs_blkaddr_in_seqzone(sbi, blkaddr)) + return false; + + if (start) + return (blkaddr % sbi->blocks_per_blkz) == 0; + return (blkaddr % sbi->blocks_per_blkz == sbi->blocks_per_blkz - 1); - if (f2fs_is_multi_device(sbi)) { - devi = f2fs_target_device_index(sbi, blkaddr); - if (blkaddr < FDEV(devi).start_blk || - blkaddr > FDEV(devi).end_blk) { - f2fs_err(sbi, "Invalid block %x", blkaddr); - return false; - } - blkaddr -= FDEV(devi).start_blk; - } - ret
[f2fs-dev] [PATCH v2 2/4] f2fs: convert f2fs_read_single_page() to use folio
Convert f2fs_read_single_page() to use folio and related functionality. Signed-off-by: Chao Yu --- v2: - no change. fs/f2fs/data.c | 27 ++- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 6419cf020327..bb6c0e955d7e 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -2063,7 +2063,7 @@ static inline loff_t f2fs_readpage_limit(struct inode *inode) return i_size_read(inode); } -static int f2fs_read_single_page(struct inode *inode, struct page *page, +static int f2fs_read_single_page(struct inode *inode, struct folio *folio, unsigned nr_pages, struct f2fs_map_blocks *map, struct bio **bio_ret, @@ -2076,9 +2076,10 @@ static int f2fs_read_single_page(struct inode *inode, struct page *page, sector_t last_block; sector_t last_block_in_file; sector_t block_nr; + pgoff_t index = folio_index(folio); int ret = 0; - block_in_file = (sector_t)page_index(page); + block_in_file = (sector_t)index; last_block = block_in_file + nr_pages; last_block_in_file = bytes_to_blks(inode, f2fs_readpage_limit(inode) + blocksize - 1); @@ -2109,7 +2110,7 @@ static int f2fs_read_single_page(struct inode *inode, struct page *page, got_it: if ((map->m_flags & F2FS_MAP_MAPPED)) { block_nr = map->m_pblk + block_in_file - map->m_lblk; - SetPageMappedToDisk(page); + folio_set_mappedtodisk(folio); if (!f2fs_is_valid_blkaddr(F2FS_I_SB(inode), block_nr, DATA_GENERIC_ENHANCE_READ)) { @@ -2118,15 +2119,15 @@ static int f2fs_read_single_page(struct inode *inode, struct page *page, } } else { zero_out: - zero_user_segment(page, 0, PAGE_SIZE); - if (f2fs_need_verity(inode, page->index) && - !fsverity_verify_page(page)) { + folio_zero_segment(folio, 0, folio_size(folio)); + if (f2fs_need_verity(inode, index) && + !fsverity_verify_folio(folio)) { ret = -EIO; goto out; } - if (!PageUptodate(page)) - SetPageUptodate(page); - unlock_page(page); + if (!folio_test_uptodate(folio)) + folio_mark_uptodate(folio); + folio_unlock(folio); goto out; } @@ -2136,14 +2137,14 @@ static int f2fs_read_single_page(struct inode *inode, struct page *page, */ if (bio && (!page_is_mergeable(F2FS_I_SB(inode), bio, *last_block_in_bio, block_nr) || - !f2fs_crypt_mergeable_bio(bio, inode, page->index, NULL))) { + !f2fs_crypt_mergeable_bio(bio, inode, index, NULL))) { submit_and_realloc: f2fs_submit_read_bio(F2FS_I_SB(inode), bio, DATA); bio = NULL; } if (bio == NULL) { bio = f2fs_grab_read_bio(inode, block_nr, nr_pages, - is_readahead ? REQ_RAHEAD : 0, page->index, + is_readahead ? REQ_RAHEAD : 0, index, false); if (IS_ERR(bio)) { ret = PTR_ERR(bio); @@ -2158,7 +2159,7 @@ static int f2fs_read_single_page(struct inode *inode, struct page *page, */ f2fs_wait_on_block_writeback(inode, block_nr); - if (bio_add_page(bio, page, blocksize, 0) < blocksize) + if (!bio_add_folio(bio, folio, blocksize, 0)) goto submit_and_realloc; inc_page_count(F2FS_I_SB(inode), F2FS_RD_DATA); @@ -2423,7 +2424,7 @@ static int f2fs_mpage_readpages(struct inode *inode, goto next_page; read_single_page: #endif - ret = f2fs_read_single_page(inode, >page, max_nr_pages, , + ret = f2fs_read_single_page(inode, folio, max_nr_pages, , , _block_in_bio, rac); if (ret) { #ifdef CONFIG_F2FS_FS_COMPRESSION -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH v2 3/4] f2fs: convert f2fs_read_inline_data() to use folio
Convert f2fs_read_inline_data() to use folio and related functionality, and also convert its caller to use folio. Signed-off-by: Chao Yu --- v2: - no change. fs/f2fs/data.c | 11 +-- fs/f2fs/f2fs.h | 4 ++-- fs/f2fs/inline.c | 34 +- 3 files changed, 24 insertions(+), 25 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index bb6c0e955d7e..24f9a39ffd56 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -2457,20 +2457,19 @@ static int f2fs_mpage_readpages(struct inode *inode, static int f2fs_read_data_folio(struct file *file, struct folio *folio) { - struct page *page = >page; - struct inode *inode = page_file_mapping(page)->host; + struct inode *inode = folio_file_mapping(folio)->host; int ret = -EAGAIN; - trace_f2fs_readpage(page, DATA); + trace_f2fs_readpage(>page, DATA); if (!f2fs_is_compress_backend_ready(inode)) { - unlock_page(page); + folio_unlock(folio); return -EOPNOTSUPP; } /* If the file has inline data, try to read it directly */ if (f2fs_has_inline_data(inode)) - ret = f2fs_read_inline_data(inode, page); + ret = f2fs_read_inline_data(inode, folio); if (ret == -EAGAIN) ret = f2fs_mpage_readpages(inode, NULL, folio); return ret; @@ -3399,7 +3398,7 @@ static int prepare_write_begin(struct f2fs_sb_info *sbi, if (f2fs_has_inline_data(inode)) { if (pos + len <= MAX_INLINE_DATA(inode)) { - f2fs_do_read_inline_data(page, ipage); + f2fs_do_read_inline_data(page_folio(page), ipage); set_inode_flag(inode, FI_DATA_EXIST); if (inode->i_nlink) set_page_private_inline(ipage); diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 3f7196122574..a0ae99bcca39 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -4154,10 +4154,10 @@ extern struct kmem_cache *f2fs_inode_entry_slab; bool f2fs_may_inline_data(struct inode *inode); bool f2fs_sanity_check_inline_data(struct inode *inode); bool f2fs_may_inline_dentry(struct inode *inode); -void f2fs_do_read_inline_data(struct page *page, struct page *ipage); +void f2fs_do_read_inline_data(struct folio *folio, struct page *ipage); void f2fs_truncate_inline_inode(struct inode *inode, struct page *ipage, u64 from); -int f2fs_read_inline_data(struct inode *inode, struct page *page); +int f2fs_read_inline_data(struct inode *inode, struct folio *folio); int f2fs_convert_inline_page(struct dnode_of_data *dn, struct page *page); int f2fs_convert_inline_inode(struct inode *inode); int f2fs_try_convert_inline_dir(struct inode *dir, struct dentry *dentry); diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c index 3d3218a4b29d..7638d0d7b7ee 100644 --- a/fs/f2fs/inline.c +++ b/fs/f2fs/inline.c @@ -61,22 +61,22 @@ bool f2fs_may_inline_dentry(struct inode *inode) return true; } -void f2fs_do_read_inline_data(struct page *page, struct page *ipage) +void f2fs_do_read_inline_data(struct folio *folio, struct page *ipage) { - struct inode *inode = page->mapping->host; + struct inode *inode = folio_file_mapping(folio)->host; - if (PageUptodate(page)) + if (folio_test_uptodate(folio)) return; - f2fs_bug_on(F2FS_P_SB(page), page->index); + f2fs_bug_on(F2FS_I_SB(inode), folio_index(folio)); - zero_user_segment(page, MAX_INLINE_DATA(inode), PAGE_SIZE); + folio_zero_segment(folio, MAX_INLINE_DATA(inode), folio_size(folio)); /* Copy the whole inline data block */ - memcpy_to_page(page, 0, inline_data_addr(inode, ipage), + memcpy_to_folio(folio, 0, inline_data_addr(inode, ipage), MAX_INLINE_DATA(inode)); - if (!PageUptodate(page)) - SetPageUptodate(page); + if (!folio_test_uptodate(folio)) + folio_mark_uptodate(folio); } void f2fs_truncate_inline_inode(struct inode *inode, @@ -97,13 +97,13 @@ void f2fs_truncate_inline_inode(struct inode *inode, clear_inode_flag(inode, FI_DATA_EXIST); } -int f2fs_read_inline_data(struct inode *inode, struct page *page) +int f2fs_read_inline_data(struct inode *inode, struct folio *folio) { struct page *ipage; ipage = f2fs_get_node_page(F2FS_I_SB(inode), inode->i_ino); if (IS_ERR(ipage)) { - unlock_page(page); + folio_unlock(folio); return PTR_ERR(ipage); } @@ -112,15 +112,15 @@ int f2fs_read_inline_data(struct inode *inode, struct page *page) return -EAGAIN; } - if (page->index) - zero_user_segment(page, 0, PAGE_SIZE); + if (folio_index(folio)) + folio_zer
[f2fs-dev] [PATCH v2 1/4] f2fs: convert f2fs_mpage_readpages() to use folio
Convert f2fs_mpage_readpages() to use folio and related functionality. Signed-off-by: Chao Yu --- v2: - fix compile warning w/o CONFIG_F2FS_FS_COMPRESSION reported by lkp fs/f2fs/data.c | 81 +- 1 file changed, 40 insertions(+), 41 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index ed7d08785fcf..6419cf020327 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -2345,7 +2345,7 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret, * Major change was from block_size == page_size in f2fs by default. */ static int f2fs_mpage_readpages(struct inode *inode, - struct readahead_control *rac, struct page *page) + struct readahead_control *rac, struct folio *folio) { struct bio *bio = NULL; sector_t last_block_in_bio = 0; @@ -2362,6 +2362,7 @@ static int f2fs_mpage_readpages(struct inode *inode, .nr_cpages = 0, }; pgoff_t nc_cluster_idx = NULL_CLUSTER; + pgoff_t index; #endif unsigned nr_pages = rac ? readahead_count(rac) : 1; unsigned max_nr_pages = nr_pages; @@ -2378,64 +2379,62 @@ static int f2fs_mpage_readpages(struct inode *inode, for (; nr_pages; nr_pages--) { if (rac) { - page = readahead_page(rac); - prefetchw(>flags); + folio = readahead_folio(rac); + prefetchw(>flags); } #ifdef CONFIG_F2FS_FS_COMPRESSION - if (f2fs_compressed_file(inode)) { - /* there are remained compressed pages, submit them */ - if (!f2fs_cluster_can_merge_page(, page->index)) { - ret = f2fs_read_multi_pages(, , - max_nr_pages, - _block_in_bio, - rac != NULL, false); - f2fs_destroy_compress_ctx(, false); - if (ret) - goto set_error_page; - } - if (cc.cluster_idx == NULL_CLUSTER) { - if (nc_cluster_idx == - page->index >> cc.log_cluster_size) { - goto read_single_page; - } - - ret = f2fs_is_compressed_cluster(inode, page->index); - if (ret < 0) - goto set_error_page; - else if (!ret) { - nc_cluster_idx = - page->index >> cc.log_cluster_size; - goto read_single_page; - } - - nc_cluster_idx = NULL_CLUSTER; - } - ret = f2fs_init_compress_ctx(); + index = folio_index(folio); + + if (!f2fs_compressed_file(inode)) + goto read_single_page; + + /* there are remained compressed pages, submit them */ + if (!f2fs_cluster_can_merge_page(, index)) { + ret = f2fs_read_multi_pages(, , + max_nr_pages, + _block_in_bio, + rac != NULL, false); + f2fs_destroy_compress_ctx(, false); if (ret) goto set_error_page; + } + if (cc.cluster_idx == NULL_CLUSTER) { + if (nc_cluster_idx == index >> cc.log_cluster_size) + goto read_single_page; - f2fs_compress_ctx_add_page(, page); + ret = f2fs_is_compressed_cluster(inode, index); + if (ret < 0) + goto set_error_page; + else if (!ret) { + nc_cluster_idx = + index >> cc.log_cluster_size; + goto read_single_page; + } - goto next_page; + nc_cluster_idx = NULL_CLUSTER; } + ret = f2fs_init_compress_ctx(); + if (ret) + goto set_error_page; + + f2fs_compress_ctx_add_page(, >page); + + goto next_page; read_single_page: #endif - - ret = f2fs_read_single_page(inode, page, max_nr_pages, , + ret
[f2fs-dev] [PATCH v2 4/4] f2fs: convert f2fs__page tracepoint class to use folio
Convert f2fs__page tracepoint class() and its instances to use folio and related functionality, and rename it to f2fs__folio(). Signed-off-by: Chao Yu --- v2: - no change. fs/f2fs/checkpoint.c| 4 ++-- fs/f2fs/data.c | 10 - fs/f2fs/node.c | 4 ++-- include/trace/events/f2fs.h | 42 ++--- 4 files changed, 30 insertions(+), 30 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index eac698b8dd38..5d05a413f451 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -345,7 +345,7 @@ static int __f2fs_write_meta_page(struct page *page, { struct f2fs_sb_info *sbi = F2FS_P_SB(page); - trace_f2fs_writepage(page, META); + trace_f2fs_writepage(page_folio(page), META); if (unlikely(f2fs_cp_error(sbi))) { if (is_sbi_flag_set(sbi, SBI_IS_CLOSE)) { @@ -492,7 +492,7 @@ long f2fs_sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type, static bool f2fs_dirty_meta_folio(struct address_space *mapping, struct folio *folio) { - trace_f2fs_set_page_dirty(>page, META); + trace_f2fs_set_page_dirty(folio, META); if (!folio_test_uptodate(folio)) folio_mark_uptodate(folio); diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 24f9a39ffd56..21d4c1c9b25b 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -2460,7 +2460,7 @@ static int f2fs_read_data_folio(struct file *file, struct folio *folio) struct inode *inode = folio_file_mapping(folio)->host; int ret = -EAGAIN; - trace_f2fs_readpage(>page, DATA); + trace_f2fs_readpage(folio, DATA); if (!f2fs_is_compress_backend_ready(inode)) { folio_unlock(folio); @@ -2709,7 +2709,7 @@ int f2fs_do_write_data_page(struct f2fs_io_info *fio) } else { set_inode_flag(inode, FI_UPDATE_WRITE); } - trace_f2fs_do_write_data_page(fio->page, IPU); + trace_f2fs_do_write_data_page(page_folio(page), IPU); return err; } @@ -2738,7 +2738,7 @@ int f2fs_do_write_data_page(struct f2fs_io_info *fio) /* LFS mode write path */ f2fs_outplace_write_data(, fio); - trace_f2fs_do_write_data_page(page, OPU); + trace_f2fs_do_write_data_page(page_folio(page), OPU); set_inode_flag(inode, FI_APPEND_WRITE); out_writepage: f2fs_put_dnode(); @@ -2785,7 +2785,7 @@ int f2fs_write_single_data_page(struct page *page, int *submitted, .last_block = last_block, }; - trace_f2fs_writepage(page, DATA); + trace_f2fs_writepage(page_folio(page), DATA); /* we should bypass data pages to proceed the kworker jobs */ if (unlikely(f2fs_cp_error(sbi))) { @@ -3759,7 +3759,7 @@ static bool f2fs_dirty_data_folio(struct address_space *mapping, { struct inode *inode = mapping->host; - trace_f2fs_set_page_dirty(>page, DATA); + trace_f2fs_set_page_dirty(folio, DATA); if (!folio_test_uptodate(folio)) folio_mark_uptodate(folio); diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 3b9eb5693683..95cecf08cb37 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1624,7 +1624,7 @@ static int __write_node_page(struct page *page, bool atomic, bool *submitted, }; unsigned int seq; - trace_f2fs_writepage(page, NODE); + trace_f2fs_writepage(page_folio(page), NODE); if (unlikely(f2fs_cp_error(sbi))) { /* keep node pages in remount-ro mode */ @@ -2171,7 +2171,7 @@ static int f2fs_write_node_pages(struct address_space *mapping, static bool f2fs_dirty_node_folio(struct address_space *mapping, struct folio *folio) { - trace_f2fs_set_page_dirty(>page, NODE); + trace_f2fs_set_page_dirty(folio, NODE); if (!folio_test_uptodate(folio)) folio_mark_uptodate(folio); diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h index 7ed0fc430dc6..371ba28415f5 100644 --- a/include/trace/events/f2fs.h +++ b/include/trace/events/f2fs.h @@ -1304,11 +1304,11 @@ TRACE_EVENT(f2fs_write_end, __entry->copied) ); -DECLARE_EVENT_CLASS(f2fs__page, +DECLARE_EVENT_CLASS(f2fs__folio, - TP_PROTO(struct page *page, int type), + TP_PROTO(struct folio *folio, int type), - TP_ARGS(page, type), + TP_ARGS(folio, type), TP_STRUCT__entry( __field(dev_t, dev) @@ -1321,14 +1321,14 @@ DECLARE_EVENT_CLASS(f2fs__page, ), TP_fast_assign( - __entry->dev= page_file_mapping(page)->host->i_sb->s_dev; - __entry->ino= page_file_mapping(page)->host->i_ino; + __entry->dev= folio_file_mapping(folio)->host->i_sb->s_dev; + __entry->ino= folio_fil
Re: [f2fs-dev] [PATCH] f2fs: assign write hint in direct write IO path
On 2024/4/20 1:53, Jaegeuk Kim wrote: Thanks, Chao, If you don't mind, can I merge this into my patch. Ok? No problem. :) Thanks, On 04/18, Chao Yu wrote: f2fs has its own write_hint policy, let's assign write hint for direct write bio. Cc: Hyunchul Lee Signed-off-by: Chao Yu --- fs/f2fs/f2fs.h| 1 + fs/f2fs/file.c| 15 ++- fs/f2fs/segment.c | 17 +++-- 3 files changed, 26 insertions(+), 7 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index b3b878acc86b..3f7196122574 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -3722,6 +3722,7 @@ void f2fs_replace_block(struct f2fs_sb_info *sbi, struct dnode_of_data *dn, block_t old_addr, block_t new_addr, unsigned char version, bool recover_curseg, bool recover_newaddr); +int f2fs_get_segment_temp(int seg_type); int f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page, block_t old_blkaddr, block_t *new_blkaddr, struct f2fs_summary *sum, int type, diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index ac1ae85f3cc3..d382f8bc2fbe 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -4685,8 +4685,21 @@ static int f2fs_dio_write_end_io(struct kiocb *iocb, ssize_t size, int error, return 0; } +static void f2fs_dio_write_submit_io(const struct iomap_iter *iter, + struct bio *bio, loff_t file_offset) +{ + struct inode *inode = iter->inode; + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); + int seg_type = f2fs_rw_hint_to_seg_type(inode->i_write_hint); + enum temp_type temp = f2fs_get_segment_temp(seg_type); + + bio->bi_write_hint = f2fs_io_type_to_rw_hint(sbi, DATA, temp); + submit_bio(bio); +} + static const struct iomap_dio_ops f2fs_iomap_dio_write_ops = { - .end_io = f2fs_dio_write_end_io, + .end_io = f2fs_dio_write_end_io, + .submit_io = f2fs_dio_write_submit_io, }; static void f2fs_flush_buffered_write(struct address_space *mapping, diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index daa94669f7ee..2206199e8099 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -3502,6 +3502,15 @@ static int __get_segment_type_6(struct f2fs_io_info *fio) } } +int f2fs_get_segment_temp(int seg_type) +{ + if (IS_HOT(seg_type)) + return HOT; + else if (IS_WARM(seg_type)) + return WARM; + return COLD; +} + static int __get_segment_type(struct f2fs_io_info *fio) { int type = 0; @@ -3520,12 +3529,8 @@ static int __get_segment_type(struct f2fs_io_info *fio) f2fs_bug_on(fio->sbi, true); } - if (IS_HOT(type)) - fio->temp = HOT; - else if (IS_WARM(type)) - fio->temp = WARM; - else - fio->temp = COLD; + fio->temp = f2fs_get_segment_temp(type); + return type; } -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH] f2fs: assign write hint in direct write IO path
f2fs has its own write_hint policy, let's assign write hint for direct write bio. Cc: Hyunchul Lee Signed-off-by: Chao Yu --- fs/f2fs/f2fs.h| 1 + fs/f2fs/file.c| 15 ++- fs/f2fs/segment.c | 17 +++-- 3 files changed, 26 insertions(+), 7 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index b3b878acc86b..3f7196122574 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -3722,6 +3722,7 @@ void f2fs_replace_block(struct f2fs_sb_info *sbi, struct dnode_of_data *dn, block_t old_addr, block_t new_addr, unsigned char version, bool recover_curseg, bool recover_newaddr); +int f2fs_get_segment_temp(int seg_type); int f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page, block_t old_blkaddr, block_t *new_blkaddr, struct f2fs_summary *sum, int type, diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index ac1ae85f3cc3..d382f8bc2fbe 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -4685,8 +4685,21 @@ static int f2fs_dio_write_end_io(struct kiocb *iocb, ssize_t size, int error, return 0; } +static void f2fs_dio_write_submit_io(const struct iomap_iter *iter, + struct bio *bio, loff_t file_offset) +{ + struct inode *inode = iter->inode; + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); + int seg_type = f2fs_rw_hint_to_seg_type(inode->i_write_hint); + enum temp_type temp = f2fs_get_segment_temp(seg_type); + + bio->bi_write_hint = f2fs_io_type_to_rw_hint(sbi, DATA, temp); + submit_bio(bio); +} + static const struct iomap_dio_ops f2fs_iomap_dio_write_ops = { - .end_io = f2fs_dio_write_end_io, + .end_io = f2fs_dio_write_end_io, + .submit_io = f2fs_dio_write_submit_io, }; static void f2fs_flush_buffered_write(struct address_space *mapping, diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index daa94669f7ee..2206199e8099 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -3502,6 +3502,15 @@ static int __get_segment_type_6(struct f2fs_io_info *fio) } } +int f2fs_get_segment_temp(int seg_type) +{ + if (IS_HOT(seg_type)) + return HOT; + else if (IS_WARM(seg_type)) + return WARM; + return COLD; +} + static int __get_segment_type(struct f2fs_io_info *fio) { int type = 0; @@ -3520,12 +3529,8 @@ static int __get_segment_type(struct f2fs_io_info *fio) f2fs_bug_on(fio->sbi, true); } - if (IS_HOT(type)) - fio->temp = HOT; - else if (IS_WARM(type)) - fio->temp = WARM; - else - fio->temp = COLD; + fio->temp = f2fs_get_segment_temp(type); + return type; } -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] f2fs: assign the write hint per stream by default
On 2024/4/18 5:12, Jaegeuk Kim wrote: This reverts commit 930e2607638d ("f2fs: remove obsolete whint_mode"), as we decide to pass write hints to the disk. Signed-off-by: Jaegeuk Kim --- Documentation/filesystems/f2fs.rst | 29 +++ fs/f2fs/data.c | 2 + fs/f2fs/f2fs.h | 2 + fs/f2fs/segment.c | 59 ++ 4 files changed, 92 insertions(+) diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst index efc3493fd6f8..68a0885fb5e6 100644 --- a/Documentation/filesystems/f2fs.rst +++ b/Documentation/filesystems/f2fs.rst @@ -774,6 +774,35 @@ In order to identify whether the data in the victim segment are valid or not, F2FS manages a bitmap. Each bit represents the validity of a block, and the bitmap is composed of a bit stream covering whole blocks in main area. +Write-hint Policy +- + +F2FS sets the whint all the time with the below policy. No user-based mode? Thanks, + += === +User F2FS Block += === +N/A META WRITE_LIFE_NONE|REQ_META +N/A HOT_NODE WRITE_LIFE_NONE +N/A WARM_NODEWRITE_LIFE_MEDIUM +N/A COLD_NODEWRITE_LIFE_LONG +ioctl(COLD) COLD_DATAWRITE_LIFE_EXTREME +extension list"" + +-- buffered io +N/A COLD_DATAWRITE_LIFE_EXTREME +N/A HOT_DATA WRITE_LIFE_SHORT +N/A WARM_DATAWRITE_LIFE_NOT_SET + +-- direct io +WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME +WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT +WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET +WRITE_LIFE_NONE "WRITE_LIFE_NONE +WRITE_LIFE_MEDIUM "WRITE_LIFE_MEDIUM +WRITE_LIFE_LONG "WRITE_LIFE_LONG += === + Fallocate(2) Policy --- diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 5d641fac02ba..ed7d08785fcf 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -465,6 +465,8 @@ static struct bio *__bio_alloc(struct f2fs_io_info *fio, int npages) } else { bio->bi_end_io = f2fs_write_end_io; bio->bi_private = sbi; + bio->bi_write_hint = f2fs_io_type_to_rw_hint(sbi, + fio->type, fio->temp); } iostat_alloc_and_bind_ctx(sbi, bio, NULL); diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index dd530dc70005..b3b878acc86b 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -3745,6 +3745,8 @@ void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi); int __init f2fs_create_segment_manager_caches(void); void f2fs_destroy_segment_manager_caches(void); int f2fs_rw_hint_to_seg_type(enum rw_hint hint); +enum rw_hint f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi, + enum page_type type, enum temp_type temp); unsigned int f2fs_usable_segs_in_sec(struct f2fs_sb_info *sbi, unsigned int segno); unsigned int f2fs_usable_blks_in_seg(struct f2fs_sb_info *sbi, diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index f0da516ba8dc..daa94669f7ee 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -3364,6 +3364,65 @@ int f2fs_rw_hint_to_seg_type(enum rw_hint hint) } } +/* + * This returns write hints for each segment type. This hints will be + * passed down to block layer as below by default. + * + * User F2FS Block + * - + * META WRITE_LIFE_NONE|REQ_META + * HOT_NODE WRITE_LIFE_NONE + * WARM_NODEWRITE_LIFE_MEDIUM + * COLD_NODEWRITE_LIFE_LONG + * ioctl(COLD) COLD_DATAWRITE_LIFE_EXTREME + * extension list"" + * + * -- buffered io + * COLD_DATAWRITE_LIFE_EXTREME + * HOT_DATA WRITE_LIFE_SHORT + * WARM_DATAWRITE_LIFE_NOT_SET + * + * -- direct io + * WRITE_LIFE_EXTREMECOLD_DATAWRITE_LIFE_EXTREME + * WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT + * WRITE_LIFE_NOT_SETWARM_DATAWRITE_LIFE_NOT_SET + * WRITE_LIFE_NONE "WRITE_LIFE_NONE + * WRITE_LIFE_MEDIUM "
Re: [f2fs-dev] [PATCH v3] f2fs: zone: don't block IO if there is remained open zone
On 2024/4/17 0:51, Jaegeuk Kim wrote: On 04/16, Chao Yu wrote: On 2024/4/15 22:01, Chao Yu wrote: On 2024/4/15 11:26, Chao Yu wrote: On 2024/4/14 23:19, Jaegeuk Kim wrote: It seems this caused kernel hang. Chao, have you tested this patch enough? Jaegeuk, Oh, I've checked this patch w/ fsstress before submitting it, but missed the SPO testcase... do you encounter kernel hang w/ SPO testcase? I did see any hang issue w/ por_fsstress testcase, which testcase do you use? Sorry, I mean I haven't reproduced it yet... I'd prefer to check this patch later. Have you tested on Zoned device with nullblk? Yes, I enabled blkzoned feature w/ nullblk device, and set /sys/kernel/config/nullb/nullb0/zone_max_open to six, so that it can emulate ZUFS' configuration. Thanks, Thanks, Thanks, Anyway, let me test it more. Thanks, On 04/13, Chao Yu wrote: On 2024/4/13 5:11, Jaegeuk Kim wrote: On 04/07, Chao Yu wrote: max open zone may be larger than log header number of f2fs, for such case, it doesn't need to wait last IO in previous zone, let's introduce available_open_zone semaphore, and reduce it once we submit first write IO in a zone, and increase it after completion of last IO in the zone. Cc: Daeho Jeong Signed-off-by: Chao Yu --- v3: - avoid race condition in between __submit_merged_bio() and __allocate_new_segment(). fs/f2fs/data.c | 105 ++ fs/f2fs/f2fs.h | 34 --- fs/f2fs/iostat.c | 7 fs/f2fs/iostat.h | 2 + fs/f2fs/segment.c | 43 --- fs/f2fs/segment.h | 12 +- fs/f2fs/super.c | 2 + 7 files changed, 156 insertions(+), 49 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 0d88649c60a5..18a4ac0a06bc 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -373,11 +373,10 @@ static void f2fs_write_end_io(struct bio *bio) #ifdef CONFIG_BLK_DEV_ZONED static void f2fs_zone_write_end_io(struct bio *bio) { - struct f2fs_bio_info *io = (struct f2fs_bio_info *)bio->bi_private; + struct f2fs_sb_info *sbi = iostat_get_bio_private(bio); - bio->bi_private = io->bi_private; - complete(>zone_wait); f2fs_write_end_io(bio); + up(>available_open_zones); } #endif @@ -531,6 +530,24 @@ static void __submit_merged_bio(struct f2fs_bio_info *io) if (!io->bio) return; +#ifdef CONFIG_BLK_DEV_ZONED + if (io->open_zone) { + /* + * if there is no open zone, it will wait for last IO in + * previous zone before submitting new IO. + */ + down(>sbi->available_open_zones); + io->open_zone = false; + io->zone_openned = true; + } + + if (io->close_zone) { + io->bio->bi_end_io = f2fs_zone_write_end_io; + io->zone_openned = false; + io->close_zone = false; + } +#endif + if (is_read_io(fio->op)) { trace_f2fs_prepare_read_bio(io->sbi->sb, fio->type, io->bio); f2fs_submit_read_bio(io->sbi, io->bio, fio->type); @@ -601,9 +618,9 @@ int f2fs_init_write_merge_io(struct f2fs_sb_info *sbi) INIT_LIST_HEAD(>write_io[i][j].bio_list); init_f2fs_rwsem(>write_io[i][j].bio_list_lock); #ifdef CONFIG_BLK_DEV_ZONED - init_completion(>write_io[i][j].zone_wait); - sbi->write_io[i][j].zone_pending_bio = NULL; - sbi->write_io[i][j].bi_private = NULL; + sbi->write_io[i][j].open_zone = false; + sbi->write_io[i][j].zone_openned = false; + sbi->write_io[i][j].close_zone = false; #endif } } @@ -634,6 +651,31 @@ static void __f2fs_submit_merged_write(struct f2fs_sb_info *sbi, f2fs_up_write(>io_rwsem); } +void f2fs_blkzoned_submit_merged_write(struct f2fs_sb_info *sbi, int type) +{ +#ifdef CONFIG_BLK_DEV_ZONED + struct f2fs_bio_info *io; + + if (!f2fs_sb_has_blkzoned(sbi)) + return; + + io = sbi->write_io[PAGE_TYPE(type)] + type_to_temp(type); + + f2fs_down_write(>io_rwsem); + if (io->zone_openned) { + if (io->bio) { + io->close_zone = true; + __submit_merged_bio(io); + } else if (io->zone_openned) { + up(>available_open_zones); + io->zone_openned = false; + } + } + f2fs_up_write(>io_rwsem); +#endif + +} + static void __submit_merged_write_cond(struct f2fs_sb_info *sbi, struct inode *inode, struct page *page, nid_t ino, enum page_type type, bool force) @@ -918,22 +960,16 @@ int f2fs_merge_page_bio(struct f2fs_io_info *fio) } #ifdef CONFIG_BLK_DEV_ZONED -static bool is_end_zone_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr) +static bool is_blkaddr_zone_boundary(struct f2fs_sb_info *sbi, + block_t blkaddr,
Re: [f2fs-dev] [PATCH] common/quota: fix keywords of quota feature in _require_prjquota() for f2fs
On 2024/4/16 16:49, Zorro Lang wrote: On Tue, Apr 16, 2024 at 03:18:19PM +0800, Chao Yu wrote: Previously, in f2fs, sysfile quota feature has different name: - "quota" in mkfs.f2fs - and "quota_ino" in dump.f2fs Now, it has unified the name to "quota" since commit 92cc5edeb7 ("f2fs-tools: reuse feature_table to clean up print_sb_state()"). It needs to fix keywords in _require_prjquota() for f2fs, Otherwise, quota testcase will fail. generic/383 1s ... [not run] quota sysfile not enabled in this device /dev/vdc Cc: Jaegeuk Kim Signed-off-by: Chao Yu --- common/quota | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/common/quota b/common/quota index 6b529bf4..cfe3276f 100644 --- a/common/quota +++ b/common/quota @@ -145,7 +145,7 @@ _require_prjquota() if [ "$FSTYP" == "f2fs" ]; then dump.f2fs $_dev 2>&1 | grep -qw project_quota [ $? -ne 0 ] && _notrun "Project quota not enabled in this device $_dev" - dump.f2fs $_dev 2>&1 | grep -qw quota_ino + dump.f2fs $_dev 2>&1 | grep -qw quota This will _notrun on old f2fs-tools, due to `grep -w quota` doesn't match old "quota_ino". So how about grep -Eqw "quota|quota_ino", or any better idea you have. Thanks for your suggestion, I fix this in v2, I've tested v2 w/ old f2fs-tools, it works fine. Thanks, Thanks, Zorro [ $? -ne 0 ] && _notrun "quota sysfile not enabled in this device $_dev" cat /sys/fs/f2fs/features/project_quota | grep -qw supported [ $? -ne 0 ] && _notrun "Installed kernel does not support project quotas" -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH v2] common/quota: update keywords of quota feature in _require_prjquota() for f2fs
Previously, in f2fs, sysfile quota feature has different name: - "quota" in mkfs.f2fs - and "quota_ino" in dump.f2fs Now, it has unified the name to "quota" since commit 92cc5edeb7 ("f2fs-tools: reuse feature_table to clean up print_sb_state()"). It needs to update keywords "quota" in _require_prjquota() for f2fs, Otherwise, quota testcase will fail as below. generic/383 1s ... [not run] quota sysfile not enabled in this device /dev/vdc This patch keeps keywords "quota_ino" in _require_prjquota() to keep compatibility for old f2fs-tools. Cc: Jaegeuk Kim Signed-off-by: Chao Yu --- v2: - keep keywords "quota_ino" for compatibility of old f2fs-tools suggested by Zorro Lang. common/quota | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/common/quota b/common/quota index 6b529bf4..4c1d3dcd 100644 --- a/common/quota +++ b/common/quota @@ -145,7 +145,7 @@ _require_prjquota() if [ "$FSTYP" == "f2fs" ]; then dump.f2fs $_dev 2>&1 | grep -qw project_quota [ $? -ne 0 ] && _notrun "Project quota not enabled in this device $_dev" - dump.f2fs $_dev 2>&1 | grep -qw quota_ino + dump.f2fs $_dev 2>&1 | grep -Eqw "quota|quota_ino" [ $? -ne 0 ] && _notrun "quota sysfile not enabled in this device $_dev" cat /sys/fs/f2fs/features/project_quota | grep -qw supported [ $? -ne 0 ] && _notrun "Installed kernel does not support project quotas" -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH 4/4] f2fs: convert f2fs__page tracepoint class to use folio
Convert f2fs__page tracepoint class() and its instances to use folio and related functionality, and rename it to f2fs__folio(). Signed-off-by: Chao Yu --- fs/f2fs/checkpoint.c| 4 ++-- fs/f2fs/data.c | 10 - fs/f2fs/node.c | 4 ++-- include/trace/events/f2fs.h | 42 ++--- 4 files changed, 30 insertions(+), 30 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index eac698b8dd38..5d05a413f451 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -345,7 +345,7 @@ static int __f2fs_write_meta_page(struct page *page, { struct f2fs_sb_info *sbi = F2FS_P_SB(page); - trace_f2fs_writepage(page, META); + trace_f2fs_writepage(page_folio(page), META); if (unlikely(f2fs_cp_error(sbi))) { if (is_sbi_flag_set(sbi, SBI_IS_CLOSE)) { @@ -492,7 +492,7 @@ long f2fs_sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type, static bool f2fs_dirty_meta_folio(struct address_space *mapping, struct folio *folio) { - trace_f2fs_set_page_dirty(>page, META); + trace_f2fs_set_page_dirty(folio, META); if (!folio_test_uptodate(folio)) folio_mark_uptodate(folio); diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 3eb90b9b0f8b..cf6d31e3e630 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -2490,7 +2490,7 @@ static int f2fs_read_data_folio(struct file *file, struct folio *folio) struct inode *inode = folio_file_mapping(folio)->host; int ret = -EAGAIN; - trace_f2fs_readpage(>page, DATA); + trace_f2fs_readpage(folio, DATA); if (!f2fs_is_compress_backend_ready(inode)) { folio_unlock(folio); @@ -2739,7 +2739,7 @@ int f2fs_do_write_data_page(struct f2fs_io_info *fio) } else { set_inode_flag(inode, FI_UPDATE_WRITE); } - trace_f2fs_do_write_data_page(fio->page, IPU); + trace_f2fs_do_write_data_page(page_folio(page), IPU); return err; } @@ -2768,7 +2768,7 @@ int f2fs_do_write_data_page(struct f2fs_io_info *fio) /* LFS mode write path */ f2fs_outplace_write_data(, fio); - trace_f2fs_do_write_data_page(page, OPU); + trace_f2fs_do_write_data_page(page_folio(page), OPU); set_inode_flag(inode, FI_APPEND_WRITE); out_writepage: f2fs_put_dnode(); @@ -2815,7 +2815,7 @@ int f2fs_write_single_data_page(struct page *page, int *submitted, .last_block = last_block, }; - trace_f2fs_writepage(page, DATA); + trace_f2fs_writepage(page_folio(page), DATA); /* we should bypass data pages to proceed the kworker jobs */ if (unlikely(f2fs_cp_error(sbi))) { @@ -3789,7 +3789,7 @@ static bool f2fs_dirty_data_folio(struct address_space *mapping, { struct inode *inode = mapping->host; - trace_f2fs_set_page_dirty(>page, DATA); + trace_f2fs_set_page_dirty(folio, DATA); if (!folio_test_uptodate(folio)) folio_mark_uptodate(folio); diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 3b9eb5693683..95cecf08cb37 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1624,7 +1624,7 @@ static int __write_node_page(struct page *page, bool atomic, bool *submitted, }; unsigned int seq; - trace_f2fs_writepage(page, NODE); + trace_f2fs_writepage(page_folio(page), NODE); if (unlikely(f2fs_cp_error(sbi))) { /* keep node pages in remount-ro mode */ @@ -2171,7 +2171,7 @@ static int f2fs_write_node_pages(struct address_space *mapping, static bool f2fs_dirty_node_folio(struct address_space *mapping, struct folio *folio) { - trace_f2fs_set_page_dirty(>page, NODE); + trace_f2fs_set_page_dirty(folio, NODE); if (!folio_test_uptodate(folio)) folio_mark_uptodate(folio); diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h index 7ed0fc430dc6..371ba28415f5 100644 --- a/include/trace/events/f2fs.h +++ b/include/trace/events/f2fs.h @@ -1304,11 +1304,11 @@ TRACE_EVENT(f2fs_write_end, __entry->copied) ); -DECLARE_EVENT_CLASS(f2fs__page, +DECLARE_EVENT_CLASS(f2fs__folio, - TP_PROTO(struct page *page, int type), + TP_PROTO(struct folio *folio, int type), - TP_ARGS(page, type), + TP_ARGS(folio, type), TP_STRUCT__entry( __field(dev_t, dev) @@ -1321,14 +1321,14 @@ DECLARE_EVENT_CLASS(f2fs__page, ), TP_fast_assign( - __entry->dev= page_file_mapping(page)->host->i_sb->s_dev; - __entry->ino= page_file_mapping(page)->host->i_ino; + __entry->dev= folio_file_mapping(folio)->host->i_sb->s_dev; + __entry->ino= folio_file_mapping
[f2fs-dev] [PATCH 1/4] f2fs: convert f2fs_mpage_readpages() to use folio
Convert f2fs_mpage_readpages() to use folio and related functionality. Signed-off-by: Chao Yu --- fs/f2fs/data.c | 80 +- 1 file changed, 40 insertions(+), 40 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 9c5512be1a1b..14dcd621acaa 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -2374,7 +2374,7 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret, * Major change was from block_size == page_size in f2fs by default. */ static int f2fs_mpage_readpages(struct inode *inode, - struct readahead_control *rac, struct page *page) + struct readahead_control *rac, struct folio *folio) { struct bio *bio = NULL; sector_t last_block_in_bio = 0; @@ -2394,6 +2394,7 @@ static int f2fs_mpage_readpages(struct inode *inode, #endif unsigned nr_pages = rac ? readahead_count(rac) : 1; unsigned max_nr_pages = nr_pages; + pgoff_t index; int ret = 0; map.m_pblk = 0; @@ -2407,64 +2408,63 @@ static int f2fs_mpage_readpages(struct inode *inode, for (; nr_pages; nr_pages--) { if (rac) { - page = readahead_page(rac); - prefetchw(>flags); + folio = readahead_folio(rac); + prefetchw(>flags); } -#ifdef CONFIG_F2FS_FS_COMPRESSION - if (f2fs_compressed_file(inode)) { - /* there are remained compressed pages, submit them */ - if (!f2fs_cluster_can_merge_page(, page->index)) { - ret = f2fs_read_multi_pages(, , - max_nr_pages, - _block_in_bio, - rac != NULL, false); - f2fs_destroy_compress_ctx(, false); - if (ret) - goto set_error_page; - } - if (cc.cluster_idx == NULL_CLUSTER) { - if (nc_cluster_idx == - page->index >> cc.log_cluster_size) { - goto read_single_page; - } - - ret = f2fs_is_compressed_cluster(inode, page->index); - if (ret < 0) - goto set_error_page; - else if (!ret) { - nc_cluster_idx = - page->index >> cc.log_cluster_size; - goto read_single_page; - } + index = folio_index(folio); - nc_cluster_idx = NULL_CLUSTER; - } - ret = f2fs_init_compress_ctx(); +#ifdef CONFIG_F2FS_FS_COMPRESSION + if (!f2fs_compressed_file(inode)) + goto read_single_page; + + /* there are remained compressed pages, submit them */ + if (!f2fs_cluster_can_merge_page(, index)) { + ret = f2fs_read_multi_pages(, , + max_nr_pages, + _block_in_bio, + rac != NULL, false); + f2fs_destroy_compress_ctx(, false); if (ret) goto set_error_page; + } + if (cc.cluster_idx == NULL_CLUSTER) { + if (nc_cluster_idx == index >> cc.log_cluster_size) + goto read_single_page; - f2fs_compress_ctx_add_page(, page); + ret = f2fs_is_compressed_cluster(inode, index); + if (ret < 0) + goto set_error_page; + else if (!ret) { + nc_cluster_idx = + index >> cc.log_cluster_size; + goto read_single_page; + } - goto next_page; + nc_cluster_idx = NULL_CLUSTER; } + ret = f2fs_init_compress_ctx(); + if (ret) + goto set_error_page; + + f2fs_compress_ctx_add_page(, >page); + + goto next_page; read_single_page: #endif - ret = f2fs_read_single_page(inode, page, max_nr_pages, , + ret = f2fs_read_single_page(inode, >page, max_nr_pages, ,
[f2fs-dev] [PATCH 3/4] f2fs: convert f2fs_read_inline_data() to use folio
Convert f2fs_read_inline_data() to use folio and related functionality, and also convert its caller to use folio. Signed-off-by: Chao Yu --- fs/f2fs/data.c | 11 +-- fs/f2fs/f2fs.h | 4 ++-- fs/f2fs/inline.c | 34 +- 3 files changed, 24 insertions(+), 25 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index c35107657c97..3eb90b9b0f8b 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -2487,20 +2487,19 @@ static int f2fs_mpage_readpages(struct inode *inode, static int f2fs_read_data_folio(struct file *file, struct folio *folio) { - struct page *page = >page; - struct inode *inode = page_file_mapping(page)->host; + struct inode *inode = folio_file_mapping(folio)->host; int ret = -EAGAIN; - trace_f2fs_readpage(page, DATA); + trace_f2fs_readpage(>page, DATA); if (!f2fs_is_compress_backend_ready(inode)) { - unlock_page(page); + folio_unlock(folio); return -EOPNOTSUPP; } /* If the file has inline data, try to read it directly */ if (f2fs_has_inline_data(inode)) - ret = f2fs_read_inline_data(inode, page); + ret = f2fs_read_inline_data(inode, folio); if (ret == -EAGAIN) ret = f2fs_mpage_readpages(inode, NULL, folio); return ret; @@ -3429,7 +3428,7 @@ static int prepare_write_begin(struct f2fs_sb_info *sbi, if (f2fs_has_inline_data(inode)) { if (pos + len <= MAX_INLINE_DATA(inode)) { - f2fs_do_read_inline_data(page, ipage); + f2fs_do_read_inline_data(page_folio(page), ipage); set_inode_flag(inode, FI_DATA_EXIST); if (inode->i_nlink) set_page_private_inline(ipage); diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 34acd791c198..13dee521fbe8 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -4153,10 +4153,10 @@ extern struct kmem_cache *f2fs_inode_entry_slab; bool f2fs_may_inline_data(struct inode *inode); bool f2fs_sanity_check_inline_data(struct inode *inode); bool f2fs_may_inline_dentry(struct inode *inode); -void f2fs_do_read_inline_data(struct page *page, struct page *ipage); +void f2fs_do_read_inline_data(struct folio *folio, struct page *ipage); void f2fs_truncate_inline_inode(struct inode *inode, struct page *ipage, u64 from); -int f2fs_read_inline_data(struct inode *inode, struct page *page); +int f2fs_read_inline_data(struct inode *inode, struct folio *folio); int f2fs_convert_inline_page(struct dnode_of_data *dn, struct page *page); int f2fs_convert_inline_inode(struct inode *inode); int f2fs_try_convert_inline_dir(struct inode *dir, struct dentry *dentry); diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c index 3d3218a4b29d..7638d0d7b7ee 100644 --- a/fs/f2fs/inline.c +++ b/fs/f2fs/inline.c @@ -61,22 +61,22 @@ bool f2fs_may_inline_dentry(struct inode *inode) return true; } -void f2fs_do_read_inline_data(struct page *page, struct page *ipage) +void f2fs_do_read_inline_data(struct folio *folio, struct page *ipage) { - struct inode *inode = page->mapping->host; + struct inode *inode = folio_file_mapping(folio)->host; - if (PageUptodate(page)) + if (folio_test_uptodate(folio)) return; - f2fs_bug_on(F2FS_P_SB(page), page->index); + f2fs_bug_on(F2FS_I_SB(inode), folio_index(folio)); - zero_user_segment(page, MAX_INLINE_DATA(inode), PAGE_SIZE); + folio_zero_segment(folio, MAX_INLINE_DATA(inode), folio_size(folio)); /* Copy the whole inline data block */ - memcpy_to_page(page, 0, inline_data_addr(inode, ipage), + memcpy_to_folio(folio, 0, inline_data_addr(inode, ipage), MAX_INLINE_DATA(inode)); - if (!PageUptodate(page)) - SetPageUptodate(page); + if (!folio_test_uptodate(folio)) + folio_mark_uptodate(folio); } void f2fs_truncate_inline_inode(struct inode *inode, @@ -97,13 +97,13 @@ void f2fs_truncate_inline_inode(struct inode *inode, clear_inode_flag(inode, FI_DATA_EXIST); } -int f2fs_read_inline_data(struct inode *inode, struct page *page) +int f2fs_read_inline_data(struct inode *inode, struct folio *folio) { struct page *ipage; ipage = f2fs_get_node_page(F2FS_I_SB(inode), inode->i_ino); if (IS_ERR(ipage)) { - unlock_page(page); + folio_unlock(folio); return PTR_ERR(ipage); } @@ -112,15 +112,15 @@ int f2fs_read_inline_data(struct inode *inode, struct page *page) return -EAGAIN; } - if (page->index) - zero_user_segment(page, 0, PAGE_SIZE); + if (folio_index(folio)) + folio_zero_segment(folio,
[f2fs-dev] [PATCH 2/4] f2fs: convert f2fs_read_single_page() to use folio
Convert f2fs_read_single_page() to use folio and related functionality. Signed-off-by: Chao Yu --- fs/f2fs/data.c | 27 ++- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 14dcd621acaa..c35107657c97 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -2092,7 +2092,7 @@ static inline loff_t f2fs_readpage_limit(struct inode *inode) return i_size_read(inode); } -static int f2fs_read_single_page(struct inode *inode, struct page *page, +static int f2fs_read_single_page(struct inode *inode, struct folio *folio, unsigned nr_pages, struct f2fs_map_blocks *map, struct bio **bio_ret, @@ -2105,9 +2105,10 @@ static int f2fs_read_single_page(struct inode *inode, struct page *page, sector_t last_block; sector_t last_block_in_file; sector_t block_nr; + pgoff_t index = folio_index(folio); int ret = 0; - block_in_file = (sector_t)page_index(page); + block_in_file = (sector_t)index; last_block = block_in_file + nr_pages; last_block_in_file = bytes_to_blks(inode, f2fs_readpage_limit(inode) + blocksize - 1); @@ -2138,7 +2139,7 @@ static int f2fs_read_single_page(struct inode *inode, struct page *page, got_it: if ((map->m_flags & F2FS_MAP_MAPPED)) { block_nr = map->m_pblk + block_in_file - map->m_lblk; - SetPageMappedToDisk(page); + folio_set_mappedtodisk(folio); if (!f2fs_is_valid_blkaddr(F2FS_I_SB(inode), block_nr, DATA_GENERIC_ENHANCE_READ)) { @@ -2147,15 +2148,15 @@ static int f2fs_read_single_page(struct inode *inode, struct page *page, } } else { zero_out: - zero_user_segment(page, 0, PAGE_SIZE); - if (f2fs_need_verity(inode, page->index) && - !fsverity_verify_page(page)) { + folio_zero_segment(folio, 0, folio_size(folio)); + if (f2fs_need_verity(inode, index) && + !fsverity_verify_folio(folio)) { ret = -EIO; goto out; } - if (!PageUptodate(page)) - SetPageUptodate(page); - unlock_page(page); + if (!folio_test_uptodate(folio)) + folio_mark_uptodate(folio); + folio_unlock(folio); goto out; } @@ -2165,14 +2166,14 @@ static int f2fs_read_single_page(struct inode *inode, struct page *page, */ if (bio && (!page_is_mergeable(F2FS_I_SB(inode), bio, *last_block_in_bio, block_nr) || - !f2fs_crypt_mergeable_bio(bio, inode, page->index, NULL))) { + !f2fs_crypt_mergeable_bio(bio, inode, index, NULL))) { submit_and_realloc: f2fs_submit_read_bio(F2FS_I_SB(inode), bio, DATA); bio = NULL; } if (bio == NULL) { bio = f2fs_grab_read_bio(inode, block_nr, nr_pages, - is_readahead ? REQ_RAHEAD : 0, page->index, + is_readahead ? REQ_RAHEAD : 0, index, false); if (IS_ERR(bio)) { ret = PTR_ERR(bio); @@ -2187,7 +2188,7 @@ static int f2fs_read_single_page(struct inode *inode, struct page *page, */ f2fs_wait_on_block_writeback(inode, block_nr); - if (bio_add_page(bio, page, blocksize, 0) < blocksize) + if (!bio_add_folio(bio, folio, blocksize, 0)) goto submit_and_realloc; inc_page_count(F2FS_I_SB(inode), F2FS_RD_DATA); @@ -2453,7 +2454,7 @@ static int f2fs_mpage_readpages(struct inode *inode, read_single_page: #endif - ret = f2fs_read_single_page(inode, >page, max_nr_pages, , + ret = f2fs_read_single_page(inode, folio, max_nr_pages, , , _block_in_bio, rac); if (ret) { #ifdef CONFIG_F2FS_FS_COMPRESSION -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH 2/2] f2fs: remove unnecessary block size check in init_f2fs_fs()
After commit d7e9a9037de2 ("f2fs: Support Block Size == Page Size"), F2FS_BLKSIZE equals to PAGE_SIZE, remove unnecessary check condition. Signed-off-by: Chao Yu --- fs/f2fs/super.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 6d1e4fc629e2..32aa6d6fa871 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -4933,12 +4933,6 @@ static int __init init_f2fs_fs(void) { int err; - if (PAGE_SIZE != F2FS_BLKSIZE) { - printk("F2FS not supported on PAGE_SIZE(%lu) != BLOCK_SIZE(%lu)\n", - PAGE_SIZE, F2FS_BLKSIZE); - return -EINVAL; - } - err = init_inodecache(); if (err) goto fail; -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH 1/2] f2fs: fix comment in sanity_check_raw_super()
Commit d7e9a9037de2 ("f2fs: Support Block Size == Page Size") missed to adjust comment in sanity_check_raw_super(), fix it. Signed-off-by: Chao Yu --- fs/f2fs/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 0a34c8746782..6d1e4fc629e2 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -3456,7 +3456,7 @@ static int sanity_check_raw_super(struct f2fs_sb_info *sbi, } } - /* Currently, support only 4KB block size */ + /* only support block_size equals to PAGE_SIZE */ if (le32_to_cpu(raw_super->log_blocksize) != F2FS_BLKSIZE_BITS) { f2fs_info(sbi, "Invalid log_blocksize (%u), supports only %u", le32_to_cpu(raw_super->log_blocksize), -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH] common/quota: fix keywords of quota feature in _require_prjquota() for f2fs
Previously, in f2fs, sysfile quota feature has different name: - "quota" in mkfs.f2fs - and "quota_ino" in dump.f2fs Now, it has unified the name to "quota" since commit 92cc5edeb7 ("f2fs-tools: reuse feature_table to clean up print_sb_state()"). It needs to fix keywords in _require_prjquota() for f2fs, Otherwise, quota testcase will fail. generic/383 1s ... [not run] quota sysfile not enabled in this device /dev/vdc Cc: Jaegeuk Kim Signed-off-by: Chao Yu --- common/quota | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/common/quota b/common/quota index 6b529bf4..cfe3276f 100644 --- a/common/quota +++ b/common/quota @@ -145,7 +145,7 @@ _require_prjquota() if [ "$FSTYP" == "f2fs" ]; then dump.f2fs $_dev 2>&1 | grep -qw project_quota [ $? -ne 0 ] && _notrun "Project quota not enabled in this device $_dev" - dump.f2fs $_dev 2>&1 | grep -qw quota_ino + dump.f2fs $_dev 2>&1 | grep -qw quota [ $? -ne 0 ] && _notrun "quota sysfile not enabled in this device $_dev" cat /sys/fs/f2fs/features/project_quota | grep -qw supported [ $? -ne 0 ] && _notrun "Installed kernel does not support project quotas" -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH] mkfs.f2fs: add description for ro feature in manual
Add missing description for readonly feature in manual of mkfs.f2fs. Signed-off-by: Chao Yu --- man/mkfs.f2fs.8 | 3 +++ 1 file changed, 3 insertions(+) diff --git a/man/mkfs.f2fs.8 b/man/mkfs.f2fs.8 index 0dc367b..1f0c724 100644 --- a/man/mkfs.f2fs.8 +++ b/man/mkfs.f2fs.8 @@ -208,6 +208,9 @@ Enable casefolding support in the filesystem. Optional flags can be passed with .TP .B compression Enable support for filesystem level compression. Requires extra attr. +.TP +.B ro +Enable readonly feature to eliminate OVP/SSA on-disk layout for small readonly partition. .RE .TP .BI \-C " encoding:flags" -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH v3] f2fs: zone: don't block IO if there is remained open zone
On 2024/4/15 22:01, Chao Yu wrote: On 2024/4/15 11:26, Chao Yu wrote: On 2024/4/14 23:19, Jaegeuk Kim wrote: It seems this caused kernel hang. Chao, have you tested this patch enough? Jaegeuk, Oh, I've checked this patch w/ fsstress before submitting it, but missed the SPO testcase... do you encounter kernel hang w/ SPO testcase? I did see any hang issue w/ por_fsstress testcase, which testcase do you use? Sorry, I mean I haven't reproduced it yet... Thanks, Thanks, Anyway, let me test it more. Thanks, On 04/13, Chao Yu wrote: On 2024/4/13 5:11, Jaegeuk Kim wrote: On 04/07, Chao Yu wrote: max open zone may be larger than log header number of f2fs, for such case, it doesn't need to wait last IO in previous zone, let's introduce available_open_zone semaphore, and reduce it once we submit first write IO in a zone, and increase it after completion of last IO in the zone. Cc: Daeho Jeong Signed-off-by: Chao Yu --- v3: - avoid race condition in between __submit_merged_bio() and __allocate_new_segment(). fs/f2fs/data.c | 105 ++ fs/f2fs/f2fs.h | 34 --- fs/f2fs/iostat.c | 7 fs/f2fs/iostat.h | 2 + fs/f2fs/segment.c | 43 --- fs/f2fs/segment.h | 12 +- fs/f2fs/super.c | 2 + 7 files changed, 156 insertions(+), 49 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 0d88649c60a5..18a4ac0a06bc 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -373,11 +373,10 @@ static void f2fs_write_end_io(struct bio *bio) #ifdef CONFIG_BLK_DEV_ZONED static void f2fs_zone_write_end_io(struct bio *bio) { - struct f2fs_bio_info *io = (struct f2fs_bio_info *)bio->bi_private; + struct f2fs_sb_info *sbi = iostat_get_bio_private(bio); - bio->bi_private = io->bi_private; - complete(>zone_wait); f2fs_write_end_io(bio); + up(>available_open_zones); } #endif @@ -531,6 +530,24 @@ static void __submit_merged_bio(struct f2fs_bio_info *io) if (!io->bio) return; +#ifdef CONFIG_BLK_DEV_ZONED + if (io->open_zone) { + /* + * if there is no open zone, it will wait for last IO in + * previous zone before submitting new IO. + */ + down(>sbi->available_open_zones); + io->open_zone = false; + io->zone_openned = true; + } + + if (io->close_zone) { + io->bio->bi_end_io = f2fs_zone_write_end_io; + io->zone_openned = false; + io->close_zone = false; + } +#endif + if (is_read_io(fio->op)) { trace_f2fs_prepare_read_bio(io->sbi->sb, fio->type, io->bio); f2fs_submit_read_bio(io->sbi, io->bio, fio->type); @@ -601,9 +618,9 @@ int f2fs_init_write_merge_io(struct f2fs_sb_info *sbi) INIT_LIST_HEAD(>write_io[i][j].bio_list); init_f2fs_rwsem(>write_io[i][j].bio_list_lock); #ifdef CONFIG_BLK_DEV_ZONED - init_completion(>write_io[i][j].zone_wait); - sbi->write_io[i][j].zone_pending_bio = NULL; - sbi->write_io[i][j].bi_private = NULL; + sbi->write_io[i][j].open_zone = false; + sbi->write_io[i][j].zone_openned = false; + sbi->write_io[i][j].close_zone = false; #endif } } @@ -634,6 +651,31 @@ static void __f2fs_submit_merged_write(struct f2fs_sb_info *sbi, f2fs_up_write(>io_rwsem); } +void f2fs_blkzoned_submit_merged_write(struct f2fs_sb_info *sbi, int type) +{ +#ifdef CONFIG_BLK_DEV_ZONED + struct f2fs_bio_info *io; + + if (!f2fs_sb_has_blkzoned(sbi)) + return; + + io = sbi->write_io[PAGE_TYPE(type)] + type_to_temp(type); + + f2fs_down_write(>io_rwsem); + if (io->zone_openned) { + if (io->bio) { + io->close_zone = true; + __submit_merged_bio(io); + } else if (io->zone_openned) { + up(>available_open_zones); + io->zone_openned = false; + } + } + f2fs_up_write(>io_rwsem); +#endif + +} + static void __submit_merged_write_cond(struct f2fs_sb_info *sbi, struct inode *inode, struct page *page, nid_t ino, enum page_type type, bool force) @@ -918,22 +960,16 @@ int f2fs_merge_page_bio(struct f2fs_io_info *fio) } #ifdef CONFIG_BLK_DEV_ZONED -static bool is_end_zone_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr) +static bool is_blkaddr_zone_boundary(struct f2fs_sb_info *sbi, + block_t blkaddr, bool start) { - int devi = 0; + if (!f2fs_blkaddr_in_seqzone(sbi, blkaddr)) + return false; + + if (start) + return (blkaddr % sbi->blocks_per_blkz) == 0; + return (blkaddr % sbi->blocks_per_blkz == sbi->blocks_per_blkz - 1); - if (f2fs_is_multi_device(sbi)) { - devi = f2fs_target_device_index(sbi, blkaddr); -
Re: [f2fs-dev] [PATCH v3] f2fs: zone: don't block IO if there is remained open zone
On 2024/4/15 11:26, Chao Yu wrote: On 2024/4/14 23:19, Jaegeuk Kim wrote: It seems this caused kernel hang. Chao, have you tested this patch enough? Jaegeuk, Oh, I've checked this patch w/ fsstress before submitting it, but missed the SPO testcase... do you encounter kernel hang w/ SPO testcase? I did see any hang issue w/ por_fsstress testcase, which testcase do you use? Thanks, Anyway, let me test it more. Thanks, On 04/13, Chao Yu wrote: On 2024/4/13 5:11, Jaegeuk Kim wrote: On 04/07, Chao Yu wrote: max open zone may be larger than log header number of f2fs, for such case, it doesn't need to wait last IO in previous zone, let's introduce available_open_zone semaphore, and reduce it once we submit first write IO in a zone, and increase it after completion of last IO in the zone. Cc: Daeho Jeong Signed-off-by: Chao Yu --- v3: - avoid race condition in between __submit_merged_bio() and __allocate_new_segment(). fs/f2fs/data.c | 105 ++ fs/f2fs/f2fs.h | 34 --- fs/f2fs/iostat.c | 7 fs/f2fs/iostat.h | 2 + fs/f2fs/segment.c | 43 --- fs/f2fs/segment.h | 12 +- fs/f2fs/super.c | 2 + 7 files changed, 156 insertions(+), 49 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 0d88649c60a5..18a4ac0a06bc 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -373,11 +373,10 @@ static void f2fs_write_end_io(struct bio *bio) #ifdef CONFIG_BLK_DEV_ZONED static void f2fs_zone_write_end_io(struct bio *bio) { - struct f2fs_bio_info *io = (struct f2fs_bio_info *)bio->bi_private; + struct f2fs_sb_info *sbi = iostat_get_bio_private(bio); - bio->bi_private = io->bi_private; - complete(>zone_wait); f2fs_write_end_io(bio); + up(>available_open_zones); } #endif @@ -531,6 +530,24 @@ static void __submit_merged_bio(struct f2fs_bio_info *io) if (!io->bio) return; +#ifdef CONFIG_BLK_DEV_ZONED + if (io->open_zone) { + /* + * if there is no open zone, it will wait for last IO in + * previous zone before submitting new IO. + */ + down(>sbi->available_open_zones); + io->open_zone = false; + io->zone_openned = true; + } + + if (io->close_zone) { + io->bio->bi_end_io = f2fs_zone_write_end_io; + io->zone_openned = false; + io->close_zone = false; + } +#endif + if (is_read_io(fio->op)) { trace_f2fs_prepare_read_bio(io->sbi->sb, fio->type, io->bio); f2fs_submit_read_bio(io->sbi, io->bio, fio->type); @@ -601,9 +618,9 @@ int f2fs_init_write_merge_io(struct f2fs_sb_info *sbi) INIT_LIST_HEAD(>write_io[i][j].bio_list); init_f2fs_rwsem(>write_io[i][j].bio_list_lock); #ifdef CONFIG_BLK_DEV_ZONED - init_completion(>write_io[i][j].zone_wait); - sbi->write_io[i][j].zone_pending_bio = NULL; - sbi->write_io[i][j].bi_private = NULL; + sbi->write_io[i][j].open_zone = false; + sbi->write_io[i][j].zone_openned = false; + sbi->write_io[i][j].close_zone = false; #endif } } @@ -634,6 +651,31 @@ static void __f2fs_submit_merged_write(struct f2fs_sb_info *sbi, f2fs_up_write(>io_rwsem); } +void f2fs_blkzoned_submit_merged_write(struct f2fs_sb_info *sbi, int type) +{ +#ifdef CONFIG_BLK_DEV_ZONED + struct f2fs_bio_info *io; + + if (!f2fs_sb_has_blkzoned(sbi)) + return; + + io = sbi->write_io[PAGE_TYPE(type)] + type_to_temp(type); + + f2fs_down_write(>io_rwsem); + if (io->zone_openned) { + if (io->bio) { + io->close_zone = true; + __submit_merged_bio(io); + } else if (io->zone_openned) { + up(>available_open_zones); + io->zone_openned = false; + } + } + f2fs_up_write(>io_rwsem); +#endif + +} + static void __submit_merged_write_cond(struct f2fs_sb_info *sbi, struct inode *inode, struct page *page, nid_t ino, enum page_type type, bool force) @@ -918,22 +960,16 @@ int f2fs_merge_page_bio(struct f2fs_io_info *fio) } #ifdef CONFIG_BLK_DEV_ZONED -static bool is_end_zone_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr) +static bool is_blkaddr_zone_boundary(struct f2fs_sb_info *sbi, + block_t blkaddr, bool start) { - int devi = 0; + if (!f2fs_blkaddr_in_seqzone(sbi, blkaddr)) + return false; + + if (start) + return (blkaddr % sbi->blocks_per_blkz) == 0; + return (blkaddr % sbi->blocks_per_blkz == sbi->blocks_per_blkz - 1); - if (f2fs_is_multi_device(sbi)) { - devi = f2fs_target_device_index(sbi, blkaddr); - if (blkaddr < FDEV(devi).start_blk || - blkaddr > FDEV(devi).end_b
Re: [f2fs-dev] [PATCH v3] f2fs: zone: don't block IO if there is remained open zone
On 2024/4/14 23:19, Jaegeuk Kim wrote: It seems this caused kernel hang. Chao, have you tested this patch enough? Jaegeuk, Oh, I've checked this patch w/ fsstress before submitting it, but missed the SPO testcase... do you encounter kernel hang w/ SPO testcase? Anyway, let me test it more. Thanks, On 04/13, Chao Yu wrote: On 2024/4/13 5:11, Jaegeuk Kim wrote: On 04/07, Chao Yu wrote: max open zone may be larger than log header number of f2fs, for such case, it doesn't need to wait last IO in previous zone, let's introduce available_open_zone semaphore, and reduce it once we submit first write IO in a zone, and increase it after completion of last IO in the zone. Cc: Daeho Jeong Signed-off-by: Chao Yu --- v3: - avoid race condition in between __submit_merged_bio() and __allocate_new_segment(). fs/f2fs/data.c| 105 ++ fs/f2fs/f2fs.h| 34 --- fs/f2fs/iostat.c | 7 fs/f2fs/iostat.h | 2 + fs/f2fs/segment.c | 43 --- fs/f2fs/segment.h | 12 +- fs/f2fs/super.c | 2 + 7 files changed, 156 insertions(+), 49 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 0d88649c60a5..18a4ac0a06bc 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -373,11 +373,10 @@ static void f2fs_write_end_io(struct bio *bio) #ifdef CONFIG_BLK_DEV_ZONED static void f2fs_zone_write_end_io(struct bio *bio) { - struct f2fs_bio_info *io = (struct f2fs_bio_info *)bio->bi_private; + struct f2fs_sb_info *sbi = iostat_get_bio_private(bio); - bio->bi_private = io->bi_private; - complete(>zone_wait); f2fs_write_end_io(bio); + up(>available_open_zones); } #endif @@ -531,6 +530,24 @@ static void __submit_merged_bio(struct f2fs_bio_info *io) if (!io->bio) return; +#ifdef CONFIG_BLK_DEV_ZONED + if (io->open_zone) { + /* +* if there is no open zone, it will wait for last IO in +* previous zone before submitting new IO. +*/ + down(>sbi->available_open_zones); + io->open_zone = false; + io->zone_openned = true; + } + + if (io->close_zone) { + io->bio->bi_end_io = f2fs_zone_write_end_io; + io->zone_openned = false; + io->close_zone = false; + } +#endif + if (is_read_io(fio->op)) { trace_f2fs_prepare_read_bio(io->sbi->sb, fio->type, io->bio); f2fs_submit_read_bio(io->sbi, io->bio, fio->type); @@ -601,9 +618,9 @@ int f2fs_init_write_merge_io(struct f2fs_sb_info *sbi) INIT_LIST_HEAD(>write_io[i][j].bio_list); init_f2fs_rwsem(>write_io[i][j].bio_list_lock); #ifdef CONFIG_BLK_DEV_ZONED - init_completion(>write_io[i][j].zone_wait); - sbi->write_io[i][j].zone_pending_bio = NULL; - sbi->write_io[i][j].bi_private = NULL; + sbi->write_io[i][j].open_zone = false; + sbi->write_io[i][j].zone_openned = false; + sbi->write_io[i][j].close_zone = false; #endif } } @@ -634,6 +651,31 @@ static void __f2fs_submit_merged_write(struct f2fs_sb_info *sbi, f2fs_up_write(>io_rwsem); } +void f2fs_blkzoned_submit_merged_write(struct f2fs_sb_info *sbi, int type) +{ +#ifdef CONFIG_BLK_DEV_ZONED + struct f2fs_bio_info *io; + + if (!f2fs_sb_has_blkzoned(sbi)) + return; + + io = sbi->write_io[PAGE_TYPE(type)] + type_to_temp(type); + + f2fs_down_write(>io_rwsem); + if (io->zone_openned) { + if (io->bio) { + io->close_zone = true; + __submit_merged_bio(io); + } else if (io->zone_openned) { + up(>available_open_zones); + io->zone_openned = false; + } + } + f2fs_up_write(>io_rwsem); +#endif + +} + static void __submit_merged_write_cond(struct f2fs_sb_info *sbi, struct inode *inode, struct page *page, nid_t ino, enum page_type type, bool force) @@ -918,22 +960,16 @@ int f2fs_merge_page_bio(struct f2fs_io_info *fio) } #ifdef CONFIG_BLK_DEV_ZONED -static bool is_end_zone_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr) +static bool is_blkaddr_zone_boundary(struct f2fs_sb_info *sbi, + block_t blkaddr, bool start) { - int devi = 0; + if (!f2fs_blkaddr_in_seqzone(sbi, blkaddr)) + return false; + + if (start) + return (blkaddr % sbi->blocks_per_blkz) == 0; + r
Re: [f2fs-dev] [PATCH 2/2] f2fs: allow direct io of pinned files for zoned storage
On 2024/4/12 2:37, Daeho Jeong wrote: From: Daeho Jeong Since the allocation happens in conventional LU for zoned storage, we can allow direct io for that. Signed-off-by: Daeho Jeong Reviewed-by: Chao Yu Thanks, ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 1/2] f2fs: prevent writing without fallocate() for pinned files
On 2024/4/12 1:54, Daeho Jeong wrote: From: Daeho Jeong In a case writing without fallocate(), we can't guarantee it's allocated in the conventional area for zoned stroage. To make it consistent across storage devices, we disallow it regardless of storage device types. Signed-off-by: Daeho Jeong Reviewed-by: Chao Yu Thanks, ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH v3] f2fs: zone: don't block IO if there is remained open zone
On 2024/4/13 5:11, Jaegeuk Kim wrote: On 04/07, Chao Yu wrote: max open zone may be larger than log header number of f2fs, for such case, it doesn't need to wait last IO in previous zone, let's introduce available_open_zone semaphore, and reduce it once we submit first write IO in a zone, and increase it after completion of last IO in the zone. Cc: Daeho Jeong Signed-off-by: Chao Yu --- v3: - avoid race condition in between __submit_merged_bio() and __allocate_new_segment(). fs/f2fs/data.c| 105 ++ fs/f2fs/f2fs.h| 34 --- fs/f2fs/iostat.c | 7 fs/f2fs/iostat.h | 2 + fs/f2fs/segment.c | 43 --- fs/f2fs/segment.h | 12 +- fs/f2fs/super.c | 2 + 7 files changed, 156 insertions(+), 49 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 0d88649c60a5..18a4ac0a06bc 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -373,11 +373,10 @@ static void f2fs_write_end_io(struct bio *bio) #ifdef CONFIG_BLK_DEV_ZONED static void f2fs_zone_write_end_io(struct bio *bio) { - struct f2fs_bio_info *io = (struct f2fs_bio_info *)bio->bi_private; + struct f2fs_sb_info *sbi = iostat_get_bio_private(bio); - bio->bi_private = io->bi_private; - complete(>zone_wait); f2fs_write_end_io(bio); + up(>available_open_zones); } #endif @@ -531,6 +530,24 @@ static void __submit_merged_bio(struct f2fs_bio_info *io) if (!io->bio) return; +#ifdef CONFIG_BLK_DEV_ZONED + if (io->open_zone) { + /* +* if there is no open zone, it will wait for last IO in +* previous zone before submitting new IO. +*/ + down(>sbi->available_open_zones); + io->open_zone = false; + io->zone_openned = true; + } + + if (io->close_zone) { + io->bio->bi_end_io = f2fs_zone_write_end_io; + io->zone_openned = false; + io->close_zone = false; + } +#endif + if (is_read_io(fio->op)) { trace_f2fs_prepare_read_bio(io->sbi->sb, fio->type, io->bio); f2fs_submit_read_bio(io->sbi, io->bio, fio->type); @@ -601,9 +618,9 @@ int f2fs_init_write_merge_io(struct f2fs_sb_info *sbi) INIT_LIST_HEAD(>write_io[i][j].bio_list); init_f2fs_rwsem(>write_io[i][j].bio_list_lock); #ifdef CONFIG_BLK_DEV_ZONED - init_completion(>write_io[i][j].zone_wait); - sbi->write_io[i][j].zone_pending_bio = NULL; - sbi->write_io[i][j].bi_private = NULL; + sbi->write_io[i][j].open_zone = false; + sbi->write_io[i][j].zone_openned = false; + sbi->write_io[i][j].close_zone = false; #endif } } @@ -634,6 +651,31 @@ static void __f2fs_submit_merged_write(struct f2fs_sb_info *sbi, f2fs_up_write(>io_rwsem); } +void f2fs_blkzoned_submit_merged_write(struct f2fs_sb_info *sbi, int type) +{ +#ifdef CONFIG_BLK_DEV_ZONED + struct f2fs_bio_info *io; + + if (!f2fs_sb_has_blkzoned(sbi)) + return; + + io = sbi->write_io[PAGE_TYPE(type)] + type_to_temp(type); + + f2fs_down_write(>io_rwsem); + if (io->zone_openned) { + if (io->bio) { + io->close_zone = true; + __submit_merged_bio(io); + } else if (io->zone_openned) { + up(>available_open_zones); + io->zone_openned = false; + } + } + f2fs_up_write(>io_rwsem); +#endif + +} + static void __submit_merged_write_cond(struct f2fs_sb_info *sbi, struct inode *inode, struct page *page, nid_t ino, enum page_type type, bool force) @@ -918,22 +960,16 @@ int f2fs_merge_page_bio(struct f2fs_io_info *fio) } #ifdef CONFIG_BLK_DEV_ZONED -static bool is_end_zone_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr) +static bool is_blkaddr_zone_boundary(struct f2fs_sb_info *sbi, + block_t blkaddr, bool start) { - int devi = 0; + if (!f2fs_blkaddr_in_seqzone(sbi, blkaddr)) + return false; + + if (start) + return (blkaddr % sbi->blocks_per_blkz) == 0; + return (blkaddr % sbi->blocks_per_blkz == sbi->blocks_per_blkz - 1); - if (f2fs_is_multi_device(sbi)) { - devi = f2fs_target_device_index(sbi, blkaddr); - if (blkaddr < FDEV(devi).start_blk || - blkaddr > FDEV(devi).end_blk) { - f2fs_err(sbi, "Invalid block %x", bl
Re: [f2fs-dev] [PATCH] f2fs: Fix incorrect return value
On 2024/4/9 14:47, wangjianjian (C) via Linux-f2fs-devel wrote: On 2024/4/7 14:23, Chao Yu wrote: On 2024/4/4 21:47, Wang Jianjian wrote: dquot_mark_dquot_dirty returns old dirty state not the error code. I think it's fine to just pass return value of dquot_mark_dquot_dirty() to caller, because caller can distinguish status from return value: 1) < 0, there is an error, 2) >= 0, there is no error, previously it is dirty if it is 1. mark_all_dquot_dirty uses if return value is 0 to save error code. It may cause mess. I didn't get your point... No caller of mark_all_dquot_dirty() cares about its return value, so, I think there is no practical problem now. By the way, I am fine don't change it. Thanks, Signed-off-by: Wang Jianjian --- fs/f2fs/super.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index a6867f26f141..af07027475d9 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -3063,13 +3063,13 @@ static int f2fs_dquot_mark_dquot_dirty(struct dquot *dquot) { struct super_block *sb = dquot->dq_sb; struct f2fs_sb_info *sbi = F2FS_SB(sb); - int ret = dquot_mark_dquot_dirty(dquot); + dquot_mark_dquot_dirty(dquot); /* if we are using journalled quota */ if (is_journalled_quota(sbi)) set_sbi_flag(sbi, SBI_QUOTA_NEED_FLUSH); - return ret; + return 0; } static int f2fs_dquot_commit_info(struct super_block *sb, int type) ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 3/3] f2fs: fix false alarm on invalid block address
On 2024/4/10 4:34, Jaegeuk Kim wrote: f2fs_ra_meta_pages can try to read ahead on invalid block address which is not the corruption case. In which case we will read ahead invalid meta pages? recovery w/ META_POR? Thanks, Fixes: 31f85ccc84b8 ("f2fs: unify the error handling of f2fs_is_valid_blkaddr") Signed-off-by: Jaegeuk Kim --- fs/f2fs/checkpoint.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index eac698b8dd38..b01320502624 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -179,22 +179,22 @@ static bool __f2fs_is_valid_blkaddr(struct f2fs_sb_info *sbi, break; case META_SIT: if (unlikely(blkaddr >= SIT_BLK_CNT(sbi))) - goto err; + goto check_only; break; case META_SSA: if (unlikely(blkaddr >= MAIN_BLKADDR(sbi) || blkaddr < SM_I(sbi)->ssa_blkaddr)) - goto err; + goto check_only; break; case META_CP: if (unlikely(blkaddr >= SIT_I(sbi)->sit_base_addr || blkaddr < __start_cp_addr(sbi))) - goto err; + goto check_only; break; case META_POR: if (unlikely(blkaddr >= MAX_BLKADDR(sbi) || blkaddr < MAIN_BLKADDR(sbi))) - goto err; + goto check_only; break; case DATA_GENERIC: case DATA_GENERIC_ENHANCE: @@ -228,6 +228,7 @@ static bool __f2fs_is_valid_blkaddr(struct f2fs_sb_info *sbi, return true; err: f2fs_handle_error(sbi, ERROR_INVALID_BLKADDR); +check_only: return false; } ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 2/3] f2fs: clear writeback when compression failed
On 2024/4/10 4:34, Jaegeuk Kim wrote: Let's stop issuing compressed writes and clear their writeback flags. Signed-off-by: Jaegeuk Kim --- fs/f2fs/compress.c | 33 +++-- 1 file changed, 31 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c index d67c471ab5df..3a8ecc6aee84 100644 --- a/fs/f2fs/compress.c +++ b/fs/f2fs/compress.c @@ -1031,6 +1031,25 @@ static void set_cluster_writeback(struct compress_ctx *cc) } } +static void cancel_cluster_writeback(struct compress_ctx *cc, int submitted) +{ + int i; + + for (i = 0; i < cc->cluster_size; i++) { + if (!cc->rpages[i]) + continue; + if (i < submitted) { + if (i) + f2fs_wait_on_page_writeback(cc->rpages[i], + DATA, true, true); + inode_inc_dirty_pages(cc->inode); + lock_page(cc->rpages[i]); + } + clear_page_private_gcing(cc->rpages[i]); + end_page_writeback(cc->rpages[i]); + } +} + static void set_cluster_dirty(struct compress_ctx *cc) { int i; @@ -1232,7 +1251,6 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc, .page = NULL, .encrypted_page = NULL, .compressed_page = NULL, - .submitted = 0, .io_type = io_type, .io_wbc = wbc, .encrypted = fscrypt_inode_uses_fs_layer_crypto(cc->inode) ? @@ -1358,7 +1376,15 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc, fio.compressed_page = cc->cpages[i - 1]; cc->cpages[i - 1] = NULL; + fio.submitted = 0; f2fs_outplace_write_data(, ); + if (unlikely(!fio.submitted)) { + cancel_cluster_writeback(cc, i); + + /* To call fscrypt_finalize_bounce_page */ + i = cc->valid_nr_cpages; *submitted = 0; ? Thanks, + goto out_destroy_crypt; + } (*submitted)++; unlock_continue: inode_dec_dirty_pages(cc->inode); @@ -1392,8 +1418,11 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc, out_destroy_crypt: page_array_free(cc->inode, cic->rpages, cc->cluster_size); - for (--i; i >= 0; i--) + for (--i; i >= 0; i--) { + if (!cc->cpages[i]) + continue; fscrypt_finalize_bounce_page(>cpages[i]); + } out_put_cic: kmem_cache_free(cic_entry_slab, cic); out_put_dnode: ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 1/3] f2fs: use folio_test_writeback
On 2024/4/10 4:34, Jaegeuk Kim wrote: Let's convert PageWriteback to folio_test_writeback. Signed-off-by: Jaegeuk Kim Reviewed-by: Chao Yu Thanks, ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH v4] f2fs: zone: don't block IO if there is remained open zone
max open zone may be larger than log header number of f2fs, for such case, it doesn't need to wait last IO in previous zone, let's introduce available_open_zone semaphore, and reduce it once we submit first write IO in a zone, and increase it after completion of last IO in the zone. Cc: Daeho Jeong Signed-off-by: Chao Yu --- v4: - avoid unneeded condition in f2fs_blkzoned_submit_merged_write(). fs/f2fs/data.c| 105 ++ fs/f2fs/f2fs.h| 34 --- fs/f2fs/iostat.c | 7 fs/f2fs/iostat.h | 2 + fs/f2fs/segment.c | 43 --- fs/f2fs/segment.h | 12 +- fs/f2fs/super.c | 2 + 7 files changed, 156 insertions(+), 49 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 60056b9a51be..71472ab6b7e7 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -373,11 +373,10 @@ static void f2fs_write_end_io(struct bio *bio) #ifdef CONFIG_BLK_DEV_ZONED static void f2fs_zone_write_end_io(struct bio *bio) { - struct f2fs_bio_info *io = (struct f2fs_bio_info *)bio->bi_private; + struct f2fs_sb_info *sbi = iostat_get_bio_private(bio); - bio->bi_private = io->bi_private; - complete(>zone_wait); f2fs_write_end_io(bio); + up(>available_open_zones); } #endif @@ -531,6 +530,24 @@ static void __submit_merged_bio(struct f2fs_bio_info *io) if (!io->bio) return; +#ifdef CONFIG_BLK_DEV_ZONED + if (io->open_zone) { + /* +* if there is no open zone, it will wait for last IO in +* previous zone before submitting new IO. +*/ + down(>sbi->available_open_zones); + io->open_zone = false; + io->zone_openned = true; + } + + if (io->close_zone) { + io->bio->bi_end_io = f2fs_zone_write_end_io; + io->zone_openned = false; + io->close_zone = false; + } +#endif + if (is_read_io(fio->op)) { trace_f2fs_prepare_read_bio(io->sbi->sb, fio->type, io->bio); f2fs_submit_read_bio(io->sbi, io->bio, fio->type); @@ -601,9 +618,9 @@ int f2fs_init_write_merge_io(struct f2fs_sb_info *sbi) INIT_LIST_HEAD(>write_io[i][j].bio_list); init_f2fs_rwsem(>write_io[i][j].bio_list_lock); #ifdef CONFIG_BLK_DEV_ZONED - init_completion(>write_io[i][j].zone_wait); - sbi->write_io[i][j].zone_pending_bio = NULL; - sbi->write_io[i][j].bi_private = NULL; + sbi->write_io[i][j].open_zone = false; + sbi->write_io[i][j].zone_openned = false; + sbi->write_io[i][j].close_zone = false; #endif } } @@ -634,6 +651,31 @@ static void __f2fs_submit_merged_write(struct f2fs_sb_info *sbi, f2fs_up_write(>io_rwsem); } +void f2fs_blkzoned_submit_merged_write(struct f2fs_sb_info *sbi, int type) +{ +#ifdef CONFIG_BLK_DEV_ZONED + struct f2fs_bio_info *io; + + if (!f2fs_sb_has_blkzoned(sbi)) + return; + + io = sbi->write_io[PAGE_TYPE(type)] + type_to_temp(type); + + f2fs_down_write(>io_rwsem); + if (io->zone_openned) { + if (io->bio) { + io->close_zone = true; + __submit_merged_bio(io); + } else { + up(>available_open_zones); + io->zone_openned = false; + } + } + f2fs_up_write(>io_rwsem); +#endif + +} + static void __submit_merged_write_cond(struct f2fs_sb_info *sbi, struct inode *inode, struct page *page, nid_t ino, enum page_type type, bool force) @@ -918,22 +960,16 @@ int f2fs_merge_page_bio(struct f2fs_io_info *fio) } #ifdef CONFIG_BLK_DEV_ZONED -static bool is_end_zone_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr) +static bool is_blkaddr_zone_boundary(struct f2fs_sb_info *sbi, + block_t blkaddr, bool start) { - int devi = 0; + if (!f2fs_blkaddr_in_seqzone(sbi, blkaddr)) + return false; + + if (start) + return (blkaddr % sbi->blocks_per_blkz) == 0; + return (blkaddr % sbi->blocks_per_blkz == sbi->blocks_per_blkz - 1); - if (f2fs_is_multi_device(sbi)) { - devi = f2fs_target_device_index(sbi, blkaddr); - if (blkaddr < FDEV(devi).start_blk || - blkaddr > FDEV(devi).end_blk) { - f2fs_err(sbi, "Invalid block %x", blkaddr); - return false; - } - blkaddr -= FDEV(devi).start_blk; - } - ret
[f2fs-dev] [PATCH v2 2/2] f2fs: introduce written_map to indicate written datas
Currently, __exchange_data_block() will check checkpointed state of data, if it is not checkpointed, it will try to exchange blkaddrs directly in dnode. However, after commit 899fee36fac0 ("f2fs: fix to avoid data corruption by forbidding SSR overwrite"), in order to disallow SSR allocator to reuse all written data/node type blocks, all written blocks were set as checkpointed. In order to reenable metadata exchange functionality, let's introduce written_map to indicate all written blocks including checkpointed one, or newly written and invalidated one, and use it for SSR allocation, and then ckpt_valid_bitmap can indicate real checkpointed status, and we can use it correctly in __exchange_data_block(). [testcase] xfs_io -f /mnt/f2fs/src -c "pwrite 0 2m" xfs_io -f /mnt/f2fs/dst -c "pwrite 0 2m" xfs_io /mnt/f2fs/src -c "fiemap -v" xfs_io /mnt/f2fs/dst -c "fiemap -v" f2fs_io move_range /mnt/f2fs/src /mnt/f2fs/dst 0 0 2097152 xfs_io /mnt/f2fs/src -c "fiemap -v" xfs_io /mnt/f2fs/dst -c "fiemap -v" [before] /mnt/f2fs/src: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..4095]: 8445952..8450047 4096 0x1001 /mnt/f2fs/dst: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..4095]: 143360..1474554096 0x1001 /mnt/f2fs/src: /mnt/f2fs/dst: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..4095]: 4284416..4288511 4096 0x1001 [after] /mnt/f2fs/src: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..4095]: 147456..1515514096 0x1001 /mnt/f2fs/dst: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..4095]: 151552..1556474096 0x1001 /mnt/f2fs/src: /mnt/f2fs/dst: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..4095]: 147456..1515514096 0x1001 Signed-off-by: Chao Yu --- v2: - introduce written_blocks in struct seg_entry for ssr allocator. fs/f2fs/gc.c | 2 +- fs/f2fs/segment.c | 22 -- fs/f2fs/segment.h | 19 ++- 3 files changed, 27 insertions(+), 16 deletions(-) diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 8852814dab7f..ea7b5ca6f09b 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -366,7 +366,7 @@ static inline unsigned int get_gc_cost(struct f2fs_sb_info *sbi, unsigned int segno, struct victim_sel_policy *p) { if (p->alloc_mode == SSR) - return get_seg_entry(sbi, segno)->ckpt_valid_blocks; + return get_seg_entry(sbi, segno)->written_blocks; /* alloc_mode == LFS */ if (p->gc_mode == GC_GREEDY) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index af716925db19..0d110908e383 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -2456,12 +2456,13 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del) sbi->discard_blks--; /* -* SSR should never reuse block which is checkpointed -* or newly invalidated. +* if CP disabling is enable, it allows SSR to reuse newly +* invalidated block, otherwise forbidding it to pretect fsyned +* datas. */ if (!is_sbi_flag_set(sbi, SBI_CP_DISABLED)) { - if (!f2fs_test_and_set_bit(offset, se->ckpt_valid_map)) - se->ckpt_valid_blocks++; + if (!f2fs_test_and_set_bit(offset, se->written_map)) + se->written_blocks++; } } else { exist = f2fs_test_and_clear_bit(offset, se->cur_valid_map); @@ -2498,8 +2499,6 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del) f2fs_test_and_clear_bit(offset, se->discard_map)) sbi->discard_blks++; } - if (!f2fs_test_bit(offset, se->ckpt_valid_map)) - se->ckpt_valid_blocks += del; __mark_sit_entry_dirty(sbi, segno); @@ -2847,11 +2846,11 @@ static void __get_segment_bitmap(struct f2fs_sb_info *sbi, struct seg_entry *se = get_seg_entry(sbi, segno); int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long); unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map; - unsigned long *cur_map = (unsigned long *)se->cur_valid_map; + unsigned long *written_map = (unsigned long *)se->written_map; int i; for (i = 0; i < entries; i++) - target_map[i] = ckpt_map[i] | cur_map[i]; + target_map[i] = ckpt_map[i] | written_map[i]; } static int __next_free_blkoff(struct f2fs_sb_info *sbi, unsigned long *bitmap, @@ -4512,9 +4511,9 @@ static int build_sit_info(struct f2fs_sb_info *sbi) return -ENOMEM; #ifdef CONFIG_F2FS_CHECK_FS - bitmap_size = MAIN_SEGS(s
[f2fs-dev] [PATCH v2 1/2] f2fs: use per-log target_bitmap to improve lookup performace of ssr allocation
After commit 899fee36fac0 ("f2fs: fix to avoid data corruption by forbidding SSR overwrite"), valid block bitmap of current openned segment is fixed, let's introduce a per-log bitmap instead of temp bitmap to avoid unnecessary calculation overhead whenever allocating free slot w/ SSR allocator. Signed-off-by: Chao Yu --- v2: - rebase to last dev-test branch. fs/f2fs/segment.c | 30 ++ fs/f2fs/segment.h | 1 + 2 files changed, 23 insertions(+), 8 deletions(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 6474b7338e81..af716925db19 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -2840,31 +2840,39 @@ static int new_curseg(struct f2fs_sb_info *sbi, int type, bool new_sec) return 0; } -static int __next_free_blkoff(struct f2fs_sb_info *sbi, - int segno, block_t start) +static void __get_segment_bitmap(struct f2fs_sb_info *sbi, + unsigned long *target_map, + int segno) { struct seg_entry *se = get_seg_entry(sbi, segno); int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long); - unsigned long *target_map = SIT_I(sbi)->tmp_map; unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map; unsigned long *cur_map = (unsigned long *)se->cur_valid_map; int i; for (i = 0; i < entries; i++) target_map[i] = ckpt_map[i] | cur_map[i]; +} + +static int __next_free_blkoff(struct f2fs_sb_info *sbi, unsigned long *bitmap, + int segno, block_t start) +{ + __get_segment_bitmap(sbi, bitmap, segno); - return __find_rev_next_zero_bit(target_map, BLKS_PER_SEG(sbi), start); + return __find_rev_next_zero_bit(bitmap, BLKS_PER_SEG(sbi), start); } static int f2fs_find_next_ssr_block(struct f2fs_sb_info *sbi, - struct curseg_info *seg) + struct curseg_info *seg) { - return __next_free_blkoff(sbi, seg->segno, seg->next_blkoff + 1); + return __find_rev_next_zero_bit(seg->target_map, + BLKS_PER_SEG(sbi), seg->next_blkoff + 1); } bool f2fs_segment_has_free_slot(struct f2fs_sb_info *sbi, int segno) { - return __next_free_blkoff(sbi, segno, 0) < BLKS_PER_SEG(sbi); + return __next_free_blkoff(sbi, SIT_I(sbi)->tmp_map, segno, 0) < + BLKS_PER_SEG(sbi); } /* @@ -2890,7 +2898,8 @@ static int change_curseg(struct f2fs_sb_info *sbi, int type) reset_curseg(sbi, type, 1); curseg->alloc_type = SSR; - curseg->next_blkoff = __next_free_blkoff(sbi, curseg->segno, 0); + curseg->next_blkoff = __next_free_blkoff(sbi, curseg->target_map, + curseg->segno, 0); sum_page = f2fs_get_sum_page(sbi, new_segno); if (IS_ERR(sum_page)) { @@ -4635,6 +4644,10 @@ static int build_curseg(struct f2fs_sb_info *sbi) sizeof(struct f2fs_journal), GFP_KERNEL); if (!array[i].journal) return -ENOMEM; + array[i].target_map = f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, + GFP_KERNEL); + if (!array[i].target_map) + return -ENOMEM; if (i < NR_PERSISTENT_LOG) array[i].seg_type = CURSEG_HOT_DATA + i; else if (i == CURSEG_COLD_DATA_PINNED) @@ -5453,6 +5466,7 @@ static void destroy_curseg(struct f2fs_sb_info *sbi) for (i = 0; i < NR_CURSEG_TYPE; i++) { kfree(array[i].sum_blk); kfree(array[i].journal); + kfree(array[i].target_map); } kfree(array); } diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index e1c0f418aa11..10f3e44f036f 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -292,6 +292,7 @@ struct curseg_info { struct f2fs_summary_block *sum_blk; /* cached summary block */ struct rw_semaphore journal_rwsem; /* protect journal area */ struct f2fs_journal *journal; /* cached journal info */ + unsigned long *target_map; /* bitmap for SSR allocator */ unsigned char alloc_type; /* current allocation type */ unsigned short seg_type;/* segment type like CURSEG_XXX_TYPE */ unsigned int segno; /* current segment number */ -- 2.40.1 ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] f2fs_io: support unset subcommand for pinfile
Ping, Missed to check this patch? On 2024/3/29 18:25, Chao Yu wrote: This patch adds unset subcommand for pinfile command. Usage: f2fs_io pinfile unset [target_file] Signed-off-by: Chao Yu --- man/f2fs_io.8 | 2 +- tools/f2fs_io/f2fs_io.c | 11 +-- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/man/f2fs_io.8 b/man/f2fs_io.8 index f097bde..b9c9dc8 100644 --- a/man/f2fs_io.8 +++ b/man/f2fs_io.8 @@ -44,7 +44,7 @@ going down with metadata flush going down with fsck mark .RE .TP -\fBpinfile\fR \fI[get|set] [file]\fR +\fBpinfile\fR \fI[get|set|unset] [file]\fR Get or set the pinning status on a file. .TP \fBfadvise\fR \fI[advice] [offset] [length] [file]\fR diff --git a/tools/f2fs_io/f2fs_io.c b/tools/f2fs_io/f2fs_io.c index b8e4f02..a7b593a 100644 --- a/tools/f2fs_io/f2fs_io.c +++ b/tools/f2fs_io/f2fs_io.c @@ -442,7 +442,7 @@ static void do_fadvise(int argc, char **argv, const struct cmd_desc *cmd) #define pinfile_desc "pin file control" #define pinfile_help \ -"f2fs_io pinfile [get|set] [file]\n\n" \ +"f2fs_io pinfile [get|set|unset] [file]\n\n" \ "get/set pinning given the file\n" \ static void do_pinfile(int argc, char **argv, const struct cmd_desc *cmd) @@ -464,7 +464,14 @@ static void do_pinfile(int argc, char **argv, const struct cmd_desc *cmd) ret = ioctl(fd, F2FS_IOC_SET_PIN_FILE, ); if (ret != 0) die_errno("F2FS_IOC_SET_PIN_FILE failed"); - printf("set_pin_file: %u blocks moved in %s\n", ret, argv[2]); + printf("%s pinfile: %u blocks moved in %s\n", + argv[1], ret, argv[2]); + } else if (!strcmp(argv[1], "unset")) { + pin = 0; + ret = ioctl(fd, F2FS_IOC_SET_PIN_FILE, ); + if (ret != 0) + die_errno("F2FS_IOC_SET_PIN_FILE failed"); + printf("%s pinfile in %s\n", argv[1], argv[2]); } else if (!strcmp(argv[1], "get")) { unsigned int flags; ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] f2fs: write missing last sum blk of file pinning section
On 2024/4/10 7:34, Daeho Jeong wrote: From: Daeho Jeong While do not allocating a new section in advance for file pinning area, I missed that we should write the sum block for the last segment of a file pinning section. Fixes: 9703d69d9d15 ("f2fs: support file pinning for zoned devices") Signed-off-by: Daeho Jeong Reviewed-by: Chao Yu Thanks, ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] f2fs: don't set RO when shutting down f2fs
On 2024/4/10 0:21, Jaegeuk Kim wrote: On 04/09, Chao Yu wrote: On 2024/4/5 3:52, Jaegeuk Kim wrote: Shutdown does not check the error of thaw_super due to readonly, which causes a deadlock like below. f2fs_ioc_shutdown(F2FS_GOING_DOWN_FULLSYNC)issue_discard_thread - bdev_freeze - freeze_super - f2fs_stop_checkpoint() - f2fs_handle_critical_error - sb_start_write - set RO - waiting - bdev_thaw - thaw_super_locked - return -EINVAL, if sb_rdonly() - f2fs_stop_discard_thread -> wait for kthread_stop(discard_thread); Reported-by: "Light Hsieh (謝明燈)" Signed-off-by: Jaegeuk Kim --- fs/f2fs/super.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index df9765b41dac..ba6288e870c5 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -4135,9 +4135,16 @@ void f2fs_handle_critical_error(struct f2fs_sb_info *sbi, unsigned char reason, if (shutdown) set_sbi_flag(sbi, SBI_IS_SHUTDOWN); - /* continue filesystem operators if errors=continue */ - if (continue_fs || f2fs_readonly(sb)) + /* +* Continue filesystem operators if errors=continue. Should not set +* RO by shutdown, since RO bypasses thaw_super which can hang the +* system. +*/ + if (continue_fs || f2fs_readonly(sb) || + reason == STOP_CP_REASON_SHUTDOWN) { + f2fs_warn(sbi, "Stopped filesystem due to readon: %d", reason); return; Do we need to set RO after bdev_thaw() in f2fs_do_shutdown()? IIRC, shutdown doesn't need to set RO as we stopped the checkpoint. I'm more concerned on any side effect caused by this RO change. Okay, I just wonder whether we need to follow semantics of errors=remount-ro semantics, but it looks fine since shutdown operation simulated by ioctl could not be treated as an inner critical error, errors=%sSpecify f2fs behavior on critical errors. This supports modes: "panic", "continue" and "remount-ro", respectively, trigger panic immediately, continue without doing anything, and remount the partition in read-only mode. By default it uses "continue" mode. Also, it keeps the behavior consistent w/ what we do for errors=panic case. if (F2FS_OPTION(sbi).errors == MOUNT_ERRORS_PANIC && !shutdown && !system_going_down() && ^ !is_sbi_flag_set(sbi, SBI_IS_SHUTDOWN)) panic("F2FS-fs (device %s): panic forced after error\n", sb->s_id); Thanks, Thanks, + } f2fs_warn(sbi, "Remounting filesystem read-only"); /* ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH v2] f2fs: don't set RO when shutting down f2fs
On 2024/4/10 0:20, Jaegeuk Kim wrote: Shutdown does not check the error of thaw_super due to readonly, which causes a deadlock like below. f2fs_ioc_shutdown(F2FS_GOING_DOWN_FULLSYNC)issue_discard_thread - bdev_freeze - freeze_super - f2fs_stop_checkpoint() - f2fs_handle_critical_error - sb_start_write - set RO - waiting - bdev_thaw - thaw_super_locked - return -EINVAL, if sb_rdonly() - f2fs_stop_discard_thread -> wait for kthread_stop(discard_thread); Reported-by: "Light Hsieh (謝明燈)" Signed-off-by: Jaegeuk Kim Reviewed-by: Chao Yu Thanks, ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] f2fs-tools: give 6 sections for overprovision buffer
On 2024/4/3 7:54, Jaegeuk Kim wrote: This addresses high GC cost at runtime. Signed-off-by: Jaegeuk Kim Reviewed-by: Chao Yu Thanks, ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] f2fs-tools: print extension list properly
On 2024/4/8 21:11, Sheng Yong wrote: The "hot file extensions" list does not print properly. **Before** extension_count [0x 23 : 35] cold file extentsions [mp wm og jp ] [avi m4v m4p mkv ] [mov webmwav m4a ] [3gp opusflacgif ] [png svg webpjar ] [deb iso gz xz ] [zst pdf pyc ttc ] [ttf exe apk cnt ] [exo odexvdex] hot_ext_count [0x 1 : 1] hot file extentsions db ] cp_payload [0x 0 : 0] **After** extension_count [0x 23 : 35] cold file extentsions [mp wm og jp ] [avi m4v m4p mkv ] [mov webmwav m4a ] [3gp opusflacgif ] [png svg webpjar ] [deb iso gz xz ] [zst pdf pyc ttc ] [ttf exe apk cnt ] [exo odexvdex] hot_ext_count [0x 1 : 1] hot file extentsions [db ] cp_payload [0x 0 : 0] Signed-off-by: Sheng Yong Reviewed-by: Chao Yu Thanks, ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel