Re: [PATCH] f2fs: fix to cover allocate_segment() with lock
On 2021/4/20 0:57, Jaegeuk Kim wrote: On 04/14, Chao Yu wrote: As we did for other cases, in fix_curseg_write_pointer(), let's change as below: - use callback function s_ops->allocate_segment() instead of raw function allocate_segment_by_default(); - cover allocate_segment() with curseg_lock and sentry_lock. Signed-off-by: Chao Yu --- fs/f2fs/segment.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index b2ee6b7791b0..daf9531ec58f 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -4848,7 +4848,12 @@ static int fix_curseg_write_pointer(struct f2fs_sb_info *sbi, int type) f2fs_notice(sbi, "Assign new section to curseg[%d]: " "curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff); - allocate_segment_by_default(sbi, type, true); + + down_read(&SM_I(sbi)->curseg_lock); + down_write(&SIT_I(sbi)->sentry_lock); + SIT_I(sbi)->s_ops->allocate_segment(sbi, type, true); + up_write(&SIT_I(sbi)->sentry_lock); + up_read(&SM_I(sbi)->curseg_lock); Seems f2fs_allocate_new_section()? f2fs_allocate_new_section() will allocate new section only when current section has been initialized and has valid block/ckpt_block. It looks fix_curseg_write_pointer() wants to force migrating current segment to new section whenever write pointer and curseg->next_blkoff is inconsistent. So how about adding a parameter to force f2fs_allocate_new_section() to allocate new section? Thanks, /* check consistency of the zone curseg pointed to */ if (check_zone_write_pointer(sbi, zbd, &zone)) -- 2.29.2 .
Re: [f2fs-dev] [PATCH] fs: f2fs: Remove unnecessary struct declaration
On 2021/4/19 10:20, Wan Jiabing wrote: struct dnode_of_data is defined at 897th line. The declaration here is unnecessary. Remove it. Signed-off-by: Wan Jiabing Reviewed-by: Chao Yu Thanks,
[RFC PATCH] f2fs: reduce expensive checkpoint trigger frequency
We may trigger high frequent checkpoint for below case: 1. mkdir /mnt/dir1; set dir1 encrypted 2. touch /mnt/file1; fsync /mnt/file1 3. mkdir /mnt/dir2; set dir2 encrypted 4. touch /mnt/file2; fsync /mnt/file2 ... Although, newly created dir and file are not related, due to commit bbf156f7afa7 ("f2fs: fix lost xattrs of directories"), we will trigger checkpoint whenever fsync() comes after a new encrypted dir created. In order to avoid such condition, let's record an entry including directory's ino into global cache when we initialize encryption policy in a checkpointed directory, and then only trigger checkpoint() when target file's parent has non-persisted encryption policy, for the case its parent is not checkpointed, need_do_checkpoint() has cover that by verifying it with f2fs_is_checkpointed_node(). Signed-off-by: Chao Yu --- fs/f2fs/f2fs.h | 2 ++ fs/f2fs/file.c | 3 +++ fs/f2fs/xattr.c | 6 -- include/trace/events/f2fs.h | 3 ++- 4 files changed, 11 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 87d734f5589d..34487e527d12 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -246,6 +246,7 @@ enum { APPEND_INO, /* for append ino list */ UPDATE_INO, /* for update ino list */ TRANS_DIR_INO, /* for trasactions dir ino list */ + ENC_DIR_INO,/* for encrypted dir ino list */ FLUSH_INO, /* for multiple device flushing */ MAX_INO_ENTRY, /* max. list */ }; @@ -1090,6 +1091,7 @@ enum cp_reason_type { CP_FASTBOOT_MODE, CP_SPEC_LOG_NUM, CP_RECOVER_DIR, + CP_ENC_DIR, }; enum iostat_type { diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 6284b2f4a60b..a6c38d8b1ec3 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -218,6 +218,9 @@ static inline enum cp_reason_type need_do_checkpoint(struct inode *inode) f2fs_exist_written_data(sbi, F2FS_I(inode)->i_pino, TRANS_DIR_INO)) cp_reason = CP_RECOVER_DIR; + else if (f2fs_exist_written_data(sbi, F2FS_I(inode)->i_pino, + ENC_DIR_INO)) + cp_reason = CP_ENC_DIR; return cp_reason; } diff --git a/fs/f2fs/xattr.c b/fs/f2fs/xattr.c index c8f34decbf8e..38796d488d15 100644 --- a/fs/f2fs/xattr.c +++ b/fs/f2fs/xattr.c @@ -630,6 +630,7 @@ static int __f2fs_setxattr(struct inode *inode, int index, const char *name, const void *value, size_t size, struct page *ipage, int flags) { + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct f2fs_xattr_entry *here, *last; void *base_addr, *last_base_addr; int found, newsize; @@ -745,8 +746,9 @@ static int __f2fs_setxattr(struct inode *inode, int index, !strcmp(name, F2FS_XATTR_NAME_ENCRYPTION_CONTEXT)) f2fs_set_encrypted_inode(inode); f2fs_mark_inode_dirty_sync(inode, true); - if (!error && S_ISDIR(inode->i_mode)) - set_sbi_flag(F2FS_I_SB(inode), SBI_NEED_CP); + if (!error && S_ISDIR(inode->i_mode) && + f2fs_is_checkpointed_node(sbi, inode->i_ino)) + f2fs_add_ino_entry(sbi, inode->i_ino, ENC_DIR_INO); same: if (is_inode_flag_set(inode, FI_ACL_MODE)) { diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h index 56b113e3cd6a..ca0cf12226e9 100644 --- a/include/trace/events/f2fs.h +++ b/include/trace/events/f2fs.h @@ -145,7 +145,8 @@ TRACE_DEFINE_ENUM(CP_RESIZE); { CP_NODE_NEED_CP, "node needs cp" }, \ { CP_FASTBOOT_MODE, "fastboot mode" }, \ { CP_SPEC_LOG_NUM, "log type is 2" }, \ - { CP_RECOVER_DIR, "dir needs recovery" }) + { CP_RECOVER_DIR, "dir needs recovery" }, \ + { CP_ENC_DIR, "persist encryption policy" }) #define show_shutdown_mode(type) \ __print_symbolic(type, \ -- 2.29.2
Re: [PATCH] direct-io: use read lock for DIO_LOCKING flag
On 2021/4/16 8:43, Al Viro wrote: On Thu, Apr 15, 2021 at 12:24:13PM +0200, Jan Kara wrote: On Thu 15-04-21 17:43:32, Chao Yu wrote: 9902af79c01a ("parallel lookups: actual switch to rwsem") changes inode lock from mutex to rwsem, however, we forgot to adjust lock for DIO_LOCKING flag in do_blockdev_direct_IO(), The change in question had nothing to do with the use of ->i_mutex for regular files data access. so let's change to hold read lock to mitigate performance regression in the case of read DIO vs read DIO, meanwhile it still keeps original functionality of avoiding buffered access vs direct access. Signed-off-by: Chao Yu Thanks for the patch but this is not safe. Originally we had exclusive lock (with i_mutex), switching to rwsem doesn't change that requirement. It may be OK for some filesystems to actually use shared acquisition of rwsem for DIO reads but it is not clear that is fine for all filesystems (and I suspect those filesystems that actually do care already don't use DIO_LOCKING flag or were already converted to iomap_dio_rw()). So unless you do audit of all filesystems using do_blockdev_direct_IO() with DIO_LOCKING flag and make sure they are all fine with inode lock in shared mode, this is a no-go. Aye. Frankly, I would expect that anyone bothering with that kind of analysis for given filesystem (and there are fairly unpleasant ones in the list) would just use the fruits of those efforts to convert it over to iomap. Actually, I was misguided by DIO_LOCKING comments more or less, it looks it was introduced to avoid race case only in between buffered IO and DIO. /* need locking between buffered and direct access */ DIO_LOCKING = 0x01, I don't think it is easy for me to analyse usage scenario/restriction of all DIO_LOCKING users, and get their developers' acks for this change. Converting fs to use iomap_dio_rw looks a better option for me, thanks, Jan and Al. :) Thanks, "Read DIO" does not mean that accesses to private in-core data structures used by given filesystem can be safely done in parallel. So blanket patch like that is not safe at all. .
[PATCH] direct-io: use read lock for DIO_LOCKING flag
9902af79c01a ("parallel lookups: actual switch to rwsem") changes inode lock from mutex to rwsem, however, we forgot to adjust lock for DIO_LOCKING flag in do_blockdev_direct_IO(), so let's change to hold read lock to mitigate performance regression in the case of read DIO vs read DIO, meanwhile it still keeps original functionality of avoiding buffered access vs direct access. Signed-off-by: Chao Yu --- fs/direct-io.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/direct-io.c b/fs/direct-io.c index b2e86e739d7a..93ff912f2749 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -1166,7 +1166,7 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, dio->flags = flags; if (dio->flags & DIO_LOCKING && iov_iter_rw(iter) == READ) { /* will be released by direct_io_worker */ - inode_lock(inode); + inode_lock_shared(inode); } /* Once we sampled i_size check for reads beyond EOF */ @@ -1316,7 +1316,7 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, * of protecting us from looking up uninitialized blocks. */ if (iov_iter_rw(iter) == READ && (dio->flags & DIO_LOCKING)) - inode_unlock(dio->inode); + inode_unlock_shared(dio->inode); /* * The only time we want to leave bios in flight is when a successful @@ -1341,7 +1341,7 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, fail_dio: if (dio->flags & DIO_LOCKING && iov_iter_rw(iter) == READ) - inode_unlock(inode); + inode_unlock_shared(inode); kmem_cache_free(dio_cache, dio); return retval; -- 2.29.2
[PATCH] f2fs: fix to cover allocate_segment() with lock
As we did for other cases, in fix_curseg_write_pointer(), let's change as below: - use callback function s_ops->allocate_segment() instead of raw function allocate_segment_by_default(); - cover allocate_segment() with curseg_lock and sentry_lock. Signed-off-by: Chao Yu --- fs/f2fs/segment.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index b2ee6b7791b0..daf9531ec58f 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -4848,7 +4848,12 @@ static int fix_curseg_write_pointer(struct f2fs_sb_info *sbi, int type) f2fs_notice(sbi, "Assign new section to curseg[%d]: " "curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff); - allocate_segment_by_default(sbi, type, true); + + down_read(&SM_I(sbi)->curseg_lock); + down_write(&SIT_I(sbi)->sentry_lock); + SIT_I(sbi)->s_ops->allocate_segment(sbi, type, true); + up_write(&SIT_I(sbi)->sentry_lock); + up_read(&SM_I(sbi)->curseg_lock); /* check consistency of the zone curseg pointed to */ if (check_zone_write_pointer(sbi, zbd, &zone)) -- 2.29.2
[PATCH] f2fs: document: add description about compressed space handling
User or developer may still be confused about why f2fs doesn't expose compressed space to userspace, add description about compressed space handling policy into f2fs documentation. Signed-off-by: Chao Yu --- Documentation/filesystems/f2fs.rst | 8 1 file changed, 8 insertions(+) diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst index 63c0c49b726d..992bf91eeec8 100644 --- a/Documentation/filesystems/f2fs.rst +++ b/Documentation/filesystems/f2fs.rst @@ -819,6 +819,14 @@ Compression implementation * chattr +c file * chattr +c dir; touch dir/file * mount w/ -o compress_extension=ext; touch file.ext + * mount w/ -o compress_extension=*; touch any_file + +- At this point, compression feature doesn't expose compressed space to user + directly in order to guarantee potential data updates later to the space. + Instead, the main goal is to reduce data writes to flash disk as much as + possible, resulting in extending disk life time as well as relaxing IO + congestion. Alternatively, we've added ioctl interface to reclaim compressed + space and show it to user after putting the immutable bit. Compress metadata layout:: -- 2.29.2
Re: [f2fs-dev] [PATCH v2] f2fs: fix to avoid touching checkpointed data in get_victim()
On 2021/4/13 10:59, Jaegeuk Kim wrote: On 04/11, Chao Yu wrote: Hi Jaegeuk, Could you please help to merge below cleanup diff into original patch? or merge this separately if it is too late since it is near rc7. I didn't review this tho, this gives an error in xfstests/083. From 59c2bd34fb0c77bcede2af7e5d308c014c544a1e Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Sun, 11 Apr 2021 14:29:34 +0800 Subject: [PATCH] f2fs: avoid duplicated codes for cleanup f2fs_segment_has_free_slot() was copied and modified from __next_free_blkoff(), they are almost the same, clean up to reuse common code as much as possible. Signed-off-by: Chao Yu --- - fix to assign .next_blkoff in change_curseg() fs/f2fs/segment.c | 32 ++-- 1 file changed, 10 insertions(+), 22 deletions(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index d6c6c13feb43..6e740ecf0814 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -2638,22 +2638,20 @@ static void new_curseg(struct f2fs_sb_info *sbi, int type, bool new_sec) curseg->alloc_type = LFS; } -static void __next_free_blkoff(struct f2fs_sb_info *sbi, - struct curseg_info *seg, block_t start) +static int __next_free_blkoff(struct f2fs_sb_info *sbi, + int segno, block_t start) { - struct seg_entry *se = get_seg_entry(sbi, seg->segno); + struct seg_entry *se = get_seg_entry(sbi, segno); int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long); unsigned long *target_map = SIT_I(sbi)->tmp_map; unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map; unsigned long *cur_map = (unsigned long *)se->cur_valid_map; - int i, pos; + int i; for (i = 0; i < entries; i++) target_map[i] = ckpt_map[i] | cur_map[i]; - pos = __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, start); - - seg->next_blkoff = pos; + return __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, start); } /* @@ -2665,26 +2663,16 @@ static void __refresh_next_blkoff(struct f2fs_sb_info *sbi, struct curseg_info *seg) { if (seg->alloc_type == SSR) - __next_free_blkoff(sbi, seg, seg->next_blkoff + 1); + seg->next_blkoff = + __next_free_blkoff(sbi, seg->segno, + seg->next_blkoff + 1); else seg->next_blkoff++; } bool f2fs_segment_has_free_slot(struct f2fs_sb_info *sbi, int segno) { - struct seg_entry *se = get_seg_entry(sbi, segno); - int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long); - unsigned long *target_map = SIT_I(sbi)->tmp_map; - unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map; - unsigned long *cur_map = (unsigned long *)se->cur_valid_map; - int i, pos; - - for (i = 0; i < entries; i++) - target_map[i] = ckpt_map[i] | cur_map[i]; - - pos = __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, 0); - - return pos < sbi->blocks_per_seg; + return __next_free_blkoff(sbi, segno, 0) < sbi->blocks_per_seg; } /* @@ -2712,7 +2700,7 @@ static void change_curseg(struct f2fs_sb_info *sbi, int type, bool flush) reset_curseg(sbi, type, 1); curseg->alloc_type = SSR; - __next_free_blkoff(sbi, curseg, 0); + curseg->next_blkoff = __next_free_blkoff(sbi, curseg->segno, 0); sum_page = f2fs_get_sum_page(sbi, new_segno); if (IS_ERR(sum_page)) { -- 2.29.2
[PATCH] f2fs: fix to avoid NULL pointer dereference
From: Yi Chen Unable to handle kernel NULL pointer dereference at virtual address pc : f2fs_put_page+0x1c/0x26c lr : __revoke_inmem_pages+0x544/0x75c f2fs_put_page+0x1c/0x26c __revoke_inmem_pages+0x544/0x75c __f2fs_commit_inmem_pages+0x364/0x3c0 f2fs_commit_inmem_pages+0xc8/0x1a0 f2fs_ioc_commit_atomic_write+0xa4/0x15c f2fs_ioctl+0x5b0/0x1574 file_ioctl+0x154/0x320 do_vfs_ioctl+0x164/0x740 __arm64_sys_ioctl+0x78/0xa4 el0_svc_common+0xbc/0x1d0 el0_svc_handler+0x74/0x98 el0_svc+0x8/0xc In f2fs_put_page, we access page->mapping is NULL. The root cause is: In some cases, the page refcount and ATOMIC_WRITTEN_PAGE flag miss set for page-priavte flag has been set. We add f2fs_bug_on like this: f2fs_register_inmem_page() { ... f2fs_set_page_private(page, ATOMIC_WRITTEN_PAGE); f2fs_bug_on(F2FS_I_SB(inode), !IS_ATOMIC_WRITTEN_PAGE(page)); ... } The bug on stack follow link this: PC is at f2fs_register_inmem_page+0x238/0x2b4 LR is at f2fs_register_inmem_page+0x2a8/0x2b4 f2fs_register_inmem_page+0x238/0x2b4 f2fs_set_data_page_dirty+0x104/0x164 set_page_dirty+0x78/0xc8 f2fs_write_end+0x1b4/0x444 generic_perform_write+0x144/0x1cc __generic_file_write_iter+0xc4/0x174 f2fs_file_write_iter+0x2c0/0x350 __vfs_write+0x104/0x134 vfs_write+0xe8/0x19c SyS_pwrite64+0x78/0xb8 To fix this issue, let's add page refcount add page-priavte flag. The page-private flag is not cleared and needs further analysis. Signed-off-by: Chao Yu Signed-off-by: Ge Qiu Signed-off-by: Dehe Gu Signed-off-by: Yi Chen --- fs/f2fs/segment.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 0cb1ca88d4aa..d6c6c13feb43 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -186,7 +186,10 @@ void f2fs_register_inmem_page(struct inode *inode, struct page *page) { struct inmem_pages *new; - f2fs_set_page_private(page, ATOMIC_WRITTEN_PAGE); + if (PagePrivate(page)) + set_page_private(page, (unsigned long)ATOMIC_WRITTEN_PAGE); + else + f2fs_set_page_private(page, ATOMIC_WRITTEN_PAGE); new = f2fs_kmem_cache_alloc(inmem_entry_slab, GFP_NOFS); -- 2.29.2
Re: [PATCH v3] f2fs: fix to keep isolation of atomic write
On 2021/4/13 11:27, Jaegeuk Kim wrote: On 04/12, Chao Yu wrote: As Yi Chen reported, there is a potential race case described as below: Thread AThread B - f2fs_ioc_start_atomic_write - mkwrite - set_page_dirty - f2fs_set_page_private(page, 0) - set_inode_flag(FI_ATOMIC_FILE) - mkwrite same page - set_page_dirty - f2fs_register_inmem_page - f2fs_set_page_private(ATOMIC_WRITTEN_PAGE) failed due to PagePrivate flag has been set - list_add_tail - truncate_inode_pages - f2fs_invalidate_page - clear page private but w/o remove it from inmem_list - set page->mapping to NULL - f2fs_ioc_commit_atomic_write - __f2fs_commit_inmem_pages - __revoke_inmem_pages - f2fs_put_page panic as page->mapping is NULL The root cause is we missed to keep isolation of atomic write in the case of start_atomic_write vs mkwrite, let start_atomic_write helds i_mmap_sem lock to avoid this issue. My only concern is performance regression. Could you please verify the numbers? Do you have specific test script? IIRC, the scenario you mean is multi-threads write/mmap the same db, right? Thanks, Reported-by: Yi Chen Signed-off-by: Chao Yu --- v3: - rebase to last dev branch - update commit message because this patch fixes a different racing issue of atomic write fs/f2fs/file.c| 3 +++ fs/f2fs/segment.c | 6 ++ 2 files changed, 9 insertions(+) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index d697c8900fa7..6284b2f4a60b 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -2054,6 +2054,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp) goto out; down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); + down_write(&F2FS_I(inode)->i_mmap_sem); /* * Should wait end_io to count F2FS_WB_CP_DATA correctly by @@ -2064,6 +2065,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp) inode->i_ino, get_dirty_pages(inode)); ret = filemap_write_and_wait_range(inode->i_mapping, 0, LLONG_MAX); if (ret) { + up_write(&F2FS_I(inode)->i_mmap_sem); up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); goto out; } @@ -2077,6 +2079,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp) /* add inode in inmem_list first and set atomic_file */ set_inode_flag(inode, FI_ATOMIC_FILE); clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST); + up_write(&F2FS_I(inode)->i_mmap_sem); up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); f2fs_update_time(F2FS_I_SB(inode), REQ_TIME); diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 0cb1ca88d4aa..78c8342f52fd 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -325,6 +325,7 @@ void f2fs_drop_inmem_pages(struct inode *inode) struct f2fs_inode_info *fi = F2FS_I(inode); do { + down_write(&F2FS_I(inode)->i_mmap_sem); mutex_lock(&fi->inmem_lock); if (list_empty(&fi->inmem_pages)) { fi->i_gc_failures[GC_FAILURE_ATOMIC] = 0; @@ -339,11 +340,13 @@ void f2fs_drop_inmem_pages(struct inode *inode) spin_unlock(&sbi->inode_lock[ATOMIC_FILE]); mutex_unlock(&fi->inmem_lock); + up_write(&F2FS_I(inode)->i_mmap_sem); break; } __revoke_inmem_pages(inode, &fi->inmem_pages, true, false, true); mutex_unlock(&fi->inmem_lock); + up_write(&F2FS_I(inode)->i_mmap_sem); } while (1); } @@ -468,6 +471,7 @@ int f2fs_commit_inmem_pages(struct inode *inode) f2fs_balance_fs(sbi, true); down_write(&fi->i_gc_rwsem[WRITE]); + down_write(&F2FS_I(inode)->i_mmap_sem); f2fs_lock_op(sbi); set_inode_flag(inode, FI_ATOMIC_COMMIT); @@ -479,6 +483,8 @@ int f2fs_commit_inmem_pages(struct inode *inode) clear_inode_flag(inode, FI_ATOMIC_COMMIT); f2fs_unlock_op(sbi); + + up_write(&F2FS_I(inode)->i_mmap_sem); up_write(&fi->i_gc_rwsem[WRITE]); return err; -- 2.29.2 .
Re: [f2fs-dev] [PATCH v2] f2fs: fix to avoid touching checkpointed data in get_victim()
On 2021/4/13 10:59, Jaegeuk Kim wrote: @@ -2701,7 +2689,7 @@ static void change_curseg(struct f2fs_sb_info *sbi, int type, bool flush) reset_curseg(sbi, type, 1); curseg->alloc_type = SSR; - __next_free_blkoff(sbi, curseg, 0); + __next_free_blkoff(sbi, curseg->segno, 0); Forgot to assign curseg->next_blkoff here, I checked generic/083, it passed, let me run all testcases. Thanks, sum_page = f2fs_get_sum_page(sbi, new_segno); if (IS_ERR(sum_page)) {
Re: [f2fs-dev] [PATCH v2] f2fs: fix to avoid touching checkpointed data in get_victim()
On 2021/4/13 10:59, Jaegeuk Kim wrote: On 04/11, Chao Yu wrote: Hi Jaegeuk, Could you please help to merge below cleanup diff into original patch? or merge this separately if it is too late since it is near rc7. I didn't review this tho, this gives an error in xfstests/083. My bad, I hit this issue too, let me check this. Thanks, From 5a342a8f332a1b3281ec0e2b4d41b5287689c8ed Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Sun, 11 Apr 2021 14:29:34 +0800 Subject: [PATCH] f2fs: avoid duplicated codes for cleanup f2fs_segment_has_free_slot() was copied from __next_free_blkoff(), the main implementation of them is almost the same, clean up them to reuse common code as much as possible. Signed-off-by: Chao Yu --- fs/f2fs/segment.c | 32 ++-- 1 file changed, 10 insertions(+), 22 deletions(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index b33273aa5c22..bd9056165d62 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -2627,22 +2627,20 @@ static void new_curseg(struct f2fs_sb_info *sbi, int type, bool new_sec) curseg->alloc_type = LFS; } -static void __next_free_blkoff(struct f2fs_sb_info *sbi, - struct curseg_info *seg, block_t start) +static int __next_free_blkoff(struct f2fs_sb_info *sbi, + int segno, block_t start) { - struct seg_entry *se = get_seg_entry(sbi, seg->segno); + struct seg_entry *se = get_seg_entry(sbi, segno); int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long); unsigned long *target_map = SIT_I(sbi)->tmp_map; unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map; unsigned long *cur_map = (unsigned long *)se->cur_valid_map; - int i, pos; + int i; for (i = 0; i < entries; i++) target_map[i] = ckpt_map[i] | cur_map[i]; - pos = __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, start); - - seg->next_blkoff = pos; + return __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, start); } /* @@ -2654,26 +2652,16 @@ static void __refresh_next_blkoff(struct f2fs_sb_info *sbi, struct curseg_info *seg) { if (seg->alloc_type == SSR) - __next_free_blkoff(sbi, seg, seg->next_blkoff + 1); + seg->next_blkoff = + __next_free_blkoff(sbi, seg->segno, + seg->next_blkoff + 1); else seg->next_blkoff++; } bool f2fs_segment_has_free_slot(struct f2fs_sb_info *sbi, int segno) { - struct seg_entry *se = get_seg_entry(sbi, segno); - int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long); - unsigned long *target_map = SIT_I(sbi)->tmp_map; - unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map; - unsigned long *cur_map = (unsigned long *)se->cur_valid_map; - int i, pos; - - for (i = 0; i < entries; i++) - target_map[i] = ckpt_map[i] | cur_map[i]; - - pos = __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, 0); - - return pos < sbi->blocks_per_seg; + return __next_free_blkoff(sbi, segno, 0) < sbi->blocks_per_seg; } /* @@ -2701,7 +2689,7 @@ static void change_curseg(struct f2fs_sb_info *sbi, int type, bool flush) reset_curseg(sbi, type, 1); curseg->alloc_type = SSR; - __next_free_blkoff(sbi, curseg, 0); + __next_free_blkoff(sbi, curseg->segno, 0); sum_page = f2fs_get_sum_page(sbi, new_segno); if (IS_ERR(sum_page)) { -- 2.22.1 ___ Linux-f2fs-devel mailing list linux-f2fs-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel .
[PATCH v3] f2fs: fix to keep isolation of atomic write
As Yi Chen reported, there is a potential race case described as below: Thread AThread B - f2fs_ioc_start_atomic_write - mkwrite - set_page_dirty - f2fs_set_page_private(page, 0) - set_inode_flag(FI_ATOMIC_FILE) - mkwrite same page - set_page_dirty - f2fs_register_inmem_page - f2fs_set_page_private(ATOMIC_WRITTEN_PAGE) failed due to PagePrivate flag has been set - list_add_tail - truncate_inode_pages - f2fs_invalidate_page - clear page private but w/o remove it from inmem_list - set page->mapping to NULL - f2fs_ioc_commit_atomic_write - __f2fs_commit_inmem_pages - __revoke_inmem_pages - f2fs_put_page panic as page->mapping is NULL The root cause is we missed to keep isolation of atomic write in the case of start_atomic_write vs mkwrite, let start_atomic_write helds i_mmap_sem lock to avoid this issue. Reported-by: Yi Chen Signed-off-by: Chao Yu --- v3: - rebase to last dev branch - update commit message because this patch fixes a different racing issue of atomic write fs/f2fs/file.c| 3 +++ fs/f2fs/segment.c | 6 ++ 2 files changed, 9 insertions(+) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index d697c8900fa7..6284b2f4a60b 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -2054,6 +2054,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp) goto out; down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); + down_write(&F2FS_I(inode)->i_mmap_sem); /* * Should wait end_io to count F2FS_WB_CP_DATA correctly by @@ -2064,6 +2065,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp) inode->i_ino, get_dirty_pages(inode)); ret = filemap_write_and_wait_range(inode->i_mapping, 0, LLONG_MAX); if (ret) { + up_write(&F2FS_I(inode)->i_mmap_sem); up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); goto out; } @@ -2077,6 +2079,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp) /* add inode in inmem_list first and set atomic_file */ set_inode_flag(inode, FI_ATOMIC_FILE); clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST); + up_write(&F2FS_I(inode)->i_mmap_sem); up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); f2fs_update_time(F2FS_I_SB(inode), REQ_TIME); diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 0cb1ca88d4aa..78c8342f52fd 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -325,6 +325,7 @@ void f2fs_drop_inmem_pages(struct inode *inode) struct f2fs_inode_info *fi = F2FS_I(inode); do { + down_write(&F2FS_I(inode)->i_mmap_sem); mutex_lock(&fi->inmem_lock); if (list_empty(&fi->inmem_pages)) { fi->i_gc_failures[GC_FAILURE_ATOMIC] = 0; @@ -339,11 +340,13 @@ void f2fs_drop_inmem_pages(struct inode *inode) spin_unlock(&sbi->inode_lock[ATOMIC_FILE]); mutex_unlock(&fi->inmem_lock); + up_write(&F2FS_I(inode)->i_mmap_sem); break; } __revoke_inmem_pages(inode, &fi->inmem_pages, true, false, true); mutex_unlock(&fi->inmem_lock); + up_write(&F2FS_I(inode)->i_mmap_sem); } while (1); } @@ -468,6 +471,7 @@ int f2fs_commit_inmem_pages(struct inode *inode) f2fs_balance_fs(sbi, true); down_write(&fi->i_gc_rwsem[WRITE]); + down_write(&F2FS_I(inode)->i_mmap_sem); f2fs_lock_op(sbi); set_inode_flag(inode, FI_ATOMIC_COMMIT); @@ -479,6 +483,8 @@ int f2fs_commit_inmem_pages(struct inode *inode) clear_inode_flag(inode, FI_ATOMIC_COMMIT); f2fs_unlock_op(sbi); + + up_write(&F2FS_I(inode)->i_mmap_sem); up_write(&fi->i_gc_rwsem[WRITE]); return err; -- 2.29.2
Re: [PATCH v2] f2fs: fix to avoid touching checkpointed data in get_victim()
Hi Jaegeuk, Could you please help to merge below cleanup diff into original patch? or merge this separately if it is too late since it is near rc7. From 5a342a8f332a1b3281ec0e2b4d41b5287689c8ed Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Sun, 11 Apr 2021 14:29:34 +0800 Subject: [PATCH] f2fs: avoid duplicated codes for cleanup f2fs_segment_has_free_slot() was copied from __next_free_blkoff(), the main implementation of them is almost the same, clean up them to reuse common code as much as possible. Signed-off-by: Chao Yu --- fs/f2fs/segment.c | 32 ++-- 1 file changed, 10 insertions(+), 22 deletions(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index b33273aa5c22..bd9056165d62 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -2627,22 +2627,20 @@ static void new_curseg(struct f2fs_sb_info *sbi, int type, bool new_sec) curseg->alloc_type = LFS; } -static void __next_free_blkoff(struct f2fs_sb_info *sbi, - struct curseg_info *seg, block_t start) +static int __next_free_blkoff(struct f2fs_sb_info *sbi, + int segno, block_t start) { - struct seg_entry *se = get_seg_entry(sbi, seg->segno); + struct seg_entry *se = get_seg_entry(sbi, segno); int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long); unsigned long *target_map = SIT_I(sbi)->tmp_map; unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map; unsigned long *cur_map = (unsigned long *)se->cur_valid_map; - int i, pos; + int i; for (i = 0; i < entries; i++) target_map[i] = ckpt_map[i] | cur_map[i]; - pos = __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, start); - - seg->next_blkoff = pos; + return __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, start); } /* @@ -2654,26 +2652,16 @@ static void __refresh_next_blkoff(struct f2fs_sb_info *sbi, struct curseg_info *seg) { if (seg->alloc_type == SSR) - __next_free_blkoff(sbi, seg, seg->next_blkoff + 1); + seg->next_blkoff = + __next_free_blkoff(sbi, seg->segno, + seg->next_blkoff + 1); else seg->next_blkoff++; } bool f2fs_segment_has_free_slot(struct f2fs_sb_info *sbi, int segno) { - struct seg_entry *se = get_seg_entry(sbi, segno); - int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long); - unsigned long *target_map = SIT_I(sbi)->tmp_map; - unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map; - unsigned long *cur_map = (unsigned long *)se->cur_valid_map; - int i, pos; - - for (i = 0; i < entries; i++) - target_map[i] = ckpt_map[i] | cur_map[i]; - - pos = __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, 0); - - return pos < sbi->blocks_per_seg; + return __next_free_blkoff(sbi, segno, 0) < sbi->blocks_per_seg; } /* @@ -2701,7 +2689,7 @@ static void change_curseg(struct f2fs_sb_info *sbi, int type, bool flush) reset_curseg(sbi, type, 1); curseg->alloc_type = SSR; - __next_free_blkoff(sbi, curseg, 0); + __next_free_blkoff(sbi, curseg->segno, 0); sum_page = f2fs_get_sum_page(sbi, new_segno); if (IS_ERR(sum_page)) { -- 2.22.1
Re: [PATCH v2] f2fs: fix the periodic wakeups of discard thread
On 2021/4/6 17:09, Sahitya Tummala wrote: Fix the unnecessary periodic wakeups of discard thread that happens under below two conditions - 1. When f2fs is heavily utilized over 80%, the current discard policy sets the max sleep timeout of discard thread as 50ms (DEF_MIN_DISCARD_ISSUE_TIME). But this is set even when there are no pending discard commands to be issued. 2. In the issue_discard_thread() path when there are no pending discard commands, it fails to reset the wait_ms to max timeout value. Signed-off-by: Sahitya Tummala Reviewed-by: Chao Yu Thanks,
Re: [PATCH v2 00/10] erofs: add big pcluster compression support
po for big pcluster: https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git -b experimental-bigpcluster-compact Thanks for your time on reading this! Nice job! Acked-by: Chao Yu Thanks, Thanks, Gao Xiang changes since v1: - add a missing vunmap in erofs_pcpubuf_exit(); - refine comments and commit messages. (btw, I'll apply this patchset for -next first for further integration test, which will be aimed to 5.13-rc1.) Gao Xiang (10): erofs: reserve physical_clusterbits[] erofs: introduce multipage per-CPU buffers erofs: introduce physical cluster slab pools erofs: fix up inplace I/O pointer for big pcluster erofs: add big physical cluster definition erofs: adjust per-CPU buffers according to max_pclusterblks erofs: support parsing big pcluster compress indexes erofs: support parsing big pcluster compact indexes erofs: support decompress big pcluster for lz4 backend erofs: enable big pcluster feature fs/erofs/Kconfig| 14 --- fs/erofs/Makefile | 2 +- fs/erofs/decompressor.c | 216 +--- fs/erofs/erofs_fs.h | 31 -- fs/erofs/internal.h | 31 ++ fs/erofs/pcpubuf.c | 134 + fs/erofs/super.c| 1 + fs/erofs/utils.c| 12 --- fs/erofs/zdata.c| 193 ++- fs/erofs/zdata.h| 14 +-- fs/erofs/zmap.c | 155 ++-- 11 files changed, 560 insertions(+), 243 deletions(-) create mode 100644 fs/erofs/pcpubuf.c
[PATCH v2] f2fs: fix to avoid accessing invalid fio in f2fs_allocate_data_block()
Callers may pass fio parameter with NULL value to f2fs_allocate_data_block(), so we should make sure accessing fio's field after fio's validation check. Fixes: f608c38c59c6 ("f2fs: clean up parameter of f2fs_allocate_data_block()") Signed-off-by: Chao Yu --- v2: - relocate fio->retry assignment to avoid race condition. fs/f2fs/segment.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index c517e689a9a3..44897cfecb1e 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -3417,12 +3417,12 @@ void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page, f2fs_inode_chksum_set(sbi, page); } - if (F2FS_IO_ALIGNED(sbi)) - fio->retry = false; - if (fio) { struct f2fs_bio_info *io; + if (F2FS_IO_ALIGNED(sbi)) + fio->retry = false; + INIT_LIST_HEAD(&fio->list); fio->in_list = true; io = sbi->write_io[fio->type] + fio->temp; -- 2.29.2
Re: [f2fs-dev] [PATCH] f2fs: set checkpoint_merge by default
On 2021/4/2 8:42, Jaegeuk Kim wrote: Once we introduced checkpoint_merge, we've seen some contention w/o the option. In order to avoid it, let's set it by default. Signed-off-by: Jaegeuk Kim Reviewed-by: Chao Yu Thanks,
[PATCH] f2fs: fix to avoid GC/mmap race with f2fs_truncate()
It missed to hold i_gc_rwsem and i_map_sem around f2fs_truncate() in f2fs_file_write_iter() to avoid racing with background GC and mmap, fix it. Signed-off-by: Chao Yu --- fs/f2fs/file.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index dc79694e512c..f3ca63b55843 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -4443,8 +4443,13 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) clear_inode_flag(inode, FI_NO_PREALLOC); /* if we couldn't write data, we should deallocate blocks. */ - if (preallocated && i_size_read(inode) < target_size) + if (preallocated && i_size_read(inode) < target_size) { + down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); + down_write(&F2FS_I(inode)->i_mmap_sem); f2fs_truncate(inode); + up_write(&F2FS_I(inode)->i_mmap_sem); + up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); + } if (ret > 0) f2fs_update_iostat(F2FS_I_SB(inode), APP_WRITE_IO, ret); -- 2.29.2
[PATCH] f2fs: fix to avoid accessing invalid fio in f2fs_allocate_data_block()
Callers may pass fio parameter with NULL value to f2fs_allocate_data_block(), so we should make sure accessing fio's field after fio's validation check. Fixes: f608c38c59c6 ("f2fs: clean up parameter of f2fs_allocate_data_block()") Signed-off-by: Chao Yu --- fs/f2fs/segment.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index c517e689a9a3..d076ae03b5dc 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -3417,9 +3417,6 @@ void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page, f2fs_inode_chksum_set(sbi, page); } - if (F2FS_IO_ALIGNED(sbi)) - fio->retry = false; - if (fio) { struct f2fs_bio_info *io; @@ -3429,6 +3426,9 @@ void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page, spin_lock(&io->io_lock); list_add_tail(&fio->list, &io->io_list); spin_unlock(&io->io_lock); + + if (F2FS_IO_ALIGNED(sbi)) + fio->retry = false; } mutex_unlock(&curseg->curseg_mutex); -- 2.29.2
Re: [f2fs-dev] [PATCH] Revert "f2fs: give a warning only for readonly partition"
On 2021/3/31 9:57, Jaegeuk Kim wrote: On 03/27, Chao Yu wrote: On 2021/3/27 9:52, Chao Yu wrote: On 2021/3/27 1:30, Jaegeuk Kim wrote: On 03/26, Chao Yu wrote: On 2021/3/26 9:19, Jaegeuk Kim wrote: On 03/26, Chao Yu wrote: On 2021/3/25 9:59, Chao Yu wrote: On 2021/3/25 6:44, Jaegeuk Kim wrote: On 03/24, Chao Yu wrote: On 2021/3/24 12:22, Jaegeuk Kim wrote: On 03/24, Chao Yu wrote: On 2021/3/24 2:39, Jaegeuk Kim wrote: On 03/23, Chao Yu wrote: This reverts commit 938a184265d75ea474f1c6fe1da96a5196163789. Because that commit fails generic/050 testcase which expect failure during mount a recoverable readonly partition. I think we need to change generic/050, since f2fs can recover this partition, Well, not sure we can change that testcase, since it restricts all generic filesystems behavior. At least, ext4's behavior makes sense to me: journal_dev_ro = bdev_read_only(journal->j_dev); really_read_only = bdev_read_only(sb->s_bdev) | journal_dev_ro; if (journal_dev_ro && !sb_rdonly(sb)) { ext4_msg(sb, KERN_ERR, "journal device read-only, try mounting with '-o ro'"); err = -EROFS; goto err_out; } if (ext4_has_feature_journal_needs_recovery(sb)) { if (sb_rdonly(sb)) { ext4_msg(sb, KERN_INFO, "INFO: recovery " "required on readonly filesystem"); if (really_read_only) { ext4_msg(sb, KERN_ERR, "write access " "unavailable, cannot proceed " "(try mounting with noload)"); err = -EROFS; goto err_out; } ext4_msg(sb, KERN_INFO, "write access will " "be enabled during recovery"); } } even though using it as readonly. And, valid checkpoint can allow for user to read all the data without problem. if (f2fs_hw_is_readonly(sbi)) { Since device is readonly now, all write to the device will fail, checkpoint can not persist recovered data, after page cache is expired, user can see stale data. My point is, after mount with ro, there'll be no data write which preserves the current status. So, in the next time, we can recover fsync'ed data later, if user succeeds to mount as rw. Another point is, with the current checkpoint, we should not have any corrupted metadata. So, why not giving a chance to show what data remained to user? I think this can be doable only with CoW filesystems. I guess we're talking about the different things... Let me declare two different readonly status: 1. filesystem readonly: file system is mount with ro mount option, and app from userspace can not modify any thing of filesystem, but filesystem itself can modify data on device since device may be writable. 2. device readonly: device is set to readonly status via 'blockdev --setro' command, and then filesystem should never issue any write IO to the device. So, what I mean is, *when device is readonly*, rather than f2fs mountpoint is readonly (f2fs_hw_is_readonly() returns true as below code, instead of f2fs_readonly() returns true), in this condition, we should not issue any write IO to device anyway, because, AFAIK, write IO will fail due to bio_check_ro() check. In that case, mount(2) will try readonly, no? Yes, if device is readonly, mount (2) can not mount/remount device to rw mountpoint. Any other concern about this patch? Indeed we're talking about different things. :) This case is mount(ro) with device(ro) having some data to recover. My point is why not giving a chance to mount(ro) to show the current data covered by a valid checkpoint. This doesn't change anything in the disk, Got your idea. IMO, it has potential issue in above condition: Since device is readonly now, all write to the device will fail, checkpoint can not persist recovered data, after page cache is expired, user can see stale data. e.g. Recovery writes one inode and then triggers a checkpoint, all writes fail I'm confused. Currently we don't trigger the roll-forward recovery. Oh, my miss, sorry. :-P My point is in this condition we can return error and try to notice user to mount with disable_roll_forward or norecovery option, then at least user can know he should not expect last fsynced data in newly mounted image. Or we can use f2fs_recover_fsync_data() to check whether there is fsynced data, if there is no such data, then let mount() succeed. Something like this, maybe: --- fs/f2fs/super.c | 17 + 1 file changed, 13 insertions(+), 4 de
[PATCH] f2fs: fix to restrict mount condition on readonly block device
When we mount an unclean f2fs image in a readonly block device, let's make mount() succeed only when there is no recoverable data in that image, otherwise after mount(), file fsyned won't be recovered as user expected. Fixes: 938a184265d7 ("f2fs: give a warning only for readonly partition") Signed-off-by: Chao Yu --- fs/f2fs/super.c | 16 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 954b1fe97d67..14239e2b7ae7 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -3966,10 +3966,18 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) * previous checkpoint was not done by clean system shutdown. */ if (f2fs_hw_is_readonly(sbi)) { - if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) - f2fs_err(sbi, "Need to recover fsync data, but write access unavailable"); - else - f2fs_info(sbi, "write access unavailable, skipping recovery"); + if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) { + err = f2fs_recover_fsync_data(sbi, true); + if (err > 0) { + err = -EROFS; + f2fs_err(sbi, "Need to recover fsync data, but " + "write access unavailable, please try " + "mount w/ disable_roll_forward or norecovery"); + } + if (err < 0) + goto free_meta; + } + f2fs_info(sbi, "write access unavailable, skipping recovery"); goto reset_checkpoint; } -- 2.29.2
Re: [PATCH v2 4/4] erofs: add on-disk compression configurations
On 2021/3/29 14:55, Gao Xiang wrote: I found a reference here, https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-style.rst?h=v5.11#n99 also vaguely remembered some threads from Linus, but hard to find now :-( Sure, we can follow this rule for erofs code style. :) Thanks,
Re: [PATCH v2 4/4] erofs: add on-disk compression configurations
On 2021/3/29 14:36, Gao Xiang wrote: Hi Chao, On Mon, Mar 29, 2021 at 02:26:05PM +0800, Chao Yu wrote: On 2021/3/29 9:23, Gao Xiang wrote: From: Gao Xiang Add a bitmap for available compression algorithms and a variable-sized on-disk table for compression options in preparation for upcoming big pcluster and LZMA algorithm, which follows the end of super block. To parse the compression options, the bitmap is scanned one by one. For each available algorithm, there is data followed by 2-byte `length' correspondingly (it's enough for most cases, or entire fs blocks should be used.) With such available algorithm bitmap, kernel itself can also refuse to mount such filesystem if any unsupported compression algorithm exists. Note that COMPR_CFGS feature will be enabled with BIG_PCLUSTER. Signed-off-by: Gao Xiang --- fs/erofs/decompressor.c | 2 +- fs/erofs/erofs_fs.h | 14 ++-- fs/erofs/internal.h | 5 +- fs/erofs/super.c| 143 +++- 4 files changed, 157 insertions(+), 7 deletions(-) diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c index 97538ff24a19..27aa6a99b371 100644 --- a/fs/erofs/decompressor.c +++ b/fs/erofs/decompressor.c @@ -41,7 +41,7 @@ int z_erofs_load_lz4_config(struct super_block *sb, } distance = le16_to_cpu(lz4->max_distance); } else { - distance = le16_to_cpu(dsb->lz4_max_distance); + distance = le16_to_cpu(dsb->u1.lz4_max_distance); } EROFS_SB(sb)->lz4.max_distance_pages = distance ? diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h index e0f3c0db1f82..5a126493d4d9 100644 --- a/fs/erofs/erofs_fs.h +++ b/fs/erofs/erofs_fs.h @@ -18,15 +18,16 @@ * be incompatible with this kernel version. */ #define EROFS_FEATURE_INCOMPAT_LZ4_0PADDING 0x0001 +#define EROFS_FEATURE_INCOMPAT_COMPR_CFGS 0x0002 #define EROFS_ALL_FEATURE_INCOMPAT EROFS_FEATURE_INCOMPAT_LZ4_0PADDING -/* 128-byte erofs on-disk super block */ +/* erofs on-disk super block (currently 128 bytes) */ struct erofs_super_block { __le32 magic; /* file system magic number */ __le32 checksum;/* crc32c(super_block) */ __le32 feature_compat; __u8 blkszbits; /* support block_size == PAGE_SIZE only */ - __u8 reserved; + __u8 sb_extslots; /* superblock size = 128 + sb_extslots * 16 */ __le16 root_nid;/* nid of root directory */ __le64 inos;/* total valid ino # (== f_files - f_favail) */ @@ -39,8 +40,12 @@ struct erofs_super_block { __u8 uuid[16]; /* 128-bit uuid for volume */ __u8 volume_name[16]; /* volume name */ __le32 feature_incompat; - /* customized lz4 sliding window size instead of 64k by default */ - __le16 lz4_max_distance; + union { + /* bitmap for available compression algorithms */ + __le16 available_compr_algs; + /* customized sliding window size instead of 64k by default */ + __le16 lz4_max_distance; + } __packed u1; __u8 reserved2[42]; }; @@ -196,6 +201,7 @@ enum { Z_EROFS_COMPRESSION_LZ4 = 0, Z_EROFS_COMPRESSION_MAX }; +#define Z_EROFS_ALL_COMPR_ALGS (1 << (Z_EROFS_COMPRESSION_MAX - 1)) /* 14 bytes (+ length field = 16 bytes) */ struct z_erofs_lz4_cfgs { diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 46b977f348eb..f3fa895d809f 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -75,6 +75,7 @@ struct erofs_sb_info { struct xarray managed_pslots; unsigned int shrinker_run_no; + u16 available_compr_algs; /* pseudo inode to manage cached pages */ struct inode *managed_cache; @@ -90,6 +91,7 @@ struct erofs_sb_info { /* inode slot unit size in bit shift */ unsigned char islotbits; + u32 sb_size;/* total superblock size */ u32 build_time_nsec; u64 build_time; @@ -233,6 +235,7 @@ static inline bool erofs_sb_has_##name(struct erofs_sb_info *sbi) \ } EROFS_FEATURE_FUNCS(lz4_0padding, incompat, INCOMPAT_LZ4_0PADDING) +EROFS_FEATURE_FUNCS(compr_cfgs, incompat, INCOMPAT_COMPR_CFGS) EROFS_FEATURE_FUNCS(sb_chksum, compat, COMPAT_SB_CHKSUM) /* atomic flag definitions */ @@ -454,7 +457,7 @@ static inline int z_erofs_load_lz4_config(struct super_block *sb, struct erofs_super_block *dsb, struct z_erofs_lz4_cfgs *lz4, int len) { - if (lz4 || dsb->lz4_max_distance) { + if (lz4 || dsb->u1.lz4_max_distance) { erofs_err(sb, "lz4 algorithm isn't enabled"); return -EINVAL; } diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 1ca8da3f2125..628c751634fe 100644 --- a/
Re: [PATCH v2 4/4] erofs: add on-disk compression configurations
return true; } +#ifdef CONFIG_EROFS_FS_ZIP +/* read variable-sized metadata, offset will be aligned by 4-byte */ +static void *erofs_read_metadata(struct super_block *sb, struct page **pagep, +erofs_off_t *offset, int *lengthp) +{ + struct page *page = *pagep; + u8 *buffer, *ptr; + int len, i, cnt; + erofs_blk_t blk; + + *offset = round_up(*offset, 4); + blk = erofs_blknr(*offset); + + if (!page || page->index != blk) { + if (page) { + unlock_page(page); + put_page(page); + } + page = erofs_get_meta_page(sb, blk); + if (IS_ERR(page)) + goto err_nullpage; + } + + ptr = kmap(page); + len = le16_to_cpu(*(__le16 *)&ptr[erofs_blkoff(*offset)]); + if (!len) + len = U16_MAX + 1; + buffer = kmalloc(len, GFP_KERNEL); + if (!buffer) { + buffer = ERR_PTR(-ENOMEM); + goto out; + } + *offset += sizeof(__le16); + *lengthp = len; + + for (i = 0; i < len; i += cnt) { + cnt = min(EROFS_BLKSIZ - (int)erofs_blkoff(*offset), len - i); + blk = erofs_blknr(*offset); + + if (!page || page->index != blk) { + if (page) { + kunmap(page); + unlock_page(page); + put_page(page); + } + page = erofs_get_meta_page(sb, blk); + if (IS_ERR(page)) { + kfree(buffer); + goto err_nullpage; + } + ptr = kmap(page); + } + memcpy(buffer + i, ptr + erofs_blkoff(*offset), cnt); + *offset += cnt; + } +out: + kunmap(page); + *pagep = page; + return buffer; +err_nullpage: + *pagep = NULL; + return page; +} + +static int erofs_load_compr_cfgs(struct super_block *sb, +struct erofs_super_block *dsb) +{ + struct erofs_sb_info *sbi; + struct page *page; + unsigned int algs, alg; + erofs_off_t offset; + int size, ret; + + sbi = EROFS_SB(sb); + sbi->available_compr_algs = le16_to_cpu(dsb->u1.available_compr_algs); + + if (sbi->available_compr_algs & ~Z_EROFS_ALL_COMPR_ALGS) { + erofs_err(sb, +"try to load compressed image with unsupported algorithms %x", Minor style issue: "try to load compressed image with unsupported " "algorithms %x", sbi->available_compr_algs & ~Z_EROFS_ALL_COMPR_ALGS); + sbi->available_compr_algs & ~Z_EROFS_ALL_COMPR_ALGS); + return -EINVAL; + } + + offset = EROFS_SUPER_OFFSET + sbi->sb_size; + page = NULL; + alg = 0; + ret = 0; + + for (algs = sbi->available_compr_algs; algs; algs >>= 1, ++alg) { + void *data; + + if (!(algs & 1)) + continue; + + data = erofs_read_metadata(sb, &page, &offset, &size); + if (IS_ERR(data)) { + ret = PTR_ERR(data); + goto err; + } + + switch (alg) { + case Z_EROFS_COMPRESSION_LZ4: + ret = z_erofs_load_lz4_config(sb, dsb, data, size); + break; + default: + DBG_BUGON(1); + ret = -EFAULT; + } + kfree(data); + if (ret) + goto err; + } +err: + if (page) { + unlock_page(page); + put_page(page); + } + return ret; +} +#else +static int erofs_load_compr_cfgs(struct super_block *sb, +struct erofs_super_block *dsb) +{ + if (dsb->u1.available_compr_algs) { + erofs_err(sb, +"try to load compressed image when compression is disabled"); Ditto, erofs_err(sb, "try to load compressed image when " "compression is disabled"); + return -EINVAL; + } + return 0; +} +#endif + static int erofs_read_superblock(struct super_block *sb) { struct erofs_sb_info *sbi; @@ -166,6 +298,12 @@ static int erofs_read_superblock(struct super_block *sb) if (!check_layout_compatibility(sb, dsb)) goto out; + sbi->sb_size = 128 + dsb->sb_extslots * 16; sbi->sb_size = sizeof(struct erofs_super_block) + dsb->sb_e
Re: [f2fs-dev] [PATCH] Revert "f2fs: give a warning only for readonly partition"
On 2021/3/27 9:52, Chao Yu wrote: On 2021/3/27 1:30, Jaegeuk Kim wrote: On 03/26, Chao Yu wrote: On 2021/3/26 9:19, Jaegeuk Kim wrote: On 03/26, Chao Yu wrote: On 2021/3/25 9:59, Chao Yu wrote: On 2021/3/25 6:44, Jaegeuk Kim wrote: On 03/24, Chao Yu wrote: On 2021/3/24 12:22, Jaegeuk Kim wrote: On 03/24, Chao Yu wrote: On 2021/3/24 2:39, Jaegeuk Kim wrote: On 03/23, Chao Yu wrote: This reverts commit 938a184265d75ea474f1c6fe1da96a5196163789. Because that commit fails generic/050 testcase which expect failure during mount a recoverable readonly partition. I think we need to change generic/050, since f2fs can recover this partition, Well, not sure we can change that testcase, since it restricts all generic filesystems behavior. At least, ext4's behavior makes sense to me: journal_dev_ro = bdev_read_only(journal->j_dev); really_read_only = bdev_read_only(sb->s_bdev) | journal_dev_ro; if (journal_dev_ro && !sb_rdonly(sb)) { ext4_msg(sb, KERN_ERR, "journal device read-only, try mounting with '-o ro'"); err = -EROFS; goto err_out; } if (ext4_has_feature_journal_needs_recovery(sb)) { if (sb_rdonly(sb)) { ext4_msg(sb, KERN_INFO, "INFO: recovery " "required on readonly filesystem"); if (really_read_only) { ext4_msg(sb, KERN_ERR, "write access " "unavailable, cannot proceed " "(try mounting with noload)"); err = -EROFS; goto err_out; } ext4_msg(sb, KERN_INFO, "write access will " "be enabled during recovery"); } } even though using it as readonly. And, valid checkpoint can allow for user to read all the data without problem. if (f2fs_hw_is_readonly(sbi)) { Since device is readonly now, all write to the device will fail, checkpoint can not persist recovered data, after page cache is expired, user can see stale data. My point is, after mount with ro, there'll be no data write which preserves the current status. So, in the next time, we can recover fsync'ed data later, if user succeeds to mount as rw. Another point is, with the current checkpoint, we should not have any corrupted metadata. So, why not giving a chance to show what data remained to user? I think this can be doable only with CoW filesystems. I guess we're talking about the different things... Let me declare two different readonly status: 1. filesystem readonly: file system is mount with ro mount option, and app from userspace can not modify any thing of filesystem, but filesystem itself can modify data on device since device may be writable. 2. device readonly: device is set to readonly status via 'blockdev --setro' command, and then filesystem should never issue any write IO to the device. So, what I mean is, *when device is readonly*, rather than f2fs mountpoint is readonly (f2fs_hw_is_readonly() returns true as below code, instead of f2fs_readonly() returns true), in this condition, we should not issue any write IO to device anyway, because, AFAIK, write IO will fail due to bio_check_ro() check. In that case, mount(2) will try readonly, no? Yes, if device is readonly, mount (2) can not mount/remount device to rw mountpoint. Any other concern about this patch? Indeed we're talking about different things. :) This case is mount(ro) with device(ro) having some data to recover. My point is why not giving a chance to mount(ro) to show the current data covered by a valid checkpoint. This doesn't change anything in the disk, Got your idea. IMO, it has potential issue in above condition: Since device is readonly now, all write to the device will fail, checkpoint can not persist recovered data, after page cache is expired, user can see stale data. e.g. Recovery writes one inode and then triggers a checkpoint, all writes fail I'm confused. Currently we don't trigger the roll-forward recovery. Oh, my miss, sorry. :-P My point is in this condition we can return error and try to notice user to mount with disable_roll_forward or norecovery option, then at least user can know he should not expect last fsynced data in newly mounted image. Or we can use f2fs_recover_fsync_data() to check whether there is fsynced data, if there is no such data, then let mount() succeed. Something like this, maybe: --- fs/f2fs/super.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 954
[PATCH] f2fs: introduce gc_merge mount option
In this patch, we will add two new mount options: "gc_merge" and "nogc_merge", when background_gc is on, "gc_merge" option can be set to let background GC thread to handle foreground GC requests, it can eliminate the sluggish issue caused by slow foreground GC operation when GC is triggered from a process with limited I/O and CPU resources. Original idea is from Xiang. Signed-off-by: Gao Xiang Signed-off-by: Chao Yu --- Documentation/filesystems/f2fs.rst | 6 ++ fs/f2fs/f2fs.h | 1 + fs/f2fs/gc.c | 26 ++ fs/f2fs/gc.h | 6 ++ fs/f2fs/segment.c | 15 +-- fs/f2fs/super.c| 19 +-- 6 files changed, 65 insertions(+), 8 deletions(-) diff --git a/Documentation/filesystems/f2fs.rst b/Documentation/filesystems/f2fs.rst index 35ed01a5fbc9..63c0c49b726d 100644 --- a/Documentation/filesystems/f2fs.rst +++ b/Documentation/filesystems/f2fs.rst @@ -110,6 +110,12 @@ background_gc=%sTurn on/off cleaning operations, namely garbage on synchronous garbage collection running in background. Default value for this option is on. So garbage collection is on by default. +gc_mergeWhen background_gc is on, this option can be enabled to +let background GC thread to handle foreground GC requests, +it can eliminate the sluggish issue caused by slow foreground +GC operation when GC is triggered from a process with limited +I/O and CPU resources. +nogc_merge Disable GC merge feature. disable_roll_forwardDisable the roll-forward recovery routine norecovery Disable the roll-forward recovery routine, mounted read- only (i.e., -o ro,disable_roll_forward) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index fe380bcf8d4d..87d734f5589d 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -97,6 +97,7 @@ extern const char *f2fs_fault_name[FAULT_MAX]; #define F2FS_MOUNT_NORECOVERY 0x0400 #define F2FS_MOUNT_ATGC0x0800 #define F2FS_MOUNT_MERGE_CHECKPOINT0x1000 +#defineF2FS_MOUNT_GC_MERGE 0x2000 #define F2FS_OPTION(sbi) ((sbi)->mount_opt) #define clear_opt(sbi, option) (F2FS_OPTION(sbi).opt &= ~F2FS_MOUNT_##option) diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index a2ca483f9855..5c48825fd12d 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -31,19 +31,24 @@ static int gc_thread_func(void *data) struct f2fs_sb_info *sbi = data; struct f2fs_gc_kthread *gc_th = sbi->gc_thread; wait_queue_head_t *wq = &sbi->gc_thread->gc_wait_queue_head; + wait_queue_head_t *fggc_wq = &sbi->gc_thread->fggc_wq; unsigned int wait_ms; wait_ms = gc_th->min_sleep_time; set_freezable(); do { - bool sync_mode; + bool sync_mode, foreground = false; wait_event_interruptible_timeout(*wq, kthread_should_stop() || freezing(current) || + waitqueue_active(fggc_wq) || gc_th->gc_wake, msecs_to_jiffies(wait_ms)); + if (test_opt(sbi, GC_MERGE) && waitqueue_active(fggc_wq)) + foreground = true; + /* give it a try one time */ if (gc_th->gc_wake) gc_th->gc_wake = 0; @@ -90,7 +95,10 @@ static int gc_thread_func(void *data) goto do_gc; } - if (!down_write_trylock(&sbi->gc_lock)) { + if (foreground) { + down_write(&sbi->gc_lock); + goto do_gc; + } else if (!down_write_trylock(&sbi->gc_lock)) { stat_other_skip_bggc_count(sbi); goto next; } @@ -107,14 +115,22 @@ static int gc_thread_func(void *data) else increase_sleep_time(gc_th, &wait_ms); do_gc: - stat_inc_bggc_count(sbi->stat_info); + if (!foreground) + stat_inc_bggc_count(sbi->stat_info); sync_mode = F2FS_OPTION(sbi).bggc_mode == BGGC_MODE_SYNC; + /* foreground GC was been triggered via f2fs_balance_fs() */ + if (foreground) + sync_mode = false; + /* if return value is not zero, no victim was selected */ - if (f2fs_gc(sbi, sync_mode, true, false, NULL_SEGNO)) + if (f2fs_gc(sbi, sync_mode, !foreground,
Re: [PATCH 4/4] erofs: add on-disk compression configurations
On 2021/3/27 11:49, Gao Xiang wrote: From: Gao Xiang Add a bitmap for available compression algorithms and a variable-sized on-disk table for compression options in preparation for upcoming big pcluster and LZMA algorithm, which follows the end of super block. To parse the compression options, the bitmap is scanned one by one. For each available algorithm, there is data followed by 2-byte `length' correspondingly (it's enough for most cases, or entire fs blocks should be used.) With such available algorithm bitmap, kernel itself can also refuse to mount such filesystem if any unsupported compression algorithm exists. Signed-off-by: Gao Xiang --- fs/erofs/decompressor.c | 2 +- fs/erofs/erofs_fs.h | 16 +++-- fs/erofs/internal.h | 5 +- fs/erofs/super.c| 145 +++- 4 files changed, 161 insertions(+), 7 deletions(-) diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c index 97538ff24a19..27aa6a99b371 100644 --- a/fs/erofs/decompressor.c +++ b/fs/erofs/decompressor.c @@ -41,7 +41,7 @@ int z_erofs_load_lz4_config(struct super_block *sb, } distance = le16_to_cpu(lz4->max_distance); } else { - distance = le16_to_cpu(dsb->lz4_max_distance); + distance = le16_to_cpu(dsb->u1.lz4_max_distance); } EROFS_SB(sb)->lz4.max_distance_pages = distance ? diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h index 1322ae63944b..ef3f8a99aa5f 100644 --- a/fs/erofs/erofs_fs.h +++ b/fs/erofs/erofs_fs.h @@ -18,15 +18,18 @@ * be incompatible with this kernel version. */ #define EROFS_FEATURE_INCOMPAT_LZ4_0PADDING 0x0001 -#define EROFS_ALL_FEATURE_INCOMPAT EROFS_FEATURE_INCOMPAT_LZ4_0PADDING +#define EROFS_FEATURE_INCOMPAT_COMPR_CFGS 0x0002 +#define EROFS_ALL_FEATURE_INCOMPAT \ + (EROFS_FEATURE_INCOMPAT_LZ4_0PADDING | \ +EROFS_FEATURE_INCOMPAT_COMPR_CFGS) -/* 128-byte erofs on-disk super block */ +/* erofs on-disk super block (currently 128 bytes) */ struct erofs_super_block { __le32 magic; /* file system magic number */ __le32 checksum;/* crc32c(super_block) */ __le32 feature_compat; __u8 blkszbits; /* support block_size == PAGE_SIZE only */ - __u8 reserved; + __u8 sb_extslots; /* superblock size = 128 + sb_extslots * 16 */ __le16 root_nid; /* nid of root directory */ __le64 inos;/* total valid ino # (== f_files - f_favail) */ @@ -39,7 +42,11 @@ struct erofs_super_block { __u8 uuid[16]; /* 128-bit uuid for volume */ __u8 volume_name[16]; /* volume name */ __le32 feature_incompat; - __le16 lz4_max_distance; + union { + /* bitmap for available compression algorithms */ + __le16 available_compr_algs; + __le16 lz4_max_distance; + } __packed u1; __u8 reserved2[42]; }; @@ -195,6 +202,7 @@ enum { Z_EROFS_COMPRESSION_LZ4 = 0, Z_EROFS_COMPRESSION_MAX }; +#define Z_EROFS_ALL_COMPR_ALGS (1 << (Z_EROFS_COMPRESSION_MAX - 1)) /* 14 bytes (+ length field = 16 bytes) */ struct z_erofs_lz4_cfgs { diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 46b977f348eb..f3fa895d809f 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -75,6 +75,7 @@ struct erofs_sb_info { struct xarray managed_pslots; unsigned int shrinker_run_no; + u16 available_compr_algs; /* pseudo inode to manage cached pages */ struct inode *managed_cache; @@ -90,6 +91,7 @@ struct erofs_sb_info { /* inode slot unit size in bit shift */ unsigned char islotbits; + u32 sb_size; /* total superblock size */ u32 build_time_nsec; u64 build_time; @@ -233,6 +235,7 @@ static inline bool erofs_sb_has_##name(struct erofs_sb_info *sbi) \ } EROFS_FEATURE_FUNCS(lz4_0padding, incompat, INCOMPAT_LZ4_0PADDING) +EROFS_FEATURE_FUNCS(compr_cfgs, incompat, INCOMPAT_COMPR_CFGS) EROFS_FEATURE_FUNCS(sb_chksum, compat, COMPAT_SB_CHKSUM) /* atomic flag definitions */ @@ -454,7 +457,7 @@ static inline int z_erofs_load_lz4_config(struct super_block *sb, struct erofs_super_block *dsb, struct z_erofs_lz4_cfgs *lz4, int len) { - if (lz4 || dsb->lz4_max_distance) { + if (lz4 || dsb->u1.lz4_max_distance) { erofs_err(sb, "lz4 algorithm isn't enabled"); return -EINVAL; } diff --git a/fs/erofs/super.c b/fs/erofs/super.c index 1ca8da3f2125..c5e3039f51bf 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -122,6 +122,138 @@ static bool check_layout_compatibility(struct super_block *sb, return true; } +#ifdef CONFIG_EROFS_FS_ZIP +/* read variable-sized metadata, offset will be aligned by 4-byte */ +static
Re: [PATCH 3/4] erofs: introduce on-disk lz4 fs configurations
On 2021/3/27 11:49, Gao Xiang wrote: From: Gao Xiang Introduce z_erofs_lz4_cfgs to store all lz4 configurations. Currently it's only max_distance, but will be used for new features later. Signed-off-by: Gao Xiang Reviewed-by: Chao Yu Thanks,
Re: [PATCH 2/4] erofs: support adjust lz4 history window size
On 2021/3/27 11:49, Gao Xiang wrote: From: Huang Jianan lz4 uses LZ4_DISTANCE_MAX to record history preservation. When using rolling decompression, a block with a higher compression ratio will cause a larger memory allocation (up to 64k). It may cause a large resource burden in extreme cases on devices with small memory and a large number of concurrent IOs. So appropriately reducing this value can improve performance. Decreasing this value will reduce the compression ratio (except when input_size Signed-off-by: Guo Weichao [ Gao Xiang: introduce struct erofs_sb_lz4_info for configurations. ] Signed-off-by: Gao Xiang --- fs/erofs/decompressor.c | 21 + fs/erofs/erofs_fs.h | 3 ++- fs/erofs/internal.h | 19 +++ fs/erofs/super.c| 4 +++- 4 files changed, 41 insertions(+), 6 deletions(-) diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c index 80e8871aef71..93411e9df9b6 100644 --- a/fs/erofs/decompressor.c +++ b/fs/erofs/decompressor.c @@ -28,6 +28,17 @@ struct z_erofs_decompressor { char *name; }; +int z_erofs_load_lz4_config(struct super_block *sb, + struct erofs_super_block *dsb) +{ + u16 distance = le16_to_cpu(dsb->lz4_max_distance); + + EROFS_SB(sb)->lz4.max_distance_pages = distance ? + DIV_ROUND_UP(distance, PAGE_SIZE) + 1 : + LZ4_MAX_DISTANCE_PAGES; + return 0; +} + static int z_erofs_lz4_prepare_destpages(struct z_erofs_decompress_req *rq, struct list_head *pagepool) { @@ -36,6 +47,8 @@ static int z_erofs_lz4_prepare_destpages(struct z_erofs_decompress_req *rq, struct page *availables[LZ4_MAX_DISTANCE_PAGES] = { NULL }; unsigned long bounced[DIV_ROUND_UP(LZ4_MAX_DISTANCE_PAGES, BITS_PER_LONG)] = { 0 }; + unsigned int lz4_max_distance_pages = + EROFS_SB(rq->sb)->lz4.max_distance_pages; void *kaddr = NULL; unsigned int i, j, top; @@ -44,14 +57,14 @@ static int z_erofs_lz4_prepare_destpages(struct z_erofs_decompress_req *rq, struct page *const page = rq->out[i]; struct page *victim; - if (j >= LZ4_MAX_DISTANCE_PAGES) + if (j >= lz4_max_distance_pages) j = 0; /* 'valid' bounced can only be tested after a complete round */ if (test_bit(j, bounced)) { - DBG_BUGON(i < LZ4_MAX_DISTANCE_PAGES); - DBG_BUGON(top >= LZ4_MAX_DISTANCE_PAGES); - availables[top++] = rq->out[i - LZ4_MAX_DISTANCE_PAGES]; + DBG_BUGON(i < lz4_max_distance_pages); + DBG_BUGON(top >= lz4_max_distance_pages); + availables[top++] = rq->out[i - lz4_max_distance_pages]; } if (page) { diff --git a/fs/erofs/erofs_fs.h b/fs/erofs/erofs_fs.h index 9ad1615f4474..b27d0e4e4ab5 100644 --- a/fs/erofs/erofs_fs.h +++ b/fs/erofs/erofs_fs.h @@ -39,7 +39,8 @@ struct erofs_super_block { __u8 uuid[16]; /* 128-bit uuid for volume */ __u8 volume_name[16]; /* volume name */ __le32 feature_incompat; - __u8 reserved2[44]; + __le16 lz4_max_distance; It missed to add comments, otherwise it looks good to me. Reviewed-by: Chao Yu Thanks, + __u8 reserved2[42]; }; /* diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index d29fc0c56032..1de60992c3dd 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -59,6 +59,12 @@ struct erofs_fs_context { unsigned int mount_opt; }; +/* all filesystem-wide lz4 configurations */ +struct erofs_sb_lz4_info { + /* # of pages needed for EROFS lz4 rolling decompression */ + u16 max_distance_pages; +}; + struct erofs_sb_info { #ifdef CONFIG_EROFS_FS_ZIP /* list for all registered superblocks, mainly for shrinker */ @@ -72,6 +78,8 @@ struct erofs_sb_info { /* pseudo inode to manage cached pages */ struct inode *managed_cache; + + struct erofs_sb_lz4_info lz4; #endif/* CONFIG_EROFS_FS_ZIP */ u32 blocks; u32 meta_blkaddr; @@ -432,6 +440,8 @@ int erofs_try_to_free_all_cached_pages(struct erofs_sb_info *sbi, struct erofs_workgroup *egrp); int erofs_try_to_free_cached_page(struct address_space *mapping, struct page *page); +int z_erofs_load_lz4_config(struct super_block *sb, + struct erofs_super_block *dsb); #else static inline void erofs_shrinker_register(struct super_block *sb) {} static inline void erofs_shrinker_unregister(struct super_block *sb) {} @@ -439,6 +449,15 @@ static inline int erofs_init_
Re: [PATCH 1/4] erofs: introduce erofs_sb_has_xxx() helpers
On 2021/3/27 11:49, Gao Xiang wrote: From: Gao Xiang Introduce erofs_sb_has_xxx() to make long checks short, especially for later big pcluster & LZMA features. Signed-off-by: Gao Xiang Reviewed-by: Chao Yu Thanks,
Re: [f2fs-dev] [PATCH] Revert "f2fs: give a warning only for readonly partition"
On 2021/3/27 1:30, Jaegeuk Kim wrote: On 03/26, Chao Yu wrote: On 2021/3/26 9:19, Jaegeuk Kim wrote: On 03/26, Chao Yu wrote: On 2021/3/25 9:59, Chao Yu wrote: On 2021/3/25 6:44, Jaegeuk Kim wrote: On 03/24, Chao Yu wrote: On 2021/3/24 12:22, Jaegeuk Kim wrote: On 03/24, Chao Yu wrote: On 2021/3/24 2:39, Jaegeuk Kim wrote: On 03/23, Chao Yu wrote: This reverts commit 938a184265d75ea474f1c6fe1da96a5196163789. Because that commit fails generic/050 testcase which expect failure during mount a recoverable readonly partition. I think we need to change generic/050, since f2fs can recover this partition, Well, not sure we can change that testcase, since it restricts all generic filesystems behavior. At least, ext4's behavior makes sense to me: journal_dev_ro = bdev_read_only(journal->j_dev); really_read_only = bdev_read_only(sb->s_bdev) | journal_dev_ro; if (journal_dev_ro && !sb_rdonly(sb)) { ext4_msg(sb, KERN_ERR, "journal device read-only, try mounting with '-o ro'"); err = -EROFS; goto err_out; } if (ext4_has_feature_journal_needs_recovery(sb)) { if (sb_rdonly(sb)) { ext4_msg(sb, KERN_INFO, "INFO: recovery " "required on readonly filesystem"); if (really_read_only) { ext4_msg(sb, KERN_ERR, "write access " "unavailable, cannot proceed " "(try mounting with noload)"); err = -EROFS; goto err_out; } ext4_msg(sb, KERN_INFO, "write access will " "be enabled during recovery"); } } even though using it as readonly. And, valid checkpoint can allow for user to read all the data without problem. if (f2fs_hw_is_readonly(sbi)) { Since device is readonly now, all write to the device will fail, checkpoint can not persist recovered data, after page cache is expired, user can see stale data. My point is, after mount with ro, there'll be no data write which preserves the current status. So, in the next time, we can recover fsync'ed data later, if user succeeds to mount as rw. Another point is, with the current checkpoint, we should not have any corrupted metadata. So, why not giving a chance to show what data remained to user? I think this can be doable only with CoW filesystems. I guess we're talking about the different things... Let me declare two different readonly status: 1. filesystem readonly: file system is mount with ro mount option, and app from userspace can not modify any thing of filesystem, but filesystem itself can modify data on device since device may be writable. 2. device readonly: device is set to readonly status via 'blockdev --setro' command, and then filesystem should never issue any write IO to the device. So, what I mean is, *when device is readonly*, rather than f2fs mountpoint is readonly (f2fs_hw_is_readonly() returns true as below code, instead of f2fs_readonly() returns true), in this condition, we should not issue any write IO to device anyway, because, AFAIK, write IO will fail due to bio_check_ro() check. In that case, mount(2) will try readonly, no? Yes, if device is readonly, mount (2) can not mount/remount device to rw mountpoint. Any other concern about this patch? Indeed we're talking about different things. :) This case is mount(ro) with device(ro) having some data to recover. My point is why not giving a chance to mount(ro) to show the current data covered by a valid checkpoint. This doesn't change anything in the disk, Got your idea. IMO, it has potential issue in above condition: Since device is readonly now, all write to the device will fail, checkpoint can not persist recovered data, after page cache is expired, user can see stale data. e.g. Recovery writes one inode and then triggers a checkpoint, all writes fail I'm confused. Currently we don't trigger the roll-forward recovery. Oh, my miss, sorry. :-P My point is in this condition we can return error and try to notice user to mount with disable_roll_forward or norecovery option, then at least user can know he should not expect last fsynced data in newly mounted image. Or we can use f2fs_recover_fsync_data() to check whether there is fsynced data, if there is no such data, then let mount() succeed. Thanks, due to device is readonly, once inode cache is reclaimed by vm, user will see old inode when reloading it, or even see corrupted fs if partial meta inode's cache is expired. Thoughts? Thanks, and in the
[PATCH] f2fs: delete empty compress.h
From: Chao Yu Commit 75e91c888989 ("f2fs: compress: fix compression chksum") wrongly introduced empty compress.h, delete it. Signed-off-by: Chao Yu --- fs/f2fs/compress.h | 0 1 file changed, 0 insertions(+), 0 deletions(-) delete mode 100644 fs/f2fs/compress.h diff --git a/fs/f2fs/compress.h b/fs/f2fs/compress.h deleted file mode 100644 index e69de29bb2d1.. -- 2.22.1
Re: [PATCH v2] f2fs: fix to avoid touching checkpointed data in get_victim()
On 2021/3/25 7:49, Jaegeuk Kim wrote: On 03/24, Chao Yu wrote: In CP disabling mode, there are two issues when using LFS or SSR | AT_SSR mode to select victim: 1. LFS is set to find source section during GC, the victim should have no checkpointed data, since after GC, section could not be set free for reuse. Previously, we only check valid chpt blocks in current segment rather than section, fix it. 2. SSR | AT_SSR are set to find target segment for writes which can be fully filled by checkpointed and newly written blocks, we should never select such segment, otherwise it can cause panic or data corruption during allocation, potential case is described as below: a) target segment has 128 ckpt valid blocks b) GC migrates 'n' (n < 512) valid blocks to other segment (segment is still in dirty list) I missed to change 128 to n, so comments should be updated as below: a) target segment has 'n' (n < 512) ckpt valid blocks b) GC migrates 'n' valid blocks to other segment (segment is still in dirty list) Thanks, c) GC migrates '512 - n' blocks to target segment (segment has 'n' cp_vblocks and '512 - n' vblocks) d) If GC selects target segment via {AT,}SSR allocator, however there is no free space in targe segment.
[PATCH] f2fs: fix to cover __allocate_new_section() with curseg_lock
In order to avoid race with f2fs_do_replace_block(). Fixes: f5a53edcf01e ("f2fs: support aligned pinned file") Signed-off-by: Chao Yu --- fs/f2fs/segment.c | 4 1 file changed, 4 insertions(+) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index e533192545b2..f1b0e832e7c2 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -2957,19 +2957,23 @@ static void __allocate_new_section(struct f2fs_sb_info *sbi, int type) void f2fs_allocate_new_section(struct f2fs_sb_info *sbi, int type) { + down_read(&SM_I(sbi)->curseg_lock); down_write(&SIT_I(sbi)->sentry_lock); __allocate_new_section(sbi, type); up_write(&SIT_I(sbi)->sentry_lock); + up_read(&SM_I(sbi)->curseg_lock); } void f2fs_allocate_new_segments(struct f2fs_sb_info *sbi) { int i; + down_read(&SM_I(sbi)->curseg_lock); down_write(&SIT_I(sbi)->sentry_lock); for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_DATA; i++) __allocate_new_segment(sbi, i, false); up_write(&SIT_I(sbi)->sentry_lock); + up_read(&SM_I(sbi)->curseg_lock); } static const struct segment_allocation default_salloc_ops = { -- 2.29.2
Re: [f2fs-dev] [PATCH] Revert "f2fs: give a warning only for readonly partition"
On 2021/3/26 9:19, Jaegeuk Kim wrote: On 03/26, Chao Yu wrote: On 2021/3/25 9:59, Chao Yu wrote: On 2021/3/25 6:44, Jaegeuk Kim wrote: On 03/24, Chao Yu wrote: On 2021/3/24 12:22, Jaegeuk Kim wrote: On 03/24, Chao Yu wrote: On 2021/3/24 2:39, Jaegeuk Kim wrote: On 03/23, Chao Yu wrote: This reverts commit 938a184265d75ea474f1c6fe1da96a5196163789. Because that commit fails generic/050 testcase which expect failure during mount a recoverable readonly partition. I think we need to change generic/050, since f2fs can recover this partition, Well, not sure we can change that testcase, since it restricts all generic filesystems behavior. At least, ext4's behavior makes sense to me: journal_dev_ro = bdev_read_only(journal->j_dev); really_read_only = bdev_read_only(sb->s_bdev) | journal_dev_ro; if (journal_dev_ro && !sb_rdonly(sb)) { ext4_msg(sb, KERN_ERR, "journal device read-only, try mounting with '-o ro'"); err = -EROFS; goto err_out; } if (ext4_has_feature_journal_needs_recovery(sb)) { if (sb_rdonly(sb)) { ext4_msg(sb, KERN_INFO, "INFO: recovery " "required on readonly filesystem"); if (really_read_only) { ext4_msg(sb, KERN_ERR, "write access " "unavailable, cannot proceed " "(try mounting with noload)"); err = -EROFS; goto err_out; } ext4_msg(sb, KERN_INFO, "write access will " "be enabled during recovery"); } } even though using it as readonly. And, valid checkpoint can allow for user to read all the data without problem. if (f2fs_hw_is_readonly(sbi)) { Since device is readonly now, all write to the device will fail, checkpoint can not persist recovered data, after page cache is expired, user can see stale data. My point is, after mount with ro, there'll be no data write which preserves the current status. So, in the next time, we can recover fsync'ed data later, if user succeeds to mount as rw. Another point is, with the current checkpoint, we should not have any corrupted metadata. So, why not giving a chance to show what data remained to user? I think this can be doable only with CoW filesystems. I guess we're talking about the different things... Let me declare two different readonly status: 1. filesystem readonly: file system is mount with ro mount option, and app from userspace can not modify any thing of filesystem, but filesystem itself can modify data on device since device may be writable. 2. device readonly: device is set to readonly status via 'blockdev --setro' command, and then filesystem should never issue any write IO to the device. So, what I mean is, *when device is readonly*, rather than f2fs mountpoint is readonly (f2fs_hw_is_readonly() returns true as below code, instead of f2fs_readonly() returns true), in this condition, we should not issue any write IO to device anyway, because, AFAIK, write IO will fail due to bio_check_ro() check. In that case, mount(2) will try readonly, no? Yes, if device is readonly, mount (2) can not mount/remount device to rw mountpoint. Any other concern about this patch? Indeed we're talking about different things. :) This case is mount(ro) with device(ro) having some data to recover. My point is why not giving a chance to mount(ro) to show the current data covered by a valid checkpoint. This doesn't change anything in the disk, Got your idea. IMO, it has potential issue in above condition: >>>>>>> Since device is readonly now, all write to the device will fail, checkpoint can >>>>>>> not persist recovered data, after page cache is expired, user can see stale data. e.g. Recovery writes one inode and then triggers a checkpoint, all writes fail due to device is readonly, once inode cache is reclaimed by vm, user will see old inode when reloading it, or even see corrupted fs if partial meta inode's cache is expired. Thoughts? Thanks, and in the next time, it allows mount(rw|ro) with device(rw) to recover the data seamlessly. Thanks, Thanks, # blockdev --setro /dev/vdb # mount -t f2fs /dev/vdb /mnt/test/ mount: /mnt/test: WARNING: source write-protected, mounted read-only. if (f2fs_hw_is_readonly(sbi)) { - if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) { - err = -EROFS; + if (!is_
Re: [f2fs-dev] [PATCH -next] f2fs: fix a typo in inode.c
On 2021/3/25 14:38, Ruiqi Gong wrote: Do a trivial typo fix. s/runing/running Reported-by: Hulk Robot Signed-off-by: Ruiqi Gong Reviewed-by: Chao Yu Thanks,
Re: [f2fs-dev] [PATCH] Revert "f2fs: give a warning only for readonly partition"
On 2021/3/25 9:59, Chao Yu wrote: On 2021/3/25 6:44, Jaegeuk Kim wrote: On 03/24, Chao Yu wrote: On 2021/3/24 12:22, Jaegeuk Kim wrote: On 03/24, Chao Yu wrote: On 2021/3/24 2:39, Jaegeuk Kim wrote: On 03/23, Chao Yu wrote: This reverts commit 938a184265d75ea474f1c6fe1da96a5196163789. Because that commit fails generic/050 testcase which expect failure during mount a recoverable readonly partition. I think we need to change generic/050, since f2fs can recover this partition, Well, not sure we can change that testcase, since it restricts all generic filesystems behavior. At least, ext4's behavior makes sense to me: journal_dev_ro = bdev_read_only(journal->j_dev); really_read_only = bdev_read_only(sb->s_bdev) | journal_dev_ro; if (journal_dev_ro && !sb_rdonly(sb)) { ext4_msg(sb, KERN_ERR, "journal device read-only, try mounting with '-o ro'"); err = -EROFS; goto err_out; } if (ext4_has_feature_journal_needs_recovery(sb)) { if (sb_rdonly(sb)) { ext4_msg(sb, KERN_INFO, "INFO: recovery " "required on readonly filesystem"); if (really_read_only) { ext4_msg(sb, KERN_ERR, "write access " "unavailable, cannot proceed " "(try mounting with noload)"); err = -EROFS; goto err_out; } ext4_msg(sb, KERN_INFO, "write access will " "be enabled during recovery"); } } even though using it as readonly. And, valid checkpoint can allow for user to read all the data without problem. if (f2fs_hw_is_readonly(sbi)) { Since device is readonly now, all write to the device will fail, checkpoint can not persist recovered data, after page cache is expired, user can see stale data. My point is, after mount with ro, there'll be no data write which preserves the current status. So, in the next time, we can recover fsync'ed data later, if user succeeds to mount as rw. Another point is, with the current checkpoint, we should not have any corrupted metadata. So, why not giving a chance to show what data remained to user? I think this can be doable only with CoW filesystems. I guess we're talking about the different things... Let me declare two different readonly status: 1. filesystem readonly: file system is mount with ro mount option, and app from userspace can not modify any thing of filesystem, but filesystem itself can modify data on device since device may be writable. 2. device readonly: device is set to readonly status via 'blockdev --setro' command, and then filesystem should never issue any write IO to the device. So, what I mean is, *when device is readonly*, rather than f2fs mountpoint is readonly (f2fs_hw_is_readonly() returns true as below code, instead of f2fs_readonly() returns true), in this condition, we should not issue any write IO to device anyway, because, AFAIK, write IO will fail due to bio_check_ro() check. In that case, mount(2) will try readonly, no? Yes, if device is readonly, mount (2) can not mount/remount device to rw mountpoint. Any other concern about this patch? Thanks, Thanks, # blockdev --setro /dev/vdb # mount -t f2fs /dev/vdb /mnt/test/ mount: /mnt/test: WARNING: source write-protected, mounted read-only. if (f2fs_hw_is_readonly(sbi)) { - if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) { - err = -EROFS; + if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) f2fs_err(sbi, "Need to recover fsync data, but write access unavailable"); - goto free_meta; - } - f2fs_info(sbi, "write access unavailable, skipping recovery"); + else + f2fs_info(sbi, "write access unavailable, skipping recovery"); goto reset_checkpoint; } For the case of filesystem is readonly and device is writable, it's fine to do recovery in order to let user to see fsynced data. Thanks, Am I missing something? Thanks, Fixes: 938a184265d7 ("f2fs: give a warning only for readonly partition") Signed-off-by: Chao Yu --- fs/f2fs/super.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index b48281642e98..2b78ee11f093 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super
Re: [PATCH] Revert "f2fs: give a warning only for readonly partition"
On 2021/3/25 6:44, Jaegeuk Kim wrote: On 03/24, Chao Yu wrote: On 2021/3/24 12:22, Jaegeuk Kim wrote: On 03/24, Chao Yu wrote: On 2021/3/24 2:39, Jaegeuk Kim wrote: On 03/23, Chao Yu wrote: This reverts commit 938a184265d75ea474f1c6fe1da96a5196163789. Because that commit fails generic/050 testcase which expect failure during mount a recoverable readonly partition. I think we need to change generic/050, since f2fs can recover this partition, Well, not sure we can change that testcase, since it restricts all generic filesystems behavior. At least, ext4's behavior makes sense to me: journal_dev_ro = bdev_read_only(journal->j_dev); really_read_only = bdev_read_only(sb->s_bdev) | journal_dev_ro; if (journal_dev_ro && !sb_rdonly(sb)) { ext4_msg(sb, KERN_ERR, "journal device read-only, try mounting with '-o ro'"); err = -EROFS; goto err_out; } if (ext4_has_feature_journal_needs_recovery(sb)) { if (sb_rdonly(sb)) { ext4_msg(sb, KERN_INFO, "INFO: recovery " "required on readonly filesystem"); if (really_read_only) { ext4_msg(sb, KERN_ERR, "write access " "unavailable, cannot proceed " "(try mounting with noload)"); err = -EROFS; goto err_out; } ext4_msg(sb, KERN_INFO, "write access will " "be enabled during recovery"); } } even though using it as readonly. And, valid checkpoint can allow for user to read all the data without problem. if (f2fs_hw_is_readonly(sbi)) { Since device is readonly now, all write to the device will fail, checkpoint can not persist recovered data, after page cache is expired, user can see stale data. My point is, after mount with ro, there'll be no data write which preserves the current status. So, in the next time, we can recover fsync'ed data later, if user succeeds to mount as rw. Another point is, with the current checkpoint, we should not have any corrupted metadata. So, why not giving a chance to show what data remained to user? I think this can be doable only with CoW filesystems. I guess we're talking about the different things... Let me declare two different readonly status: 1. filesystem readonly: file system is mount with ro mount option, and app from userspace can not modify any thing of filesystem, but filesystem itself can modify data on device since device may be writable. 2. device readonly: device is set to readonly status via 'blockdev --setro' command, and then filesystem should never issue any write IO to the device. So, what I mean is, *when device is readonly*, rather than f2fs mountpoint is readonly (f2fs_hw_is_readonly() returns true as below code, instead of f2fs_readonly() returns true), in this condition, we should not issue any write IO to device anyway, because, AFAIK, write IO will fail due to bio_check_ro() check. In that case, mount(2) will try readonly, no? Yes, if device is readonly, mount (2) can not mount/remount device to rw mountpoint. Thanks, # blockdev --setro /dev/vdb # mount -t f2fs /dev/vdb /mnt/test/ mount: /mnt/test: WARNING: source write-protected, mounted read-only. if (f2fs_hw_is_readonly(sbi)) { - if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) { - err = -EROFS; + if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) f2fs_err(sbi, "Need to recover fsync data, but write access unavailable"); - goto free_meta; - } - f2fs_info(sbi, "write access unavailable, skipping recovery"); + else + f2fs_info(sbi, "write access unavailable, skipping recovery"); goto reset_checkpoint; } For the case of filesystem is readonly and device is writable, it's fine to do recovery in order to let user to see fsynced data. Thanks, Am I missing something? Thanks, Fixes: 938a184265d7 ("f2fs: give a warning only for readonly partition") Signed-off-by: Chao Yu --- fs/f2fs/super.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index b48281642e98..2b78ee11f093 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -3952,10 +3952,12 @@ static int f2fs_fill_super(struct supe
Re: [PATCH v2] f2fs: fix to avoid touching checkpointed data in get_victim()
On 2021/3/25 7:49, Jaegeuk Kim wrote: On 03/24, Chao Yu wrote: In CP disabling mode, there are two issues when using LFS or SSR | AT_SSR mode to select victim: 1. LFS is set to find source section during GC, the victim should have no checkpointed data, since after GC, section could not be set free for reuse. Previously, we only check valid chpt blocks in current segment rather than section, fix it. 2. SSR | AT_SSR are set to find target segment for writes which can be fully filled by checkpointed and newly written blocks, we should never select such segment, otherwise it can cause panic or data corruption during allocation, potential case is described as below: a) target segment has 128 ckpt valid blocks b) GC migrates 'n' (n < 512) valid blocks to other segment (segment is still in dirty list) c) GC migrates '512 - n' blocks to target segment (segment has 'n' cp_vblocks and '512 - n' vblocks) d) If GC selects target segment via {AT,}SSR allocator, however there is no free space in targe segment. Fixes: 4354994f097d ("f2fs: checkpoint disabling") Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection") Signed-off-by: Chao Yu --- v2: - fix to check checkpointed data in section rather than segment for LFS mode. - update commit title and message. fs/f2fs/f2fs.h| 1 + fs/f2fs/gc.c | 28 fs/f2fs/segment.c | 39 --- fs/f2fs/segment.h | 14 +- 4 files changed, 58 insertions(+), 24 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index eb154d9cb063..29e634d08a27 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -3387,6 +3387,7 @@ block_t f2fs_get_unusable_blocks(struct f2fs_sb_info *sbi); int f2fs_disable_cp_again(struct f2fs_sb_info *sbi, block_t unusable); void f2fs_release_discard_addrs(struct f2fs_sb_info *sbi); int f2fs_npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra); +bool segment_has_free_slot(struct f2fs_sb_info *sbi, int segno); void f2fs_init_inmem_curseg(struct f2fs_sb_info *sbi); void f2fs_save_inmem_curseg(struct f2fs_sb_info *sbi); void f2fs_restore_inmem_curseg(struct f2fs_sb_info *sbi); diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index d96acc6531f2..4d9616373a4a 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -392,10 +392,6 @@ static void add_victim_entry(struct f2fs_sb_info *sbi, if (p->gc_mode == GC_AT && get_valid_blocks(sbi, segno, true) == 0) return; - - if (p->alloc_mode == AT_SSR && - get_seg_entry(sbi, segno)->ckpt_valid_blocks == 0) - return; } for (i = 0; i < sbi->segs_per_sec; i++) @@ -728,11 +724,27 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi, if (sec_usage_check(sbi, secno)) goto next; + /* Don't touch checkpointed data */ - if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED) && - get_ckpt_valid_blocks(sbi, segno) && - p.alloc_mode == LFS)) - goto next; + if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) { + if (p.alloc_mode == LFS) { + /* +* LFS is set to find source section during GC. +* The victim should have no checkpointed data. +*/ + if (get_ckpt_valid_blocks(sbi, segno, true)) + goto next; + } else { + /* +* SSR | AT_SSR are set to find target segment +* for writes which can be full by checkpointed +* and newly written blocks. +*/ + if (!segment_has_free_slot(sbi, segno)) + goto next; + } + } + if (gc_type == BG_GC && test_bit(secno, dirty_i->victim_secmap)) goto next; diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 6e1a5f5657bf..f6a30856ceda 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -865,7 +865,7 @@ static void locate_dirty_segment(struct f2fs_sb_info *sbi, unsigned int segno) mutex_lock(&dirty_i->seglist_lock); valid_blocks = get_valid_blocks(sbi, segno, false); - ckpt_valid_blocks = get_ckpt_valid_blocks(sbi, segno); + ckpt_valid_blocks = get_ckpt_valid_blocks(sbi, segno, false); if (valid_blocks == 0 && (!is_sbi_flag_set(sbi, SBI_CP_DISA
Re: [PATCH] Revert "f2fs: give a warning only for readonly partition"
On 2021/3/24 12:22, Jaegeuk Kim wrote: On 03/24, Chao Yu wrote: On 2021/3/24 2:39, Jaegeuk Kim wrote: On 03/23, Chao Yu wrote: This reverts commit 938a184265d75ea474f1c6fe1da96a5196163789. Because that commit fails generic/050 testcase which expect failure during mount a recoverable readonly partition. I think we need to change generic/050, since f2fs can recover this partition, Well, not sure we can change that testcase, since it restricts all generic filesystems behavior. At least, ext4's behavior makes sense to me: journal_dev_ro = bdev_read_only(journal->j_dev); really_read_only = bdev_read_only(sb->s_bdev) | journal_dev_ro; if (journal_dev_ro && !sb_rdonly(sb)) { ext4_msg(sb, KERN_ERR, "journal device read-only, try mounting with '-o ro'"); err = -EROFS; goto err_out; } if (ext4_has_feature_journal_needs_recovery(sb)) { if (sb_rdonly(sb)) { ext4_msg(sb, KERN_INFO, "INFO: recovery " "required on readonly filesystem"); if (really_read_only) { ext4_msg(sb, KERN_ERR, "write access " "unavailable, cannot proceed " "(try mounting with noload)"); err = -EROFS; goto err_out; } ext4_msg(sb, KERN_INFO, "write access will " "be enabled during recovery"); } } even though using it as readonly. And, valid checkpoint can allow for user to read all the data without problem. if (f2fs_hw_is_readonly(sbi)) { Since device is readonly now, all write to the device will fail, checkpoint can not persist recovered data, after page cache is expired, user can see stale data. My point is, after mount with ro, there'll be no data write which preserves the current status. So, in the next time, we can recover fsync'ed data later, if user succeeds to mount as rw. Another point is, with the current checkpoint, we should not have any corrupted metadata. So, why not giving a chance to show what data remained to user? I think this can be doable only with CoW filesystems. I guess we're talking about the different things... Let me declare two different readonly status: 1. filesystem readonly: file system is mount with ro mount option, and app from userspace can not modify any thing of filesystem, but filesystem itself can modify data on device since device may be writable. 2. device readonly: device is set to readonly status via 'blockdev --setro' command, and then filesystem should never issue any write IO to the device. So, what I mean is, *when device is readonly*, rather than f2fs mountpoint is readonly (f2fs_hw_is_readonly() returns true as below code, instead of f2fs_readonly() returns true), in this condition, we should not issue any write IO to device anyway, because, AFAIK, write IO will fail due to bio_check_ro() check. if (f2fs_hw_is_readonly(sbi)) { - if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) { - err = -EROFS; + if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) f2fs_err(sbi, "Need to recover fsync data, but write access unavailable"); - goto free_meta; - } - f2fs_info(sbi, "write access unavailable, skipping recovery"); + else + f2fs_info(sbi, "write access unavailable, skipping recovery"); goto reset_checkpoint; } For the case of filesystem is readonly and device is writable, it's fine to do recovery in order to let user to see fsynced data. Thanks, Am I missing something? Thanks, Fixes: 938a184265d7 ("f2fs: give a warning only for readonly partition") Signed-off-by: Chao Yu --- fs/f2fs/super.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index b48281642e98..2b78ee11f093 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -3952,10 +3952,12 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) * previous checkpoint was not done by clean system shutdown. */ if (f2fs_hw_is_readonly(sbi)) { - if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) + if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) { + err = -EROFS;
Re: [PATCH RFC] f2fs: fix to avoid selecting full segment w/ {AT,}SSR allocator
On 2021/3/24 6:59, Jaegeuk Kim wrote: On 03/19, Chao Yu wrote: On 2021/3/19 1:17, Jaegeuk Kim wrote: On 02/20, Chao Yu wrote: In cp disabling mode, there could be a condition - target segment has 128 ckpt valid blocks - GC migrates 128 valid blocks to other segment (segment is still in dirty list) - GC migrates 384 blocks to target segment (segment has 128 cp_vblocks and 384 vblocks) - If GC selects target segment via {AT,}SSR allocator, however there is no free space in targe segment. Fixes: 4354994f097d ("f2fs: checkpoint disabling") Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection") Signed-off-by: Chao Yu --- fs/f2fs/f2fs.h| 1 + fs/f2fs/gc.c | 17 + fs/f2fs/segment.c | 20 3 files changed, 34 insertions(+), 4 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index ed7807103c8e..9c753eff0814 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -3376,6 +3376,7 @@ block_t f2fs_get_unusable_blocks(struct f2fs_sb_info *sbi); int f2fs_disable_cp_again(struct f2fs_sb_info *sbi, block_t unusable); void f2fs_release_discard_addrs(struct f2fs_sb_info *sbi); int f2fs_npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra); +bool segment_has_free_slot(struct f2fs_sb_info *sbi, int segno); void f2fs_init_inmem_curseg(struct f2fs_sb_info *sbi); void f2fs_save_inmem_curseg(struct f2fs_sb_info *sbi); void f2fs_restore_inmem_curseg(struct f2fs_sb_info *sbi); diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 86ba8ed0b8a7..a1d8062cdace 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -392,10 +392,6 @@ static void add_victim_entry(struct f2fs_sb_info *sbi, if (p->gc_mode == GC_AT && get_valid_blocks(sbi, segno, true) == 0) return; - - if (p->alloc_mode == AT_SSR && - get_seg_entry(sbi, segno)->ckpt_valid_blocks == 0) - return; } for (i = 0; i < sbi->segs_per_sec; i++) @@ -736,6 +732,19 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi, if (gc_type == BG_GC && test_bit(secno, dirty_i->victim_secmap)) goto next; + if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) { + /* +* to avoid selecting candidate which has below valid +* block distribution: +* partial blocks are valid and all left ones are valid +* in previous checkpoint. +*/ + if (p.alloc_mode == SSR || p.alloc_mode == AT_SSR) { + if (!segment_has_free_slot(sbi, segno)) + goto next; Do we need to change this to check free_slot instead of get_ckpt_valid_blocks()? Jaegeuk, LFS was assigned only for GC case, in this case we are trying to select source section, rather than target segment for SSR/AT_SSR case, so we don't need to check free_slot. - f2fs_gc - __get_victim - get_victim(sbi, victim, gc_type, NO_CHECK_TYPE, LFS, 0); 732 if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED) && 733 get_ckpt_valid_blocks(sbi, segno) && 734 p.alloc_mode == LFS)) BTW, in LFS mode, GC wants to find source section rather than segment, so we should change to check valid ckpt blocks in every segment of targe section here? Alright. I refactored a bit on this patch with new one. Could you please take a look? https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev&id=00152bd7cabd69b4615ebead823ff23887b0e0f7 I see, newly added comment looks good to me. One more concern is commit title and commit message is out-of-update, I've revised it in v2: https://lore.kernel.org/linux-f2fs-devel/20210324031828.67133-1-yuch...@huawei.com/T/#u Thanks, Thanks, Thanks, + } + } + if (is_atgc) { add_victim_entry(sbi, &p, segno); goto next; diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 2d5a82c4ca15..deaf57e13125 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -2650,6 +2650,26 @@ static void __refresh_next_blkoff(struct f2fs_sb_info *sbi, seg->next_blkoff++; } +bool segment_has_free_slot(struct f2fs_sb_info *sbi, int segno) +{ + struct sit_info *sit = SIT_I(sbi); + struct seg_entry *se = get_seg_entry(sbi, segno); + int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long); + unsigned long *target_map = SIT_I(sbi)->tmp_map; + unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map; + unsigned long *cur_map = (unsigned
[PATCH] f2fs: fix to update last i_size if fallocate partially succeeds
In the case of expanding pinned file, map.m_lblk and map.m_len will update in each round of section allocation, so in error path, last i_size will be calculated with wrong m_lblk and m_len, fix it. Fixes: f5a53edcf01e ("f2fs: support aligned pinned file") Signed-off-by: Chao Yu --- fs/f2fs/file.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index bd5a77091d23..dc79694e512c 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -1619,9 +1619,10 @@ static int expand_inode_data(struct inode *inode, loff_t offset, struct f2fs_map_blocks map = { .m_next_pgofs = NULL, .m_next_extent = NULL, .m_seg_type = NO_CHECK_TYPE, .m_may_create = true }; - pgoff_t pg_end; + pgoff_t pg_start, pg_end; loff_t new_size = i_size_read(inode); loff_t off_end; + block_t expanded = 0; int err; err = inode_newsize_ok(inode, (len + offset)); @@ -1634,11 +1635,12 @@ static int expand_inode_data(struct inode *inode, loff_t offset, f2fs_balance_fs(sbi, true); + pg_start = ((unsigned long long)offset) >> PAGE_SHIFT; pg_end = ((unsigned long long)offset + len) >> PAGE_SHIFT; off_end = (offset + len) & (PAGE_SIZE - 1); - map.m_lblk = ((unsigned long long)offset) >> PAGE_SHIFT; - map.m_len = pg_end - map.m_lblk; + map.m_lblk = pg_start; + map.m_len = pg_end - pg_start; if (off_end) map.m_len++; @@ -1648,7 +1650,6 @@ static int expand_inode_data(struct inode *inode, loff_t offset, if (f2fs_is_pinned_file(inode)) { block_t sec_blks = BLKS_PER_SEC(sbi); block_t sec_len = roundup(map.m_len, sec_blks); - block_t done = 0; map.m_len = sec_blks; next_alloc: @@ -1656,10 +1657,8 @@ static int expand_inode_data(struct inode *inode, loff_t offset, GET_SEC_FROM_SEG(sbi, overprovision_segments(sbi { down_write(&sbi->gc_lock); err = f2fs_gc(sbi, true, false, false, NULL_SEGNO); - if (err && err != -ENODATA && err != -EAGAIN) { - map.m_len = done; + if (err && err != -ENODATA && err != -EAGAIN) goto out_err; - } } down_write(&sbi->pin_sem); @@ -1673,24 +1672,25 @@ static int expand_inode_data(struct inode *inode, loff_t offset, up_write(&sbi->pin_sem); - done += map.m_len; + expanded += map.m_len; sec_len -= map.m_len; map.m_lblk += map.m_len; if (!err && sec_len) goto next_alloc; - map.m_len = done; + map.m_len = expanded; } else { err = f2fs_map_blocks(inode, &map, 1, F2FS_GET_BLOCK_PRE_AIO); + expanded = map.m_len; } out_err: if (err) { pgoff_t last_off; - if (!map.m_len) + if (!expanded) return err; - last_off = map.m_lblk + map.m_len - 1; + last_off = pg_start + expanded - 1; /* update new size to the failed position */ new_size = (last_off == pg_end) ? offset + len : -- 2.29.2
[PATCH v2] f2fs: fix to avoid touching checkpointed data in get_victim()
In CP disabling mode, there are two issues when using LFS or SSR | AT_SSR mode to select victim: 1. LFS is set to find source section during GC, the victim should have no checkpointed data, since after GC, section could not be set free for reuse. Previously, we only check valid chpt blocks in current segment rather than section, fix it. 2. SSR | AT_SSR are set to find target segment for writes which can be fully filled by checkpointed and newly written blocks, we should never select such segment, otherwise it can cause panic or data corruption during allocation, potential case is described as below: a) target segment has 128 ckpt valid blocks b) GC migrates 'n' (n < 512) valid blocks to other segment (segment is still in dirty list) c) GC migrates '512 - n' blocks to target segment (segment has 'n' cp_vblocks and '512 - n' vblocks) d) If GC selects target segment via {AT,}SSR allocator, however there is no free space in targe segment. Fixes: 4354994f097d ("f2fs: checkpoint disabling") Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection") Signed-off-by: Chao Yu --- v2: - fix to check checkpointed data in section rather than segment for LFS mode. - update commit title and message. fs/f2fs/f2fs.h| 1 + fs/f2fs/gc.c | 28 fs/f2fs/segment.c | 39 --- fs/f2fs/segment.h | 14 +- 4 files changed, 58 insertions(+), 24 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index eb154d9cb063..29e634d08a27 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -3387,6 +3387,7 @@ block_t f2fs_get_unusable_blocks(struct f2fs_sb_info *sbi); int f2fs_disable_cp_again(struct f2fs_sb_info *sbi, block_t unusable); void f2fs_release_discard_addrs(struct f2fs_sb_info *sbi); int f2fs_npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra); +bool segment_has_free_slot(struct f2fs_sb_info *sbi, int segno); void f2fs_init_inmem_curseg(struct f2fs_sb_info *sbi); void f2fs_save_inmem_curseg(struct f2fs_sb_info *sbi); void f2fs_restore_inmem_curseg(struct f2fs_sb_info *sbi); diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index d96acc6531f2..4d9616373a4a 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -392,10 +392,6 @@ static void add_victim_entry(struct f2fs_sb_info *sbi, if (p->gc_mode == GC_AT && get_valid_blocks(sbi, segno, true) == 0) return; - - if (p->alloc_mode == AT_SSR && - get_seg_entry(sbi, segno)->ckpt_valid_blocks == 0) - return; } for (i = 0; i < sbi->segs_per_sec; i++) @@ -728,11 +724,27 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi, if (sec_usage_check(sbi, secno)) goto next; + /* Don't touch checkpointed data */ - if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED) && - get_ckpt_valid_blocks(sbi, segno) && - p.alloc_mode == LFS)) - goto next; + if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) { + if (p.alloc_mode == LFS) { + /* +* LFS is set to find source section during GC. +* The victim should have no checkpointed data. +*/ + if (get_ckpt_valid_blocks(sbi, segno, true)) + goto next; + } else { + /* +* SSR | AT_SSR are set to find target segment +* for writes which can be full by checkpointed +* and newly written blocks. +*/ + if (!segment_has_free_slot(sbi, segno)) + goto next; + } + } + if (gc_type == BG_GC && test_bit(secno, dirty_i->victim_secmap)) goto next; diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 6e1a5f5657bf..f6a30856ceda 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -865,7 +865,7 @@ static void locate_dirty_segment(struct f2fs_sb_info *sbi, unsigned int segno) mutex_lock(&dirty_i->seglist_lock); valid_blocks = get_valid_blocks(sbi, segno, false); - ckpt_valid_blocks = get_ckpt_valid_blocks(sbi, segno); + ckpt_valid_blocks = get_ckpt_valid_blocks(sbi, segno, false); if (valid_blocks == 0 && (!is_sbi_flag_set(sbi, SBI_CP_DISABLED) || ckpt_valid_blocks == usable_blocks)) { @@
Re: [PATCH 2/2] f2fs: fix error path of f2fs_remount()
Ping, On 2021/3/17 17:56, Chao Yu wrote: In error path of f2fs_remount(), it missed to restart/stop kernel thread or enable/disable checkpoint, then mount option status may not be consistent with real condition of filesystem, so let's reorder remount flow a bit as below and do recovery correctly in error path: 1) handle gc thread 2) handle ckpt thread 3) handle flush thread 4) handle checkpoint disabling Signed-off-by: Chao Yu --- fs/f2fs/super.c | 47 ++- 1 file changed, 34 insertions(+), 13 deletions(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 6716af216dca..fa60f08c30bb 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -1942,8 +1942,9 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data) struct f2fs_mount_info org_mount_opt; unsigned long old_sb_flags; int err; - bool need_restart_gc = false; - bool need_stop_gc = false; + bool need_restart_gc = false, need_stop_gc = false; + bool need_restart_ckpt = false, need_stop_ckpt = false; + bool need_restart_flush = false, need_stop_flush = false; bool no_extent_cache = !test_opt(sbi, EXTENT_CACHE); bool disable_checkpoint = test_opt(sbi, DISABLE_CHECKPOINT); bool no_io_align = !F2FS_IO_ALIGNED(sbi); @@ -2081,19 +2082,10 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data) clear_sbi_flag(sbi, SBI_IS_CLOSE); } - if (checkpoint_changed) { - if (test_opt(sbi, DISABLE_CHECKPOINT)) { - err = f2fs_disable_checkpoint(sbi); - if (err) - goto restore_gc; - } else { - f2fs_enable_checkpoint(sbi); - } - } - if ((*flags & SB_RDONLY) || test_opt(sbi, DISABLE_CHECKPOINT) || !test_opt(sbi, MERGE_CHECKPOINT)) { f2fs_stop_ckpt_thread(sbi); + need_restart_ckpt = true; } else { err = f2fs_start_ckpt_thread(sbi); if (err) { @@ -2102,6 +2094,7 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data) err); goto restore_gc; } + need_stop_ckpt = true; } /* @@ -2111,11 +2104,24 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data) if ((*flags & SB_RDONLY) || !test_opt(sbi, FLUSH_MERGE)) { clear_opt(sbi, FLUSH_MERGE); f2fs_destroy_flush_cmd_control(sbi, false); + need_restart_flush = true; } else { err = f2fs_create_flush_cmd_control(sbi); if (err) - goto restore_gc; + goto restore_ckpt; + need_stop_flush = true; } + + if (checkpoint_changed) { + if (test_opt(sbi, DISABLE_CHECKPOINT)) { + err = f2fs_disable_checkpoint(sbi); + if (err) + goto restore_flush; + } else { + f2fs_enable_checkpoint(sbi); + } + } + skip: #ifdef CONFIG_QUOTA /* Release old quota file names */ @@ -2130,6 +2136,21 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data) adjust_unusable_cap_perc(sbi); *flags = (*flags & ~SB_LAZYTIME) | (sb->s_flags & SB_LAZYTIME); return 0; +restore_flush: + if (need_restart_flush) { + if (f2fs_create_flush_cmd_control(sbi)) + f2fs_warn(sbi, "background flush thread has stopped"); + } else if (need_stop_flush) { + clear_opt(sbi, FLUSH_MERGE); + f2fs_destroy_flush_cmd_control(sbi, false); + } +restore_ckpt: + if (need_restart_ckpt) { + if (f2fs_start_ckpt_thread(sbi)) + f2fs_warn(sbi, "background ckpt thread has stopped"); + } else if (need_stop_ckpt) { + f2fs_stop_ckpt_thread(sbi); + } restore_gc: if (need_restart_gc) { if (f2fs_start_gc_thread(sbi))
Re: [PATCH] Revert "f2fs: give a warning only for readonly partition"
On 2021/3/24 2:39, Jaegeuk Kim wrote: On 03/23, Chao Yu wrote: This reverts commit 938a184265d75ea474f1c6fe1da96a5196163789. Because that commit fails generic/050 testcase which expect failure during mount a recoverable readonly partition. I think we need to change generic/050, since f2fs can recover this partition, Well, not sure we can change that testcase, since it restricts all generic filesystems behavior. At least, ext4's behavior makes sense to me: journal_dev_ro = bdev_read_only(journal->j_dev); really_read_only = bdev_read_only(sb->s_bdev) | journal_dev_ro; if (journal_dev_ro && !sb_rdonly(sb)) { ext4_msg(sb, KERN_ERR, "journal device read-only, try mounting with '-o ro'"); err = -EROFS; goto err_out; } if (ext4_has_feature_journal_needs_recovery(sb)) { if (sb_rdonly(sb)) { ext4_msg(sb, KERN_INFO, "INFO: recovery " "required on readonly filesystem"); if (really_read_only) { ext4_msg(sb, KERN_ERR, "write access " "unavailable, cannot proceed " "(try mounting with noload)"); err = -EROFS; goto err_out; } ext4_msg(sb, KERN_INFO, "write access will " "be enabled during recovery"); } } even though using it as readonly. And, valid checkpoint can allow for user to read all the data without problem. >>if (f2fs_hw_is_readonly(sbi)) { Since device is readonly now, all write to the device will fail, checkpoint can not persist recovered data, after page cache is expired, user can see stale data. Am I missing something? Thanks, Fixes: 938a184265d7 ("f2fs: give a warning only for readonly partition") Signed-off-by: Chao Yu --- fs/f2fs/super.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index b48281642e98..2b78ee11f093 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -3952,10 +3952,12 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) * previous checkpoint was not done by clean system shutdown. */ if (f2fs_hw_is_readonly(sbi)) { - if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) + if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) { + err = -EROFS; f2fs_err(sbi, "Need to recover fsync data, but write access unavailable"); - else - f2fs_info(sbi, "write access unavailable, skipping recovery"); + goto free_meta; + } + f2fs_info(sbi, "write access unavailable, skipping recovery"); goto reset_checkpoint; } -- 2.29.2 .
Re: [f2fs-dev] [PATCH -next] f2fs: fix wrong comment of nat_tree_lock
On 2021/3/23 19:41, qiulaibin wrote: Do trivial comment fix of nat_tree_lock. Signed-off-by: qiulaibin Reviewed-by: Chao Yu Thanks,
Re: [PATCH] f2fs: fix to align to section for fallocate() on pinned file
On 2021/3/24 2:32, Jaegeuk Kim wrote: On 03/23, Chao Yu wrote: On 2021/3/5 17:56, Chao Yu wrote: Now, fallocate() on a pinned file only allocates blocks which aligns to segment rather than section, so GC may try to migrate pinned file's block, and after several times of failure, pinned file's block could be migrated to other place, however user won't be aware of such condition, and then old obsolete block address may be readed/written incorrectly. To avoid such condition, let's try to allocate pinned file's blocks with section alignment. Signed-off-by: Chao Yu Jaegeuk, Could you please check and apply below diff into original patch? --- fs/f2fs/file.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 236f3f69681a..24fa68fdcaa0 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -1648,13 +1648,13 @@ static int expand_inode_data(struct inode *inode, loff_t offset, return 0; if (f2fs_is_pinned_file(inode)) { - block_t len = (map.m_len >> sbi->log_blocks_per_seg) << - sbi->log_blocks_per_seg; + block_t sec_blks = BLKS_PER_SEC(sbi); + block_t len = rounddown(map.m_len, sec_blks); len is declared above, so let me rephrase this as well. Oh, right. - if (map.m_len % sbi->blocks_per_seg) - len += sbi->blocks_per_seg; + if (map.m_len % sec_blks) + len += sec_blks; is this roundup()? More clean. Could you check this? https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev&id=e1175f02291141bbd924fc578299305fcde35855 Looks good to me. :) Thanks, - map.m_len = sbi->blocks_per_seg; + map.m_len = sec_blks; next_alloc: if (has_not_enough_free_secs(sbi, 0, GET_SEC_FROM_SEG(sbi, overprovision_segments(sbi { -- 2.22.1 .
Re: [PATCH] f2fs: fix to align to section for fallocate() on pinned file
On 2021/3/5 17:56, Chao Yu wrote: Now, fallocate() on a pinned file only allocates blocks which aligns to segment rather than section, so GC may try to migrate pinned file's block, and after several times of failure, pinned file's block could be migrated to other place, however user won't be aware of such condition, and then old obsolete block address may be readed/written incorrectly. To avoid such condition, let's try to allocate pinned file's blocks with section alignment. Signed-off-by: Chao Yu Jaegeuk, Could you please check and apply below diff into original patch? --- fs/f2fs/file.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 236f3f69681a..24fa68fdcaa0 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -1648,13 +1648,13 @@ static int expand_inode_data(struct inode *inode, loff_t offset, return 0; if (f2fs_is_pinned_file(inode)) { - block_t len = (map.m_len >> sbi->log_blocks_per_seg) << - sbi->log_blocks_per_seg; + block_t sec_blks = BLKS_PER_SEC(sbi); + block_t len = rounddown(map.m_len, sec_blks); - if (map.m_len % sbi->blocks_per_seg) - len += sbi->blocks_per_seg; + if (map.m_len % sec_blks) + len += sec_blks; - map.m_len = sbi->blocks_per_seg; + map.m_len = sec_blks; next_alloc: if (has_not_enough_free_secs(sbi, 0, GET_SEC_FROM_SEG(sbi, overprovision_segments(sbi { -- 2.22.1
Re: [LTP] [f2fs] 02eb84b96b: ltp.swapon03.fail
On 2021/3/11 4:49, Jaegeuk Kim wrote: On 03/10, Huang Jianan wrote: Hi Richard, On 2021/3/9 12:01, Matthew Wilcox wrote: On Tue, Mar 09, 2021 at 10:23:35AM +0800, Weichao Guo wrote: Hi Richard, On 2021/3/8 19:53, Richard Palethorpe wrote: Hello, kern :err : [ 187.461914] F2FS-fs (sda1): Swapfile does not align to section commit 02eb84b96bc1b382dd138bf60724edbefe77b025 Author: huangjia...@oppo.com Date: Mon Mar 1 12:58:44 2021 +0800 f2fs: check if swapfile is section-alligned If the swapfile isn't created by pin and fallocate, it can't be guaranteed section-aligned, so it may be selected by f2fs gc. When gc_pin_file_threshold is reached, the address of swapfile may change, but won't be synchronized to swap_extent, so swap will write to wrong address, which will cause data corruption. Signed-off-by: Huang Jianan Signed-off-by: Guo Weichao Reviewed-by: Chao Yu Signed-off-by: Jaegeuk Kim The test uses fallocate to preallocate the swap file and writes zeros to it. I'm not sure what pin refers to? 'pin' refers to pinned file feature in F2FS, the LBA(Logical Block Address) of a file is fixed after pinned. Without this operation before fallocate, the LBA may not align with section(F2FS GC unit), some LBA of the file may be changed by F2FS GC in some extreme cases. For this test case, how about pin the swap file before fallocate for F2FS as following: ioctl(fd, F2FS_IOC_SET_PIN_FILE, true); No special ioctl should be needed. f2fs_swap_activate() should pin the file, just like it converts inline inodes and disables compression. Now f2fs_swap_activate() will pin the file. The problem is that when f2fs_swap_activate() is executed, the file has been created and may not be section-aligned. So I think it would be better to consider aligning the swapfile during f2fs_swap_activate()? Does it make sense to reallocate blocks like in f2fs_swap_activate(), set_inode_flag(inode, FI_PIN_FILE); truncate_pagecache(inode, 0); f2fs_truncate_blocks(inode, 0, true); It will corrupt swap header info while relocating whole swapfile... expand_inode_data(); .
Re: [PATCH] f2fs: fix to avoid out-of-bounds memory access
Hi butt3rflyh4ck, On 2021/3/23 13:48, butt3rflyh4ck wrote: Hi, I have tested the patch on 5.12.0-rc4+, it seems to fix the problem. Thanks for helping to test this patch. Thanks, Regards, butt3rflyh4ck. On Mon, Mar 22, 2021 at 7:47 PM Chao Yu wrote: butt3rflyh4ck reported a bug found by syzkaller fuzzer with custom modifications in 5.12.0-rc3+ [1]: dump_stack+0xfa/0x151 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x82/0x32c mm/kasan/report.c:232 __kasan_report mm/kasan/report.c:399 [inline] kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416 f2fs_test_bit fs/f2fs/f2fs.h:2572 [inline] current_nat_addr fs/f2fs/node.h:213 [inline] get_next_nat_page fs/f2fs/node.c:123 [inline] __flush_nat_entry_set fs/f2fs/node.c:2888 [inline] f2fs_flush_nat_entries+0x258e/0x2960 fs/f2fs/node.c:2991 f2fs_write_checkpoint+0x1372/0x6a70 fs/f2fs/checkpoint.c:1640 f2fs_issue_checkpoint+0x149/0x410 fs/f2fs/checkpoint.c:1807 f2fs_sync_fs+0x20f/0x420 fs/f2fs/super.c:1454 __sync_filesystem fs/sync.c:39 [inline] sync_filesystem fs/sync.c:67 [inline] sync_filesystem+0x1b5/0x260 fs/sync.c:48 generic_shutdown_super+0x70/0x370 fs/super.c:448 kill_block_super+0x97/0xf0 fs/super.c:1394 The root cause is, if nat entry in checkpoint journal area is corrupted, e.g. nid of journalled nat entry exceeds max nid value, during checkpoint, once it tries to flush nat journal to NAT area, get_next_nat_page() may access out-of-bounds memory on nat_bitmap due to it uses wrong nid value as bitmap offset. [1] https://lore.kernel.org/lkml/cafco6xomwdr8pobek6en6-fs58kg9dorfadgjj-fnf-1x43...@mail.gmail.com/T/#u Reported-by: butt3rflyh4ck Signed-off-by: Chao Yu --- fs/f2fs/node.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index caf43970510e..8311b2367c7c 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -2790,6 +2790,9 @@ static void remove_nats_in_journal(struct f2fs_sb_info *sbi) struct f2fs_nat_entry raw_ne; nid_t nid = le32_to_cpu(nid_in_journal(journal, i)); + if (f2fs_check_nid_range(sbi, nid)) + continue; + raw_ne = nat_in_journal(journal, i); ne = __lookup_nat_cache(nm_i, nid); -- 2.29.2 .
[PATCH] Revert "f2fs: give a warning only for readonly partition"
This reverts commit 938a184265d75ea474f1c6fe1da96a5196163789. Because that commit fails generic/050 testcase which expect failure during mount a recoverable readonly partition. Fixes: 938a184265d7 ("f2fs: give a warning only for readonly partition") Signed-off-by: Chao Yu --- fs/f2fs/super.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index b48281642e98..2b78ee11f093 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -3952,10 +3952,12 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) * previous checkpoint was not done by clean system shutdown. */ if (f2fs_hw_is_readonly(sbi)) { - if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) + if (!is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) { + err = -EROFS; f2fs_err(sbi, "Need to recover fsync data, but write access unavailable"); - else - f2fs_info(sbi, "write access unavailable, skipping recovery"); + goto free_meta; + } + f2fs_info(sbi, "write access unavailable, skipping recovery"); goto reset_checkpoint; } -- 2.29.2
[PATCH] f2fs: fix to avoid out-of-bounds memory access
butt3rflyh4ck reported a bug found by syzkaller fuzzer with custom modifications in 5.12.0-rc3+ [1]: dump_stack+0xfa/0x151 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x82/0x32c mm/kasan/report.c:232 __kasan_report mm/kasan/report.c:399 [inline] kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416 f2fs_test_bit fs/f2fs/f2fs.h:2572 [inline] current_nat_addr fs/f2fs/node.h:213 [inline] get_next_nat_page fs/f2fs/node.c:123 [inline] __flush_nat_entry_set fs/f2fs/node.c:2888 [inline] f2fs_flush_nat_entries+0x258e/0x2960 fs/f2fs/node.c:2991 f2fs_write_checkpoint+0x1372/0x6a70 fs/f2fs/checkpoint.c:1640 f2fs_issue_checkpoint+0x149/0x410 fs/f2fs/checkpoint.c:1807 f2fs_sync_fs+0x20f/0x420 fs/f2fs/super.c:1454 __sync_filesystem fs/sync.c:39 [inline] sync_filesystem fs/sync.c:67 [inline] sync_filesystem+0x1b5/0x260 fs/sync.c:48 generic_shutdown_super+0x70/0x370 fs/super.c:448 kill_block_super+0x97/0xf0 fs/super.c:1394 The root cause is, if nat entry in checkpoint journal area is corrupted, e.g. nid of journalled nat entry exceeds max nid value, during checkpoint, once it tries to flush nat journal to NAT area, get_next_nat_page() may access out-of-bounds memory on nat_bitmap due to it uses wrong nid value as bitmap offset. [1] https://lore.kernel.org/lkml/cafco6xomwdr8pobek6en6-fs58kg9dorfadgjj-fnf-1x43...@mail.gmail.com/T/#u Reported-by: butt3rflyh4ck Signed-off-by: Chao Yu --- fs/f2fs/node.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index caf43970510e..8311b2367c7c 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -2790,6 +2790,9 @@ static void remove_nats_in_journal(struct f2fs_sb_info *sbi) struct f2fs_nat_entry raw_ne; nid_t nid = le32_to_cpu(nid_in_journal(journal, i)); + if (f2fs_check_nid_range(sbi, nid)) + continue; + raw_ne = nat_in_journal(journal, i); ne = __lookup_nat_cache(nm_i, nid); -- 2.29.2
[PATCH] f2fs: fix to check ckpt blocks of section in get_victim_by_default()
In image which enables large section, if we disable checkpoint, during background GC, it needs to check whether there is checkpointed data in selected victim section, rather than segment, fix this. Fixes: 4354994f097d ("f2fs: checkpoint disabling") Signed-off-by: Chao Yu --- fs/f2fs/gc.c | 4 ++-- fs/f2fs/segment.c | 18 -- fs/f2fs/segment.h | 15 ++- 3 files changed, 20 insertions(+), 17 deletions(-) diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 318840c69b74..a73db53deb05 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -726,8 +726,8 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi, goto next; /* Don't touch checkpointed data */ if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED) && - get_ckpt_valid_blocks(sbi, segno) && - p.alloc_mode == LFS)) + get_ckpt_valid_blocks(sbi, segno, true) && + p.alloc_mode == LFS)) goto next; if (gc_type == BG_GC && test_bit(secno, dirty_i->victim_secmap)) goto next; diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 81fe335b860e..788aa0f2250e 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -865,7 +865,7 @@ static void locate_dirty_segment(struct f2fs_sb_info *sbi, unsigned int segno) mutex_lock(&dirty_i->seglist_lock); valid_blocks = get_valid_blocks(sbi, segno, false); - ckpt_valid_blocks = get_ckpt_valid_blocks(sbi, segno); + ckpt_valid_blocks = get_ckpt_valid_blocks(sbi, segno, false); if (valid_blocks == 0 && (!is_sbi_flag_set(sbi, SBI_CP_DISABLED) || ckpt_valid_blocks == usable_blocks)) { @@ -950,7 +950,7 @@ static unsigned int get_free_segment(struct f2fs_sb_info *sbi) for_each_set_bit(segno, dirty_i->dirty_segmap[DIRTY], MAIN_SEGS(sbi)) { if (get_valid_blocks(sbi, segno, false)) continue; - if (get_ckpt_valid_blocks(sbi, segno)) + if (get_ckpt_valid_blocks(sbi, segno, false)) continue; mutex_unlock(&dirty_i->seglist_lock); return segno; @@ -2933,18 +2933,8 @@ static void __allocate_new_segment(struct f2fs_sb_info *sbi, int type, get_valid_blocks(sbi, curseg->segno, new_sec)) goto alloc; - if (new_sec) { - unsigned int segno = START_SEGNO(curseg->segno); - int i; - - for (i = 0; i < sbi->segs_per_sec; i++, segno++) { - if (get_ckpt_valid_blocks(sbi, segno)) - goto alloc; - } - } else { - if (!get_ckpt_valid_blocks(sbi, curseg->segno)) - return; - } + if (!get_ckpt_valid_blocks(sbi, curseg->segno, new_sec)) + return; alloc: old_segno = curseg->segno; diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index 144980b62f9e..0b4b3796faaa 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -359,8 +359,21 @@ static inline unsigned int get_valid_blocks(struct f2fs_sb_info *sbi, } static inline unsigned int get_ckpt_valid_blocks(struct f2fs_sb_info *sbi, - unsigned int segno) + unsigned int segno, bool use_section) { + if (use_section && __is_large_section(sbi)) { + unsigned int start_segno = START_SEGNO(segno); + unsigned int blocks = 0; + int i; + + for (i = 0; i < sbi->segs_per_sec; i++, start_segno++) { + struct seg_entry *se = get_seg_entry(sbi, start_segno); + + blocks += se->ckpt_valid_blocks; + } + return blocks; + } + return get_seg_entry(sbi, segno)->ckpt_valid_blocks; } -- 2.29.2
Re: [PATCH RFC] f2fs: fix to avoid selecting full segment w/ {AT,}SSR allocator
On 2021/3/19 1:17, Jaegeuk Kim wrote: On 02/20, Chao Yu wrote: In cp disabling mode, there could be a condition - target segment has 128 ckpt valid blocks - GC migrates 128 valid blocks to other segment (segment is still in dirty list) - GC migrates 384 blocks to target segment (segment has 128 cp_vblocks and 384 vblocks) - If GC selects target segment via {AT,}SSR allocator, however there is no free space in targe segment. Fixes: 4354994f097d ("f2fs: checkpoint disabling") Fixes: 093749e296e2 ("f2fs: support age threshold based garbage collection") Signed-off-by: Chao Yu --- fs/f2fs/f2fs.h| 1 + fs/f2fs/gc.c | 17 + fs/f2fs/segment.c | 20 3 files changed, 34 insertions(+), 4 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index ed7807103c8e..9c753eff0814 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -3376,6 +3376,7 @@ block_t f2fs_get_unusable_blocks(struct f2fs_sb_info *sbi); int f2fs_disable_cp_again(struct f2fs_sb_info *sbi, block_t unusable); void f2fs_release_discard_addrs(struct f2fs_sb_info *sbi); int f2fs_npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra); +bool segment_has_free_slot(struct f2fs_sb_info *sbi, int segno); void f2fs_init_inmem_curseg(struct f2fs_sb_info *sbi); void f2fs_save_inmem_curseg(struct f2fs_sb_info *sbi); void f2fs_restore_inmem_curseg(struct f2fs_sb_info *sbi); diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c index 86ba8ed0b8a7..a1d8062cdace 100644 --- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -392,10 +392,6 @@ static void add_victim_entry(struct f2fs_sb_info *sbi, if (p->gc_mode == GC_AT && get_valid_blocks(sbi, segno, true) == 0) return; - - if (p->alloc_mode == AT_SSR && - get_seg_entry(sbi, segno)->ckpt_valid_blocks == 0) - return; } for (i = 0; i < sbi->segs_per_sec; i++) @@ -736,6 +732,19 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi, if (gc_type == BG_GC && test_bit(secno, dirty_i->victim_secmap)) goto next; + if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED))) { + /* +* to avoid selecting candidate which has below valid +* block distribution: +* partial blocks are valid and all left ones are valid +* in previous checkpoint. +*/ + if (p.alloc_mode == SSR || p.alloc_mode == AT_SSR) { + if (!segment_has_free_slot(sbi, segno)) + goto next; Do we need to change this to check free_slot instead of get_ckpt_valid_blocks()? Jaegeuk, LFS was assigned only for GC case, in this case we are trying to select source section, rather than target segment for SSR/AT_SSR case, so we don't need to check free_slot. - f2fs_gc - __get_victim - get_victim(sbi, victim, gc_type, NO_CHECK_TYPE, LFS, 0); 732 if (unlikely(is_sbi_flag_set(sbi, SBI_CP_DISABLED) && 733 get_ckpt_valid_blocks(sbi, segno) && 734 p.alloc_mode == LFS)) BTW, in LFS mode, GC wants to find source section rather than segment, so we should change to check valid ckpt blocks in every segment of targe section here? Thanks, + } + } + if (is_atgc) { add_victim_entry(sbi, &p, segno); goto next; diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 2d5a82c4ca15..deaf57e13125 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -2650,6 +2650,26 @@ static void __refresh_next_blkoff(struct f2fs_sb_info *sbi, seg->next_blkoff++; } +bool segment_has_free_slot(struct f2fs_sb_info *sbi, int segno) +{ + struct sit_info *sit = SIT_I(sbi); + struct seg_entry *se = get_seg_entry(sbi, segno); + int entries = SIT_VBLOCK_MAP_SIZE / sizeof(unsigned long); + unsigned long *target_map = SIT_I(sbi)->tmp_map; + unsigned long *ckpt_map = (unsigned long *)se->ckpt_valid_map; + unsigned long *cur_map = (unsigned long *)se->cur_valid_map; + int i, pos; + + down_write(&sit->sentry_lock); + for (i = 0; i < entries; i++) + target_map[i] = ckpt_map[i] | cur_map[i]; + + pos = __find_rev_next_zero_bit(target_map, sbi->blocks_per_seg, 0); + up_write(&sit->sentry_lock); + + return pos < sbi->blocks_per_seg; +} + /* * This function always allocates a used segment(from dirty seglist) by SSR * manner, so it should recover the existing segment information of valid blocks -- 2.29.2 .
Re: [PATCH v2] erofs: fix bio->bi_max_vecs behavior change
On 2021/3/6 12:04, Gao Xiang wrote: From: Gao Xiang Martin reported an issue that directory read could be hung on the latest -rc kernel with some certain image. The root cause is that commit baa2c7c97153 ("block: set .bi_max_vecs as actual allocated vector number") changes .bi_max_vecs behavior. bio->bi_max_vecs is set as actual allocated vector number rather than the requested number now. Let's avoid using .bi_max_vecs completely instead. Reported-by: Martin DEVERA Signed-off-by: Gao Xiang --- change since v1: - since bio->bi_max_vecs doesn't record extent blocks anymore, introduce a remaining extent block to avoid extent excess. fs/erofs/data.c | 28 +++- 1 file changed, 11 insertions(+), 17 deletions(-) diff --git a/fs/erofs/data.c b/fs/erofs/data.c index f88851c5c250..1249e74b3bf0 100644 --- a/fs/erofs/data.c +++ b/fs/erofs/data.c @@ -129,6 +129,7 @@ static inline struct bio *erofs_read_raw_page(struct bio *bio, struct page *page, erofs_off_t *last_block, unsigned int nblocks, + unsigned int *eblks, bool ra) { struct inode *const inode = mapping->host; @@ -145,8 +146,7 @@ static inline struct bio *erofs_read_raw_page(struct bio *bio, /* note that for readpage case, bio also equals to NULL */ if (bio && - /* not continuous */ - *last_block + 1 != current_block) { + (*last_block + 1 != current_block || !*eblks)) { Xiang, I found below function during checking bi_max_vecs usage in f2fs: /** * bio_full - check if the bio is full * @bio:bio to check * @len:length of one segment to be added * * Return true if @bio is full and one segment with @len bytes can't be * added to the bio, otherwise return false */ static inline bool bio_full(struct bio *bio, unsigned len) { if (bio->bi_vcnt >= bio->bi_max_vecs) return true; if (bio->bi_iter.bi_size > UINT_MAX - len) return true; return false; } Could you please check that whether it will be better to use bio_full() rather than using left-space-in-bio maintained by erofs itself? something like: if (bio && (bio_full(bio, PAGE_SIZE) || /* not continuous */ (*last_block + 1 != current_block)) I'm thinking we need to decouple bio detail implementation as much as possible, to avoid regression whenever bio used/max size definition updates, though I've no idea how to fix f2fs case. Let me know if you have other concern. Thanks, submit_bio_retry: submit_bio(bio); bio = NULL; @@ -216,7 +216,8 @@ static inline struct bio *erofs_read_raw_page(struct bio *bio, if (nblocks > DIV_ROUND_UP(map.m_plen, PAGE_SIZE)) nblocks = DIV_ROUND_UP(map.m_plen, PAGE_SIZE); - bio = bio_alloc(GFP_NOIO, bio_max_segs(nblocks)); + *eblks = bio_max_segs(nblocks); + bio = bio_alloc(GFP_NOIO, *eblks); bio->bi_end_io = erofs_readendio; bio_set_dev(bio, sb->s_bdev); @@ -229,16 +230,8 @@ static inline struct bio *erofs_read_raw_page(struct bio *bio, /* out of the extent or bio is full */ if (err < PAGE_SIZE) goto submit_bio_retry; - + --*eblks; *last_block = current_block; - - /* shift in advance in case of it followed by too many gaps */ - if (bio->bi_iter.bi_size >= bio->bi_max_vecs * PAGE_SIZE) { - /* err should reassign to 0 after submitting */ - err = 0; - goto submit_bio_out; - } - return bio; err_out: @@ -252,7 +245,6 @@ static inline struct bio *erofs_read_raw_page(struct bio *bio, /* if updated manually, continuous pages has a gap */ if (bio) -submit_bio_out: submit_bio(bio); return err ? ERR_PTR(err) : NULL; } @@ -264,23 +256,26 @@ static inline struct bio *erofs_read_raw_page(struct bio *bio, static int erofs_raw_access_readpage(struct file *file, struct page *page) { erofs_off_t last_block; + unsigned int eblks; struct bio *bio; trace_erofs_readpage(page, true); bio = erofs_read_raw_page(NULL, page->mapping, - page, &last_block, 1, false); + page, &last_block, 1, &eblks, false); if (IS_ERR(bio)) return PTR_ERR(bio); - DBG_BUGON(bio); /* since we have only one bio -- must be NULL */ + if (bio) + submit_bio(bio); return 0; } static void erofs_raw_access_readahead(struct readahead_control *rac) { erofs_off_t last_block; + unsigned int eblks; struct bio *bio = NULL; struct page *page;
[PATCH 1/2] f2fs: don't start checkpoint thread in readonly mountpoint
In readonly mountpoint, there should be no write IOs include checkpoint IO, so that it's not needed to create kernel checkpoint thread. Signed-off-by: Chao Yu --- fs/f2fs/super.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index ce88db0170fa..6716af216dca 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -2091,8 +2091,10 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data) } } - if (!test_opt(sbi, DISABLE_CHECKPOINT) && - test_opt(sbi, MERGE_CHECKPOINT)) { + if ((*flags & SB_RDONLY) || test_opt(sbi, DISABLE_CHECKPOINT) || + !test_opt(sbi, MERGE_CHECKPOINT)) { + f2fs_stop_ckpt_thread(sbi); + } else { err = f2fs_start_ckpt_thread(sbi); if (err) { f2fs_err(sbi, @@ -2100,8 +2102,6 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data) err); goto restore_gc; } - } else { - f2fs_stop_ckpt_thread(sbi); } /* @@ -3857,7 +3857,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) /* setup checkpoint request control and start checkpoint issue thread */ f2fs_init_ckpt_req_control(sbi); - if (!test_opt(sbi, DISABLE_CHECKPOINT) && + if (!f2fs_readonly(sb) && !test_opt(sbi, DISABLE_CHECKPOINT) && test_opt(sbi, MERGE_CHECKPOINT)) { err = f2fs_start_ckpt_thread(sbi); if (err) { -- 2.29.2
[PATCH 2/2] f2fs: fix error path of f2fs_remount()
In error path of f2fs_remount(), it missed to restart/stop kernel thread or enable/disable checkpoint, then mount option status may not be consistent with real condition of filesystem, so let's reorder remount flow a bit as below and do recovery correctly in error path: 1) handle gc thread 2) handle ckpt thread 3) handle flush thread 4) handle checkpoint disabling Signed-off-by: Chao Yu --- fs/f2fs/super.c | 47 ++- 1 file changed, 34 insertions(+), 13 deletions(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 6716af216dca..fa60f08c30bb 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -1942,8 +1942,9 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data) struct f2fs_mount_info org_mount_opt; unsigned long old_sb_flags; int err; - bool need_restart_gc = false; - bool need_stop_gc = false; + bool need_restart_gc = false, need_stop_gc = false; + bool need_restart_ckpt = false, need_stop_ckpt = false; + bool need_restart_flush = false, need_stop_flush = false; bool no_extent_cache = !test_opt(sbi, EXTENT_CACHE); bool disable_checkpoint = test_opt(sbi, DISABLE_CHECKPOINT); bool no_io_align = !F2FS_IO_ALIGNED(sbi); @@ -2081,19 +2082,10 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data) clear_sbi_flag(sbi, SBI_IS_CLOSE); } - if (checkpoint_changed) { - if (test_opt(sbi, DISABLE_CHECKPOINT)) { - err = f2fs_disable_checkpoint(sbi); - if (err) - goto restore_gc; - } else { - f2fs_enable_checkpoint(sbi); - } - } - if ((*flags & SB_RDONLY) || test_opt(sbi, DISABLE_CHECKPOINT) || !test_opt(sbi, MERGE_CHECKPOINT)) { f2fs_stop_ckpt_thread(sbi); + need_restart_ckpt = true; } else { err = f2fs_start_ckpt_thread(sbi); if (err) { @@ -2102,6 +2094,7 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data) err); goto restore_gc; } + need_stop_ckpt = true; } /* @@ -2111,11 +2104,24 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data) if ((*flags & SB_RDONLY) || !test_opt(sbi, FLUSH_MERGE)) { clear_opt(sbi, FLUSH_MERGE); f2fs_destroy_flush_cmd_control(sbi, false); + need_restart_flush = true; } else { err = f2fs_create_flush_cmd_control(sbi); if (err) - goto restore_gc; + goto restore_ckpt; + need_stop_flush = true; } + + if (checkpoint_changed) { + if (test_opt(sbi, DISABLE_CHECKPOINT)) { + err = f2fs_disable_checkpoint(sbi); + if (err) + goto restore_flush; + } else { + f2fs_enable_checkpoint(sbi); + } + } + skip: #ifdef CONFIG_QUOTA /* Release old quota file names */ @@ -2130,6 +2136,21 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data) adjust_unusable_cap_perc(sbi); *flags = (*flags & ~SB_LAZYTIME) | (sb->s_flags & SB_LAZYTIME); return 0; +restore_flush: + if (need_restart_flush) { + if (f2fs_create_flush_cmd_control(sbi)) + f2fs_warn(sbi, "background flush thread has stopped"); + } else if (need_stop_flush) { + clear_opt(sbi, FLUSH_MERGE); + f2fs_destroy_flush_cmd_control(sbi, false); + } +restore_ckpt: + if (need_restart_ckpt) { + if (f2fs_start_ckpt_thread(sbi)) + f2fs_warn(sbi, "background ckpt thread has stopped"); + } else if (need_stop_ckpt) { + f2fs_stop_ckpt_thread(sbi); + } restore_gc: if (need_restart_gc) { if (f2fs_start_gc_thread(sbi)) -- 2.29.2
Re: [PATCH 2/2] erofs: use sync decompression for atomic contexts only
On 2021/3/17 11:54, Huang Jianan via Linux-erofs wrote: Sync decompression was introduced to get rid of additional kworker scheduling overhead. But there is no such overhead in non-atomic contexts. Therefore, it should be better to turn off sync decompression to avoid the current thread waiting in z_erofs_runqueue. Signed-off-by: Huang Jianan Signed-off-by: Guo Weichao Reviewed-by: Gao Xiang Reviewed-by: Chao Yu Thanks,
Re: [PATCH 1/2] erofs: use workqueue decompression for atomic contexts only
On 2021/3/17 11:54, Huang Jianan via Linux-erofs wrote: z_erofs_decompressqueue_endio may not be executed in the atomic context, for example, when dm-verity is turned on. In this scenario, data can be decompressed directly to get rid of additional kworker scheduling overhead. Signed-off-by: Huang Jianan Signed-off-by: Guo Weichao Reviewed-by: Gao Xiang Reviewed-by: Chao Yu Thanks,
Re: [PATCH] zonefs: fix to update .i_wr_refcnt correctly in zonefs_open_zone()
On 2021/3/17 7:30, Damien Le Moal wrote: On 2021/03/16 21:30, Chao Yu wrote: In zonefs_open_zone(), if opened zone count is larger than .s_max_open_zones threshold, we missed to recover .i_wr_refcnt, fix this. Fixes: b5c00e975779 ("zonefs: open/close zone on file open/close") Signed-off-by: Chao Yu --- fs/zonefs/super.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c index 0fe76f376dee..be6b99f7de74 100644 --- a/fs/zonefs/super.c +++ b/fs/zonefs/super.c @@ -966,8 +966,7 @@ static int zonefs_open_zone(struct inode *inode) mutex_lock(&zi->i_truncate_mutex); - zi->i_wr_refcnt++; - if (zi->i_wr_refcnt == 1) { + if (zi->i_wr_refcnt == 0) { Nit: if (!zi->i_wr_refcnt) ? I can change that when applying. More clean, thanks. :) Thanks, if (atomic_inc_return(&sbi->s_open_zones) > sbi->s_max_open_zones) { atomic_dec(&sbi->s_open_zones); @@ -978,7 +977,6 @@ static int zonefs_open_zone(struct inode *inode) if (i_size_read(inode) < zi->i_max_size) { ret = zonefs_zone_mgmt(inode, REQ_OP_ZONE_OPEN); if (ret) { - zi->i_wr_refcnt--; atomic_dec(&sbi->s_open_zones); goto unlock; } @@ -986,6 +984,8 @@ static int zonefs_open_zone(struct inode *inode) } } + zi->i_wr_refcnt++; + unlock: mutex_unlock(&zi->i_truncate_mutex); Good catch ! Will apply this and check zonefs test suite as this bug went undetected. Thanks.
[PATCH] zonefs: fix to update .i_wr_refcnt correctly in zonefs_open_zone()
In zonefs_open_zone(), if opened zone count is larger than .s_max_open_zones threshold, we missed to recover .i_wr_refcnt, fix this. Fixes: b5c00e975779 ("zonefs: open/close zone on file open/close") Signed-off-by: Chao Yu --- fs/zonefs/super.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c index 0fe76f376dee..be6b99f7de74 100644 --- a/fs/zonefs/super.c +++ b/fs/zonefs/super.c @@ -966,8 +966,7 @@ static int zonefs_open_zone(struct inode *inode) mutex_lock(&zi->i_truncate_mutex); - zi->i_wr_refcnt++; - if (zi->i_wr_refcnt == 1) { + if (zi->i_wr_refcnt == 0) { if (atomic_inc_return(&sbi->s_open_zones) > sbi->s_max_open_zones) { atomic_dec(&sbi->s_open_zones); @@ -978,7 +977,6 @@ static int zonefs_open_zone(struct inode *inode) if (i_size_read(inode) < zi->i_max_size) { ret = zonefs_zone_mgmt(inode, REQ_OP_ZONE_OPEN); if (ret) { - zi->i_wr_refcnt--; atomic_dec(&sbi->s_open_zones); goto unlock; } @@ -986,6 +984,8 @@ static int zonefs_open_zone(struct inode *inode) } } + zi->i_wr_refcnt++; + unlock: mutex_unlock(&zi->i_truncate_mutex); -- 2.29.2
Re: [PATCH v3] f2fs: allow to change discard policy based on cached discard cmds
On 2021/3/16 17:29, Sahitya Tummala wrote: With the default DPOLICY_BG discard thread is ioaware, which prevents the discard thread from issuing the discard commands. On low RAM setups, it is observed that these discard commands in the cache are consuming high memory. This patch aims to relax the memory pressure on the system due to f2fs pending discard cmds by changing the policy to DPOLICY_FORCE based on the nm_i->ram_thresh configured. Signed-off-by: Sahitya Tummala Reviewed-by: Chao Yu Thanks,
Re: [f2fs] ab2dbddfd0: BUG:kernel_NULL_pointer_dereference,address
Hi Sahitya, Node manager was initialized after segment manager's initialization, so f2fs_available_free_memory() called from issue_discard_thread() may access invalid nm_i pointer, could you please check and fix this case? On 2021/3/16 12:58, kernel test robot wrote: Greeting, FYI, we noticed the following commit (built with gcc-9): commit: ab2dbddfd064f2078a7099e4d65cce54f5ef5e71 ("[PATCH v2] f2fs: allow to change discard policy based on cached discard cmds") url: https://github.com/0day-ci/linux/commits/Sahitya-Tummala/f2fs-allow-to-change-discard-policy-based-on-cached-discard-cmds/20210311-170257 in testcase: ltp version: ltp-x86_64-14c1f76-1_20210315 with following parameters: disk: 1HDD fs: f2fs test: io ucode: 0x21 test-description: The LTP testsuite contains a collection of tools for testing the Linux kernel and related features. test-url: http://linux-test-project.github.io/ on test machine: 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 8G memory caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): If you fix the issue, kindly add following tag Reported-by: kernel test robot [ 38.378402] BUG: kernel NULL pointer dereference, address: 0010 [ 38.378526] #PF: supervisor read access in kernel mode [ 38.378610] #PF: error_code(0x) - not-present page [ 38.378694] PGD 0 P4D 0 [ 38.378739] Oops: [#1] SMP PTI [ 38.378799] CPU: 2 PID: 2436 Comm: f2fs_discard-8: Not tainted 5.12.0-rc2-1-gab2dbddfd064 #1 [ 38.378940] Hardware name: Hewlett-Packard p6-1451cx/2ADA, BIOS 8.15 02/05/2013 [ 38.379057] RIP: 0010:f2fs_available_free_memory (kbuild/src/consumer/fs/f2fs/node.c:96) f2fs [ 38.379237] Code: 04 00 00 48 0f af d6 48 be c3 f5 28 5c 8f c2 f5 28 48 c1 ea 02 48 89 d0 48 f7 e6 48 c1 ea 03 48 39 ca 0f 97 c0 e9 af fe ff ff <41> 8b 54 24 10 49 63 8d 94 20 00 00 48 0f af d6 48 be c3 f5 28 5c All code 0: 04 00 add$0x0,%al 2: 00 48 0fadd%cl,0xf(%rax) 5: af scas %es:(%rdi),%eax 6: d6 (bad) 7: 48 be c3 f5 28 5c 8fmovabs $0x28f5c28f5c28f5c3,%rsi e: c2 f5 28 11: 48 c1 ea 02 shr$0x2,%rdx 15: 48 89 d0mov%rdx,%rax 18: 48 f7 e6mul%rsi 1b: 48 c1 ea 03 shr$0x3,%rdx 1f: 48 39 cacmp%rcx,%rdx 22: 0f 97 c0seta %al 25: e9 af fe ff ff jmpq 0xfed9 2a:* 41 8b 54 24 10 mov0x10(%r12),%edx <-- trapping instruction 2f: 49 63 8d 94 20 00 00movslq 0x2094(%r13),%rcx 36: 48 0f af d6 imul %rsi,%rdx 3a: 48 rex.W 3b: be c3 f5 28 5c mov$0x5c28f5c3,%esi Code starting with the faulting instruction === 0: 41 8b 54 24 10 mov0x10(%r12),%edx 5: 49 63 8d 94 20 00 00movslq 0x2094(%r13),%rcx c: 48 0f af d6 imul %rsi,%rdx 10: 48 rex.W 11: be c3 f5 28 5c mov$0x5c28f5c3,%esi [ 38.379531] RSP: 0018:c96f3dd8 EFLAGS: 00010246 [ 38.379617] RAX: 0106 RBX: 888213317000 RCX: 001e9c8c [ 38.379731] RDX: 88810c84b430 RSI: 001e9c8c RDI: 88810c84b540 [ 38.379844] RBP: 0006 R08: 0106 R09: 88821fb2bc58 [ 38.379958] R10: 032e R11: 88821fb2a144 R12: [ 38.380071] R13: 88820b7e4000 R14: ea60 R15: [ 38.380185] FS: () GS:88821fb0() knlGS: [ 38.380315] CS: 0010 DS: ES: CR0: 80050033 [ 38.380408] CR2: 0010 CR3: 00021e00a003 CR4: 001706e0 [ 38.380522] Call Trace: [ 38.380619] ? del_timer_sync (kbuild/src/consumer/kernel/time/timer.c:1394) [ 38.380686] ? prepare_to_wait_event (kbuild/src/consumer/kernel/sched/wait.c:323 (discriminator 15)) [ 38.380762] ? __next_timer_interrupt (kbuild/src/consumer/kernel/time/timer.c:1816) [ 38.380841] issue_discard_thread (kbuild/src/consumer/fs/f2fs/segment.c:1759 (discriminator 1)) f2fs [ 38.380937] ? finish_wait (kbuild/src/consumer/kernel/sched/wait.c:403) [ 38.380997] ? __issue_discard_cmd (kbuild/src/consumer/fs/f2fs/segment.c:1722) f2fs [ 38.381094] kthread (kbuild/src/consumer/kernel/kthread.c:292) [ 38.381151] ? kthread_park (kbuild/src/consumer/kernel/kthread.c:245) [ 38.381213] ret_from_fork (kbuild/src/consumer/arch/x86/entry/entry_64.S:300) [ 38.381276] Modules linked in: dm_mod f2fs netconsole btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c sd_mod t10_pi sg intel_rapl_msr intel_rapl_common i915 x86_pkg_temp_thermal intel_powerclamp coretemp intel_gtt crct10dif_pclmul crc32_pclmul drm_kms_helper crc32c
Re: [PATCH v6 2/2] erofs: decompress in endio if possible
Hi Jianan, On 2021/3/16 11:15, Huang Jianan via Linux-erofs wrote: z_erofs_decompressqueue_endio may not be executed in the atomic context, for example, when dm-verity is turned on. In this scenario, data can be decompressed directly to get rid of additional kworker scheduling overhead. Also, it makes no sense to apply synchronous decompression for such case. It looks this patch does more than one things: - combine dm-verity and erofs workqueue - change policy of decompression in context of thread Normally, we do one thing in one patch, by this way, we will be benefit in scenario of when backporting patches and bisecting problematic patch with minimum granularity, and also it will help reviewer to focus on reviewing single code logic by following patch's goal. So IMO, it would be better to separate this patch into two. One more thing is could you explain a little bit more about why we need to change policy of decompression in context of thread? for better performance? BTW, code looks clean to me. :) Thanks, Signed-off-by: Huang Jianan Signed-off-by: Guo Weichao Reviewed-by: Gao Xiang --- fs/erofs/internal.h | 2 ++ fs/erofs/super.c| 1 + fs/erofs/zdata.c| 15 +-- 3 files changed, 16 insertions(+), 2 deletions(-) diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h index 67a7ec945686..fbc4040715be 100644 --- a/fs/erofs/internal.h +++ b/fs/erofs/internal.h @@ -50,6 +50,8 @@ struct erofs_fs_context { #ifdef CONFIG_EROFS_FS_ZIP /* current strategy of how to use managed cache */ unsigned char cache_strategy; + /* strategy of sync decompression (false - auto, true - force on) */ + bool readahead_sync_decompress; /* threshold for decompression synchronously */ unsigned int max_sync_decompress_pages; diff --git a/fs/erofs/super.c b/fs/erofs/super.c index d5a6b9b888a5..0445d09b6331 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -200,6 +200,7 @@ static void erofs_default_options(struct erofs_fs_context *ctx) #ifdef CONFIG_EROFS_FS_ZIP ctx->cache_strategy = EROFS_ZIP_CACHE_READAROUND; ctx->max_sync_decompress_pages = 3; + ctx->readahead_sync_decompress = false; #endif #ifdef CONFIG_EROFS_FS_XATTR set_opt(ctx, XATTR_USER); diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c index 6cb356c4217b..25a0c4890d0a 100644 --- a/fs/erofs/zdata.c +++ b/fs/erofs/zdata.c @@ -706,9 +706,12 @@ static int z_erofs_do_read_page(struct z_erofs_decompress_frontend *fe, goto out; } +static void z_erofs_decompressqueue_work(struct work_struct *work); static void z_erofs_decompress_kickoff(struct z_erofs_decompressqueue *io, bool sync, int bios) { + struct erofs_sb_info *const sbi = EROFS_SB(io->sb); + /* wake up the caller thread for sync decompression */ if (sync) { unsigned long flags; @@ -720,8 +723,15 @@ static void z_erofs_decompress_kickoff(struct z_erofs_decompressqueue *io, return; } - if (!atomic_add_return(bios, &io->pending_bios)) + if (atomic_add_return(bios, &io->pending_bios)) + return; + /* Use workqueue and sync decompression for atomic contexts only */ + if (in_atomic() || irqs_disabled()) { queue_work(z_erofs_workqueue, &io->u.work); + sbi->ctx.readahead_sync_decompress = true; + return; + } + z_erofs_decompressqueue_work(&io->u.work); } static bool z_erofs_page_is_invalidated(struct page *page) @@ -1333,7 +1343,8 @@ static void z_erofs_readahead(struct readahead_control *rac) struct erofs_sb_info *const sbi = EROFS_I_SB(inode); unsigned int nr_pages = readahead_count(rac); - bool sync = (nr_pages <= sbi->ctx.max_sync_decompress_pages); + bool sync = (sbi->ctx.readahead_sync_decompress && + nr_pages <= sbi->ctx.max_sync_decompress_pages); struct z_erofs_decompress_frontend f = DECOMPRESS_FRONTEND_INIT(inode); struct page *page, *head = NULL; LIST_HEAD(pagepool);
Re: [PATCH v6 1/2] erofs: avoid memory allocation failure during rolling decompression
On 2021/3/16 11:15, Huang Jianan via Linux-erofs wrote: Currently, err would be treated as io error. Therefore, it'd be better to ensure memory allocation during rolling decompression to avoid such io error. In the long term, we might consider adding another !Uptodate case for such case. Signed-off-by: Huang Jianan Signed-off-by: Guo Weichao Reviewed-by: Gao Xiang Reviewed-by: Chao Yu Thanks,
Re: [PATCH v5 1/2] erofs: avoid memory allocation failure during rolling decompression
On 2021/3/5 17:58, Huang Jianan via Linux-erofs wrote: Currently, err would be treated as io error. Therefore, it'd be better to ensure memory allocation during rolling decompression to avoid such io error. In the long term, we might consider adding another !Uptodate case for such case. Signed-off-by: Huang Jianan Signed-off-by: Guo Weichao --- fs/erofs/decompressor.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c index 1cb1ffd10569..3d276a8aad86 100644 --- a/fs/erofs/decompressor.c +++ b/fs/erofs/decompressor.c @@ -73,7 +73,8 @@ static int z_erofs_lz4_prepare_destpages(struct z_erofs_decompress_req *rq, victim = availables[--top]; get_page(victim); } else { - victim = erofs_allocpage(pagepool, GFP_KERNEL); + victim = erofs_allocpage(pagepool, +GFP_KERNEL | __GFP_NOFAIL); if (!victim) return -ENOMEM; A little bit weird that we still need to check return value of erofs_allocpage() after we pass __GFP_NOFAIL parameter. Thanks, set_page_private(victim, Z_EROFS_SHORTLIVED_PAGE);
Re: [PATCH] f2fs: fix the discard thread sleep timeout under high utilization
Hi Sahitya, On 2021/3/15 17:45, Sahitya Tummala wrote: Hi Chao, On Mon, Mar 15, 2021 at 04:10:22PM +0800, Chao Yu wrote: Hi Sahitya, On 2021/3/15 15:46, Sahitya Tummala wrote: Hi Chao, On Mon, Mar 15, 2021 at 02:12:44PM +0800, Chao Yu wrote: Sahitya, On 2021/3/15 12:56, Sahitya Tummala wrote: When f2fs is heavily utilized over 80%, the current discard policy sets the max sleep timeout of discard thread as 50ms (DEF_MIN_DISCARD_ISSUE_TIME). But this is set even when there are no pending discard commands to be issued. This results into unnecessary frequent and periodic wake ups of the discard thread. This patch adds check for pending discard commands in addition to heavy utilization condition to prevent those wake ups. Could this commit fix your issue? https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev&id=43f8c47ea7d59c7b2270835f1d7c4618a1ea27b6 I don't think it will help because we are changing the max timeout of the dpolicy itself in __init_discard_policy() when util > 80% as below - dpolicy->max_interval = DEF_MIN_DISCARD_ISSUE_TIME; And issue_discard_thread() uses this value as wait_ms, when there are no more pending discard commands to be issued. } else { wait_ms = dpolicy.max_interval; } The new patch posted above is not changing anything related to the max_interval. Hence, I think it won't help the uncessary wakeup problem I am trying to solve for this condition - util > 80% and when there are no pending discards. Please let me know if i am missing something. Copied, thanks for the explanation. But there is another case which can cause this issue in the case of disk util < 80%. wait_ms = DEF_MIN_DISCARD_ISSUE_TIME; do { wait_event_interruptible_timeout(, wait_ms); ... if (!atomic_read(&dcc->discard_cmd_cnt)) [1] new statement continue; } while(); Then the loop will wakeup whenever 50ms timeout. Yes, only for a short period of time i.e., until the first discard command is issued. Once a discard is issued, it will use wait_ms = dpolicy.max_interval; So, to avoid this case, shouldn't we reset wait_ms to dpolicy.max_interval at [1]? Yes, we can add that to cover the above case. Meanwhile, how about relocating discard_cmd_cnt check after __init_discard_policy(DPOLICY_FORCE)? and olny set .max_interval to DEF_MAX_DISCARD_ISSUE_TIME if there is no discard command, and keep .granularity to 1? There is not need to change .granularity, right? It will be controlled I think so. as per utilization as it is done today. Only max_interval and wait_ms needs to be updated. Does this look good? diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index d7076796..958ad1e 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -1772,13 +1772,16 @@ static int issue_discard_thread(void *data) wait_ms = dpolicy.max_interval; continue; } - if (!atomic_read(&dcc->discard_cmd_cnt)) - continue; - if (sbi->gc_mode == GC_URGENT_HIGH || !f2fs_available_free_memory(sbi, DISCARD_CACHE)) __init_discard_policy(sbi, &dpolicy, DPOLICY_FORCE, 1); + if (!atomic_read(&dcc->discard_cmd_cnt)) { + dpolicy.max_interval = DEF_MAX_DISCARD_ISSUE_TIME; + wait_ms = dpolicy.max_interval; + continue; + } Hmm.. how about cleaning up to configure discard policy in __init_discard_policy()? Something like: --- fs/f2fs/segment.c | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 592927ccffa7..684463a70eb9 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -1118,7 +1118,9 @@ static void __init_discard_policy(struct f2fs_sb_info *sbi, dpolicy->ordered = true; if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL) { dpolicy->granularity = 1; - dpolicy->max_interval = DEF_MIN_DISCARD_ISSUE_TIME; + if (atomic_read(&SM_I(sbi)->dcc_info->discard_cmd_cnt)) + dpolicy->max_interval = + DEF_MIN_DISCARD_ISSUE_TIME; } } else if (discard_type == DPOLICY_FORCE) { dpolicy->min_interval = DEF_MIN_DISCARD_ISSUE_TIME; @@ -1734,8 +1736,15 @@ static int issue_discard_thread(void *data) set_freezable(); do { - __init_discard_policy(sbi, &dpolicy, DPOLICY_BG, - dcc->discard_granularity); + if (sbi->gc_mode == GC_URGENT_HIGH || + !f2fs_available_free_memor
Re: [PATCH] f2fs: fix the discard thread sleep timeout under high utilization
Hi Sahitya, On 2021/3/15 15:46, Sahitya Tummala wrote: Hi Chao, On Mon, Mar 15, 2021 at 02:12:44PM +0800, Chao Yu wrote: Sahitya, On 2021/3/15 12:56, Sahitya Tummala wrote: When f2fs is heavily utilized over 80%, the current discard policy sets the max sleep timeout of discard thread as 50ms (DEF_MIN_DISCARD_ISSUE_TIME). But this is set even when there are no pending discard commands to be issued. This results into unnecessary frequent and periodic wake ups of the discard thread. This patch adds check for pending discard commands in addition to heavy utilization condition to prevent those wake ups. Could this commit fix your issue? https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev&id=43f8c47ea7d59c7b2270835f1d7c4618a1ea27b6 I don't think it will help because we are changing the max timeout of the dpolicy itself in __init_discard_policy() when util > 80% as below - dpolicy->max_interval = DEF_MIN_DISCARD_ISSUE_TIME; And issue_discard_thread() uses this value as wait_ms, when there are no more pending discard commands to be issued. } else { wait_ms = dpolicy.max_interval; } The new patch posted above is not changing anything related to the max_interval. Hence, I think it won't help the uncessary wakeup problem I am trying to solve for this condition - util > 80% and when there are no pending discards. Please let me know if i am missing something. Copied, thanks for the explanation. But there is another case which can cause this issue in the case of disk util < 80%. wait_ms = DEF_MIN_DISCARD_ISSUE_TIME; do { wait_event_interruptible_timeout(, wait_ms); ... if (!atomic_read(&dcc->discard_cmd_cnt)) [1] new statement continue; } while(); Then the loop will wakeup whenever 50ms timeout. So, to avoid this case, shouldn't we reset wait_ms to dpolicy.max_interval at [1]? Meanwhile, how about relocating discard_cmd_cnt check after __init_discard_policy(DPOLICY_FORCE)? and olny set .max_interval to DEF_MAX_DISCARD_ISSUE_TIME if there is no discard command, and keep .granularity to 1? Thanks, Thanks, Sahitya. Thanks, Signed-off-by: Sahitya Tummala --- fs/f2fs/segment.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index dced46c..df30220 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -1112,6 +1112,8 @@ static void __init_discard_policy(struct f2fs_sb_info *sbi, struct discard_policy *dpolicy, int discard_type, unsigned int granularity) { + struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info; + /* common policy */ dpolicy->type = discard_type; dpolicy->sync = true; @@ -1129,7 +1131,8 @@ static void __init_discard_policy(struct f2fs_sb_info *sbi, dpolicy->io_aware = true; dpolicy->sync = false; dpolicy->ordered = true; - if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL) { + if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL && + atomic_read(&dcc->discard_cmd_cnt)) { dpolicy->granularity = 1; dpolicy->max_interval = DEF_MIN_DISCARD_ISSUE_TIME; }
Re: [PATCH] f2fs: fix the discard thread sleep timeout under high utilization
Sahitya, On 2021/3/15 12:56, Sahitya Tummala wrote: When f2fs is heavily utilized over 80%, the current discard policy sets the max sleep timeout of discard thread as 50ms (DEF_MIN_DISCARD_ISSUE_TIME). But this is set even when there are no pending discard commands to be issued. This results into unnecessary frequent and periodic wake ups of the discard thread. This patch adds check for pending discard commands in addition to heavy utilization condition to prevent those wake ups. Could this commit fix your issue? https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev&id=43f8c47ea7d59c7b2270835f1d7c4618a1ea27b6 Thanks, Signed-off-by: Sahitya Tummala --- fs/f2fs/segment.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index dced46c..df30220 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -1112,6 +1112,8 @@ static void __init_discard_policy(struct f2fs_sb_info *sbi, struct discard_policy *dpolicy, int discard_type, unsigned int granularity) { + struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info; + /* common policy */ dpolicy->type = discard_type; dpolicy->sync = true; @@ -1129,7 +1131,8 @@ static void __init_discard_policy(struct f2fs_sb_info *sbi, dpolicy->io_aware = true; dpolicy->sync = false; dpolicy->ordered = true; - if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL) { + if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL && + atomic_read(&dcc->discard_cmd_cnt)) { dpolicy->granularity = 1; dpolicy->max_interval = DEF_MIN_DISCARD_ISSUE_TIME; }
Re: [f2fs-dev] [PATCH v3] f2fs: add sysfs nodes to get runtime compression stat
On 2021/3/11 10:32, Daeho Jeong wrote: From: Daeho Jeong I've added new sysfs nodes to show runtime compression stat since mount. compr_written_block - show the block count written after compression compr_saved_block - show the saved block count with compression compr_new_inode - show the count of inode newly enabled for compression Signed-off-by: Daeho Jeong --- v2: thanks to kernel test robot , fixed compile issue related to kernel config. v3: changed sysfs nodes' names and made them runtime stat, not persistent on disk --- Documentation/ABI/testing/sysfs-fs-f2fs | 20 + fs/f2fs/compress.c | 1 + fs/f2fs/f2fs.h | 19 fs/f2fs/super.c | 7 +++ fs/f2fs/sysfs.c | 58 + 5 files changed, 105 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs index cbeac1bebe2f..f2981eb319cb 100644 --- a/Documentation/ABI/testing/sysfs-fs-f2fs +++ b/Documentation/ABI/testing/sysfs-fs-f2fs @@ -409,3 +409,23 @@ Description: Give a way to change checkpoint merge daemon's io priority. I/O priority "3". We can select the class between "rt" and "be", and set the I/O priority within valid range of it. "," delimiter is necessary in between I/O class and priority number. + +What: /sys/fs/f2fs//compr_written_block +Date: March 2021 +Contact: "Daeho Jeong" +Description: Show the block count written after compression since mount. + If you write "0" here, you can initialize compr_written_block and + compr_saved_block to "0". + +What: /sys/fs/f2fs//compr_saved_block +Date: March 2021 +Contact: "Daeho Jeong" +Description: Show the saved block count with compression since mount. + If you write "0" here, you can initialize compr_written_block and + compr_saved_block to "0". + +What: /sys/fs/f2fs//compr_new_inode +Date: March 2021 +Contact: "Daeho Jeong" +Description: Show the count of inode newly enabled for compression since mount. + If you write "0" here, you can initialize compr_new_inode to "0". diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c index 77fa342de38f..3c9d797dbdd6 100644 --- a/fs/f2fs/compress.c +++ b/fs/f2fs/compress.c @@ -1353,6 +1353,7 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc, if (fio.compr_blocks) f2fs_i_compr_blocks_update(inode, fio.compr_blocks - 1, false); f2fs_i_compr_blocks_update(inode, cc->nr_cpages, true); + add_compr_block_stat(inode, cc->nr_cpages); If compressed cluster was overwritten as normal cluster, compr_saved_block value won't be decreased, is it fine? set_inode_flag(cc->inode, FI_APPEND_WRITE); if (cc->cluster_idx == 0) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index e2d302ae3a46..2c989f8caf05 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -1623,6 +1623,11 @@ struct f2fs_sb_info { #ifdef CONFIG_F2FS_FS_COMPRESSION struct kmem_cache *page_array_slab; /* page array entry */ unsigned int page_array_slab_size; /* default page array slab size */ + + /* For runtime compression statistics */ + atomic64_t compr_written_block; + atomic64_t compr_saved_block; + atomic_t compr_new_inode; #endif }; @@ -3955,6 +3960,18 @@ int f2fs_init_page_array_cache(struct f2fs_sb_info *sbi); void f2fs_destroy_page_array_cache(struct f2fs_sb_info *sbi); int __init f2fs_init_compress_cache(void); void f2fs_destroy_compress_cache(void); +#define inc_compr_inode_stat(inode)\ + do {\ + struct f2fs_sb_info *sbi = F2FS_I_SB(inode);\ + atomic_inc(&sbi->compr_new_inode); \ + } while (0) +#define add_compr_block_stat(inode, blocks)\ + do {\ + struct f2fs_sb_info *sbi = F2FS_I_SB(inode);\ + int diff = F2FS_I(inode)->i_cluster_size - blocks; \ + atomic64_add(blocks, &sbi->compr_written_block); \ + atomic64_add(diff, &sbi->compr_saved_block); \ + } while (0) #else static inline bool f2fs_is_compressed_page(struct page *page) { return false; } static inline bool f2fs_is_compress_backend_ready(struct inode *inode) @@ -3983,6 +4000,7 @@ static inline int f2fs_init_page_array_cache(struct f2fs_sb_info *sbi) { return static inline void f2fs_destroy_page_array_cache(struct f2fs_sb_info *sbi) { } static inline int __init f2fs_init_compress_cache(void) { return 0; } static inline void f2fs_destroy_compress_cache(v
Re: [PATCH v2] f2fs: allow to change discard policy based on cached discard cmds
On 2021/3/11 16:59, Sahitya Tummala wrote: With the default DPOLICY_BG discard thread is ioaware, which prevents the discard thread from issuing the discard commands. On low RAM setups, it is observed that these discard commands in the cache are consuming high memory. This patch aims to relax the memory pressure on the system due to f2fs pending discard cmds by changing the policy to DPOLICY_FORCE based on the nm_i->ram_thresh configured. Signed-off-by: Sahitya Tummala --- v2: - by mistake the last else condition was modified, fix it now. Oh, yes, Reviewed-by: Chao Yu Thanks,
Re: [PATCH] f2fs: allow to change discard policy based on cached discard cmds
On 2021/3/11 9:47, Sahitya Tummala wrote: With the default DPOLICY_BG discard thread is ioaware, which prevents the discard thread from issuing the discard commands. On low RAM setups, it is observed that these discard commands in the cache are consuming high memory. This patch aims to relax the memory pressure on the system due to f2fs pending discard cmds by changing the policy to DPOLICY_FORCE based on the nm_i->ram_thresh configured. Agreed. Signed-off-by: Sahitya Tummala Reviewed-by: Chao Yu Thanks,
Re: [PATCH] configfs: Fix use-after-free issue in __configfs_open_file
Hi Joel, Christoph, Al Does any one have time to review this patch, ten days past... Thanks, On 2021/3/5 11:29, Chao Yu wrote: +Cc fsdevel Ping, Any comments one this patch? On 2021/3/1 14:10, Chao Yu wrote: From: Daiyue Zhang Commit b0841eefd969 ("configfs: provide exclusion between IO and removals") uses ->frag_dead to mark the fragment state, thus no bothering with extra refcount on config_item when opening a file. The configfs_get_config_item was removed in __configfs_open_file, but not with config_item_put. So the refcount on config_item will lost its balance, causing use-after-free issues in some occasions like this: Test: 1. Mount configfs on /config with read-only items: drwxrwx--- 289 root root0 2021-04-01 11:55 /config drwxr-xr-x 2 root root0 2021-04-01 11:54 /config/a --w--w--w- 1 root root 4096 2021-04-01 11:53 /config/a/1.txt .. 2. Then run: for file in /config do echo $file grep -R 'key' $file done 3. __configfs_open_file will be called in parallel, the first one got called will do: if (file->f_mode & FMODE_READ) { if (!(inode->i_mode & S_IRUGO)) goto out_put_module; config_item_put(buffer->item); kref_put() package_details_release() kfree() the other one will run into use-after-free issues like this: BUG: KASAN: use-after-free in __configfs_open_file+0x1bc/0x3b0 Read of size 8 at addr fff155f02480 by task grep/13096 CPU: 0 PID: 13096 Comm: grep VIP: 00 Tainted: GW 4.14.116-kasan #1 TGID: 13096 Comm: grep Call trace: dump_stack+0x118/0x160 kasan_report+0x22c/0x294 __asan_load8+0x80/0x88 __configfs_open_file+0x1bc/0x3b0 configfs_open_file+0x28/0x34 do_dentry_open+0x2cc/0x5c0 vfs_open+0x80/0xe0 path_openat+0xd8c/0x2988 do_filp_open+0x1c4/0x2fc do_sys_open+0x23c/0x404 SyS_openat+0x38/0x48 Allocated by task 2138: kasan_kmalloc+0xe0/0x1ac kmem_cache_alloc_trace+0x334/0x394 packages_make_item+0x4c/0x180 configfs_mkdir+0x358/0x740 vfs_mkdir2+0x1bc/0x2e8 SyS_mkdirat+0x154/0x23c el0_svc_naked+0x34/0x38 Freed by task 13096: kasan_slab_free+0xb8/0x194 kfree+0x13c/0x910 package_details_release+0x524/0x56c kref_put+0xc4/0x104 config_item_put+0x24/0x34 __configfs_open_file+0x35c/0x3b0 configfs_open_file+0x28/0x34 do_dentry_open+0x2cc/0x5c0 vfs_open+0x80/0xe0 path_openat+0xd8c/0x2988 do_filp_open+0x1c4/0x2fc do_sys_open+0x23c/0x404 SyS_openat+0x38/0x48 el0_svc_naked+0x34/0x38 To fix this issue, remove the config_item_put in __configfs_open_file to balance the refcount of config_item. Fixes: b0841eefd969 ("configfs: provide exclusion between IO and removals") Cc: Al Viro Cc: Joel Becker Cc: Christoph Hellwig Signed-off-by: Daiyue Zhang Signed-off-by: Yi Chen Signed-off-by: Ge Qiu Reviewed-by: Chao Yu --- fs/configfs/file.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/configfs/file.c b/fs/configfs/file.c index 1f0270229d7b..da8351d1e455 100644 --- a/fs/configfs/file.c +++ b/fs/configfs/file.c @@ -378,7 +378,7 @@ static int __configfs_open_file(struct inode *inode, struct file *file, int type attr = to_attr(dentry); if (!attr) - goto out_put_item; + goto out_free_buffer; if (type & CONFIGFS_ITEM_BIN_ATTR) { buffer->bin_attr = to_bin_attr(dentry); @@ -391,7 +391,7 @@ static int __configfs_open_file(struct inode *inode, struct file *file, int type /* Grab the module reference for this attribute if we have one */ error = -ENODEV; if (!try_module_get(buffer->owner)) - goto out_put_item; + goto out_free_buffer; error = -EACCES; if (!buffer->item->ci_type) @@ -435,8 +435,6 @@ static int __configfs_open_file(struct inode *inode, struct file *file, int type out_put_module: module_put(buffer->owner); -out_put_item: - config_item_put(buffer->item); out_free_buffer: up_read(&frag->frag_sem); kfree(buffer); .
Re: [f2fs-dev] [PATCH v2] f2fs: add sysfs nodes to get accumulated compression info
On 2021/3/9 21:00, Daeho Jeong wrote: 2021년 3월 9일 (화) 오후 6:22, Chao Yu 님이 작성: On 2021/3/5 10:24, Daeho Jeong wrote: From: Daeho Jeong Added acc_compr_inodes to show accumulated compressed inode count and acc_compr_blocks to show accumulated secured block count with I noticed that these stat numbers are recorded in extra reserved area in hot node curseg journal, the journal will be persisted only for umount or fastboot checkpoint, so the numbers are not so accurate... does this satisfy your requirement? Yes, we are satisfied with just getting rough number of them. But, it Alright, would be better if you suggest more accurate way. :) I think this is the cheapest way to store rough number, otherwise it needs to change f2fs_checkpoint structure layout or add a new inner inode to persist these stat numbers if we want more accurate one. compression in sysfs. These can be re-initialized to "0" by writing "0" value in one of both. Why do we allow reset the stat numbers? Actually, I want to have a way to clear any stale number of them, but I agree we don't need this. Why not covering all code with macro CONFIG_F2FS_FS_COMPRESSION, since these numbers are only be updated when we enable compression. I wanted to keep the info even in the kernel with doesn't support per-file compression if those had been written once. What do you think? Sure, if so it's fine to me. :) Thanks, Thanks, .
Re: [f2fs-dev] [PATCH v2] f2fs: expose # of overprivision segments
On 2021/3/4 3:28, Jaegeuk Kim wrote: This is useful when checking conditions during checkpoint=disable in Android. Signed-off-by: Jaegeuk Kim Reviewed-by: Chao Yu Thanks,
Re: [f2fs-dev] [PATCH v2] f2fs: add sysfs nodes to get accumulated compression info
On 2021/3/5 10:24, Daeho Jeong wrote: From: Daeho Jeong Added acc_compr_inodes to show accumulated compressed inode count and acc_compr_blocks to show accumulated secured block count with I noticed that these stat numbers are recorded in extra reserved area in hot node curseg journal, the journal will be persisted only for umount or fastboot checkpoint, so the numbers are not so accurate... does this satisfy your requirement? compression in sysfs. These can be re-initialized to "0" by writing "0" value in one of both. Why do we allow reset the stat numbers? Why not covering all code with macro CONFIG_F2FS_FS_COMPRESSION, since these numbers are only be updated when we enable compression. Thanks,
Re: [f2fs-dev] [PATCH] f2fs: expose # of overprivision segments
On 2021/3/9 8:07, Jaegeuk Kim wrote: On 03/05, Chao Yu wrote: On 2021/3/5 1:50, Jaegeuk Kim wrote: On 03/04, Chao Yu wrote: On 2021/3/3 2:44, Jaegeuk Kim wrote: On 03/02, Jaegeuk Kim wrote: On 03/02, Chao Yu wrote: On 2021/3/2 13:42, Jaegeuk Kim wrote: This is useful when checking conditions during checkpoint=disable in Android. This sysfs entry is readonly, how about putting this at /sys/fs/f2fs//stat/? Urg.. "stat" is a bit confused. I'll take a look a better ones. Oh, I mean put it into "stat" directory, not "stat" entry, something like this: /sys/fs/f2fs//stat/ovp_segments I meant that too. Why is it like stat, since it's a geomerty? Hmm.. I feel a little bit weired to treat ovp_segments as 'stat' class, one reason is ovp_segments is readonly and is matching the readonly attribute of a stat. It seems I don't fully understand what you suggest here. I don't want to add the # of ovp_segments in /stat, since it is not part of status, but put it in / to sync with other # of free/dirty segments. If you can't read out easily, I suggest to create symlinks to organize all the current mess. Alright. Taking a look at other entries using in Android, I feel that this one can't be in stat or whatever other location, since I worry about the consistency with similar dirty/free segments. It seems it's not easy to clean up the existing ones anymore. Well, actually, the entry number are still increasing continuously, the result is that it becomes more and more slower and harder for me to find target entry name from that directory. IMO, once new readonly entry was added to "" directory, there is no chance to reloacate it due to interface compatibility. So I think this is the only chance to put it to the appropriate place at this time. I know, but this will diverge those info into different places. I don't have big concern when finding a specific entry with this tho, how about making symlinks to create a dir structure for your easy access? Or, using a script would be alternative way. Yes, there should be some alternative ways to help to access f2fs sysfs interface, but from a point view of user, I'm not sure he can figure out those ways. For those fs meta stat, why not adding a single entry to include all info you need rather than adding them one by one? e.g. You can add that in /proc as well, which requires to parse back when retrieving specific values. Copied. Thanks, /proc/fs/f2fs//super_block /proc/fs/f2fs//checkpoint /proc/fs/f2fs//nat_table /proc/fs/f2fs//sit_table ... Thanks, Thanks, Signed-off-by: Jaegeuk Kim --- fs/f2fs/sysfs.c | 8 1 file changed, 8 insertions(+) diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c index e38a7f6921dd..254b6fa17406 100644 --- a/fs/f2fs/sysfs.c +++ b/fs/f2fs/sysfs.c @@ -91,6 +91,13 @@ static ssize_t free_segments_show(struct f2fs_attr *a, (unsigned long long)(free_segments(sbi))); } +static ssize_t ovp_segments_show(struct f2fs_attr *a, + struct f2fs_sb_info *sbi, char *buf) +{ + return sprintf(buf, "%llu\n", + (unsigned long long)(overprovision_segments(sbi))); +} + static ssize_t lifetime_write_kbytes_show(struct f2fs_attr *a, struct f2fs_sb_info *sbi, char *buf) { @@ -629,6 +636,7 @@ F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, node_io_flag, node_io_flag); F2FS_RW_ATTR(CPRC_INFO, ckpt_req_control, ckpt_thread_ioprio, ckpt_thread_ioprio); F2FS_GENERAL_RO_ATTR(dirty_segments); F2FS_GENERAL_RO_ATTR(free_segments); +F2FS_GENERAL_RO_ATTR(ovp_segments); Missed to add document entry in Documentation/ABI/testing/sysfs-fs-f2fs? Yeah, thanks. Thanks, F2FS_GENERAL_RO_ATTR(lifetime_write_kbytes); F2FS_GENERAL_RO_ATTR(features); F2FS_GENERAL_RO_ATTR(current_reserved_blocks); ___ Linux-f2fs-devel mailing list linux-f2fs-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel . . .
Re: [f2fs-dev] [PATCH v4] f2fs: compress: add compress_inode to cache compressed blockst
On 2021/3/9 8:01, Jaegeuk Kim wrote: On 03/05, Chao Yu wrote: On 2021/3/5 4:20, Jaegeuk Kim wrote: On 02/27, Jaegeuk Kim wrote: On 02/04, Chao Yu wrote: Jaegeuk, On 2021/2/2 16:00, Chao Yu wrote: - for (i = 0; i < dic->nr_cpages; i++) { + for (i = 0; i < cc->nr_cpages; i++) { struct page *page = dic->cpages[i]; por_fsstress still hang in this line? I'm stuck on testing the patches, since the latest kernel is panicking somehow. Let me update later, once I can test a bit. :( It seems this works without error. https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev&id=4e6e1364dccba80ed44925870b97fbcf989b96c9 Ah, good news. Thanks for helping to test the patch. :) Hmm, I hit this again. Let me check w/o compress_cache back. :( Oops :(
Re: [PATCH v2] erofs: fix bio->bi_max_vecs behavior change
On 2021/3/8 10:36, Gao Xiang wrote: Hi Chao, On Mon, Mar 08, 2021 at 09:29:30AM +0800, Chao Yu wrote: On 2021/3/6 12:04, Gao Xiang wrote: From: Gao Xiang Martin reported an issue that directory read could be hung on the latest -rc kernel with some certain image. The root cause is that commit baa2c7c97153 ("block: set .bi_max_vecs as actual allocated vector number") changes .bi_max_vecs behavior. bio->bi_max_vecs is set as actual allocated vector number rather than the requested number now. Let's avoid using .bi_max_vecs completely instead. Reported-by: Martin DEVERA Signed-off-by: Gao Xiang Looks good to me, btw, it needs to Cc stable mailing list? Reviewed-by: Chao Yu Thanks for your review. <= 5.11 kernels are not impacted. For now, this only impacts 5.12-rc due to a bio behavior change (see commit baa2c7c97153). So personally I think just leave as it is is fine. Okay, so that's fine if you send pull request before 5.12 formal release. ;) Thanks, Thanks, Gao Xiang Thanks, .
Re: [PATCH v2] erofs: fix bio->bi_max_vecs behavior change
On 2021/3/6 12:04, Gao Xiang wrote: From: Gao Xiang Martin reported an issue that directory read could be hung on the latest -rc kernel with some certain image. The root cause is that commit baa2c7c97153 ("block: set .bi_max_vecs as actual allocated vector number") changes .bi_max_vecs behavior. bio->bi_max_vecs is set as actual allocated vector number rather than the requested number now. Let's avoid using .bi_max_vecs completely instead. Reported-by: Martin DEVERA Signed-off-by: Gao Xiang Looks good to me, btw, it needs to Cc stable mailing list? Reviewed-by: Chao Yu Thanks,
[PATCH] f2fs: fix to align to section for fallocate() on pinned file
Now, fallocate() on a pinned file only allocates blocks which aligns to segment rather than section, so GC may try to migrate pinned file's block, and after several times of failure, pinned file's block could be migrated to other place, however user won't be aware of such condition, and then old obsolete block address may be readed/written incorrectly. To avoid such condition, let's try to allocate pinned file's blocks with section alignment. Signed-off-by: Chao Yu --- fs/f2fs/f2fs.h| 2 +- fs/f2fs/file.c| 2 +- fs/f2fs/segment.c | 34 ++ 3 files changed, 28 insertions(+), 10 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 700ef46e4147..ff6c212bf2c5 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -3399,7 +3399,7 @@ void f2fs_get_new_segment(struct f2fs_sb_info *sbi, unsigned int *newseg, bool new_sec, int dir); void f2fs_allocate_segment_for_resize(struct f2fs_sb_info *sbi, int type, unsigned int start, unsigned int end); -void f2fs_allocate_new_segment(struct f2fs_sb_info *sbi, int type); +void f2fs_allocate_new_section(struct f2fs_sb_info *sbi, int type); void f2fs_allocate_new_segments(struct f2fs_sb_info *sbi); int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct fstrim_range *range); bool f2fs_exist_trim_candidates(struct f2fs_sb_info *sbi, diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 1863944f4073..1b70b56434e4 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -1666,7 +1666,7 @@ static int expand_inode_data(struct inode *inode, loff_t offset, down_write(&sbi->pin_sem); f2fs_lock_op(sbi); - f2fs_allocate_new_segment(sbi, CURSEG_COLD_DATA_PINNED); + f2fs_allocate_new_section(sbi, CURSEG_COLD_DATA_PINNED); f2fs_unlock_op(sbi); map.m_seg_type = CURSEG_COLD_DATA_PINNED; diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index b26217910b85..ff50e79d2bb7 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -2918,7 +2918,8 @@ void f2fs_allocate_segment_for_resize(struct f2fs_sb_info *sbi, int type, up_read(&SM_I(sbi)->curseg_lock); } -static void __allocate_new_segment(struct f2fs_sb_info *sbi, int type) +static void __allocate_new_segment(struct f2fs_sb_info *sbi, int type, + bool new_sec) { struct curseg_info *curseg = CURSEG_I(sbi, type); unsigned int old_segno; @@ -2926,10 +2927,22 @@ static void __allocate_new_segment(struct f2fs_sb_info *sbi, int type) if (!curseg->inited) goto alloc; - if (!curseg->next_blkoff && - !get_valid_blocks(sbi, curseg->segno, false) && - !get_ckpt_valid_blocks(sbi, curseg->segno)) - return; + if (curseg->next_blkoff || + get_valid_blocks(sbi, curseg->segno, new_sec)) + goto alloc; + + if (new_sec) { + unsigned int segno = START_SEGNO(curseg->segno); + int i; + + for (i = 0; i < sbi->segs_per_sec; i++, segno++) { + if (get_ckpt_valid_blocks(sbi, segno)) + goto alloc; + } + } else { + if (!get_ckpt_valid_blocks(sbi, curseg->segno)) + return; + } alloc: old_segno = curseg->segno; @@ -2937,10 +2950,15 @@ static void __allocate_new_segment(struct f2fs_sb_info *sbi, int type) locate_dirty_segment(sbi, old_segno); } -void f2fs_allocate_new_segment(struct f2fs_sb_info *sbi, int type) +static void __allocate_new_section(struct f2fs_sb_info *sbi, int type) +{ + __allocate_new_segment(sbi, type, true); +} + +void f2fs_allocate_new_section(struct f2fs_sb_info *sbi, int type) { down_write(&SIT_I(sbi)->sentry_lock); - __allocate_new_segment(sbi, type); + __allocate_new_section(sbi, type); up_write(&SIT_I(sbi)->sentry_lock); } @@ -2950,7 +2968,7 @@ void f2fs_allocate_new_segments(struct f2fs_sb_info *sbi) down_write(&SIT_I(sbi)->sentry_lock); for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_DATA; i++) - __allocate_new_segment(sbi, i); + __allocate_new_segment(sbi, i, false); up_write(&SIT_I(sbi)->sentry_lock); } -- 2.29.2
Re: [PATCH] configfs: Fix use-after-free issue in __configfs_open_file
+Cc fsdevel Ping, Any comments one this patch? On 2021/3/1 14:10, Chao Yu wrote: From: Daiyue Zhang Commit b0841eefd969 ("configfs: provide exclusion between IO and removals") uses ->frag_dead to mark the fragment state, thus no bothering with extra refcount on config_item when opening a file. The configfs_get_config_item was removed in __configfs_open_file, but not with config_item_put. So the refcount on config_item will lost its balance, causing use-after-free issues in some occasions like this: Test: 1. Mount configfs on /config with read-only items: drwxrwx--- 289 root root0 2021-04-01 11:55 /config drwxr-xr-x 2 root root0 2021-04-01 11:54 /config/a --w--w--w- 1 root root 4096 2021-04-01 11:53 /config/a/1.txt .. 2. Then run: for file in /config do echo $file grep -R 'key' $file done 3. __configfs_open_file will be called in parallel, the first one got called will do: if (file->f_mode & FMODE_READ) { if (!(inode->i_mode & S_IRUGO)) goto out_put_module; config_item_put(buffer->item); kref_put() package_details_release() kfree() the other one will run into use-after-free issues like this: BUG: KASAN: use-after-free in __configfs_open_file+0x1bc/0x3b0 Read of size 8 at addr fff155f02480 by task grep/13096 CPU: 0 PID: 13096 Comm: grep VIP: 00 Tainted: GW 4.14.116-kasan #1 TGID: 13096 Comm: grep Call trace: dump_stack+0x118/0x160 kasan_report+0x22c/0x294 __asan_load8+0x80/0x88 __configfs_open_file+0x1bc/0x3b0 configfs_open_file+0x28/0x34 do_dentry_open+0x2cc/0x5c0 vfs_open+0x80/0xe0 path_openat+0xd8c/0x2988 do_filp_open+0x1c4/0x2fc do_sys_open+0x23c/0x404 SyS_openat+0x38/0x48 Allocated by task 2138: kasan_kmalloc+0xe0/0x1ac kmem_cache_alloc_trace+0x334/0x394 packages_make_item+0x4c/0x180 configfs_mkdir+0x358/0x740 vfs_mkdir2+0x1bc/0x2e8 SyS_mkdirat+0x154/0x23c el0_svc_naked+0x34/0x38 Freed by task 13096: kasan_slab_free+0xb8/0x194 kfree+0x13c/0x910 package_details_release+0x524/0x56c kref_put+0xc4/0x104 config_item_put+0x24/0x34 __configfs_open_file+0x35c/0x3b0 configfs_open_file+0x28/0x34 do_dentry_open+0x2cc/0x5c0 vfs_open+0x80/0xe0 path_openat+0xd8c/0x2988 do_filp_open+0x1c4/0x2fc do_sys_open+0x23c/0x404 SyS_openat+0x38/0x48 el0_svc_naked+0x34/0x38 To fix this issue, remove the config_item_put in __configfs_open_file to balance the refcount of config_item. Fixes: b0841eefd969 ("configfs: provide exclusion between IO and removals") Cc: Al Viro Cc: Joel Becker Cc: Christoph Hellwig Signed-off-by: Daiyue Zhang Signed-off-by: Yi Chen Signed-off-by: Ge Qiu Reviewed-by: Chao Yu --- fs/configfs/file.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/configfs/file.c b/fs/configfs/file.c index 1f0270229d7b..da8351d1e455 100644 --- a/fs/configfs/file.c +++ b/fs/configfs/file.c @@ -378,7 +378,7 @@ static int __configfs_open_file(struct inode *inode, struct file *file, int type attr = to_attr(dentry); if (!attr) - goto out_put_item; + goto out_free_buffer; if (type & CONFIGFS_ITEM_BIN_ATTR) { buffer->bin_attr = to_bin_attr(dentry); @@ -391,7 +391,7 @@ static int __configfs_open_file(struct inode *inode, struct file *file, int type /* Grab the module reference for this attribute if we have one */ error = -ENODEV; if (!try_module_get(buffer->owner)) - goto out_put_item; + goto out_free_buffer; error = -EACCES; if (!buffer->item->ci_type) @@ -435,8 +435,6 @@ static int __configfs_open_file(struct inode *inode, struct file *file, int type out_put_module: module_put(buffer->owner); -out_put_item: - config_item_put(buffer->item); out_free_buffer: up_read(&frag->frag_sem); kfree(buffer);
Re: [f2fs-dev] [PATCH v4] f2fs: compress: add compress_inode to cache compressed blockst
On 2021/3/5 4:20, Jaegeuk Kim wrote: On 02/27, Jaegeuk Kim wrote: On 02/04, Chao Yu wrote: Jaegeuk, On 2021/2/2 16:00, Chao Yu wrote: - for (i = 0; i < dic->nr_cpages; i++) { + for (i = 0; i < cc->nr_cpages; i++) { struct page *page = dic->cpages[i]; por_fsstress still hang in this line? I'm stuck on testing the patches, since the latest kernel is panicking somehow. Let me update later, once I can test a bit. :( It seems this works without error. https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?h=dev&id=4e6e1364dccba80ed44925870b97fbcf989b96c9 Ah, good news. Thanks for helping to test the patch. :) Thanks, Thanks, block_t blkaddr; struct bio_post_read_ctx *ctx; @@ -2201,6 +2207,14 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret, blkaddr = data_blkaddr(dn.inode, dn.node_page, dn.ofs_in_node + i + 1); + f2fs_wait_on_block_writeback(inode, blkaddr); + + if (f2fs_load_compressed_page(sbi, page, blkaddr)) { + if (atomic_dec_and_test(&dic->remaining_pages)) + f2fs_decompress_cluster(dic); + continue; + } + ___ Linux-f2fs-devel mailing list linux-f2fs-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel .
Re: [f2fs-dev] [PATCH] f2fs: expose # of overprivision segments
On 2021/3/5 1:50, Jaegeuk Kim wrote: On 03/04, Chao Yu wrote: On 2021/3/3 2:44, Jaegeuk Kim wrote: On 03/02, Jaegeuk Kim wrote: On 03/02, Chao Yu wrote: On 2021/3/2 13:42, Jaegeuk Kim wrote: This is useful when checking conditions during checkpoint=disable in Android. This sysfs entry is readonly, how about putting this at /sys/fs/f2fs//stat/? Urg.. "stat" is a bit confused. I'll take a look a better ones. Oh, I mean put it into "stat" directory, not "stat" entry, something like this: /sys/fs/f2fs//stat/ovp_segments I meant that too. Why is it like stat, since it's a geomerty? Hmm.. I feel a little bit weired to treat ovp_segments as 'stat' class, one reason is ovp_segments is readonly and is matching the readonly attribute of a stat. Taking a look at other entries using in Android, I feel that this one can't be in stat or whatever other location, since I worry about the consistency with similar dirty/free segments. It seems it's not easy to clean up the existing ones anymore. Well, actually, the entry number are still increasing continuously, the result is that it becomes more and more slower and harder for me to find target entry name from that directory. IMO, once new readonly entry was added to "" directory, there is no chance to reloacate it due to interface compatibility. So I think this is the only chance to put it to the appropriate place at this time. I know, but this will diverge those info into different places. I don't have big concern when finding a specific entry with this tho, how about making symlinks to create a dir structure for your easy access? Or, using a script would be alternative way. Yes, there should be some alternative ways to help to access f2fs sysfs interface, but from a point view of user, I'm not sure he can figure out those ways. For those fs meta stat, why not adding a single entry to include all info you need rather than adding them one by one? e.g. /proc/fs/f2fs//super_block /proc/fs/f2fs//checkpoint /proc/fs/f2fs//nat_table /proc/fs/f2fs//sit_table ... Thanks, Thanks, Signed-off-by: Jaegeuk Kim --- fs/f2fs/sysfs.c | 8 1 file changed, 8 insertions(+) diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c index e38a7f6921dd..254b6fa17406 100644 --- a/fs/f2fs/sysfs.c +++ b/fs/f2fs/sysfs.c @@ -91,6 +91,13 @@ static ssize_t free_segments_show(struct f2fs_attr *a, (unsigned long long)(free_segments(sbi))); } +static ssize_t ovp_segments_show(struct f2fs_attr *a, + struct f2fs_sb_info *sbi, char *buf) +{ + return sprintf(buf, "%llu\n", + (unsigned long long)(overprovision_segments(sbi))); +} + static ssize_t lifetime_write_kbytes_show(struct f2fs_attr *a, struct f2fs_sb_info *sbi, char *buf) { @@ -629,6 +636,7 @@ F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, node_io_flag, node_io_flag); F2FS_RW_ATTR(CPRC_INFO, ckpt_req_control, ckpt_thread_ioprio, ckpt_thread_ioprio); F2FS_GENERAL_RO_ATTR(dirty_segments); F2FS_GENERAL_RO_ATTR(free_segments); +F2FS_GENERAL_RO_ATTR(ovp_segments); Missed to add document entry in Documentation/ABI/testing/sysfs-fs-f2fs? Yeah, thanks. Thanks, F2FS_GENERAL_RO_ATTR(lifetime_write_kbytes); F2FS_GENERAL_RO_ATTR(features); F2FS_GENERAL_RO_ATTR(current_reserved_blocks); ___ Linux-f2fs-devel mailing list linux-f2fs-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel . .
Re: [f2fs-dev] [PATCH] f2fs: fix a redundant call to f2fs_balance_fs if an error occurs
On 2021/3/4 17:21, Colin King wrote: From: Colin Ian King The uninitialized variable dn.node_changed does not get set when a call to f2fs_get_node_page fails. This uninitialized value gets used in the call to f2fs_balance_fs() that may or not may not balances dirty node and dentry pages depending on the uninitialized state of the variable. Fix this by only calling f2fs_balance_fs if err is not set. Thanks to Jaegeuk Kim for suggesting an appropriate fix. Addresses-Coverity: ("Uninitialized scalar variable") Fixes: 2a3407607028 ("f2fs: call f2fs_balance_fs only when node was changed") Signed-off-by: Colin Ian King Reviewed-by: Chao Yu Thanks,
Re: [f2fs-dev] [PATCH] f2fs: expose # of overprivision segments
On 2021/3/3 2:44, Jaegeuk Kim wrote: On 03/02, Jaegeuk Kim wrote: On 03/02, Chao Yu wrote: On 2021/3/2 13:42, Jaegeuk Kim wrote: This is useful when checking conditions during checkpoint=disable in Android. This sysfs entry is readonly, how about putting this at /sys/fs/f2fs//stat/? Urg.. "stat" is a bit confused. I'll take a look a better ones. Oh, I mean put it into "stat" directory, not "stat" entry, something like this: /sys/fs/f2fs//stat/ovp_segments Taking a look at other entries using in Android, I feel that this one can't be in stat or whatever other location, since I worry about the consistency with similar dirty/free segments. It seems it's not easy to clean up the existing ones anymore. Well, actually, the entry number are still increasing continuously, the result is that it becomes more and more slower and harder for me to find target entry name from that directory. IMO, once new readonly entry was added to "" directory, there is no chance to reloacate it due to interface compatibility. So I think this is the only chance to put it to the appropriate place at this time. Thanks, Signed-off-by: Jaegeuk Kim --- fs/f2fs/sysfs.c | 8 1 file changed, 8 insertions(+) diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c index e38a7f6921dd..254b6fa17406 100644 --- a/fs/f2fs/sysfs.c +++ b/fs/f2fs/sysfs.c @@ -91,6 +91,13 @@ static ssize_t free_segments_show(struct f2fs_attr *a, (unsigned long long)(free_segments(sbi))); } +static ssize_t ovp_segments_show(struct f2fs_attr *a, + struct f2fs_sb_info *sbi, char *buf) +{ + return sprintf(buf, "%llu\n", + (unsigned long long)(overprovision_segments(sbi))); +} + static ssize_t lifetime_write_kbytes_show(struct f2fs_attr *a, struct f2fs_sb_info *sbi, char *buf) { @@ -629,6 +636,7 @@ F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, node_io_flag, node_io_flag); F2FS_RW_ATTR(CPRC_INFO, ckpt_req_control, ckpt_thread_ioprio, ckpt_thread_ioprio); F2FS_GENERAL_RO_ATTR(dirty_segments); F2FS_GENERAL_RO_ATTR(free_segments); +F2FS_GENERAL_RO_ATTR(ovp_segments); Missed to add document entry in Documentation/ABI/testing/sysfs-fs-f2fs? Yeah, thanks. Thanks, F2FS_GENERAL_RO_ATTR(lifetime_write_kbytes); F2FS_GENERAL_RO_ATTR(features); F2FS_GENERAL_RO_ATTR(current_reserved_blocks); ___ Linux-f2fs-devel mailing list linux-f2fs-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel .
Re: [PATCH] f2fs: fix compile warning
On 2021/3/2 16:44, Chao Yu wrote: node.c:2750:13: note: in expansion of macro ‘min’ nrpages = min(last_offset - i, BIO_MAX_PAGES); Oh, after I rebase to last dev branch, compile warning is gone as we changed to use bio_max_segs() rather than min(). Please ignore this patch. Thanks,
[PATCH] f2fs: remove unused file_clear_encrypt()
- file_clear_encrypt() was never be used, remove it. - In addition, relocating macros for cleanup. Signed-off-by: Chao Yu --- fs/f2fs/f2fs.h | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 4fc343aa0a08..ddd54f5cf932 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -637,21 +637,26 @@ enum { #define FADVISE_MODIFIABLE_BITS(FADVISE_COLD_BIT | FADVISE_HOT_BIT) #define file_is_cold(inode)is_file(inode, FADVISE_COLD_BIT) -#define file_wrong_pino(inode) is_file(inode, FADVISE_LOST_PINO_BIT) #define file_set_cold(inode) set_file(inode, FADVISE_COLD_BIT) -#define file_lost_pino(inode) set_file(inode, FADVISE_LOST_PINO_BIT) #define file_clear_cold(inode) clear_file(inode, FADVISE_COLD_BIT) + +#define file_wrong_pino(inode) is_file(inode, FADVISE_LOST_PINO_BIT) +#define file_lost_pino(inode) set_file(inode, FADVISE_LOST_PINO_BIT) #define file_got_pino(inode) clear_file(inode, FADVISE_LOST_PINO_BIT) + #define file_is_encrypt(inode) is_file(inode, FADVISE_ENCRYPT_BIT) #define file_set_encrypt(inode)set_file(inode, FADVISE_ENCRYPT_BIT) -#define file_clear_encrypt(inode) clear_file(inode, FADVISE_ENCRYPT_BIT) + #define file_enc_name(inode) is_file(inode, FADVISE_ENC_NAME_BIT) #define file_set_enc_name(inode) set_file(inode, FADVISE_ENC_NAME_BIT) + #define file_keep_isize(inode) is_file(inode, FADVISE_KEEP_SIZE_BIT) #define file_set_keep_isize(inode) set_file(inode, FADVISE_KEEP_SIZE_BIT) + #define file_is_hot(inode) is_file(inode, FADVISE_HOT_BIT) #define file_set_hot(inode)set_file(inode, FADVISE_HOT_BIT) #define file_clear_hot(inode) clear_file(inode, FADVISE_HOT_BIT) + #define file_is_verity(inode) is_file(inode, FADVISE_VERITY_BIT) #define file_set_verity(inode) set_file(inode, FADVISE_VERITY_BIT) -- 2.29.2
[PATCH] f2fs: fix compile warning
node.c: In function ‘f2fs_restore_node_summary’: ./include/linux/minmax.h:18:28: warning: comparison of distinct pointer types lacks a cast (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) ^ ./include/linux/minmax.h:32:4: note: in expansion of macro ‘__typecheck’ (__typecheck(x, y) && __no_side_effects(x, y)) ^ ./include/linux/minmax.h:42:24: note: in expansion of macro ‘__safe_cmp’ __builtin_choose_expr(__safe_cmp(x, y), \ ^ ./include/linux/minmax.h:51:19: note: in expansion of macro ‘__careful_cmp’ #define min(x, y) __careful_cmp(x, y, <) ^ node.c:2750:13: note: in expansion of macro ‘min’ nrpages = min(last_offset - i, BIO_MAX_PAGES); Use min_t() rather than min() to do type cast before comparing. Signed-off-by: Chao Yu --- fs/f2fs/node.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index a8a0fb890e8d..77f9ffaf9b8e 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -2747,7 +2747,8 @@ int f2fs_restore_node_summary(struct f2fs_sb_info *sbi, sum_entry = &sum->entries[0]; for (i = 0; i < last_offset; i += nrpages, addr += nrpages) { - nrpages = min(last_offset - i, BIO_MAX_PAGES); + nrpages = min_t(unsigned long, last_offset - i, + BIO_MAX_PAGES); /* readahead node pages */ f2fs_ra_meta_pages(sbi, addr, nrpages, META_POR, true); -- 2.29.2
Re: [f2fs-dev] [PATCH] f2fs: expose # of overprivision segments
On 2021/3/2 13:42, Jaegeuk Kim wrote: This is useful when checking conditions during checkpoint=disable in Android. This sysfs entry is readonly, how about putting this at /sys/fs/f2fs//stat/? Signed-off-by: Jaegeuk Kim --- fs/f2fs/sysfs.c | 8 1 file changed, 8 insertions(+) diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c index e38a7f6921dd..254b6fa17406 100644 --- a/fs/f2fs/sysfs.c +++ b/fs/f2fs/sysfs.c @@ -91,6 +91,13 @@ static ssize_t free_segments_show(struct f2fs_attr *a, (unsigned long long)(free_segments(sbi))); } +static ssize_t ovp_segments_show(struct f2fs_attr *a, + struct f2fs_sb_info *sbi, char *buf) +{ + return sprintf(buf, "%llu\n", + (unsigned long long)(overprovision_segments(sbi))); +} + static ssize_t lifetime_write_kbytes_show(struct f2fs_attr *a, struct f2fs_sb_info *sbi, char *buf) { @@ -629,6 +636,7 @@ F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, node_io_flag, node_io_flag); F2FS_RW_ATTR(CPRC_INFO, ckpt_req_control, ckpt_thread_ioprio, ckpt_thread_ioprio); F2FS_GENERAL_RO_ATTR(dirty_segments); F2FS_GENERAL_RO_ATTR(free_segments); +F2FS_GENERAL_RO_ATTR(ovp_segments); Missed to add document entry in Documentation/ABI/testing/sysfs-fs-f2fs? Thanks, F2FS_GENERAL_RO_ATTR(lifetime_write_kbytes); F2FS_GENERAL_RO_ATTR(features); F2FS_GENERAL_RO_ATTR(current_reserved_blocks);
Re: [f2fs-dev] [PATCH v2 3/3] f2fs: check if swapfile is section-alligned
On 2021/3/1 12:58, Huang Jianan via Linux-f2fs-devel wrote: If the swapfile isn't created by pin and fallocate, it can't be guaranteed section-aligned, so it may be selected by f2fs gc. When gc_pin_file_threshold is reached, the address of swapfile may change, but won't be synchronized to swap_extent, so swap will write to wrong address, which will cause data corruption. Signed-off-by: Huang Jianan Signed-off-by: Guo Weichao Reviewed-by: Chao Yu Thanks,
[PATCH] configfs: Fix use-after-free issue in __configfs_open_file
From: Daiyue Zhang Commit b0841eefd969 ("configfs: provide exclusion between IO and removals") uses ->frag_dead to mark the fragment state, thus no bothering with extra refcount on config_item when opening a file. The configfs_get_config_item was removed in __configfs_open_file, but not with config_item_put. So the refcount on config_item will lost its balance, causing use-after-free issues in some occasions like this: Test: 1. Mount configfs on /config with read-only items: drwxrwx--- 289 root root0 2021-04-01 11:55 /config drwxr-xr-x 2 root root0 2021-04-01 11:54 /config/a --w--w--w- 1 root root 4096 2021-04-01 11:53 /config/a/1.txt .. 2. Then run: for file in /config do echo $file grep -R 'key' $file done 3. __configfs_open_file will be called in parallel, the first one got called will do: if (file->f_mode & FMODE_READ) { if (!(inode->i_mode & S_IRUGO)) goto out_put_module; config_item_put(buffer->item); kref_put() package_details_release() kfree() the other one will run into use-after-free issues like this: BUG: KASAN: use-after-free in __configfs_open_file+0x1bc/0x3b0 Read of size 8 at addr fff155f02480 by task grep/13096 CPU: 0 PID: 13096 Comm: grep VIP: 00 Tainted: GW 4.14.116-kasan #1 TGID: 13096 Comm: grep Call trace: dump_stack+0x118/0x160 kasan_report+0x22c/0x294 __asan_load8+0x80/0x88 __configfs_open_file+0x1bc/0x3b0 configfs_open_file+0x28/0x34 do_dentry_open+0x2cc/0x5c0 vfs_open+0x80/0xe0 path_openat+0xd8c/0x2988 do_filp_open+0x1c4/0x2fc do_sys_open+0x23c/0x404 SyS_openat+0x38/0x48 Allocated by task 2138: kasan_kmalloc+0xe0/0x1ac kmem_cache_alloc_trace+0x334/0x394 packages_make_item+0x4c/0x180 configfs_mkdir+0x358/0x740 vfs_mkdir2+0x1bc/0x2e8 SyS_mkdirat+0x154/0x23c el0_svc_naked+0x34/0x38 Freed by task 13096: kasan_slab_free+0xb8/0x194 kfree+0x13c/0x910 package_details_release+0x524/0x56c kref_put+0xc4/0x104 config_item_put+0x24/0x34 __configfs_open_file+0x35c/0x3b0 configfs_open_file+0x28/0x34 do_dentry_open+0x2cc/0x5c0 vfs_open+0x80/0xe0 path_openat+0xd8c/0x2988 do_filp_open+0x1c4/0x2fc do_sys_open+0x23c/0x404 SyS_openat+0x38/0x48 el0_svc_naked+0x34/0x38 To fix this issue, remove the config_item_put in __configfs_open_file to balance the refcount of config_item. Fixes: b0841eefd969 ("configfs: provide exclusion between IO and removals") Cc: Al Viro Cc: Joel Becker Cc: Christoph Hellwig Signed-off-by: Daiyue Zhang Signed-off-by: Yi Chen Signed-off-by: Ge Qiu Reviewed-by: Chao Yu --- fs/configfs/file.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/configfs/file.c b/fs/configfs/file.c index 1f0270229d7b..da8351d1e455 100644 --- a/fs/configfs/file.c +++ b/fs/configfs/file.c @@ -378,7 +378,7 @@ static int __configfs_open_file(struct inode *inode, struct file *file, int type attr = to_attr(dentry); if (!attr) - goto out_put_item; + goto out_free_buffer; if (type & CONFIGFS_ITEM_BIN_ATTR) { buffer->bin_attr = to_bin_attr(dentry); @@ -391,7 +391,7 @@ static int __configfs_open_file(struct inode *inode, struct file *file, int type /* Grab the module reference for this attribute if we have one */ error = -ENODEV; if (!try_module_get(buffer->owner)) - goto out_put_item; + goto out_free_buffer; error = -EACCES; if (!buffer->item->ci_type) @@ -435,8 +435,6 @@ static int __configfs_open_file(struct inode *inode, struct file *file, int type out_put_module: module_put(buffer->owner); -out_put_item: - config_item_put(buffer->item); out_free_buffer: up_read(&frag->frag_sem); kfree(buffer); -- 2.29.2
Re: [f2fs-dev] [PATCH][next] f2fs: Replace one-element array with flexible-array member
On 2021/2/28 13:00, Jaegeuk Kim wrote: On 02/25, Chao Yu wrote: Hello, Gustavo, On 2021/2/25 3:03, Gustavo A. R. Silva wrote: There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. I proposal to do the similar cleanup, and I've no objection on doing this. https://lore.kernel.org/patchwork/patch/869440/ Let's ask for Jaegeuk' opinion. Merged, thanks. This looks better reason than code readability. :) Agreed. Reviewed-by: Chao Yu Thanks, Refactor the code according to the use of a flexible-array member in struct f2fs_checkpoint, instead of a one-element arrays. Notice that a temporary pointer to void '*tmp_ptr' was used in order to fix the following errors when using a flexible array instead of a one element array in struct f2fs_checkpoint: CC [M] fs/f2fs/dir.o In file included from fs/f2fs/dir.c:13: fs/f2fs/f2fs.h: In function ‘__bitmap_ptr’: fs/f2fs/f2fs.h:2227:40: error: invalid use of flexible array member 2227 | return &ckpt->sit_nat_version_bitmap + offset + sizeof(__le32); |^ fs/f2fs/f2fs.h:2227:49: error: invalid use of flexible array member 2227 | return &ckpt->sit_nat_version_bitmap + offset + sizeof(__le32); | ^ fs/f2fs/f2fs.h:2238:40: error: invalid use of flexible array member 2238 | return &ckpt->sit_nat_version_bitmap + offset; |^ make[2]: *** [scripts/Makefile.build:287: fs/f2fs/dir.o] Error 1 make[1]: *** [scripts/Makefile.build:530: fs/f2fs] Error 2 make: *** [Makefile:1819: fs] Error 2 [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.9/process/deprecated.html#zero-length-and-one-element-arrays Link: https://github.com/KSPP/linux/issues/79 Build-tested-by: kernel test robot Link: https://lore.kernel.org/lkml/603647e4.deefbl4eqljuwaue%25...@intel.com/ Signed-off-by: Gustavo A. R. Silva --- fs/f2fs/f2fs.h | 5 +++-- include/linux/f2fs_fs.h | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index e2d302ae3a46..3f5cb097c30f 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -2215,6 +2215,7 @@ static inline block_t __cp_payload(struct f2fs_sb_info *sbi) static inline void *__bitmap_ptr(struct f2fs_sb_info *sbi, int flag) { struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi); + void *tmp_ptr = &ckpt->sit_nat_version_bitmap; int offset; if (is_set_ckpt_flags(sbi, CP_LARGE_NAT_BITMAP_FLAG)) { @@ -2224,7 +2225,7 @@ static inline void *__bitmap_ptr(struct f2fs_sb_info *sbi, int flag) * if large_nat_bitmap feature is enabled, leave checksum * protection for all nat/sit bitmaps. */ - return &ckpt->sit_nat_version_bitmap + offset + sizeof(__le32); + return tmp_ptr + offset + sizeof(__le32); } if (__cp_payload(sbi) > 0) { @@ -2235,7 +2236,7 @@ static inline void *__bitmap_ptr(struct f2fs_sb_info *sbi, int flag) } else { offset = (flag == NAT_BITMAP) ? le32_to_cpu(ckpt->sit_ver_bitmap_bytesize) : 0; - return &ckpt->sit_nat_version_bitmap + offset; + return tmp_ptr + offset; } } diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h index c6cc0a566ef5..5487a80617a3 100644 --- a/include/linux/f2fs_fs.h +++ b/include/linux/f2fs_fs.h @@ -168,7 +168,7 @@ struct f2fs_checkpoint { unsigned char alloc_type[MAX_ACTIVE_LOGS]; /* SIT and NAT version bitmap */ - unsigned char sit_nat_version_bitmap[1]; + unsigned char sit_nat_version_bitmap[]; } __packed; #define CP_CHKSUM_OFFSET 4092/* default chksum offset in checkpoint */ .
Re: [f2fs-dev] [PATCH 3/3] f2fs: check if swapfile is section-alligned
Hi Jianan, On 2021/2/27 20:02, Huang Jianan via Linux-f2fs-devel wrote: If the swapfile isn't created by pin and fallocate, it cann't be Typo: can't guaranteed section-aligned, so it may be selected by f2fs gc. When gc_pin_file_threshold is reached, the address of swapfile may change, but won't be synchroniz to swap_extent, so swap will write to wrong synchronized address, which will cause data corruption. Signed-off-by: Huang Jianan Signed-off-by: Guo Weichao --- fs/f2fs/data.c | 63 ++ 1 file changed, 63 insertions(+) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 4dbc1cafc55d..3e523d6e4643 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -3781,11 +3781,63 @@ int f2fs_migrate_page(struct address_space *mapping, #endif #ifdef CONFIG_SWAP +static int f2fs_check_file_aligned(struct inode *inode) f2fs_check_file_alignment() or f2fs_is_file_aligned()? +{ + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); + block_t main_blkaddr = SM_I(sbi)->main_blkaddr; + block_t cur_lblock; + block_t last_lblock; + block_t pblock; + unsigned long len; + unsigned long nr_pblocks; + unsigned int blocks_per_sec = sbi->blocks_per_seg * sbi->segs_per_sec; unsigned int blocks_per_sec = BLKS_PER_SEC(sbi); + int ret; + + cur_lblock = 0; + last_lblock = bytes_to_blks(inode, i_size_read(inode)); + len = i_size_read(inode); + + while (cur_lblock < last_lblock) { + struct f2fs_map_blocks map; + pgoff_t next_pgofs; + + memset(&map, 0, sizeof(map)); + map.m_lblk = cur_lblock; + map.m_len = bytes_to_blks(inode, len) - cur_lblock; map.m_len = last_lblock - cur_lblock; + map.m_next_pgofs = &next_pgofs; map.m_next_pgofs = NULL; map.m_next_extent = NULL; + map.m_seg_type = NO_CHECK_TYPE; map.m_may_create = false; + + ret = f2fs_map_blocks(inode, &map, 0, F2FS_GET_BLOCK_FIEMAP); + Unneeded blank line. + if (ret) + goto err_out; + + /* hole */ + if (!(map.m_flags & F2FS_MAP_FLAGS)) ret = -ENOENT; + goto err_out; + + pblock = map.m_pblk; + nr_pblocks = map.m_len; + + if ((pblock - main_blkaddr) & (blocks_per_sec - 1) || + nr_pblocks & (blocks_per_sec - 1)) ret = -EINVAL; + goto err_out; + + cur_lblock += nr_pblocks; + } + + return 0; +err_out: + pr_err("swapon: swapfile isn't section-aligned\n"); We should show above message only after we fail in check condition: if ((pblock - main_blkaddr) & (blocks_per_sec - 1) || nr_pblocks & (blocks_per_sec - 1)) { f2fs_err(sbi, "Swapfile does not align to section"); goto err_out; } And please use f2fs_{err,warn,info..} macro rather than pr_{err,warn,info..}. Could you please fix above related issues in check_swap_activate_fast() as well. + return -EINVAL; return ret; +} + static int check_swap_activate_fast(struct swap_info_struct *sis, struct file *swap_file, sector_t *span) { struct address_space *mapping = swap_file->f_mapping; struct inode *inode = mapping->host; + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); sector_t cur_lblock; sector_t last_lblock; sector_t pblock; @@ -3793,6 +3845,7 @@ static int check_swap_activate_fast(struct swap_info_struct *sis, sector_t highest_pblock = 0; int nr_extents = 0; unsigned long nr_pblocks; + unsigned int blocks_per_sec = sbi->blocks_per_seg * sbi->segs_per_sec; Ditto, u64 len; int ret; @@ -3827,6 +3880,13 @@ static int check_swap_activate_fast(struct swap_info_struct *sis, pblock = map.m_pblk; nr_pblocks = map.m_len; + if ((pblock - SM_I(sbi)->main_blkaddr) & (blocks_per_sec - 1) || + nr_pblocks & (blocks_per_sec - 1)) { + pr_err("swapon: swapfile isn't section-aligned\n"); Ditto, + ret = -EINVAL; + goto out; + } + if (cur_lblock + nr_pblocks >= sis->max) nr_pblocks = sis->max - cur_lblock; @@ -3878,6 +3938,9 @@ static int check_swap_activate(struct swap_info_struct *sis, if (PAGE_SIZE == F2FS_BLKSIZE) return check_swap_activate_fast(sis, swap_file, span); + if (f2fs_check_file_aligned(inode)) ret = f2fs_check_file_aligned(); if (ret) return ret; Thanks, + return -EINVAL; + blocks_per_page = bytes_to_blks(inode, PAGE_SIZE); /*
Re: [f2fs-dev] [PATCH 2/3] f2fs: fix last_lblock check in check_swap_activate_fast
On 2021/2/27 20:02, Huang Jianan via Linux-f2fs-devel wrote: Because page_no < sis->max guarantees that the while loop break out normally, the wrong check contidion here doesn't cause a problem. Signed-off-by: Huang Jianan Signed-off-by: Guo Weichao --- fs/f2fs/data.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index a1498a1a345c..4dbc1cafc55d 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -3804,7 +3804,7 @@ static int check_swap_activate_fast(struct swap_info_struct *sis, last_lblock = bytes_to_blks(inode, i_size_read(inode)); len = i_size_read(inode); - while (cur_lblock <= last_lblock && cur_lblock < sis->max) { + while (cur_lblock + 1 <= last_lblock && cur_lblock < sis->max) { while (cur_lblock < last_lblock && cur_lblock < sis->max) { It's nitpick, if necessary, I guess Jaegeuk could help to change this while merging. Reviewed-by: Chao Yu Thanks, struct f2fs_map_blocks map; pgoff_t next_pgofs;
Re: [f2fs-dev] [PATCH 1/3] f2fs: remove unnecessary IS_SWAPFILE check
On 2021/2/27 20:02, Huang Jianan via Linux-f2fs-devel wrote: Now swapfile in f2fs directly submit IO to blockdev according to swapfile extents reported by f2fs when swapon, therefore there is no need to check IS_SWAPFILE when exec filesystem operation. Signed-off-by: Huang Jianan Signed-off-by: Guo Weichao Reviewed-by: Chao Yu Thanks,
Re: [f2fs-dev] [PATCH v3] f2fs: compress: Allow modular (de)compression algorithms
On 2021/2/26 23:51, Geert Uytterhoeven wrote: If F2FS_FS is modular, enabling the compressions options F2FS_FS_{LZ4,LZ4HZ,LZO,LZORLE,ZSTD} will make the (de)compression algorithms {LZ4,LZ4HC,LZO,ZSTD}_{,DE}COMPRESS builtin instead of modular, as the former depend on an intermediate boolean F2FS_FS_COMPRESSION, which in-turn depends on tristate F2FS_FS. Indeed, if a boolean symbol A depends directly on a tristate symbol B and selects another tristate symbol C: tristate B tristate C bool A depends on B select C and B is modular, then C will also be modular. However, if there is an intermediate boolean D in the dependency chain between A and B: tristate B tristate C bool D depends on B bool A depends on D select C then the modular state won't propagate from B to C, and C will be builtin instead of modular. As modular dependency propagation through intermediate symbols is obscure, fix this in a robust way by moving the selection of tristate (de)compression algorithms from the boolean compression options to the tristate main F2FS_FS option. Signed-off-by: Geert Uytterhoeven Reviewed-by: Chao Yu Thanks,
Re: [f2fs-dev] [PATCH] f2fs: compress: Allow modular (de)compression algorithms
Hi Geert, On 2021/2/23 15:42, Geert Uytterhoeven wrote: I checked the code in menu_finalize(), and this seems to work like this. I discussed the oddity of the select behavior before (https://lore.kernel.org/linux-kbuild/e1a6228d-1341-6264-d97a-e2bd52a65...@infradead.org/), but I was not confident about what the right direction was. Anyway, the behavior is obscure from the current code. If you want to make this more robust, you can write as follows: config F2FS_FS tristate "F2FS filesystem support" depends on BLOCK select NLS select CRYPTO select CRYPTO_CRC32 select F2FS_FS_XATTR if FS_ENCRYPTION select FS_ENCRYPTION_ALGS if FS_ENCRYPTION select LZO_COMPRESS if F2FS_FS_LZO select LZO_DECOMPRESS if F2FS_FS_LZO select LZ4_COMPRESS if F2FS_FS_LZ4 select LZ4_DECOMPRESS if F2FS_FS_LZ4 select LZ4HC_COMPRESS if F2FS_FS_LZ4HC select ZSTD_COMPRESS if F2FS_FS_ZSTD select ZSTD_DECOMPRESS if F2FS_FS_ZSTD The code is a bit clumsy, but it is clear that the module (F2FS_FS) is selecting the compress/decompress libraries. Actually the above is what I tried first ;-) Works fine. Then I started to look for similar cases in other file systems (e.g. EROFS_FS_ZIP), and discovered the issue doesn't happen there, which sparked my investigation. So I settled on the direct dependency, because it keeps all compression-related logic together. It looks above way is more explicit, how about using your previous implementation? Thank, Gr{oetje,eeting}s,
Re: [f2fs-dev] [PATCH][next] f2fs: Replace one-element array with flexible-array member
Hello, Gustavo, On 2021/2/25 3:03, Gustavo A. R. Silva wrote: There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. I proposal to do the similar cleanup, and I've no objection on doing this. https://lore.kernel.org/patchwork/patch/869440/ Let's ask for Jaegeuk' opinion. Refactor the code according to the use of a flexible-array member in struct f2fs_checkpoint, instead of a one-element arrays. Notice that a temporary pointer to void '*tmp_ptr' was used in order to fix the following errors when using a flexible array instead of a one element array in struct f2fs_checkpoint: CC [M] fs/f2fs/dir.o In file included from fs/f2fs/dir.c:13: fs/f2fs/f2fs.h: In function ‘__bitmap_ptr’: fs/f2fs/f2fs.h:2227:40: error: invalid use of flexible array member 2227 | return &ckpt->sit_nat_version_bitmap + offset + sizeof(__le32); |^ fs/f2fs/f2fs.h:2227:49: error: invalid use of flexible array member 2227 | return &ckpt->sit_nat_version_bitmap + offset + sizeof(__le32); | ^ fs/f2fs/f2fs.h:2238:40: error: invalid use of flexible array member 2238 | return &ckpt->sit_nat_version_bitmap + offset; |^ make[2]: *** [scripts/Makefile.build:287: fs/f2fs/dir.o] Error 1 make[1]: *** [scripts/Makefile.build:530: fs/f2fs] Error 2 make: *** [Makefile:1819: fs] Error 2 [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.9/process/deprecated.html#zero-length-and-one-element-arrays Link: https://github.com/KSPP/linux/issues/79 Build-tested-by: kernel test robot Link: https://lore.kernel.org/lkml/603647e4.deefbl4eqljuwaue%25...@intel.com/ Signed-off-by: Gustavo A. R. Silva --- fs/f2fs/f2fs.h | 5 +++-- include/linux/f2fs_fs.h | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index e2d302ae3a46..3f5cb097c30f 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -2215,6 +2215,7 @@ static inline block_t __cp_payload(struct f2fs_sb_info *sbi) static inline void *__bitmap_ptr(struct f2fs_sb_info *sbi, int flag) { struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi); + void *tmp_ptr = &ckpt->sit_nat_version_bitmap; int offset; if (is_set_ckpt_flags(sbi, CP_LARGE_NAT_BITMAP_FLAG)) { @@ -2224,7 +2225,7 @@ static inline void *__bitmap_ptr(struct f2fs_sb_info *sbi, int flag) * if large_nat_bitmap feature is enabled, leave checksum * protection for all nat/sit bitmaps. */ - return &ckpt->sit_nat_version_bitmap + offset + sizeof(__le32); + return tmp_ptr + offset + sizeof(__le32); } if (__cp_payload(sbi) > 0) { @@ -2235,7 +2236,7 @@ static inline void *__bitmap_ptr(struct f2fs_sb_info *sbi, int flag) } else { offset = (flag == NAT_BITMAP) ? le32_to_cpu(ckpt->sit_ver_bitmap_bytesize) : 0; - return &ckpt->sit_nat_version_bitmap + offset; + return tmp_ptr + offset; } } diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h index c6cc0a566ef5..5487a80617a3 100644 --- a/include/linux/f2fs_fs.h +++ b/include/linux/f2fs_fs.h @@ -168,7 +168,7 @@ struct f2fs_checkpoint { unsigned char alloc_type[MAX_ACTIVE_LOGS]; /* SIT and NAT version bitmap */ - unsigned char sit_nat_version_bitmap[1]; + unsigned char sit_nat_version_bitmap[]; } __packed; #define CP_CHKSUM_OFFSET 4092 /* default chksum offset in checkpoint */