Re: [f2fs-dev] [PATCH 1/4] f2fs: allocate new section if it's not new
LGTM On Mon, Dec 4, 2023 at 10:06 AM Jaegeuk Kim wrote: > > If fsck can allocate a new zone, it'd be better to use that instead of > allocating a new one. > > And, it modifies kernel messages. > > Signed-off-by: Jaegeuk Kim > --- > fs/f2fs/segment.c | 33 + > 1 file changed, 17 insertions(+), 16 deletions(-) > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c > index 08e2f44a1264..9081c9af977a 100644 > --- a/fs/f2fs/segment.c > +++ b/fs/f2fs/segment.c > @@ -4949,20 +4949,18 @@ static int check_zone_write_pointer(struct > f2fs_sb_info *sbi, > return ret; > } > > - if (is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) { > - /* > -* If there are valid blocks and the write pointer doesn't > match > -* with them, we need to report the inconsistency and fill > -* the zone till the end to close the zone. This inconsistency > -* does not cause write error because the zone will not be > -* selected for write operation until it get discarded. > -*/ > - f2fs_notice(sbi, "Valid blocks are not aligned with write " > + /* > +* If there are valid blocks and the write pointer doesn't match > +* with them, we need to report the inconsistency and fill > +* the zone till the end to close the zone. This inconsistency > +* does not cause write error because the zone will not be > +* selected for write operation until it get discarded. > +*/ > + f2fs_notice(sbi, "Valid blocks are not aligned with write " > "pointer: valid block[0x%x,0x%x] wp[0x%x,0x%x]", > GET_SEGNO(sbi, last_valid_block), > GET_BLKOFF_FROM_SEG0(sbi, last_valid_block), > wp_segno, wp_blkoff); > - } > > ret = blkdev_zone_mgmt(fdev->bdev, REQ_OP_ZONE_FINISH, > zone->start, zone->len, GFP_NOFS); > @@ -5053,15 +5051,18 @@ static int fix_curseg_write_pointer(struct > f2fs_sb_info *sbi, int type) > f2fs_notice(sbi, "Unaligned curseg[%d] with write pointer: " > "curseg[0x%x,0x%x] wp[0x%x,0x%x]", type, > cs->segno, > cs->next_blkoff, wp_segno, wp_blkoff); > - } else { > - f2fs_notice(sbi, "Not successfully unmounted in the previous " > - "mount"); > } > > - f2fs_notice(sbi, "Assign new section to curseg[%d]: " > - "curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff); > + /* Allocate a new section if it's not new. */ > + if (cs->next_blkoff) { > + unsigned int old_segno = cs->segno, old_blkoff = > cs->next_blkoff; > > - f2fs_allocate_new_section(sbi, type, true); > + f2fs_allocate_new_section(sbi, type, true); > + f2fs_notice(sbi, "Assign new section to curseg[%d]: " > + "[0x%x,0x%x] -> [0x%x,0x%x]", > + type, old_segno, old_blkoff, > + cs->segno, cs->next_blkoff); > + } > > /* check consistency of the zone curseg pointed to */ > if (check_zone_write_pointer(sbi, zbd, )) > -- > 2.43.0.rc2.451.g8631bc7472-goog > > > > ___ > Linux-f2fs-devel mailing list > Linux-f2fs-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH] f2fs-tools: skip finishing zones for current zones
From: Daeho Jeong Do not finishing zones for current zones. Signed-off-by: Daeho Jeong Fixes: 06a25b021d15 ("f2fs-tools: make six open zone check resilient") --- fsck/fsck.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/fsck/fsck.c b/fsck/fsck.c index 8acb822..5121a56 100644 --- a/fsck/fsck.c +++ b/fsck/fsck.c @@ -3265,8 +3265,9 @@ static int chk_and_fix_wp_with_sit(int UNUSED(i), void *blkzone, void *opaque) struct f2fs_fsck *fsck = F2FS_FSCK(sbi); block_t zone_block, wp_block, wp_blkoff; unsigned int zone_segno, wp_segno; - int ret, last_valid_blkoff; + int i, ret, last_valid_blkoff; int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT; + unsigned int segs_per_zone = sbi->segs_per_sec * sbi->secs_per_zone; if (blk_zone_conv(blkz)) return 0; @@ -3309,6 +3310,15 @@ static int chk_and_fix_wp_with_sit(int UNUSED(i), void *blkzone, void *opaque) return 0; } + /* if a curseg points to the zone, do not finishing zone */ + for (i = 0; i < NO_CHECK_TYPE; i++) { + struct curseg_info *cs = CURSEG_I(sbi, i); + + if (zone_segno <= cs->segno && + cs->segno < zone_segno + segs_per_zone) + return 0; + } + /* * If valid blocks exist in the zone beyond the write pointer, it * is a bug. No need to fix because the zone is not selected for the -- 2.43.0.rc2.451.g8631bc7472-goog ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 2/4] f2fs: fix write pointers on zoned device after roll forward
LGTM On Mon, Dec 4, 2023 at 10:06 AM Jaegeuk Kim wrote: > > 1. do roll forward recovery > 2. update current segments pointers > 3. fix the entire zones' write pointers > 4. do checkpoint > > Signed-off-by: Jaegeuk Kim > --- > fs/f2fs/recovery.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c > index 16415c770b45..d0f24ccbd1ac 100644 > --- a/fs/f2fs/recovery.c > +++ b/fs/f2fs/recovery.c > @@ -917,6 +917,8 @@ int f2fs_recover_fsync_data(struct f2fs_sb_info *sbi, > bool check_only) > if (!err && fix_curseg_write_pointer && !f2fs_readonly(sbi->sb) && > f2fs_sb_has_blkzoned(sbi)) { > err = f2fs_fix_curseg_write_pointer(sbi); > + if (!err) > + err = f2fs_check_write_pointer(sbi); > ret = err; > } > > -- > 2.43.0.rc2.451.g8631bc7472-goog > > > > ___ > Linux-f2fs-devel mailing list > Linux-f2fs-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 4/4] f2fs: let's finish or reset zones all the time
LGTM On Mon, Dec 4, 2023 at 10:07 AM Jaegeuk Kim wrote: > > In order to limit # of open zones, let's finish or reset zones given # of > valid blocks per section and its zone condition. > > Signed-off-by: Jaegeuk Kim > --- > fs/f2fs/segment.c | 74 +++ > 1 file changed, 17 insertions(+), 57 deletions(-) > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c > index 9081c9af977a..5696a4d381ff 100644 > --- a/fs/f2fs/segment.c > +++ b/fs/f2fs/segment.c > @@ -4870,82 +4870,44 @@ static int check_zone_write_pointer(struct > f2fs_sb_info *sbi, > struct f2fs_dev_info *fdev, > struct blk_zone *zone) > { > - unsigned int wp_segno, wp_blkoff, zone_secno, zone_segno, segno; > - block_t zone_block, wp_block, last_valid_block; > + unsigned int zone_segno; > + block_t zone_block, wp_block, valid_block_cnt; > unsigned int log_sectors_per_block = sbi->log_blocksize - > SECTOR_SHIFT; > - int i, s, b, ret; > - struct seg_entry *se; > + int ret; > > if (zone->type != BLK_ZONE_TYPE_SEQWRITE_REQ) > return 0; > > wp_block = fdev->start_blk + (zone->wp >> log_sectors_per_block); > - wp_segno = GET_SEGNO(sbi, wp_block); > - wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno); > zone_block = fdev->start_blk + (zone->start >> log_sectors_per_block); > zone_segno = GET_SEGNO(sbi, zone_block); > - zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno); > - > - if (zone_segno >= MAIN_SEGS(sbi)) > - return 0; > > /* > * Skip check of zones cursegs point to, since > * fix_curseg_write_pointer() checks them. > */ > - for (i = 0; i < NO_CHECK_TYPE; i++) > - if (zone_secno == GET_SEC_FROM_SEG(sbi, > - CURSEG_I(sbi, i)->segno)) > - return 0; > + if (zone_segno >= MAIN_SEGS(sbi) || > + IS_CURSEC(sbi, GET_SEC_FROM_SEG(sbi, zone_segno))) > + return 0; > > /* > -* Get last valid block of the zone. > +* Get # of valid block of the zone. > */ > - last_valid_block = zone_block - 1; > - for (s = sbi->segs_per_sec - 1; s >= 0; s--) { > - segno = zone_segno + s; > - se = get_seg_entry(sbi, segno); > - for (b = sbi->blocks_per_seg - 1; b >= 0; b--) > - if (f2fs_test_bit(b, se->cur_valid_map)) { > - last_valid_block = START_BLOCK(sbi, segno) + > b; > - break; > - } > - if (last_valid_block >= zone_block) > - break; > - } > + valid_block_cnt = get_valid_blocks(sbi, zone_segno, true); > > - /* > -* When safely unmounted in the previous mount, we can trust write > -* pointers. Otherwise, finish zones. > -*/ > - if (is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) { > - /* > -* The write pointer matches with the valid blocks or > -* already points to the end of the zone. > -*/ > - if ((last_valid_block + 1 == wp_block) || > - (zone->wp == zone->start + zone->len)) > - return 0; > - } > + if ((!valid_block_cnt && zone->cond == BLK_ZONE_COND_EMPTY) || > + (valid_block_cnt && zone->cond == BLK_ZONE_COND_FULL)) > + return 0; > > - if (last_valid_block + 1 == zone_block) { > - if (is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) { > - /* > -* If there is no valid block in the zone and if write > -* pointer is not at zone start, reset the write > -* pointer. > -*/ > - f2fs_notice(sbi, > - "Zone without valid block has non-zero write " > - "pointer. Reset the write pointer: > wp[0x%x,0x%x]", > - wp_segno, wp_blkoff); > - } > + if (!valid_block_cnt) { > + f2fs_notice(sbi, "Zone without valid block has non-zero write > " > + "pointer. Reset the write pointer: cond[0x%x]", > + zone->cond); > ret = __f2fs_issue_discard_zone(sbi, fdev->bdev, zone_block, > zone->len >> log_sectors_per_block); > if (ret) > f2fs_err(sbi, "Discard zone failed: %s (errno=%d)", > fdev->path, ret); > - > return ret; > } > > @@ -4957,10 +4919,8 @@ static int check_zone_write_pointer(struct > f2fs_sb_info
Re: [f2fs-dev] [PATCH] fsck.f2fs: run full scan if checkpoint is disabled
LGTM On Mon, Dec 4, 2023 at 10:07 AM Jaegeuk Kim wrote: > > Let's fix any inconsistency until checkpint being enabled back. > > Signed-off-by: Jaegeuk Kim > --- > fsck/mount.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/fsck/mount.c b/fsck/mount.c > index e957904494ef..30c62280b281 100644 > --- a/fsck/mount.c > +++ b/fsck/mount.c > @@ -1435,6 +1435,7 @@ static int f2fs_should_proceed(struct f2fs_super_block > *sb, u32 flag) > { > if (!c.fix_on && (c.auto_fix || c.preen_mode)) { > if (flag & CP_FSCK_FLAG || > + flag & CP_DISABLED_FLAG || > flag & CP_QUOTA_NEED_FSCK_FLAG || > c.abnormal_stop || c.fs_errors || > (exist_qf_ino(sb) && (flag & CP_ERROR_FLAG))) { > -- > 2.43.0.rc2.451.g8631bc7472-goog > > > > ___ > Linux-f2fs-devel mailing list > Linux-f2fs-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 3/4] f2fs: check write pointers when checkpoint=disable
LGTM On Mon, Dec 4, 2023 at 10:06 AM Jaegeuk Kim wrote: > > Even if f2fs was rebooted as staying checkpoint=disable, let's match the write > pointers all the time. > > Signed-off-by: Jaegeuk Kim > --- > fs/f2fs/super.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c > index 617340e9ea7f..9a874b4d1501 100644 > --- a/fs/f2fs/super.c > +++ b/fs/f2fs/super.c > @@ -4741,7 +4741,7 @@ static int f2fs_fill_super(struct super_block *sb, void > *data, int silent) > #ifdef CONFIG_QUOTA > f2fs_recover_quota_end(sbi, quota_enabled); > #endif > - > +reset_checkpoint: > /* > * If the f2fs is not readonly and fsync data recovery succeeds, > * check zoned block devices' write pointer consistency. > @@ -4752,7 +4752,6 @@ static int f2fs_fill_super(struct super_block *sb, void > *data, int silent) > goto free_meta; > } > > -reset_checkpoint: > f2fs_init_inmem_curseg(sbi); > > /* f2fs_recover_fsync_data() cleared this already */ > -- > 2.43.0.rc2.451.g8631bc7472-goog > > > > ___ > Linux-f2fs-devel mailing list > Linux-f2fs-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH v2] f2fs: Restrict max filesize for 16K f2fs
Blocks are tracked by u32, so the max permitted filesize is U32_MAX * BLOCK_SIZE. Additionally, in order to support crypto data unit sizes of 4K with a 16K block size with IV_INO_LBLK_{32,63}, we must further restrict max filesize to U32_MAX * 4096. This does not affect 4K blocksize f2fs as the natural limit for files are well below that. Fixes: d7e9a9037de2 ("f2fs: Support Block Size == Page Size") Signed-off-by: Daniel Rosenberg --- fs/f2fs/super.c | 12 1 file changed, 12 insertions(+) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 033af907c3b1..18a2189a0dc4 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -3364,6 +3364,18 @@ loff_t max_file_blocks(struct inode *inode) leaf_count *= NIDS_PER_BLOCK; result += leaf_count; + /* +* For compatibility with FSCRYPT_POLICY_IV_INO_LBLK_{64,32} with a +* 4K crypto data unit, we must restrict the max filesize to what can +* fit within U32_MAX data units. +* +* Since the blocksize must currently be equal to the page size, +* we can use a constant for that. Note if this is not the case +* in the future that inode is NULL while setting up the superblock. +*/ + + result = min(result, ((loff_t) U32_MAX * 4096) >> F2FS_BLKSIZE_BITS); + return result; } base-commit: d346fa09abff46988de9267b67b6900d9913d5a2 -- 2.43.0.rc2.451.g8631bc7472-goog ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] fsck.f2fs: run full scan if checkpoint is disabled
Thanks. Let me add reviewed-by. :) On 12/04, Daeho Jeong wrote: > LGTM > > On Mon, Dec 4, 2023 at 10:07 AM Jaegeuk Kim wrote: > > > > Let's fix any inconsistency until checkpint being enabled back. > > > > Signed-off-by: Jaegeuk Kim > > --- > > fsck/mount.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/fsck/mount.c b/fsck/mount.c > > index e957904494ef..30c62280b281 100644 > > --- a/fsck/mount.c > > +++ b/fsck/mount.c > > @@ -1435,6 +1435,7 @@ static int f2fs_should_proceed(struct > > f2fs_super_block *sb, u32 flag) > > { > > if (!c.fix_on && (c.auto_fix || c.preen_mode)) { > > if (flag & CP_FSCK_FLAG || > > + flag & CP_DISABLED_FLAG || > > flag & CP_QUOTA_NEED_FSCK_FLAG || > > c.abnormal_stop || c.fs_errors || > > (exist_qf_ino(sb) && (flag & CP_ERROR_FLAG))) { > > -- > > 2.43.0.rc2.451.g8631bc7472-goog > > > > > > > > ___ > > Linux-f2fs-devel mailing list > > Linux-f2fs-devel@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH v2] f2fs: Restrict max filesize for 16K f2fs
On Mon, Dec 04, 2023 at 03:46:15PM -0800, Daniel Rosenberg via Linux-f2fs-devel wrote: > Blocks are tracked by u32, so the max permitted filesize is > U32_MAX * BLOCK_SIZE. Additionally, in order to support crypto data unit > sizes of 4K with a 16K block size with IV_INO_LBLK_{32,63}, we must {32,63} should be {32,64} > + /* > + * For compatibility with FSCRYPT_POLICY_IV_INO_LBLK_{64,32} with a > + * 4K crypto data unit, we must restrict the max filesize to what can > + * fit within U32_MAX data units. FSCRYPT_POLICY_IV_INO_LBLK_{64,32} should be FSCRYPT_POLICY_FLAG_IV_INO_LBLK_{64,32} > + * > + * Since the blocksize must currently be equal to the page size, > + * we can use a constant for that. Note if this is not the case > + * in the future that inode is NULL while setting up the superblock. I'm not sure what the last sentence is trying to say. > + */ > + > + result = min(result, ((loff_t) U32_MAX * 4096) >> F2FS_BLKSIZE_BITS); Is it intentional that this is off by 1? If indices can be up to U32_MAX, then the maximum size is U32_MAX + 1. It's not a bad idea to go with the lower size, so that max_index + 1 does not overflow. But that's not what the explanation says, so this seems to be accidental. - Eric ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH 1/4] f2fs: allocate new section if it's not new
If fsck can allocate a new zone, it'd be better to use that instead of allocating a new one. And, it modifies kernel messages. Signed-off-by: Jaegeuk Kim --- fs/f2fs/segment.c | 33 + 1 file changed, 17 insertions(+), 16 deletions(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 08e2f44a1264..9081c9af977a 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -4949,20 +4949,18 @@ static int check_zone_write_pointer(struct f2fs_sb_info *sbi, return ret; } - if (is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) { - /* -* If there are valid blocks and the write pointer doesn't match -* with them, we need to report the inconsistency and fill -* the zone till the end to close the zone. This inconsistency -* does not cause write error because the zone will not be -* selected for write operation until it get discarded. -*/ - f2fs_notice(sbi, "Valid blocks are not aligned with write " + /* +* If there are valid blocks and the write pointer doesn't match +* with them, we need to report the inconsistency and fill +* the zone till the end to close the zone. This inconsistency +* does not cause write error because the zone will not be +* selected for write operation until it get discarded. +*/ + f2fs_notice(sbi, "Valid blocks are not aligned with write " "pointer: valid block[0x%x,0x%x] wp[0x%x,0x%x]", GET_SEGNO(sbi, last_valid_block), GET_BLKOFF_FROM_SEG0(sbi, last_valid_block), wp_segno, wp_blkoff); - } ret = blkdev_zone_mgmt(fdev->bdev, REQ_OP_ZONE_FINISH, zone->start, zone->len, GFP_NOFS); @@ -5053,15 +5051,18 @@ static int fix_curseg_write_pointer(struct f2fs_sb_info *sbi, int type) f2fs_notice(sbi, "Unaligned curseg[%d] with write pointer: " "curseg[0x%x,0x%x] wp[0x%x,0x%x]", type, cs->segno, cs->next_blkoff, wp_segno, wp_blkoff); - } else { - f2fs_notice(sbi, "Not successfully unmounted in the previous " - "mount"); } - f2fs_notice(sbi, "Assign new section to curseg[%d]: " - "curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff); + /* Allocate a new section if it's not new. */ + if (cs->next_blkoff) { + unsigned int old_segno = cs->segno, old_blkoff = cs->next_blkoff; - f2fs_allocate_new_section(sbi, type, true); + f2fs_allocate_new_section(sbi, type, true); + f2fs_notice(sbi, "Assign new section to curseg[%d]: " + "[0x%x,0x%x] -> [0x%x,0x%x]", + type, old_segno, old_blkoff, + cs->segno, cs->next_blkoff); + } /* check consistency of the zone curseg pointed to */ if (check_zone_write_pointer(sbi, zbd, )) -- 2.43.0.rc2.451.g8631bc7472-goog ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH 4/4] f2fs: let's finish or reset zones all the time
In order to limit # of open zones, let's finish or reset zones given # of valid blocks per section and its zone condition. Signed-off-by: Jaegeuk Kim --- fs/f2fs/segment.c | 74 +++ 1 file changed, 17 insertions(+), 57 deletions(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 9081c9af977a..5696a4d381ff 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -4870,82 +4870,44 @@ static int check_zone_write_pointer(struct f2fs_sb_info *sbi, struct f2fs_dev_info *fdev, struct blk_zone *zone) { - unsigned int wp_segno, wp_blkoff, zone_secno, zone_segno, segno; - block_t zone_block, wp_block, last_valid_block; + unsigned int zone_segno; + block_t zone_block, wp_block, valid_block_cnt; unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT; - int i, s, b, ret; - struct seg_entry *se; + int ret; if (zone->type != BLK_ZONE_TYPE_SEQWRITE_REQ) return 0; wp_block = fdev->start_blk + (zone->wp >> log_sectors_per_block); - wp_segno = GET_SEGNO(sbi, wp_block); - wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno); zone_block = fdev->start_blk + (zone->start >> log_sectors_per_block); zone_segno = GET_SEGNO(sbi, zone_block); - zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno); - - if (zone_segno >= MAIN_SEGS(sbi)) - return 0; /* * Skip check of zones cursegs point to, since * fix_curseg_write_pointer() checks them. */ - for (i = 0; i < NO_CHECK_TYPE; i++) - if (zone_secno == GET_SEC_FROM_SEG(sbi, - CURSEG_I(sbi, i)->segno)) - return 0; + if (zone_segno >= MAIN_SEGS(sbi) || + IS_CURSEC(sbi, GET_SEC_FROM_SEG(sbi, zone_segno))) + return 0; /* -* Get last valid block of the zone. +* Get # of valid block of the zone. */ - last_valid_block = zone_block - 1; - for (s = sbi->segs_per_sec - 1; s >= 0; s--) { - segno = zone_segno + s; - se = get_seg_entry(sbi, segno); - for (b = sbi->blocks_per_seg - 1; b >= 0; b--) - if (f2fs_test_bit(b, se->cur_valid_map)) { - last_valid_block = START_BLOCK(sbi, segno) + b; - break; - } - if (last_valid_block >= zone_block) - break; - } + valid_block_cnt = get_valid_blocks(sbi, zone_segno, true); - /* -* When safely unmounted in the previous mount, we can trust write -* pointers. Otherwise, finish zones. -*/ - if (is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) { - /* -* The write pointer matches with the valid blocks or -* already points to the end of the zone. -*/ - if ((last_valid_block + 1 == wp_block) || - (zone->wp == zone->start + zone->len)) - return 0; - } + if ((!valid_block_cnt && zone->cond == BLK_ZONE_COND_EMPTY) || + (valid_block_cnt && zone->cond == BLK_ZONE_COND_FULL)) + return 0; - if (last_valid_block + 1 == zone_block) { - if (is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) { - /* -* If there is no valid block in the zone and if write -* pointer is not at zone start, reset the write -* pointer. -*/ - f2fs_notice(sbi, - "Zone without valid block has non-zero write " - "pointer. Reset the write pointer: wp[0x%x,0x%x]", - wp_segno, wp_blkoff); - } + if (!valid_block_cnt) { + f2fs_notice(sbi, "Zone without valid block has non-zero write " + "pointer. Reset the write pointer: cond[0x%x]", + zone->cond); ret = __f2fs_issue_discard_zone(sbi, fdev->bdev, zone_block, zone->len >> log_sectors_per_block); if (ret) f2fs_err(sbi, "Discard zone failed: %s (errno=%d)", fdev->path, ret); - return ret; } @@ -4957,10 +4919,8 @@ static int check_zone_write_pointer(struct f2fs_sb_info *sbi, * selected for write operation until it get discarded. */ f2fs_notice(sbi, "Valid blocks are not aligned with write " - "pointer: valid block[0x%x,0x%x] wp[0x%x,0x%x]", - GET_SEGNO(sbi,
[f2fs-dev] [PATCH] fsck.f2fs: run full scan if checkpoint is disabled
Let's fix any inconsistency until checkpint being enabled back. Signed-off-by: Jaegeuk Kim --- fsck/mount.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fsck/mount.c b/fsck/mount.c index e957904494ef..30c62280b281 100644 --- a/fsck/mount.c +++ b/fsck/mount.c @@ -1435,6 +1435,7 @@ static int f2fs_should_proceed(struct f2fs_super_block *sb, u32 flag) { if (!c.fix_on && (c.auto_fix || c.preen_mode)) { if (flag & CP_FSCK_FLAG || + flag & CP_DISABLED_FLAG || flag & CP_QUOTA_NEED_FSCK_FLAG || c.abnormal_stop || c.fs_errors || (exist_qf_ino(sb) && (flag & CP_ERROR_FLAG))) { -- 2.43.0.rc2.451.g8631bc7472-goog ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH 3/4] f2fs: check write pointers when checkpoint=disable
Even if f2fs was rebooted as staying checkpoint=disable, let's match the write pointers all the time. Signed-off-by: Jaegeuk Kim --- fs/f2fs/super.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 617340e9ea7f..9a874b4d1501 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -4741,7 +4741,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) #ifdef CONFIG_QUOTA f2fs_recover_quota_end(sbi, quota_enabled); #endif - +reset_checkpoint: /* * If the f2fs is not readonly and fsync data recovery succeeds, * check zoned block devices' write pointer consistency. @@ -4752,7 +4752,6 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) goto free_meta; } -reset_checkpoint: f2fs_init_inmem_curseg(sbi); /* f2fs_recover_fsync_data() cleared this already */ -- 2.43.0.rc2.451.g8631bc7472-goog ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH 2/4] f2fs: fix write pointers on zoned device after roll forward
1. do roll forward recovery 2. update current segments pointers 3. fix the entire zones' write pointers 4. do checkpoint Signed-off-by: Jaegeuk Kim --- fs/f2fs/recovery.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c index 16415c770b45..d0f24ccbd1ac 100644 --- a/fs/f2fs/recovery.c +++ b/fs/f2fs/recovery.c @@ -917,6 +917,8 @@ int f2fs_recover_fsync_data(struct f2fs_sb_info *sbi, bool check_only) if (!err && fix_curseg_write_pointer && !f2fs_readonly(sbi->sb) && f2fs_sb_has_blkzoned(sbi)) { err = f2fs_fix_curseg_write_pointer(sbi); + if (!err) + err = f2fs_check_write_pointer(sbi); ret = err; } -- 2.43.0.rc2.451.g8631bc7472-goog ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] Performance drop due to alloc_workqueue() misuse and recent change
Hello, On Mon, Dec 04, 2023 at 04:03:47PM +, Naohiro Aota wrote: > Recently, commit 636b927eba5b ("workqueue: Make unbound workqueues to use > per-cpu pool_workqueues") changed WQ_UNBOUND workqueue's behavior. It > changed the meaning of alloc_workqueue()'s max_active from an upper limit > imposed per NUMA node to a limit per CPU. As a result, massive number of > workers can be running at the same time, especially if the workqueue user > thinks the max_active is a global limit. > > Actually, it is already written it is per-CPU limit in the documentation > before the commit. However, several callers seem to misuse max_active, > maybe thinking it is a global limit. It is an unexpected behavior change > for them. Right, and the behavior has been like that for a very long time and there was no other way to achieve reasonable level of concurrency, so the current situation is expected. > For example, these callers set max_active = num_online_cpus(), which is a > suspicious limit applying to per-CPU. This config means we can have nr_cpu > * nr_cpu active tasks working at the same time. Yeah, that sounds like a good indicator. > fs/f2fs/data.c: sbi->post_read_wq = alloc_workqueue("f2fs_post_read_wq", > fs/f2fs/data.c- WQ_UNBOUND | > WQ_HIGHPRI, > fs/f2fs/data.c- num_online_cpus()); > > fs/crypto/crypto.c: fscrypt_read_workqueue = > alloc_workqueue("fscrypt_read_queue", > fs/crypto/crypto.c- WQ_UNBOUND | > WQ_HIGHPRI, > fs/crypto/crypto.c- > num_online_cpus()); > > fs/verity/verify.c: fsverity_read_workqueue = > alloc_workqueue("fsverity_read_queue", > fs/verity/verify.c- WQ_HIGHPRI, > fs/verity/verify.c- > num_online_cpus()); > > drivers/crypto/hisilicon/qm.c: qm->wq = alloc_workqueue("%s", WQ_HIGHPRI | > WQ_MEM_RECLAIM | > drivers/crypto/hisilicon/qm.c- WQ_UNBOUND, > num_online_cpus(), > drivers/crypto/hisilicon/qm.c- pci_name(qm->pdev)); > > block/blk-crypto-fallback.c:blk_crypto_wq = > alloc_workqueue("blk_crypto_wq", > block/blk-crypto-fallback.c-WQ_UNBOUND | > WQ_HIGHPRI | > block/blk-crypto-fallback.c- > WQ_MEM_RECLAIM, num_online_cpus()); > > drivers/md/dm-crypt.c: cc->crypt_queue = > alloc_workqueue("kcryptd/%s", > drivers/md/dm-crypt.c- > WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM | WQ_UNBOUND, > drivers/md/dm-crypt.c- > num_online_cpus(), devname); Most of these work items are CPU bound but not completley so. e.g. kcrypt_crypt_write_continue() does wait_for_completion(), so setting max_active to 1 likely isn't what they want either. They mostly want some reasonable system-wide concurrency limit w.r.t. the CPU count while keeping some level of flexibility in terms of task placement. The previous max_active wasn't great for this because its meaning changed depending on the number of nodes. Now, the meaning doesn't change but it's not really useful for the above purpose. It's only useful for avoiding melting the system completely. One way to go about it is to declare that concurrency level management for unbound workqueue is on users but that seems not ideal given many use cases would want it anyway. Let me think it over but I think the right way to go about it is going the other direction - ie. making max_active apply to the whole system regardless of the number of nodes / ccx's / whatever. > Furthermore, the change affects performance in a certain case. > > Btrfs creates several WQ_UNBOUND workqueues with a default max_active = > min(NRCPUS + 2, 8). As my machine has 96 CPUs with NUMA disabled, this > max_active config allows running over 700 active works. Before the commit, > it is limited to 8 if NUMA is disabled or limited to 16 if NUMA nodes is 2. > > I reverted the workqueue code back to before the commit, and I ran the > following fio command on RAID0 btrfs on 6 SSDs. > > fio --group_reporting --eta=always --eta-interval=30s --eta-newline=30s \ > --rw=write --fallocate=none \ > --direct=1 --ioengine=libaio --iodepth=32 \ > --filesize=100G \ > --blocksize=64k \ > --time_based --runtime=300s \ > --end_fsync=1 \ > --directory=${MNT} \ > --name=writer --numjobs=32 > > By changing workqueue's max_active, the result varies. > > - wq max_active=8 (intended limit by btrfs?) > WRITE: bw=2495MiB/s (2616MB/s), 2495MiB/s-2495MiB/s (2616MB/s-2616MB/s), > io=753GiB (808GB), run=308953-308953msec > - wq max_active=16 (actual limit on 2 NUMA nodes setup) > WRITE: bw=1736MiB/s (1820MB/s), 1736MiB/s-1736MiB/s (1820MB/s-1820MB/s), > io=670GiB (720GB),
[f2fs-dev] [PATCH v3] f2fs: Restrict max filesize for 16K f2fs
Blocks are tracked by u32, so the max permitted filesize is (U32_MAX + 1) * BLOCK_SIZE. Additionally, in order to support crypto data unit sizes of 4K with a 16K block with IV_INO_LBLK_{32,64}, we must further restrict max filesize to (U32_MAX + 1) * 4096. This does not affect 4K blocksize f2fs as the natural limit for files are well below that. Fixes: d7e9a9037de2 ("f2fs: Support Block Size == Page Size") Signed-off-by: Daniel Rosenberg --- fs/f2fs/super.c | 8 1 file changed, 8 insertions(+) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 033af907c3b1..5dfbc6b4c0ac 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -3364,6 +3364,14 @@ loff_t max_file_blocks(struct inode *inode) leaf_count *= NIDS_PER_BLOCK; result += leaf_count; + /* +* For compatibility with FSCRYPT_POLICY_FLAG_IV_INO_LBLK_{64,32} with +* a 4K crypto data unit, we must restrict the max filesize to what can +* fit within U32_MAX + 1 data units. +*/ + + result = min(result, (((loff_t)U32_MAX + 1) * 4096) >> F2FS_BLKSIZE_BITS); + return result; } base-commit: d346fa09abff46988de9267b67b6900d9913d5a2 -- 2.43.0.rc2.451.g8631bc7472-goog ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] Weird EROFS data corruption
Hi Juhyung, On 2023/12/4 11:41, Juhyung Park wrote: ... - Could you share the full message about the output of `lscpu`? Sure: Architecture:x86_64 CPU op-mode(s):32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order:Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Vendor ID: GenuineIntel BIOS Vendor ID:Intel(R) Corporation Model name:11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz BIOS Model name: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz None CPU @ 3.0GHz BIOS CPU family: 198 CPU family: 6 Model: 140 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 Stepping:1 CPU(s) scaling MHz: 60% CPU max MHz: 4800. CPU min MHz: 400. BogoMIPS:5990.40 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_ tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes6 4 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xt pr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_dead line_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowp refetch cpuid_fault epb cat_l2 cdp_l2 ssbd ibrs ibpb st ibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdt_a avx512f avx512dq rdseed adx smap avx512ifma clfl ushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves split_lock_detect dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req vnmi avx512vbmi umip pku ospke avx512_vbmi 2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme av x512_vpopcntdq rdpid movdiri movdir64b fsrm avx512_vp2i Sigh, I've been thinking. Here FSRM is the most significant difference between our environments, could you only try the following diff to see if there's any difference anymore? (without the previous disable patch.) diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S index 1b60ae81ecd8..1b52a913233c 100644 --- a/arch/x86/lib/memmove_64.S +++ b/arch/x86/lib/memmove_64.S @@ -41,9 +41,7 @@ SYM_FUNC_START(__memmove) #define CHECK_LEN cmp $0x20, %rdx; jb 1f #define MEMMOVE_BYTES movq %rdx, %rcx; rep movsb; RET .Lmemmove_begin_forward: - ALTERNATIVE_2 __stringify(CHECK_LEN), \ - __stringify(CHECK_LEN; MEMMOVE_BYTES), X86_FEATURE_ERMS, \ - __stringify(MEMMOVE_BYTES), X86_FEATURE_FSRM + CHECK_LEN /* * movsq instruction have many startup latency Thanks, Gao Xiang ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] Performance drop due to alloc_workqueue() misuse and recent change
Recently, commit 636b927eba5b ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues") changed WQ_UNBOUND workqueue's behavior. It changed the meaning of alloc_workqueue()'s max_active from an upper limit imposed per NUMA node to a limit per CPU. As a result, massive number of workers can be running at the same time, especially if the workqueue user thinks the max_active is a global limit. Actually, it is already written it is per-CPU limit in the documentation before the commit. However, several callers seem to misuse max_active, maybe thinking it is a global limit. It is an unexpected behavior change for them. For example, these callers set max_active = num_online_cpus(), which is a suspicious limit applying to per-CPU. This config means we can have nr_cpu * nr_cpu active tasks working at the same time. fs/f2fs/data.c: sbi->post_read_wq = alloc_workqueue("f2fs_post_read_wq", fs/f2fs/data.c- WQ_UNBOUND | WQ_HIGHPRI, fs/f2fs/data.c- num_online_cpus()); fs/crypto/crypto.c: fscrypt_read_workqueue = alloc_workqueue("fscrypt_read_queue", fs/crypto/crypto.c- WQ_UNBOUND | WQ_HIGHPRI, fs/crypto/crypto.c- num_online_cpus()); fs/verity/verify.c: fsverity_read_workqueue = alloc_workqueue("fsverity_read_queue", fs/verity/verify.c- WQ_HIGHPRI, fs/verity/verify.c- num_online_cpus()); drivers/crypto/hisilicon/qm.c: qm->wq = alloc_workqueue("%s", WQ_HIGHPRI | WQ_MEM_RECLAIM | drivers/crypto/hisilicon/qm.c- WQ_UNBOUND, num_online_cpus(), drivers/crypto/hisilicon/qm.c- pci_name(qm->pdev)); block/blk-crypto-fallback.c:blk_crypto_wq = alloc_workqueue("blk_crypto_wq", block/blk-crypto-fallback.c-WQ_UNBOUND | WQ_HIGHPRI | block/blk-crypto-fallback.c-WQ_MEM_RECLAIM, num_online_cpus()); drivers/md/dm-crypt.c: cc->crypt_queue = alloc_workqueue("kcryptd/%s", drivers/md/dm-crypt.c- WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM | WQ_UNBOUND, drivers/md/dm-crypt.c- num_online_cpus(), devname); Furthermore, the change affects performance in a certain case. Btrfs creates several WQ_UNBOUND workqueues with a default max_active = min(NRCPUS + 2, 8). As my machine has 96 CPUs with NUMA disabled, this max_active config allows running over 700 active works. Before the commit, it is limited to 8 if NUMA is disabled or limited to 16 if NUMA nodes is 2. I reverted the workqueue code back to before the commit, and I ran the following fio command on RAID0 btrfs on 6 SSDs. fio --group_reporting --eta=always --eta-interval=30s --eta-newline=30s \ --rw=write --fallocate=none \ --direct=1 --ioengine=libaio --iodepth=32 \ --filesize=100G \ --blocksize=64k \ --time_based --runtime=300s \ --end_fsync=1 \ --directory=${MNT} \ --name=writer --numjobs=32 By changing workqueue's max_active, the result varies. - wq max_active=8 (intended limit by btrfs?) WRITE: bw=2495MiB/s (2616MB/s), 2495MiB/s-2495MiB/s (2616MB/s-2616MB/s), io=753GiB (808GB), run=308953-308953msec - wq max_active=16 (actual limit on 2 NUMA nodes setup) WRITE: bw=1736MiB/s (1820MB/s), 1736MiB/s-1736MiB/s (1820MB/s-1820MB/s), io=670GiB (720GB), run=395532-395532msec - wq max_active=768 (simulating current limit) WRITE: bw=1276MiB/s (1338MB/s), 1276MiB/s-1276MiB/s (1338MB/s-1338MB/s), io=375GiB (403GB), run=300984-300984msec The current performance is slower than the previous limit (max_active=16) by 27%, or it is 50% slower than the intended limit. The performance drop might be due to contention of the btrfs-endio-write works. There are over 700 kworker instances were created and 100 works are on the 'D' state competing for a lock. More specifically, I tested the same workload on the commit. - At commit 636b927eba5b ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues") WRITE: bw=1191MiB/s (1249MB/s), 1191MiB/s-1191MiB/s (1249MB/s-1249MB/s), io=350GiB (376GB), run=300714-300714msec - At the previous commit = 4cbfd3de73 ("workqueue: Call wq_update_unbound_numa() on all CPUs in NUMA node on CPU hotplug") WRITE: bw=1747MiB/s (1832MB/s), 1747MiB/s-1747MiB/s (1832MB/s-1832MB/s), io=748GiB (803GB), run=438134-438134msec So, it is -31.8% performance down with the commit. In summary, we misuse max_active, considering it is a global limit. And, the recent commit introduced a huge performance drop in some cases. We need to review alloc_workqueue() usage to check if its max_active setting is proper or not. ___ Linux-f2fs-devel mailing list