Re: [f2fs-dev] [PATCH 1/4] f2fs: allocate new section if it's not new

2023-12-04 Thread Daeho Jeong
LGTM

On Mon, Dec 4, 2023 at 10:06 AM Jaegeuk Kim  wrote:
>
> If fsck can allocate a new zone, it'd be better to use that instead of
> allocating a new one.
>
> And, it modifies kernel messages.
>
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/segment.c | 33 +
>  1 file changed, 17 insertions(+), 16 deletions(-)
>
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 08e2f44a1264..9081c9af977a 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -4949,20 +4949,18 @@ static int check_zone_write_pointer(struct 
> f2fs_sb_info *sbi,
> return ret;
> }
>
> -   if (is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) {
> -   /*
> -* If there are valid blocks and the write pointer doesn't 
> match
> -* with them, we need to report the inconsistency and fill
> -* the zone till the end to close the zone. This inconsistency
> -* does not cause write error because the zone will not be
> -* selected for write operation until it get discarded.
> -*/
> -   f2fs_notice(sbi, "Valid blocks are not aligned with write "
> +   /*
> +* If there are valid blocks and the write pointer doesn't match
> +* with them, we need to report the inconsistency and fill
> +* the zone till the end to close the zone. This inconsistency
> +* does not cause write error because the zone will not be
> +* selected for write operation until it get discarded.
> +*/
> +   f2fs_notice(sbi, "Valid blocks are not aligned with write "
> "pointer: valid block[0x%x,0x%x] wp[0x%x,0x%x]",
> GET_SEGNO(sbi, last_valid_block),
> GET_BLKOFF_FROM_SEG0(sbi, last_valid_block),
> wp_segno, wp_blkoff);
> -   }
>
> ret = blkdev_zone_mgmt(fdev->bdev, REQ_OP_ZONE_FINISH,
> zone->start, zone->len, GFP_NOFS);
> @@ -5053,15 +5051,18 @@ static int fix_curseg_write_pointer(struct 
> f2fs_sb_info *sbi, int type)
> f2fs_notice(sbi, "Unaligned curseg[%d] with write pointer: "
> "curseg[0x%x,0x%x] wp[0x%x,0x%x]", type, 
> cs->segno,
> cs->next_blkoff, wp_segno, wp_blkoff);
> -   } else {
> -   f2fs_notice(sbi, "Not successfully unmounted in the previous "
> -   "mount");
> }
>
> -   f2fs_notice(sbi, "Assign new section to curseg[%d]: "
> -   "curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff);
> +   /* Allocate a new section if it's not new. */
> +   if (cs->next_blkoff) {
> +   unsigned int old_segno = cs->segno, old_blkoff = 
> cs->next_blkoff;
>
> -   f2fs_allocate_new_section(sbi, type, true);
> +   f2fs_allocate_new_section(sbi, type, true);
> +   f2fs_notice(sbi, "Assign new section to curseg[%d]: "
> +   "[0x%x,0x%x] -> [0x%x,0x%x]",
> +   type, old_segno, old_blkoff,
> +   cs->segno, cs->next_blkoff);
> +   }
>
> /* check consistency of the zone curseg pointed to */
> if (check_zone_write_pointer(sbi, zbd, ))
> --
> 2.43.0.rc2.451.g8631bc7472-goog
>
>
>
> ___
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH] f2fs-tools: skip finishing zones for current zones

2023-12-04 Thread Daeho Jeong
From: Daeho Jeong 

Do not finishing zones for current zones.

Signed-off-by: Daeho Jeong 
Fixes: 06a25b021d15 ("f2fs-tools: make six open zone check resilient")
---
 fsck/fsck.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fsck/fsck.c b/fsck/fsck.c
index 8acb822..5121a56 100644
--- a/fsck/fsck.c
+++ b/fsck/fsck.c
@@ -3265,8 +3265,9 @@ static int chk_and_fix_wp_with_sit(int UNUSED(i), void 
*blkzone, void *opaque)
struct f2fs_fsck *fsck = F2FS_FSCK(sbi);
block_t zone_block, wp_block, wp_blkoff;
unsigned int zone_segno, wp_segno;
-   int ret, last_valid_blkoff;
+   int i, ret, last_valid_blkoff;
int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
+   unsigned int segs_per_zone = sbi->segs_per_sec * sbi->secs_per_zone;
 
if (blk_zone_conv(blkz))
return 0;
@@ -3309,6 +3310,15 @@ static int chk_and_fix_wp_with_sit(int UNUSED(i), void 
*blkzone, void *opaque)
return 0;
}
 
+   /* if a curseg points to the zone, do not finishing zone */
+   for (i = 0; i < NO_CHECK_TYPE; i++) {
+   struct curseg_info *cs = CURSEG_I(sbi, i);
+
+   if (zone_segno <= cs->segno &&
+   cs->segno < zone_segno + segs_per_zone)
+   return 0;
+   }
+
/*
 * If valid blocks exist in the zone beyond the write pointer, it
 * is a bug. No need to fix because the zone is not selected for the
-- 
2.43.0.rc2.451.g8631bc7472-goog



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 2/4] f2fs: fix write pointers on zoned device after roll forward

2023-12-04 Thread Daeho Jeong
LGTM

On Mon, Dec 4, 2023 at 10:06 AM Jaegeuk Kim  wrote:
>
> 1. do roll forward recovery
> 2. update current segments pointers
> 3. fix the entire zones' write pointers
> 4. do checkpoint
>
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/recovery.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> index 16415c770b45..d0f24ccbd1ac 100644
> --- a/fs/f2fs/recovery.c
> +++ b/fs/f2fs/recovery.c
> @@ -917,6 +917,8 @@ int f2fs_recover_fsync_data(struct f2fs_sb_info *sbi, 
> bool check_only)
> if (!err && fix_curseg_write_pointer && !f2fs_readonly(sbi->sb) &&
> f2fs_sb_has_blkzoned(sbi)) {
> err = f2fs_fix_curseg_write_pointer(sbi);
> +   if (!err)
> +   err = f2fs_check_write_pointer(sbi);
> ret = err;
> }
>
> --
> 2.43.0.rc2.451.g8631bc7472-goog
>
>
>
> ___
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 4/4] f2fs: let's finish or reset zones all the time

2023-12-04 Thread Daeho Jeong
LGTM

On Mon, Dec 4, 2023 at 10:07 AM Jaegeuk Kim  wrote:
>
> In order to limit # of open zones, let's finish or reset zones given # of
> valid blocks per section and its zone condition.
>
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/segment.c | 74 +++
>  1 file changed, 17 insertions(+), 57 deletions(-)
>
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 9081c9af977a..5696a4d381ff 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -4870,82 +4870,44 @@ static int check_zone_write_pointer(struct 
> f2fs_sb_info *sbi,
> struct f2fs_dev_info *fdev,
> struct blk_zone *zone)
>  {
> -   unsigned int wp_segno, wp_blkoff, zone_secno, zone_segno, segno;
> -   block_t zone_block, wp_block, last_valid_block;
> +   unsigned int zone_segno;
> +   block_t zone_block, wp_block, valid_block_cnt;
> unsigned int log_sectors_per_block = sbi->log_blocksize - 
> SECTOR_SHIFT;
> -   int i, s, b, ret;
> -   struct seg_entry *se;
> +   int ret;
>
> if (zone->type != BLK_ZONE_TYPE_SEQWRITE_REQ)
> return 0;
>
> wp_block = fdev->start_blk + (zone->wp >> log_sectors_per_block);
> -   wp_segno = GET_SEGNO(sbi, wp_block);
> -   wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
> zone_block = fdev->start_blk + (zone->start >> log_sectors_per_block);
> zone_segno = GET_SEGNO(sbi, zone_block);
> -   zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno);
> -
> -   if (zone_segno >= MAIN_SEGS(sbi))
> -   return 0;
>
> /*
>  * Skip check of zones cursegs point to, since
>  * fix_curseg_write_pointer() checks them.
>  */
> -   for (i = 0; i < NO_CHECK_TYPE; i++)
> -   if (zone_secno == GET_SEC_FROM_SEG(sbi,
> -  CURSEG_I(sbi, i)->segno))
> -   return 0;
> +   if (zone_segno >= MAIN_SEGS(sbi) ||
> +   IS_CURSEC(sbi, GET_SEC_FROM_SEG(sbi, zone_segno)))
> +   return 0;
>
> /*
> -* Get last valid block of the zone.
> +* Get # of valid block of the zone.
>  */
> -   last_valid_block = zone_block - 1;
> -   for (s = sbi->segs_per_sec - 1; s >= 0; s--) {
> -   segno = zone_segno + s;
> -   se = get_seg_entry(sbi, segno);
> -   for (b = sbi->blocks_per_seg - 1; b >= 0; b--)
> -   if (f2fs_test_bit(b, se->cur_valid_map)) {
> -   last_valid_block = START_BLOCK(sbi, segno) + 
> b;
> -   break;
> -   }
> -   if (last_valid_block >= zone_block)
> -   break;
> -   }
> +   valid_block_cnt = get_valid_blocks(sbi, zone_segno, true);
>
> -   /*
> -* When safely unmounted in the previous mount, we can trust write
> -* pointers. Otherwise, finish zones.
> -*/
> -   if (is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) {
> -   /*
> -* The write pointer matches with the valid blocks or
> -* already points to the end of the zone.
> -*/
> -   if ((last_valid_block + 1 == wp_block) ||
> -   (zone->wp == zone->start + zone->len))
> -   return 0;
> -   }
> +   if ((!valid_block_cnt && zone->cond == BLK_ZONE_COND_EMPTY) ||
> +   (valid_block_cnt && zone->cond == BLK_ZONE_COND_FULL))
> +   return 0;
>
> -   if (last_valid_block + 1 == zone_block) {
> -   if (is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) {
> -   /*
> -* If there is no valid block in the zone and if write
> -* pointer is not at zone start, reset the write
> -* pointer.
> -*/
> -   f2fs_notice(sbi,
> - "Zone without valid block has non-zero write "
> - "pointer. Reset the write pointer: 
> wp[0x%x,0x%x]",
> - wp_segno, wp_blkoff);
> -   }
> +   if (!valid_block_cnt) {
> +   f2fs_notice(sbi, "Zone without valid block has non-zero write 
> "
> +   "pointer. Reset the write pointer: cond[0x%x]",
> +   zone->cond);
> ret = __f2fs_issue_discard_zone(sbi, fdev->bdev, zone_block,
> zone->len >> log_sectors_per_block);
> if (ret)
> f2fs_err(sbi, "Discard zone failed: %s (errno=%d)",
>  fdev->path, ret);
> -
> return ret;
> }
>
> @@ -4957,10 +4919,8 @@ static int check_zone_write_pointer(struct 
> f2fs_sb_info 

Re: [f2fs-dev] [PATCH] fsck.f2fs: run full scan if checkpoint is disabled

2023-12-04 Thread Daeho Jeong
LGTM

On Mon, Dec 4, 2023 at 10:07 AM Jaegeuk Kim  wrote:
>
> Let's fix any inconsistency until checkpint being enabled back.
>
> Signed-off-by: Jaegeuk Kim 
> ---
>  fsck/mount.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/fsck/mount.c b/fsck/mount.c
> index e957904494ef..30c62280b281 100644
> --- a/fsck/mount.c
> +++ b/fsck/mount.c
> @@ -1435,6 +1435,7 @@ static int f2fs_should_proceed(struct f2fs_super_block 
> *sb, u32 flag)
>  {
> if (!c.fix_on && (c.auto_fix || c.preen_mode)) {
> if (flag & CP_FSCK_FLAG ||
> +   flag & CP_DISABLED_FLAG ||
> flag & CP_QUOTA_NEED_FSCK_FLAG ||
> c.abnormal_stop || c.fs_errors ||
> (exist_qf_ino(sb) && (flag & CP_ERROR_FLAG))) {
> --
> 2.43.0.rc2.451.g8631bc7472-goog
>
>
>
> ___
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 3/4] f2fs: check write pointers when checkpoint=disable

2023-12-04 Thread Daeho Jeong
LGTM

On Mon, Dec 4, 2023 at 10:06 AM Jaegeuk Kim  wrote:
>
> Even if f2fs was rebooted as staying checkpoint=disable, let's match the write
> pointers all the time.
>
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/super.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 617340e9ea7f..9a874b4d1501 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -4741,7 +4741,7 @@ static int f2fs_fill_super(struct super_block *sb, void 
> *data, int silent)
>  #ifdef CONFIG_QUOTA
> f2fs_recover_quota_end(sbi, quota_enabled);
>  #endif
> -
> +reset_checkpoint:
> /*
>  * If the f2fs is not readonly and fsync data recovery succeeds,
>  * check zoned block devices' write pointer consistency.
> @@ -4752,7 +4752,6 @@ static int f2fs_fill_super(struct super_block *sb, void 
> *data, int silent)
> goto free_meta;
> }
>
> -reset_checkpoint:
> f2fs_init_inmem_curseg(sbi);
>
> /* f2fs_recover_fsync_data() cleared this already */
> --
> 2.43.0.rc2.451.g8631bc7472-goog
>
>
>
> ___
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v2] f2fs: Restrict max filesize for 16K f2fs

2023-12-04 Thread Daniel Rosenberg via Linux-f2fs-devel
Blocks are tracked by u32, so the max permitted filesize is
U32_MAX * BLOCK_SIZE. Additionally, in order to support crypto data unit
sizes of 4K with a 16K block size with IV_INO_LBLK_{32,63}, we must
further restrict max filesize to U32_MAX * 4096. This does not affect 4K
blocksize f2fs as the natural limit for files are well below that.

Fixes: d7e9a9037de2 ("f2fs: Support Block Size == Page Size")
Signed-off-by: Daniel Rosenberg 
---
 fs/f2fs/super.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 033af907c3b1..18a2189a0dc4 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -3364,6 +3364,18 @@ loff_t max_file_blocks(struct inode *inode)
leaf_count *= NIDS_PER_BLOCK;
result += leaf_count;
 
+   /*
+* For compatibility with FSCRYPT_POLICY_IV_INO_LBLK_{64,32} with a
+* 4K crypto data unit, we must restrict the max filesize to what can
+* fit within U32_MAX data units.
+*
+* Since the blocksize must currently be equal to the page size,
+* we can use a constant for that. Note if this is not the case
+* in the future that inode is NULL while setting up the superblock.
+*/
+
+   result = min(result, ((loff_t) U32_MAX * 4096) >> F2FS_BLKSIZE_BITS);
+
return result;
 }
 

base-commit: d346fa09abff46988de9267b67b6900d9913d5a2
-- 
2.43.0.rc2.451.g8631bc7472-goog



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] fsck.f2fs: run full scan if checkpoint is disabled

2023-12-04 Thread Jaegeuk Kim
Thanks. Let me add reviewed-by. :)

On 12/04, Daeho Jeong wrote:
> LGTM
> 
> On Mon, Dec 4, 2023 at 10:07 AM Jaegeuk Kim  wrote:
> >
> > Let's fix any inconsistency until checkpint being enabled back.
> >
> > Signed-off-by: Jaegeuk Kim 
> > ---
> >  fsck/mount.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/fsck/mount.c b/fsck/mount.c
> > index e957904494ef..30c62280b281 100644
> > --- a/fsck/mount.c
> > +++ b/fsck/mount.c
> > @@ -1435,6 +1435,7 @@ static int f2fs_should_proceed(struct 
> > f2fs_super_block *sb, u32 flag)
> >  {
> > if (!c.fix_on && (c.auto_fix || c.preen_mode)) {
> > if (flag & CP_FSCK_FLAG ||
> > +   flag & CP_DISABLED_FLAG ||
> > flag & CP_QUOTA_NEED_FSCK_FLAG ||
> > c.abnormal_stop || c.fs_errors ||
> > (exist_qf_ino(sb) && (flag & CP_ERROR_FLAG))) {
> > --
> > 2.43.0.rc2.451.g8631bc7472-goog
> >
> >
> >
> > ___
> > Linux-f2fs-devel mailing list
> > Linux-f2fs-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH v2] f2fs: Restrict max filesize for 16K f2fs

2023-12-04 Thread Eric Biggers
On Mon, Dec 04, 2023 at 03:46:15PM -0800, Daniel Rosenberg via Linux-f2fs-devel 
wrote:
> Blocks are tracked by u32, so the max permitted filesize is
> U32_MAX * BLOCK_SIZE. Additionally, in order to support crypto data unit
> sizes of 4K with a 16K block size with IV_INO_LBLK_{32,63}, we must

{32,63} should be {32,64}

> + /*
> +  * For compatibility with FSCRYPT_POLICY_IV_INO_LBLK_{64,32} with a
> +  * 4K crypto data unit, we must restrict the max filesize to what can
> +  * fit within U32_MAX data units.

FSCRYPT_POLICY_IV_INO_LBLK_{64,32} should be
FSCRYPT_POLICY_FLAG_IV_INO_LBLK_{64,32}

> +  *
> +  * Since the blocksize must currently be equal to the page size,
> +  * we can use a constant for that. Note if this is not the case
> +  * in the future that inode is NULL while setting up the superblock.

I'm not sure what the last sentence is trying to say.

> +  */
> +
> + result = min(result, ((loff_t) U32_MAX * 4096) >> F2FS_BLKSIZE_BITS);

Is it intentional that this is off by 1?  If indices can be up to U32_MAX, then
the maximum size is U32_MAX + 1.  It's not a bad idea to go with the lower size,
so that max_index + 1 does not overflow.  But that's not what the explanation
says, so this seems to be accidental.

- Eric


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH 1/4] f2fs: allocate new section if it's not new

2023-12-04 Thread Jaegeuk Kim
If fsck can allocate a new zone, it'd be better to use that instead of
allocating a new one.

And, it modifies kernel messages.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/segment.c | 33 +
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 08e2f44a1264..9081c9af977a 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -4949,20 +4949,18 @@ static int check_zone_write_pointer(struct f2fs_sb_info 
*sbi,
return ret;
}
 
-   if (is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) {
-   /*
-* If there are valid blocks and the write pointer doesn't match
-* with them, we need to report the inconsistency and fill
-* the zone till the end to close the zone. This inconsistency
-* does not cause write error because the zone will not be
-* selected for write operation until it get discarded.
-*/
-   f2fs_notice(sbi, "Valid blocks are not aligned with write "
+   /*
+* If there are valid blocks and the write pointer doesn't match
+* with them, we need to report the inconsistency and fill
+* the zone till the end to close the zone. This inconsistency
+* does not cause write error because the zone will not be
+* selected for write operation until it get discarded.
+*/
+   f2fs_notice(sbi, "Valid blocks are not aligned with write "
"pointer: valid block[0x%x,0x%x] wp[0x%x,0x%x]",
GET_SEGNO(sbi, last_valid_block),
GET_BLKOFF_FROM_SEG0(sbi, last_valid_block),
wp_segno, wp_blkoff);
-   }
 
ret = blkdev_zone_mgmt(fdev->bdev, REQ_OP_ZONE_FINISH,
zone->start, zone->len, GFP_NOFS);
@@ -5053,15 +5051,18 @@ static int fix_curseg_write_pointer(struct f2fs_sb_info 
*sbi, int type)
f2fs_notice(sbi, "Unaligned curseg[%d] with write pointer: "
"curseg[0x%x,0x%x] wp[0x%x,0x%x]", type, cs->segno,
cs->next_blkoff, wp_segno, wp_blkoff);
-   } else {
-   f2fs_notice(sbi, "Not successfully unmounted in the previous "
-   "mount");
}
 
-   f2fs_notice(sbi, "Assign new section to curseg[%d]: "
-   "curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff);
+   /* Allocate a new section if it's not new. */
+   if (cs->next_blkoff) {
+   unsigned int old_segno = cs->segno, old_blkoff = 
cs->next_blkoff;
 
-   f2fs_allocate_new_section(sbi, type, true);
+   f2fs_allocate_new_section(sbi, type, true);
+   f2fs_notice(sbi, "Assign new section to curseg[%d]: "
+   "[0x%x,0x%x] -> [0x%x,0x%x]",
+   type, old_segno, old_blkoff,
+   cs->segno, cs->next_blkoff);
+   }
 
/* check consistency of the zone curseg pointed to */
if (check_zone_write_pointer(sbi, zbd, ))
-- 
2.43.0.rc2.451.g8631bc7472-goog



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH 4/4] f2fs: let's finish or reset zones all the time

2023-12-04 Thread Jaegeuk Kim
In order to limit # of open zones, let's finish or reset zones given # of
valid blocks per section and its zone condition.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/segment.c | 74 +++
 1 file changed, 17 insertions(+), 57 deletions(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 9081c9af977a..5696a4d381ff 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -4870,82 +4870,44 @@ static int check_zone_write_pointer(struct f2fs_sb_info 
*sbi,
struct f2fs_dev_info *fdev,
struct blk_zone *zone)
 {
-   unsigned int wp_segno, wp_blkoff, zone_secno, zone_segno, segno;
-   block_t zone_block, wp_block, last_valid_block;
+   unsigned int zone_segno;
+   block_t zone_block, wp_block, valid_block_cnt;
unsigned int log_sectors_per_block = sbi->log_blocksize - SECTOR_SHIFT;
-   int i, s, b, ret;
-   struct seg_entry *se;
+   int ret;
 
if (zone->type != BLK_ZONE_TYPE_SEQWRITE_REQ)
return 0;
 
wp_block = fdev->start_blk + (zone->wp >> log_sectors_per_block);
-   wp_segno = GET_SEGNO(sbi, wp_block);
-   wp_blkoff = wp_block - START_BLOCK(sbi, wp_segno);
zone_block = fdev->start_blk + (zone->start >> log_sectors_per_block);
zone_segno = GET_SEGNO(sbi, zone_block);
-   zone_secno = GET_SEC_FROM_SEG(sbi, zone_segno);
-
-   if (zone_segno >= MAIN_SEGS(sbi))
-   return 0;
 
/*
 * Skip check of zones cursegs point to, since
 * fix_curseg_write_pointer() checks them.
 */
-   for (i = 0; i < NO_CHECK_TYPE; i++)
-   if (zone_secno == GET_SEC_FROM_SEG(sbi,
-  CURSEG_I(sbi, i)->segno))
-   return 0;
+   if (zone_segno >= MAIN_SEGS(sbi) ||
+   IS_CURSEC(sbi, GET_SEC_FROM_SEG(sbi, zone_segno)))
+   return 0;
 
/*
-* Get last valid block of the zone.
+* Get # of valid block of the zone.
 */
-   last_valid_block = zone_block - 1;
-   for (s = sbi->segs_per_sec - 1; s >= 0; s--) {
-   segno = zone_segno + s;
-   se = get_seg_entry(sbi, segno);
-   for (b = sbi->blocks_per_seg - 1; b >= 0; b--)
-   if (f2fs_test_bit(b, se->cur_valid_map)) {
-   last_valid_block = START_BLOCK(sbi, segno) + b;
-   break;
-   }
-   if (last_valid_block >= zone_block)
-   break;
-   }
+   valid_block_cnt = get_valid_blocks(sbi, zone_segno, true);
 
-   /*
-* When safely unmounted in the previous mount, we can trust write
-* pointers. Otherwise, finish zones.
-*/
-   if (is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) {
-   /*
-* The write pointer matches with the valid blocks or
-* already points to the end of the zone.
-*/
-   if ((last_valid_block + 1 == wp_block) ||
-   (zone->wp == zone->start + zone->len))
-   return 0;
-   }
+   if ((!valid_block_cnt && zone->cond == BLK_ZONE_COND_EMPTY) ||
+   (valid_block_cnt && zone->cond == BLK_ZONE_COND_FULL))
+   return 0;
 
-   if (last_valid_block + 1 == zone_block) {
-   if (is_set_ckpt_flags(sbi, CP_UMOUNT_FLAG)) {
-   /*
-* If there is no valid block in the zone and if write
-* pointer is not at zone start, reset the write
-* pointer.
-*/
-   f2fs_notice(sbi,
- "Zone without valid block has non-zero write "
- "pointer. Reset the write pointer: wp[0x%x,0x%x]",
- wp_segno, wp_blkoff);
-   }
+   if (!valid_block_cnt) {
+   f2fs_notice(sbi, "Zone without valid block has non-zero write "
+   "pointer. Reset the write pointer: cond[0x%x]",
+   zone->cond);
ret = __f2fs_issue_discard_zone(sbi, fdev->bdev, zone_block,
zone->len >> log_sectors_per_block);
if (ret)
f2fs_err(sbi, "Discard zone failed: %s (errno=%d)",
 fdev->path, ret);
-
return ret;
}
 
@@ -4957,10 +4919,8 @@ static int check_zone_write_pointer(struct f2fs_sb_info 
*sbi,
 * selected for write operation until it get discarded.
 */
f2fs_notice(sbi, "Valid blocks are not aligned with write "
-   "pointer: valid block[0x%x,0x%x] wp[0x%x,0x%x]",
-   GET_SEGNO(sbi, 

[f2fs-dev] [PATCH] fsck.f2fs: run full scan if checkpoint is disabled

2023-12-04 Thread Jaegeuk Kim
Let's fix any inconsistency until checkpint being enabled back.

Signed-off-by: Jaegeuk Kim 
---
 fsck/mount.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fsck/mount.c b/fsck/mount.c
index e957904494ef..30c62280b281 100644
--- a/fsck/mount.c
+++ b/fsck/mount.c
@@ -1435,6 +1435,7 @@ static int f2fs_should_proceed(struct f2fs_super_block 
*sb, u32 flag)
 {
if (!c.fix_on && (c.auto_fix || c.preen_mode)) {
if (flag & CP_FSCK_FLAG ||
+   flag & CP_DISABLED_FLAG ||
flag & CP_QUOTA_NEED_FSCK_FLAG ||
c.abnormal_stop || c.fs_errors ||
(exist_qf_ino(sb) && (flag & CP_ERROR_FLAG))) {
-- 
2.43.0.rc2.451.g8631bc7472-goog



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH 3/4] f2fs: check write pointers when checkpoint=disable

2023-12-04 Thread Jaegeuk Kim
Even if f2fs was rebooted as staying checkpoint=disable, let's match the write
pointers all the time.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/super.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 617340e9ea7f..9a874b4d1501 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -4741,7 +4741,7 @@ static int f2fs_fill_super(struct super_block *sb, void 
*data, int silent)
 #ifdef CONFIG_QUOTA
f2fs_recover_quota_end(sbi, quota_enabled);
 #endif
-
+reset_checkpoint:
/*
 * If the f2fs is not readonly and fsync data recovery succeeds,
 * check zoned block devices' write pointer consistency.
@@ -4752,7 +4752,6 @@ static int f2fs_fill_super(struct super_block *sb, void 
*data, int silent)
goto free_meta;
}
 
-reset_checkpoint:
f2fs_init_inmem_curseg(sbi);
 
/* f2fs_recover_fsync_data() cleared this already */
-- 
2.43.0.rc2.451.g8631bc7472-goog



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH 2/4] f2fs: fix write pointers on zoned device after roll forward

2023-12-04 Thread Jaegeuk Kim
1. do roll forward recovery
2. update current segments pointers
3. fix the entire zones' write pointers
4. do checkpoint

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/recovery.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index 16415c770b45..d0f24ccbd1ac 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -917,6 +917,8 @@ int f2fs_recover_fsync_data(struct f2fs_sb_info *sbi, bool 
check_only)
if (!err && fix_curseg_write_pointer && !f2fs_readonly(sbi->sb) &&
f2fs_sb_has_blkzoned(sbi)) {
err = f2fs_fix_curseg_write_pointer(sbi);
+   if (!err)
+   err = f2fs_check_write_pointer(sbi);
ret = err;
}
 
-- 
2.43.0.rc2.451.g8631bc7472-goog



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] Performance drop due to alloc_workqueue() misuse and recent change

2023-12-04 Thread Tejun Heo
Hello,

On Mon, Dec 04, 2023 at 04:03:47PM +, Naohiro Aota wrote:
> Recently, commit 636b927eba5b ("workqueue: Make unbound workqueues to use
> per-cpu pool_workqueues") changed WQ_UNBOUND workqueue's behavior. It
> changed the meaning of alloc_workqueue()'s max_active from an upper limit
> imposed per NUMA node to a limit per CPU. As a result, massive number of
> workers can be running at the same time, especially if the workqueue user
> thinks the max_active is a global limit.
> 
> Actually, it is already written it is per-CPU limit in the documentation
> before the commit. However, several callers seem to misuse max_active,
> maybe thinking it is a global limit. It is an unexpected behavior change
> for them.

Right, and the behavior has been like that for a very long time and there
was no other way to achieve reasonable level of concurrency, so the current
situation is expected.

> For example, these callers set max_active = num_online_cpus(), which is a
> suspicious limit applying to per-CPU. This config means we can have nr_cpu
> * nr_cpu active tasks working at the same time.

Yeah, that sounds like a good indicator.

> fs/f2fs/data.c: sbi->post_read_wq = alloc_workqueue("f2fs_post_read_wq",
> fs/f2fs/data.c-  WQ_UNBOUND | 
> WQ_HIGHPRI,
> fs/f2fs/data.c-  num_online_cpus());
> 
> fs/crypto/crypto.c: fscrypt_read_workqueue = 
> alloc_workqueue("fscrypt_read_queue",
> fs/crypto/crypto.c-  WQ_UNBOUND | 
> WQ_HIGHPRI,
> fs/crypto/crypto.c-  
> num_online_cpus());
> 
> fs/verity/verify.c: fsverity_read_workqueue = 
> alloc_workqueue("fsverity_read_queue",
> fs/verity/verify.c-   WQ_HIGHPRI,
> fs/verity/verify.c-   
> num_online_cpus());
> 
> drivers/crypto/hisilicon/qm.c:  qm->wq = alloc_workqueue("%s", WQ_HIGHPRI | 
> WQ_MEM_RECLAIM |
> drivers/crypto/hisilicon/qm.c-   WQ_UNBOUND, 
> num_online_cpus(),
> drivers/crypto/hisilicon/qm.c-   pci_name(qm->pdev));
> 
> block/blk-crypto-fallback.c:blk_crypto_wq = 
> alloc_workqueue("blk_crypto_wq",
> block/blk-crypto-fallback.c-WQ_UNBOUND | 
> WQ_HIGHPRI |
> block/blk-crypto-fallback.c-
> WQ_MEM_RECLAIM, num_online_cpus());
> 
> drivers/md/dm-crypt.c:  cc->crypt_queue = 
> alloc_workqueue("kcryptd/%s",
> drivers/md/dm-crypt.c-
> WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM | WQ_UNBOUND,
> drivers/md/dm-crypt.c-
> num_online_cpus(), devname);

Most of these work items are CPU bound but not completley so. e.g.
kcrypt_crypt_write_continue() does wait_for_completion(), so setting
max_active to 1 likely isn't what they want either. They mostly want some
reasonable system-wide concurrency limit w.r.t. the CPU count while keeping
some level of flexibility in terms of task placement.

The previous max_active wasn't great for this because its meaning changed
depending on the number of nodes. Now, the meaning doesn't change but it's
not really useful for the above purpose. It's only useful for avoiding
melting the system completely.

One way to go about it is to declare that concurrency level management for
unbound workqueue is on users but that seems not ideal given many use cases
would want it anyway.

Let me think it over but I think the right way to go about it is going the
other direction - ie. making max_active apply to the whole system regardless
of the number of nodes / ccx's / whatever.

> Furthermore, the change affects performance in a certain case.
> 
> Btrfs creates several WQ_UNBOUND workqueues with a default max_active =
> min(NRCPUS + 2, 8). As my machine has 96 CPUs with NUMA disabled, this
> max_active config allows running over 700 active works. Before the commit,
> it is limited to 8 if NUMA is disabled or limited to 16 if NUMA nodes is 2.
> 
> I reverted the workqueue code back to before the commit, and I ran the
> following fio command on RAID0 btrfs on 6 SSDs.
> 
> fio --group_reporting --eta=always --eta-interval=30s --eta-newline=30s \
> --rw=write --fallocate=none \
> --direct=1 --ioengine=libaio --iodepth=32 \
> --filesize=100G \
> --blocksize=64k \
> --time_based --runtime=300s \
> --end_fsync=1 \
> --directory=${MNT} \
> --name=writer --numjobs=32
> 
> By changing workqueue's max_active, the result varies.
> 
> - wq max_active=8   (intended limit by btrfs?)
>   WRITE: bw=2495MiB/s (2616MB/s), 2495MiB/s-2495MiB/s (2616MB/s-2616MB/s), 
> io=753GiB (808GB), run=308953-308953msec
> - wq max_active=16  (actual limit on 2 NUMA nodes setup)
>   WRITE: bw=1736MiB/s (1820MB/s), 1736MiB/s-1736MiB/s (1820MB/s-1820MB/s), 
> io=670GiB (720GB), 

[f2fs-dev] [PATCH v3] f2fs: Restrict max filesize for 16K f2fs

2023-12-04 Thread Daniel Rosenberg via Linux-f2fs-devel
Blocks are tracked by u32, so the max permitted filesize is
(U32_MAX + 1) * BLOCK_SIZE. Additionally, in order to support crypto
data unit sizes of 4K with a 16K block with IV_INO_LBLK_{32,64}, we must
further restrict max filesize to (U32_MAX + 1) * 4096. This does not
affect 4K blocksize f2fs as the natural limit for files are well below
that.

Fixes: d7e9a9037de2 ("f2fs: Support Block Size == Page Size")
Signed-off-by: Daniel Rosenberg 
---
 fs/f2fs/super.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 033af907c3b1..5dfbc6b4c0ac 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -3364,6 +3364,14 @@ loff_t max_file_blocks(struct inode *inode)
leaf_count *= NIDS_PER_BLOCK;
result += leaf_count;
 
+   /*
+* For compatibility with FSCRYPT_POLICY_FLAG_IV_INO_LBLK_{64,32} with
+* a 4K crypto data unit, we must restrict the max filesize to what can
+* fit within U32_MAX + 1 data units.
+*/
+
+   result = min(result, (((loff_t)U32_MAX + 1) * 4096) >> 
F2FS_BLKSIZE_BITS);
+
return result;
 }
 

base-commit: d346fa09abff46988de9267b67b6900d9913d5a2
-- 
2.43.0.rc2.451.g8631bc7472-goog



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] Weird EROFS data corruption

2023-12-04 Thread Gao Xiang

Hi Juhyung,

On 2023/12/4 11:41, Juhyung Park wrote:

...




- Could you share the full message about the output of `lscpu`?


Sure:

Architecture:x86_64
   CPU op-mode(s):32-bit, 64-bit
   Address sizes: 39 bits physical, 48 bits virtual
   Byte Order:Little Endian
CPU(s):  8
   On-line CPU(s) list:   0-7
Vendor ID:   GenuineIntel
   BIOS Vendor ID:Intel(R) Corporation
   Model name:11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
 BIOS Model name: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz None 
CPU
   @ 3.0GHz
 BIOS CPU family: 198
 CPU family:  6
 Model:   140
 Thread(s) per core:  2
 Core(s) per socket:  4
 Socket(s):   1
 Stepping:1
 CPU(s) scaling MHz:  60%
 CPU max MHz: 4800.
 CPU min MHz: 400.
 BogoMIPS:5990.40
 Flags:   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mc
  a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
  ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc 
art
   arch_perfmon pebs bts rep_good nopl xtopology 
nonstop_
  tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq 
dtes6
  4 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 
xt
  pr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt 
tsc_dead
  line_timer aes xsave avx f16c rdrand lahf_lm abm 
3dnowp
  refetch cpuid_fault epb cat_l2 cdp_l2 ssbd ibrs ibpb 
st
  ibp ibrs_enhanced tpr_shadow flexpriority ept vpid 
ept_
  ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms 
invpcid
   rdt_a avx512f avx512dq rdseed adx smap avx512ifma 
clfl
  ushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl
  xsaveopt xsavec xgetbv1 xsaves split_lock_detect 
dtherm
   ida arat pln pts hwp hwp_notify hwp_act_window 
hwp_epp
   hwp_pkg_req vnmi avx512vbmi umip pku ospke 
avx512_vbmi
  2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme 
av
  x512_vpopcntdq rdpid movdiri movdir64b fsrm 
avx512_vp2i


Sigh, I've been thinking.  Here FSRM is the most significant difference between
our environments, could you only try the following diff to see if there's any
difference anymore? (without the previous disable patch.)

diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
index 1b60ae81ecd8..1b52a913233c 100644
--- a/arch/x86/lib/memmove_64.S
+++ b/arch/x86/lib/memmove_64.S
@@ -41,9 +41,7 @@ SYM_FUNC_START(__memmove)
 #define CHECK_LEN  cmp $0x20, %rdx; jb 1f
 #define MEMMOVE_BYTES  movq %rdx, %rcx; rep movsb; RET
 .Lmemmove_begin_forward:
-   ALTERNATIVE_2 __stringify(CHECK_LEN), \
- __stringify(CHECK_LEN; MEMMOVE_BYTES), X86_FEATURE_ERMS, \
- __stringify(MEMMOVE_BYTES), X86_FEATURE_FSRM
+   CHECK_LEN
 
 	/*

 * movsq instruction have many startup latency

Thanks,
Gao Xiang


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] Performance drop due to alloc_workqueue() misuse and recent change

2023-12-04 Thread Naohiro Aota via Linux-f2fs-devel
Recently, commit 636b927eba5b ("workqueue: Make unbound workqueues to use
per-cpu pool_workqueues") changed WQ_UNBOUND workqueue's behavior. It
changed the meaning of alloc_workqueue()'s max_active from an upper limit
imposed per NUMA node to a limit per CPU. As a result, massive number of
workers can be running at the same time, especially if the workqueue user
thinks the max_active is a global limit.

Actually, it is already written it is per-CPU limit in the documentation
before the commit. However, several callers seem to misuse max_active,
maybe thinking it is a global limit. It is an unexpected behavior change
for them.

For example, these callers set max_active = num_online_cpus(), which is a
suspicious limit applying to per-CPU. This config means we can have nr_cpu
* nr_cpu active tasks working at the same time.

fs/f2fs/data.c: sbi->post_read_wq = alloc_workqueue("f2fs_post_read_wq",
fs/f2fs/data.c-  WQ_UNBOUND | 
WQ_HIGHPRI,
fs/f2fs/data.c-  num_online_cpus());

fs/crypto/crypto.c: fscrypt_read_workqueue = 
alloc_workqueue("fscrypt_read_queue",
fs/crypto/crypto.c-  WQ_UNBOUND | 
WQ_HIGHPRI,
fs/crypto/crypto.c-  
num_online_cpus());

fs/verity/verify.c: fsverity_read_workqueue = 
alloc_workqueue("fsverity_read_queue",
fs/verity/verify.c-   WQ_HIGHPRI,
fs/verity/verify.c-   
num_online_cpus());

drivers/crypto/hisilicon/qm.c:  qm->wq = alloc_workqueue("%s", WQ_HIGHPRI | 
WQ_MEM_RECLAIM |
drivers/crypto/hisilicon/qm.c-   WQ_UNBOUND, 
num_online_cpus(),
drivers/crypto/hisilicon/qm.c-   pci_name(qm->pdev));

block/blk-crypto-fallback.c:blk_crypto_wq = alloc_workqueue("blk_crypto_wq",
block/blk-crypto-fallback.c-WQ_UNBOUND | 
WQ_HIGHPRI |
block/blk-crypto-fallback.c-WQ_MEM_RECLAIM, 
num_online_cpus());

drivers/md/dm-crypt.c:  cc->crypt_queue = alloc_workqueue("kcryptd/%s",
drivers/md/dm-crypt.c-
WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM | WQ_UNBOUND,
drivers/md/dm-crypt.c-
num_online_cpus(), devname);

Furthermore, the change affects performance in a certain case.

Btrfs creates several WQ_UNBOUND workqueues with a default max_active =
min(NRCPUS + 2, 8). As my machine has 96 CPUs with NUMA disabled, this
max_active config allows running over 700 active works. Before the commit,
it is limited to 8 if NUMA is disabled or limited to 16 if NUMA nodes is 2.

I reverted the workqueue code back to before the commit, and I ran the
following fio command on RAID0 btrfs on 6 SSDs.

fio --group_reporting --eta=always --eta-interval=30s --eta-newline=30s \
--rw=write --fallocate=none \
--direct=1 --ioengine=libaio --iodepth=32 \
--filesize=100G \
--blocksize=64k \
--time_based --runtime=300s \
--end_fsync=1 \
--directory=${MNT} \
--name=writer --numjobs=32

By changing workqueue's max_active, the result varies.

- wq max_active=8   (intended limit by btrfs?)
  WRITE: bw=2495MiB/s (2616MB/s), 2495MiB/s-2495MiB/s (2616MB/s-2616MB/s), 
io=753GiB (808GB), run=308953-308953msec
- wq max_active=16  (actual limit on 2 NUMA nodes setup)
  WRITE: bw=1736MiB/s (1820MB/s), 1736MiB/s-1736MiB/s (1820MB/s-1820MB/s), 
io=670GiB (720GB), run=395532-395532msec
- wq max_active=768 (simulating current limit)
  WRITE: bw=1276MiB/s (1338MB/s), 1276MiB/s-1276MiB/s (1338MB/s-1338MB/s), 
io=375GiB (403GB), run=300984-300984msec

The current performance is slower than the previous limit (max_active=16)
by 27%, or it is 50% slower than the intended limit.  The performance drop
might be due to contention of the btrfs-endio-write works. There are over
700 kworker instances were created and 100 works are on the 'D' state
competing for a lock.

More specifically, I tested the same workload on the commit.

- At commit 636b927eba5b ("workqueue: Make unbound workqueues to use per-cpu 
pool_workqueues")
  WRITE: bw=1191MiB/s (1249MB/s), 1191MiB/s-1191MiB/s (1249MB/s-1249MB/s), 
io=350GiB (376GB), run=300714-300714msec
- At the previous commit = 4cbfd3de73 ("workqueue: Call 
wq_update_unbound_numa() on all CPUs in NUMA node on CPU hotplug")
  WRITE: bw=1747MiB/s (1832MB/s), 1747MiB/s-1747MiB/s (1832MB/s-1832MB/s), 
io=748GiB (803GB), run=438134-438134msec

So, it is -31.8% performance down with the commit.

In summary, we misuse max_active, considering it is a global limit. And,
the recent commit introduced a huge performance drop in some cases.  We
need to review alloc_workqueue() usage to check if its max_active setting
is proper or not.

___
Linux-f2fs-devel mailing list