Re: [f2fs-dev] (2) [PATCH v6] f2fs: New victim selection for GC

2024-02-20 Thread Yonggil Song
> On Tue, Feb 13, 2024 at 5:36 PM Yonggil Song  wrote:
> >
> >
> > Overview
> > 
> >
> > This patch introduces a new way to preference data sections when selecting
> > GC victims. Migration of data blocks causes invalidation of node blocks.
> > Therefore, in situations where GC is frequent, selecting data blocks as
> > victims can reduce unnecessary block migration by invalidating node blocks.
> 
> Your approach will allocate new node blocks despite invalidating
> current node blocks while moving data blocks, though. While your
> approach may work well relating to WAF in a specific scenario, such as
> randomly overwriting an entire storage space with a huge file, it is
> important to consider its general applicability. For example, how
> about the test performance? Performance optimization should encompass
> a wide range of user scenarios. However, I am not convinced that this
> is the most efficient solution for most users. Can you provide more
> information about how your approach addresses the performance needs of
> a broader spectrum of user scenarios?
> 

Thank you for your review and feedback. I agree with your opinion.
I'll research and develop this approach for the user scenario.

> > For exceptional situations where free sections are insufficient, node blocks
> > are selected as victims instead of data blocks to get extra free sections.
> >
> > Problem
> > ===
> >
> > If the total amount of nodes is larger than the size of one section, nodes
> > occupy multiple sections, and node victims are often selected because the
> > gc cost is lowered by data block migration in GC. Since moving the data
> > section causes frequent node victim selection, victim threshing occurs in
> > the node section. This results in an increase in WAF.
> >
> > Experiment
> > ==
> >
> > Test environment is as follows.
> >
> > System info
> >   - 3.6GHz, 16 core CPU
> >   - 36GiB Memory
> > Device info
> >   - a conventional null_blk with 228MiB
> >   - a sequential null_blk with 4068 zones of 8MiB
> > Format
> >   - mkfs.f2fs  -c  -m -Z 8 -o 3.89
> > Mount
> >   - mount  
> > Fio script
> >   - fio --rw=randwrite --bs=4k --ba=4k --filesize=31187m 
> > --norandommap --overwrite=1 --name=job1 --filename=./mnt/sustain 
> > --io_size=128g
> > WAF calculation
> >   - (IOs on conv. null_blk + IOs on seq. null_blk) / random write 
> > IOs
> >
> > Conclusion
> > ==
> >
> > This experiment showed that the WAF was reduced by 29% (18.75 -> 13.3) when
> > the data section was selected first when selecting GC victims. This was
> > achieved by reducing the migration of the node blocks by 69.4%
> > (253,131,743 blks -> 77,463,278 blks). It is possible to achieve low WAF
> > performance with the GC victim selection method in environments where the
> > section size is relatively small.
> >
> > Signed-off-by: Yonggil Song 
> > ---
> >  fs/f2fs/f2fs.h |  1 +
> >  fs/f2fs/gc.c   | 96 +++---
> >  fs/f2fs/gc.h   |  6 
> >  3 files changed, 82 insertions(+), 21 deletions(-)
> >
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index 65294e3b0bef..b129f62ba541 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -1654,6 +1654,7 @@ struct f2fs_sb_info {
> > struct f2fs_mount_info mount_opt;   /* mount options */
> >
> > /* for cleaning operations */
> > +   bool require_node_gc;   /* flag for node GC */
> > struct f2fs_rwsem gc_lock;  /*
> >  * semaphore for GC, avoid
> >  * race between GC and GC 
> > or CP
> > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> > index a079eebfb080..53a51a668567 100644
> > --- a/fs/f2fs/gc.c
> > +++ b/fs/f2fs/gc.c
> > @@ -341,6 +341,14 @@ static unsigned int get_cb_cost(struct f2fs_sb_info 
> > *sbi, unsigned int segno)
> > unsigned int i;
> > unsigned int usable_segs_per_sec = f2fs_usable_segs_in_sec(sbi, 
> > segno);
> >
> > +   /*
> > +* When BG_GC selects victims based on age, it prevents node victims
> > +* from being selected. This is because node blocks can be 
> > invalidated
> > +* by moving data blocks.
> > +*/
> > +   if (__s

[f2fs-dev] [PATCH v6] f2fs: New victim selection for GC

2024-02-13 Thread Yonggil Song


Overview


This patch introduces a new way to preference data sections when selecting
GC victims. Migration of data blocks causes invalidation of node blocks.
Therefore, in situations where GC is frequent, selecting data blocks as
victims can reduce unnecessary block migration by invalidating node blocks.
For exceptional situations where free sections are insufficient, node blocks
are selected as victims instead of data blocks to get extra free sections.

Problem
===

If the total amount of nodes is larger than the size of one section, nodes
occupy multiple sections, and node victims are often selected because the
gc cost is lowered by data block migration in GC. Since moving the data
section causes frequent node victim selection, victim threshing occurs in
the node section. This results in an increase in WAF.

Experiment
==

Test environment is as follows.

System info
  - 3.6GHz, 16 core CPU
  - 36GiB Memory
Device info
  - a conventional null_blk with 228MiB
  - a sequential null_blk with 4068 zones of 8MiB
Format
  - mkfs.f2fs  -c  -m -Z 8 -o 3.89
Mount
  - mount  
Fio script
  - fio --rw=randwrite --bs=4k --ba=4k --filesize=31187m --norandommap 
--overwrite=1 --name=job1 --filename=./mnt/sustain --io_size=128g
WAF calculation
  - (IOs on conv. null_blk + IOs on seq. null_blk) / random write IOs

Conclusion
==

This experiment showed that the WAF was reduced by 29% (18.75 -> 13.3) when
the data section was selected first when selecting GC victims. This was
achieved by reducing the migration of the node blocks by 69.4%
(253,131,743 blks -> 77,463,278 blks). It is possible to achieve low WAF
performance with the GC victim selection method in environments where the
section size is relatively small.

Signed-off-by: Yonggil Song 
---
 fs/f2fs/f2fs.h |  1 +
 fs/f2fs/gc.c   | 96 +++---
 fs/f2fs/gc.h   |  6 
 3 files changed, 82 insertions(+), 21 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 65294e3b0bef..b129f62ba541 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1654,6 +1654,7 @@ struct f2fs_sb_info {
struct f2fs_mount_info mount_opt;   /* mount options */
 
/* for cleaning operations */
+   bool require_node_gc;   /* flag for node GC */
struct f2fs_rwsem gc_lock;  /*
 * semaphore for GC, avoid
 * race between GC and GC or CP
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index a079eebfb080..53a51a668567 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -341,6 +341,14 @@ static unsigned int get_cb_cost(struct f2fs_sb_info *sbi, 
unsigned int segno)
unsigned int i;
unsigned int usable_segs_per_sec = f2fs_usable_segs_in_sec(sbi, segno);
 
+   /*
+* When BG_GC selects victims based on age, it prevents node victims
+* from being selected. This is because node blocks can be invalidated
+* by moving data blocks.
+*/
+   if (__skip_node_gc(sbi, segno))
+   return UINT_MAX;
+
for (i = 0; i < usable_segs_per_sec; i++)
mtime += get_seg_entry(sbi, start + i)->mtime;
vblocks = get_valid_blocks(sbi, segno, true);
@@ -369,10 +377,24 @@ static inline unsigned int get_gc_cost(struct 
f2fs_sb_info *sbi,
return get_seg_entry(sbi, segno)->ckpt_valid_blocks;
 
/* alloc_mode == LFS */
-   if (p->gc_mode == GC_GREEDY)
-   return get_valid_blocks(sbi, segno, true);
-   else if (p->gc_mode == GC_CB)
+   if (p->gc_mode == GC_GREEDY) {
+   /*
+* If the data block that the node block pointed to is GCed,
+* the node block is invalidated. For this reason, we add a
+* weight to cost of node victims to give priority to data
+* victims during the gc process. However, in a situation
+* where we run out of free sections, we remove the weight
+* because we need to clean up node blocks.
+*/
+   unsigned int weight = 0;
+
+   if (__skip_node_gc(sbi, segno))
+   weight = BLKS_PER_SEC(sbi);
+
+   return get_valid_blocks(sbi, segno, true) + weight;
+   } else if (p->gc_mode == GC_CB) {
return get_cb_cost(sbi, segno);
+   }
 
f2fs_bug_on(sbi, 1);
return 0;
@@ -557,6 +579,14 @@ static void atgc_lookup_victim(struct f2fs_sb_info *sbi,
if (ve->mtime >= max_mtime || ve->mtime < min_mtime)
goto skip;
 
+   /*
+* When BG_GC selects victims based on age, it prevents node victims
+* from being selected. This is because node blocks can be inval

[f2fs-dev] [PATCH v5] f2fs: New victim selection for GC

2024-01-22 Thread Yonggil Song
Overview


This patch introduces a new way to preference data sections when selecting
GC victims. Migration of data blocks causes invalidation of node blocks.
Therefore, in situations where GC is frequent, selecting data blocks as
victims can reduce unnecessary block migration by invalidating node blocks.
For exceptional situations where free sections are insufficient, node blocks
are selected as victims instead of data blocks to get extra free sections.

Problem
===

If the total amount of nodes is larger than the size of one section, nodes
occupy multiple sections, and node victims are often selected because the
gc cost is lowered by data block migration in GC. Since moving the data
section causes frequent node victim selection, victim threshing occurs in
the node section. This results in an increase in WAF.

Experiment
==

Test environment is as follows.

System info
  - 3.6GHz, 16 core CPU
  - 36GiB Memory
Device info
  - a conventional null_blk with 228MiB
  - a sequential null_blk with 4068 zones of 8MiB
Format
  - mkfs.f2fs  -c  -m -Z 8 -o 3.89
Mount
  - mount  
Fio script
  - fio --rw=randwrite --bs=4k --ba=4k --filesize=31187m --norandommap 
--overwrite=1 --name=job1 --filename=./mnt/sustain --io_size=128g
WAF calculation
  - (IOs on conv. null_blk + IOs on seq. null_blk) / random write IOs

Conclusion
==

This experiment showed that the WAF was reduced by 29% (18.75 -> 13.3) when
the data section was selected first when selecting GC victims. This was
achieved by reducing the migration of the node blocks by 69.4%
(253,131,743 blks -> 77,463,278 blks). It is possible to achieve low WAF
performance with the GC victim selection method in environments where the
section size is relatively small.

Signed-off-by: Yonggil Song 
---
 fs/f2fs/f2fs.h |  1 +
 fs/f2fs/gc.c   | 96 +++---
 fs/f2fs/gc.h   |  6 
 3 files changed, 82 insertions(+), 21 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 9043cedfa12b..b2c0adfb2704 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1649,6 +1649,7 @@ struct f2fs_sb_info {
struct f2fs_mount_info mount_opt;   /* mount options */
 
/* for cleaning operations */
+   bool require_node_gc;   /* flag for node GC */
struct f2fs_rwsem gc_lock;  /*
 * semaphore for GC, avoid
 * race between GC and GC or CP
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index f550cdeaa663..ae1e960eaf5a 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -341,6 +341,14 @@ static unsigned int get_cb_cost(struct f2fs_sb_info *sbi, 
unsigned int segno)
unsigned int i;
unsigned int usable_segs_per_sec = f2fs_usable_segs_in_sec(sbi, segno);
 
+   /*
+* When BG_GC selects victims based on age, it prevents node victims
+* from being selected. This is because node blocks can be invalidated
+* by moving data blocks.
+*/
+   if (__skip_node_gc(sbi, segno))
+   return UINT_MAX;
+
for (i = 0; i < usable_segs_per_sec; i++)
mtime += get_seg_entry(sbi, start + i)->mtime;
vblocks = get_valid_blocks(sbi, segno, true);
@@ -369,10 +377,24 @@ static inline unsigned int get_gc_cost(struct 
f2fs_sb_info *sbi,
return get_seg_entry(sbi, segno)->ckpt_valid_blocks;
 
/* alloc_mode == LFS */
-   if (p->gc_mode == GC_GREEDY)
-   return get_valid_blocks(sbi, segno, true);
-   else if (p->gc_mode == GC_CB)
+   if (p->gc_mode == GC_GREEDY) {
+   /*
+* If the data block that the node block pointed to is GCed,
+* the node block is invalidated. For this reason, we add a
+* weight to cost of node victims to give priority to data
+* victims during the gc process. However, in a situation
+* where we run out of free sections, we remove the weight
+* because we need to clean up node blocks.
+*/
+   unsigned int weight = 0;
+
+   if (__skip_node_gc(sbi, segno))
+   weight = sbi->segs_per_sec << sbi->log_blocks_per_seg;
+
+   return get_valid_blocks(sbi, segno, true) + weight;
+   } else if (p->gc_mode == GC_CB) {
return get_cb_cost(sbi, segno);
+   }
 
f2fs_bug_on(sbi, 1);
return 0;
@@ -557,6 +579,14 @@ static void atgc_lookup_victim(struct f2fs_sb_info *sbi,
if (ve->mtime >= max_mtime || ve->mtime < min_mtime)
goto skip;
 
+   /*
+* When BG_GC selects victims based on age, it prevents node victims
+* from being selected. This is because 

Re: [f2fs-dev] (2) [PATCH v4] f2fs: New victim selection for GC

2024-01-03 Thread Yonggil Song
> On 12/28, Yonggil Song wrote:
> > >From d08b97183bc830779c82b83d94f8b75ad11cb29a Mon Sep 17 00:00:00 2001
> > From: Yonggil Song 
> > Date: Thu, 7 Dec 2023 16:34:38 +0900
> > Subject: [PATCH v4] f2fs: New victim selection for GC
> > 
> > Overview
> > 
> > 
> > This patch introduces a new way to preference data sections when selecting
> > GC victims. Migration of data blocks causes invalidation of node blocks.
> > Therefore, in situations where GC is frequent, selecting data blocks as
> > victims can reduce unnecessary block migration by invalidating node blocks.
> > For exceptional situations where free sections are insufficient, node blocks
> > are selected as victims instead of data blocks to get extra free sections.
> > 
> > Problem
> > ===
> > 
> > If the total amount of nodes is larger than the size of one section, nodes
> > occupy multiple sections, and node victims are often selected because the
> > gc cost is lowered by data block migration in GC. Since moving the data
> > section causes frequent node victim selection, victim threshing occurs in
> > the node section. This results in an increase in WAF.
> > 
> > Experiment
> > ==
> > 
> > Test environment is as follows.
> > 
> > System info
> >   - 3.6GHz, 16 core CPU
> >   - 36GiB Memory
> > Device info
> >   - a conventional null_blk with 228MiB
> >   - a sequential null_blk with 4068 zones of 8MiB
> > Format
> >   - mkfs.f2fs  -c  -m -Z 8 -o 3.89
> > Mount
> >   - mount  
> > Fio script
> >   - fio --rw=randwrite --bs=4k --ba=4k --filesize=31187m --norandommap 
> > --overwrite=1 --name=job1 --filename=./mnt/sustain --io_size=128g
> > WAF calculation
> >   - (IOs on conv. null_blk + IOs on seq. null_blk) / random write IOs
> > 
> > Conclusion
> > ==
> > 
> > This experiment showed that the WAF was reduced by 29% (18.75 -> 13.3) when
> > the data section was selected first when selecting GC victims. This was
> > achieved by reducing the migration of the node blocks by 69.4%
> > (253,131,743 blks -> 77,463,278 blks). It is possible to achieve low WAF
> > performance with the GC victim selection method in environments where the
> > section size is relatively small.
> > 
> > Signed-off-by: Yonggil Song 
> > ---
> >  fs/f2fs/f2fs.h |  1 +
> >  fs/f2fs/gc.c   | 99 +++---
> >  fs/f2fs/gc.h   |  6 +++
> >  3 files changed, 85 insertions(+), 21 deletions(-)
> > 
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index 9043cedfa12b..b2c0adfb2704 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -1649,6 +1649,7 @@ struct f2fs_sb_info {
> > struct f2fs_mount_info mount_opt;   /* mount options */
> >  
> > /* for cleaning operations */
> > +   bool require_node_gc;   /* flag for node GC */
> > struct f2fs_rwsem gc_lock;  /*
> >  * semaphore for GC, avoid
> >  * race between GC and GC or CP
> > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> > index f550cdeaa663..d8a81a6ed325 100644
> > --- a/fs/f2fs/gc.c
> > +++ b/fs/f2fs/gc.c
> > @@ -341,6 +341,14 @@ static unsigned int get_cb_cost(struct f2fs_sb_info 
> > *sbi, unsigned int segno)
> > unsigned int i;
> > unsigned int usable_segs_per_sec = f2fs_usable_segs_in_sec(sbi, segno);
> >  
> > +   /*
> > +* When BG_GC selects victims based on age, it prevents node victims
> > +* from being selected. This is because node blocks can be invalidated
> > +* by moving data blocks.
> > +*/
> > +   if (__skip_node_gc(sbi, segno))
> > +   return UINT_MAX;
> > +
> > for (i = 0; i < usable_segs_per_sec; i++)
> > mtime += get_seg_entry(sbi, start + i)->mtime;
> > vblocks = get_valid_blocks(sbi, segno, true);
> > @@ -369,10 +377,24 @@ static inline unsigned int get_gc_cost(struct 
> > f2fs_sb_info *sbi,
> > return get_seg_entry(sbi, segno)->ckpt_valid_blocks;
> >  
> > /* alloc_mode == LFS */
> > -   if (p->gc_mode == GC_GREEDY)
> > -   return get_valid_blocks(sbi, segno, true);
> > -   else if (p->gc_mode == GC_CB)
> > +   if (p->gc_mode == GC_GREEDY) {
> > +   /*
> > +*

[f2fs-dev] [PATCH v4] f2fs: New victim selection for GC

2023-12-27 Thread Yonggil Song
>From d08b97183bc830779c82b83d94f8b75ad11cb29a Mon Sep 17 00:00:00 2001
From: Yonggil Song 
Date: Thu, 7 Dec 2023 16:34:38 +0900
Subject: [PATCH v4] f2fs: New victim selection for GC

Overview


This patch introduces a new way to preference data sections when selecting
GC victims. Migration of data blocks causes invalidation of node blocks.
Therefore, in situations where GC is frequent, selecting data blocks as
victims can reduce unnecessary block migration by invalidating node blocks.
For exceptional situations where free sections are insufficient, node blocks
are selected as victims instead of data blocks to get extra free sections.

Problem
===

If the total amount of nodes is larger than the size of one section, nodes
occupy multiple sections, and node victims are often selected because the
gc cost is lowered by data block migration in GC. Since moving the data
section causes frequent node victim selection, victim threshing occurs in
the node section. This results in an increase in WAF.

Experiment
==

Test environment is as follows.

System info
  - 3.6GHz, 16 core CPU
  - 36GiB Memory
Device info
  - a conventional null_blk with 228MiB
  - a sequential null_blk with 4068 zones of 8MiB
Format
  - mkfs.f2fs  -c  -m -Z 8 -o 3.89
Mount
  - mount  
Fio script
  - fio --rw=randwrite --bs=4k --ba=4k --filesize=31187m --norandommap 
--overwrite=1 --name=job1 --filename=./mnt/sustain --io_size=128g
WAF calculation
  - (IOs on conv. null_blk + IOs on seq. null_blk) / random write IOs

Conclusion
==

This experiment showed that the WAF was reduced by 29% (18.75 -> 13.3) when
the data section was selected first when selecting GC victims. This was
achieved by reducing the migration of the node blocks by 69.4%
(253,131,743 blks -> 77,463,278 blks). It is possible to achieve low WAF
performance with the GC victim selection method in environments where the
section size is relatively small.

Signed-off-by: Yonggil Song 
---
 fs/f2fs/f2fs.h |  1 +
 fs/f2fs/gc.c   | 99 +++---
 fs/f2fs/gc.h   |  6 +++
 3 files changed, 85 insertions(+), 21 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 9043cedfa12b..b2c0adfb2704 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1649,6 +1649,7 @@ struct f2fs_sb_info {
struct f2fs_mount_info mount_opt;   /* mount options */
 
/* for cleaning operations */
+   bool require_node_gc;   /* flag for node GC */
struct f2fs_rwsem gc_lock;  /*
 * semaphore for GC, avoid
 * race between GC and GC or CP
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index f550cdeaa663..d8a81a6ed325 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -341,6 +341,14 @@ static unsigned int get_cb_cost(struct f2fs_sb_info *sbi, 
unsigned int segno)
unsigned int i;
unsigned int usable_segs_per_sec = f2fs_usable_segs_in_sec(sbi, segno);
 
+   /*
+* When BG_GC selects victims based on age, it prevents node victims
+* from being selected. This is because node blocks can be invalidated
+* by moving data blocks.
+*/
+   if (__skip_node_gc(sbi, segno))
+   return UINT_MAX;
+
for (i = 0; i < usable_segs_per_sec; i++)
mtime += get_seg_entry(sbi, start + i)->mtime;
vblocks = get_valid_blocks(sbi, segno, true);
@@ -369,10 +377,24 @@ static inline unsigned int get_gc_cost(struct 
f2fs_sb_info *sbi,
return get_seg_entry(sbi, segno)->ckpt_valid_blocks;
 
/* alloc_mode == LFS */
-   if (p->gc_mode == GC_GREEDY)
-   return get_valid_blocks(sbi, segno, true);
-   else if (p->gc_mode == GC_CB)
+   if (p->gc_mode == GC_GREEDY) {
+   /*
+* If the data block that the node block pointed to is GCed,
+* the node block is invalidated. For this reason, we add a
+* weight to cost of node victims to give priority to data
+* victims during the gc process. However, in a situation
+* where we run out of free sections, we remove the weight
+* because we need to clean up node blocks.
+*/
+   unsigned int cost = get_valid_blocks(sbi, segno, true);
+
+   if (__skip_node_gc(sbi, segno))
+   return cost +
+   (sbi->segs_per_sec << sbi->log_blocks_per_seg);
+   return cost;
+   } else if (p->gc_mode == GC_CB) {
return get_cb_cost(sbi, segno);
+   }
 
f2fs_bug_on(sbi, 1);
return 0;
@@ -557,6 +579,14 @@ static void atgc_lookup_victim(struct f2fs_sb_info *sbi,
if (ve-&g

Re: [f2fs-dev] (2) [PATCH v3] f2fs: New victim selection for GC

2023-12-26 Thread Yonggil Song
> On 12/21, Yonggil Song wrote:
> > Overview
> > 
> > 
> > This patch introduces a new way to preference data sections when selecting
> > GC victims. Migration of data blocks causes invalidation of node blocks.
> > Therefore, in situations where GC is frequent, selecting data blocks as
> > victims can reduce unnecessary block migration by invalidating node blocks.
> > For exceptional situations where free sections are insufficient, node blocks
> > are selected as victims instead of data blocks to get extra free sections.
> > 
> > Problem
> > ===
> > 
> > If the total amount of nodes is larger than the size of one section, nodes
> > occupy multiple sections, and node victims are often selected because the
> > gc cost is lowered by data block migration in GC. Since moving the data
> > section causes frequent node victim selection, victim threshing occurs in
> > the node section. This results in an increase in WAF.
> > 
> > Experiment
> > ==
> > 
> > Test environment is as follows.
> > 
> > System info
> >   - 3.6GHz, 16 core CPU
> >   - 36GiB Memory
> > Device info
> >   - a conventional null_blk with 228MiB
> >   - a sequential null_blk with 4068 zones of 8MiB
> > Format
> >   - mkfs.f2fs  -c  -m -Z 8 -o 3.89
> > Mount
> >   - mount  
> > Fio script
> >   - fio --rw=randwrite --bs=4k --ba=4k --filesize=31187m --norandommap 
> > --overwrite=1 --name=job1 --filename=./mnt/sustain --io_size=128g
> > WAF calculation
> >   - (IOs on conv. null_blk + IOs on seq. null_blk) / random write IOs
> > 
> > Conclusion
> > ==
> > 
> > This experiment showed that the WAF was reduced by 29% (18.75 -> 13.3) when
> > the data section was selected first when selecting GC victims. This was
> > achieved by reducing the migration of the node blocks by 69.4%
> > (253,131,743 blks -> 77,463,278 blks). It is possible to achieve low WAF
> > performance with the GC victim selection method in environments where the
> > section size is relatively small.
> > 
> > Signed-off-by: Yonggil Song 
> > ---
> >  fs/f2fs/f2fs.h |   1 +
> >  fs/f2fs/gc.c   | 102 +++--
> >  fs/f2fs/gc.h   |   6 +++
> >  3 files changed, 88 insertions(+), 21 deletions(-)
> > 
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index 9043cedfa12b..578d57f6022f 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -1649,6 +1649,7 @@ struct f2fs_sb_info {
> > struct f2fs_mount_info mount_opt;   /* mount options */
> >  
> > /* for cleaning operations */
> > +   bool need_node_clean;   /* need to clean dirty nodes */
> 
>   bool require_node_gc;
> 
> > struct f2fs_rwsem gc_lock;  /*
> >  * semaphore for GC, avoid
> >  * race between GC and GC or CP
> > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> > index f550cdeaa663..da963765e087 100644
> > --- a/fs/f2fs/gc.c
> > +++ b/fs/f2fs/gc.c
> > @@ -341,6 +341,14 @@ static unsigned int get_cb_cost(struct f2fs_sb_info 
> > *sbi, unsigned int segno)
> > unsigned int i;
> > unsigned int usable_segs_per_sec = f2fs_usable_segs_in_sec(sbi, segno);
> >  
> > +   /*
> > +* When BG_GC selects victims based on age, it prevents node victims
> > +* from being selected. This is because node blocks can be invalidated
> > +* by moving data blocks.
> > +*/
> > +   if (is_skip(sbi, segno))
> > +   return UINT_MAX;
> > +
> > for (i = 0; i < usable_segs_per_sec; i++)
> > mtime += get_seg_entry(sbi, start + i)->mtime;
> > vblocks = get_valid_blocks(sbi, segno, true);
> > @@ -369,10 +377,27 @@ static inline unsigned int get_gc_cost(struct 
> > f2fs_sb_info *sbi,
> > return get_seg_entry(sbi, segno)->ckpt_valid_blocks;
> >  
> > /* alloc_mode == LFS */
> > -   if (p->gc_mode == GC_GREEDY)
> > -   return get_valid_blocks(sbi, segno, true);
> > -   else if (p->gc_mode == GC_CB)
> > +   if (p->gc_mode == GC_GREEDY) {
> > +   unsigned int weight = 0;
> > +   unsigned int no_need = sbi->need_node_clean ? 0 : 1;
> > +   bool is_node =
> > +   IS_NODESEG(get_seg_entry(sbi, segno)->type);

[f2fs-dev] [PATCH v3] f2fs: New victim selection for GC

2023-12-20 Thread Yonggil Song
Overview


This patch introduces a new way to preference data sections when selecting
GC victims. Migration of data blocks causes invalidation of node blocks.
Therefore, in situations where GC is frequent, selecting data blocks as
victims can reduce unnecessary block migration by invalidating node blocks.
For exceptional situations where free sections are insufficient, node blocks
are selected as victims instead of data blocks to get extra free sections.

Problem
===

If the total amount of nodes is larger than the size of one section, nodes
occupy multiple sections, and node victims are often selected because the
gc cost is lowered by data block migration in GC. Since moving the data
section causes frequent node victim selection, victim threshing occurs in
the node section. This results in an increase in WAF.

Experiment
==

Test environment is as follows.

System info
  - 3.6GHz, 16 core CPU
  - 36GiB Memory
Device info
  - a conventional null_blk with 228MiB
  - a sequential null_blk with 4068 zones of 8MiB
Format
  - mkfs.f2fs  -c  -m -Z 8 -o 3.89
Mount
  - mount  
Fio script
  - fio --rw=randwrite --bs=4k --ba=4k --filesize=31187m --norandommap 
--overwrite=1 --name=job1 --filename=./mnt/sustain --io_size=128g
WAF calculation
  - (IOs on conv. null_blk + IOs on seq. null_blk) / random write IOs

Conclusion
==

This experiment showed that the WAF was reduced by 29% (18.75 -> 13.3) when
the data section was selected first when selecting GC victims. This was
achieved by reducing the migration of the node blocks by 69.4%
(253,131,743 blks -> 77,463,278 blks). It is possible to achieve low WAF
performance with the GC victim selection method in environments where the
section size is relatively small.

Signed-off-by: Yonggil Song 
---
 fs/f2fs/f2fs.h |   1 +
 fs/f2fs/gc.c   | 102 +++--
 fs/f2fs/gc.h   |   6 +++
 3 files changed, 88 insertions(+), 21 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 9043cedfa12b..578d57f6022f 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1649,6 +1649,7 @@ struct f2fs_sb_info {
struct f2fs_mount_info mount_opt;   /* mount options */
 
/* for cleaning operations */
+   bool need_node_clean;   /* need to clean dirty nodes */
struct f2fs_rwsem gc_lock;  /*
 * semaphore for GC, avoid
 * race between GC and GC or CP
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index f550cdeaa663..da963765e087 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -341,6 +341,14 @@ static unsigned int get_cb_cost(struct f2fs_sb_info *sbi, 
unsigned int segno)
unsigned int i;
unsigned int usable_segs_per_sec = f2fs_usable_segs_in_sec(sbi, segno);
 
+   /*
+* When BG_GC selects victims based on age, it prevents node victims
+* from being selected. This is because node blocks can be invalidated
+* by moving data blocks.
+*/
+   if (is_skip(sbi, segno))
+   return UINT_MAX;
+
for (i = 0; i < usable_segs_per_sec; i++)
mtime += get_seg_entry(sbi, start + i)->mtime;
vblocks = get_valid_blocks(sbi, segno, true);
@@ -369,10 +377,27 @@ static inline unsigned int get_gc_cost(struct 
f2fs_sb_info *sbi,
return get_seg_entry(sbi, segno)->ckpt_valid_blocks;
 
/* alloc_mode == LFS */
-   if (p->gc_mode == GC_GREEDY)
-   return get_valid_blocks(sbi, segno, true);
-   else if (p->gc_mode == GC_CB)
+   if (p->gc_mode == GC_GREEDY) {
+   unsigned int weight = 0;
+   unsigned int no_need = sbi->need_node_clean ? 0 : 1;
+   bool is_node =
+   IS_NODESEG(get_seg_entry(sbi, segno)->type);
+
+   /*
+* If the data block that the node block pointed to is GCed,
+* the node block is invalidated. For this reason, we add a
+* weight to cost of node victims to give priority to data
+* victims during the gc process. However, in a situation
+* where we run out of free sections, we remove the weight
+* because we need to clean up node blocks.
+*/
+   weight = is_node ?
+   no_need * (sbi->blocks_per_seg * sbi->segs_per_sec) : 0;
+
+   return (get_valid_blocks(sbi, segno, true) + weight);
+   } else if (p->gc_mode == GC_CB) {
return get_cb_cost(sbi, segno);
+   }
 
f2fs_bug_on(sbi, 1);
return 0;
@@ -557,6 +582,14 @@ static void atgc_lookup_victim(struct f2fs_sb_info *sbi,
if (ve->mtime >= max_mtime || ve-

Re: [f2fs-dev] (2) [PATCH v2] f2fs: New victim selection for GC

2023-12-13 Thread Yonggil Song
> On 12/08, Yonggil Song wrote:
> > Overview
> > 
> > 
> > This patch introduces a new way to preference data sections when selecting
> > GC victims. Migration of data blocks causes invalidation of node blocks.
> > Therefore, in situations where GC is frequent, selecting data blocks as
> > victims can reduce unnecessary block migration by invalidating node blocks.
> > For exceptional situations where free sections are insufficient, node blocks
> > are selected as victims instead of data blocks to get extra free sections.
> > 
> > Problem
> > ===
> > 
> > If the total amount of nodes is larger than the size of one section, nodes
> > occupy multiple sections, and node victims are often selected because the
> > gc cost is lowered by data block migration in GC. Since moving the data
> > section causes frequent node victim selection, victim threshing occurs in
> > the node section. This results in an increase in WAF.
> > 
> > Experiment
> > ==
> > 
> > Test environment is as follows.
> > 
> > System info
> >   - 3.6GHz, 16 core CPU
> >   - 36GiB Memory
> > Device info
> >   - a conventional null_blk with 228MiB
> >   - a sequential null_blk with 4068 zones of 8MiB
> > Format
> >   - mkfs.f2fs  -c  -m -Z 8 -o 3.89
> > Mount
> >   - mount  
> > Fio script
> >   - fio --rw=randwrite --bs=4k --ba=4k --filesize=31187m --norandommap 
> > --overwrite=1 --name=job1 --filename=./mnt/sustain --io_size=128g
> > WAF calculation
> >   - (IOs on conv. null_blk + IOs on seq. null_blk) / random write IOs
> > 
> > Conclusion
> > ==
> > 
> > This experiment showed that the WAF was reduced by 29% (18.75 -> 13.3) when
> > the data section was selected first when selecting GC victims. This was
> > achieved by reducing the migration of the node blocks by 69.4%
> > (253,131,743 blks -> 77,463,278 blks). It is possible to achieve low WAF
> > performance with the GC victim selection method in environments where the
> > section size is relatively small.
> > 
> > Signed-off-by: Yonggil Song 
> > ---
> >  fs/f2fs/f2fs.h |  1 +
> >  fs/f2fs/gc.c   | 98 ++
> >  2 files changed, 77 insertions(+), 22 deletions(-)
> > 
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index 9043cedfa12b..578d57f6022f 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -1649,6 +1649,7 @@ struct f2fs_sb_info {
> > struct f2fs_mount_info mount_opt;   /* mount options */
> >  
> > /* for cleaning operations */
> > +   bool need_node_clean;   /* need to clean dirty nodes */
> > struct f2fs_rwsem gc_lock;  /*
> >  * semaphore for GC, avoid
> >  * race between GC and GC or CP
> > diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> > index f550cdeaa663..682dcf0de59e 100644
> > --- a/fs/f2fs/gc.c
> > +++ b/fs/f2fs/gc.c
> > @@ -368,6 +368,14 @@ static inline unsigned int get_gc_cost(struct 
> > f2fs_sb_info *sbi,
> > if (p->alloc_mode == SSR)
> > return get_seg_entry(sbi, segno)->ckpt_valid_blocks;
> >  
> > +   /*
> > +* If we don't need to clean dirty nodes,
> > +* we just skip node victims.
> > +*/
> > +   if (IS_NODESEG(get_seg_entry(sbi, segno)->type) &&
> > +   !sbi->need_node_clean)
> > +   return get_max_cost(sbi, p);
> 
> How about differentiating the gc cost between data vs. node by adding some
> weights? By default, data is preferred, while node is better in the worst 
> case?
> 

Okay, I will work on v3 with your comments

> > +
> > /* alloc_mode == LFS */
> > if (p->gc_mode == GC_GREEDY)
> > return get_valid_blocks(sbi, segno, true);
> > @@ -557,6 +565,14 @@ static void atgc_lookup_victim(struct f2fs_sb_info 
> > *sbi,
> > if (ve->mtime >= max_mtime || ve->mtime < min_mtime)
> > goto skip;
> >  
> > +   /*
> > +* If we don't need to clean dirty nodes,
> > +* we just skip node victims.
> > +*/
> > +   if (IS_NODESEG(get_seg_entry(sbi, ve->segno)->type) &&
> > +   !sbi->need_node_clean)
> > +   goto skip;
> > +
> > /* age = 1 * x% * 60 */
>

[f2fs-dev] [PATCH v2] f2fs: New victim selection for GC

2023-12-08 Thread Yonggil Song
Overview


This patch introduces a new way to preference data sections when selecting
GC victims. Migration of data blocks causes invalidation of node blocks.
Therefore, in situations where GC is frequent, selecting data blocks as
victims can reduce unnecessary block migration by invalidating node blocks.
For exceptional situations where free sections are insufficient, node blocks
are selected as victims instead of data blocks to get extra free sections.

Problem
===

If the total amount of nodes is larger than the size of one section, nodes
occupy multiple sections, and node victims are often selected because the
gc cost is lowered by data block migration in GC. Since moving the data
section causes frequent node victim selection, victim threshing occurs in
the node section. This results in an increase in WAF.

Experiment
==

Test environment is as follows.

System info
  - 3.6GHz, 16 core CPU
  - 36GiB Memory
Device info
  - a conventional null_blk with 228MiB
  - a sequential null_blk with 4068 zones of 8MiB
Format
  - mkfs.f2fs  -c  -m -Z 8 -o 3.89
Mount
  - mount  
Fio script
  - fio --rw=randwrite --bs=4k --ba=4k --filesize=31187m --norandommap 
--overwrite=1 --name=job1 --filename=./mnt/sustain --io_size=128g
WAF calculation
  - (IOs on conv. null_blk + IOs on seq. null_blk) / random write IOs

Conclusion
==

This experiment showed that the WAF was reduced by 29% (18.75 -> 13.3) when
the data section was selected first when selecting GC victims. This was
achieved by reducing the migration of the node blocks by 69.4%
(253,131,743 blks -> 77,463,278 blks). It is possible to achieve low WAF
performance with the GC victim selection method in environments where the
section size is relatively small.

Signed-off-by: Yonggil Song 
---
 fs/f2fs/f2fs.h |  1 +
 fs/f2fs/gc.c   | 98 ++
 2 files changed, 77 insertions(+), 22 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 9043cedfa12b..578d57f6022f 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1649,6 +1649,7 @@ struct f2fs_sb_info {
struct f2fs_mount_info mount_opt;   /* mount options */
 
/* for cleaning operations */
+   bool need_node_clean;   /* need to clean dirty nodes */
struct f2fs_rwsem gc_lock;  /*
 * semaphore for GC, avoid
 * race between GC and GC or CP
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index f550cdeaa663..682dcf0de59e 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -368,6 +368,14 @@ static inline unsigned int get_gc_cost(struct f2fs_sb_info 
*sbi,
if (p->alloc_mode == SSR)
return get_seg_entry(sbi, segno)->ckpt_valid_blocks;
 
+   /*
+* If we don't need to clean dirty nodes,
+* we just skip node victims.
+*/
+   if (IS_NODESEG(get_seg_entry(sbi, segno)->type) &&
+   !sbi->need_node_clean)
+   return get_max_cost(sbi, p);
+
/* alloc_mode == LFS */
if (p->gc_mode == GC_GREEDY)
return get_valid_blocks(sbi, segno, true);
@@ -557,6 +565,14 @@ static void atgc_lookup_victim(struct f2fs_sb_info *sbi,
if (ve->mtime >= max_mtime || ve->mtime < min_mtime)
goto skip;
 
+   /*
+* If we don't need to clean dirty nodes,
+* we just skip node victims.
+*/
+   if (IS_NODESEG(get_seg_entry(sbi, ve->segno)->type) &&
+   !sbi->need_node_clean)
+   goto skip;
+
/* age = 1 * x% * 60 */
age = div64_u64(accu * (max_mtime - ve->mtime), total_time) *
age_weight;
@@ -913,7 +929,21 @@ int f2fs_get_victim(struct f2fs_sb_info *sbi, unsigned int 
*result,
goto retry;
}
 
+
if (p.min_segno != NULL_SEGNO) {
+   if (sbi->need_node_clean &&
+   IS_DATASEG(get_seg_entry(sbi, p.min_segno)->type)) {
+/*
+ * we need to clean node sections.
+ * but, data victim cost is the lowest.
+ * if free sections are enough, stop cleaning node 
victim.
+ * if not, it goes on by GCing data victims.
+ */
+   if (has_enough_free_secs(sbi, prefree_segments(sbi), 
0)) {
+   p.min_segno = NULL_SEGNO;
+   goto out;
+   }
+   }
 got_it:
*result = (p.min_segno / p.ofs_unit) * p.ofs_unit;
 got_result:
@@ -1830,8 +1860,27 @@ int f2fs_gc(struct f2fs_s

Re: [f2fs-dev] (2) [PATCH v1] f2fs: New victim selection for GC

2023-11-28 Thread Yonggil Song
>Hi Yonggil,
>
>On 2023/10/26 17:18, Yonggil Song wrote:
>> Overview
>> 
>> 
>> Introduce a new way to select the data section first when selecting a
>> victim in foreground GC. This victim selection method works when the
>> prefer_data_victim mount option is enabled. If foreground GC migrates only
>> data sections and runs out of free sections, it cleans dirty node sections
>> to get more free sections.
>
>What about introducing parameter to adjust cost calculated by get_gc_cost()?
>
>Something like:
>
>get_gc_cost()
>
>   if (p->gc_mode == GC_GREEDY) {
>   vblocks = get_valid_blocks();
>   if (seg_type is data)
>   return vblocks * data_factor;
>   return vblock * node_factor;
>   }
>
>If we prefer to select data segment during fggc, we can config data/node factor
>as 1 and 512?
>
>Thoughts?
>
>Thanks,
>

Hi Chao.

I think that's a simpler way to do it.
I'll work on v2 with your idea.
Thanks for the feedback

>> 
>> Problem
>> ===
>> 
>> If the total amount of nodes is larger than the size of one section, nodes
>> occupy multiple sections, and node victims are often selected because the
>> gc cost is lowered by data block migration in foreground gc. Since moving
>> the data section causes frequent node victim selection, victim threshing
>> occurs in the node section. This results in an increase in WAF.
>> 
>> Experiment
>> ==
>> 
>> Test environment is as follows.
>> 
>>  System info
>>- 3.6GHz, 16 core CPU
>>- 36GiB Memory
>>  Device info
>>- a conventional null_blk with 228MiB
>>- a sequential null_blk with 4068 zones of 8MiB
>>  Format
>>- mkfs.f2fs  -c  -m -Z 8 -o 3.89
>>  Mount
>>- mount -o prefer_data_victim  
>>  Fio script
>>- fio --rw=randwrite --bs=4k --ba=4k --filesize=31187m --norandommap 
>> --overwrite=1 --name=job1 --filename=./mnt/sustain --io_size=128g
>>  WAF calculation
>>- (IOs on conv. null_blk + IOs on seq. null_blk) / random write IOs
>> 
>> Conclusion
>> ==
>> 
>> This experiment showed that the WAF was reduced by 29% (18.75 -> 13.3) when
>> the data section was selected first when selecting GC victims. This was
>> achieved by reducing the migration of the node blocks by 69.4%
>> (253,131,743 blks -> 77,463,278 blks). It is possible to achieve low WAF
>> performance with the GC victim selection method in environments where the
>> section size is relatively small.
>> 
>> Signed-off-by: Yonggil Song 
>> ---
>>   Documentation/filesystems/f2fs.rst |   3 +
>>   fs/f2fs/f2fs.h |   2 +
>>   fs/f2fs/gc.c   | 100 +++--
>>   fs/f2fs/segment.h  |   2 +
>>   fs/f2fs/super.c|   9 +++
>>   5 files changed, 95 insertions(+), 21 deletions(-)
>> 
>> diff --git a/Documentation/filesystems/f2fs.rst 
>> b/Documentation/filesystems/f2fs.rst
>> index d32c6209685d..58e6d001d7ab 100644
>> --- a/Documentation/filesystems/f2fs.rst
>> +++ b/Documentation/filesystems/f2fs.rst
>> @@ -367,6 +367,9 @@ errors=%s Specify f2fs behavior on 
>> critical errors. This supports modes:
>>   pending node write dropkeep
>> N/A
>>   pending meta write keepkeep
>> N/A
>>   == === === 
>> 
>> +prefer_data_victim   When selecting victims in foreground GC, victims of 
>> data type
>> + are prioritized. This option minimizes GC victim 
>> threshing
>> + in the node section to reduce WAF.
>>    
>> 
>>   
>>   Debugfs Entries
>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>> index 6d688e42d89c..8b31fa2ea09a 100644
>> --- a/fs/f2fs/f2fs.h
>> +++ b/fs/f2fs/f2fs.h
>> @@ -108,6 +108,7 @@ extern const char *f2fs_fault_name[FAULT_MAX];
>>   #defineF2FS_MOUNT_GC_MERGE 0x0200
>>   #define F2FS_MOUNT_COMPRESS_CACHE  0x0400
>>   #define F2FS_MOUNT_AGE_EXTENT_CACHE0x0800
>> +#define F2FS_MOUNT_PREFER_DATA_VICTIM   0x1000
>>   
>>   #define F2FS_OPTION(sbi)   ((sbi)->mount_opt)
>>   #define c

Re: [f2fs-dev] (2) (2) [PATCH v1] f2fs: New victim selection for GC

2023-11-28 Thread Yonggil Song
On 11/20, Yonggil Song wrote:
> > >Hi Yonggil,
> > >
> > >On 10/26, Yonggil Song wrote:
> > >> Overview
> > >> 
> > >> 
> > >> Introduce a new way to select the data section first when selecting a
> > >> victim in foreground GC. This victim selection method works when the
> > >> prefer_data_victim mount option is enabled. If foreground GC migrates 
> > >> only
> > >> data sections and runs out of free sections, it cleans dirty node 
> > >> sections
> > >> to get more free sections.
> > >> 
> > >> Problem
> > >> ===
> > >> 
> > >> If the total amount of nodes is larger than the size of one section, 
> > >> nodes
> > >> occupy multiple sections, and node victims are often selected because the
> > >> gc cost is lowered by data block migration in foreground gc. Since moving
> > >> the data section causes frequent node victim selection, victim threshing
> > >> occurs in the node section. This results in an increase in WAF.
> > >
> > >How does that work w/ ATGC?
> > >
> > 
> > Hi jaegeuk.
> > 
> > I didn't consider ATGC because this feature is only supported by zoned 
> > devices(LFS).
> > I didn't add ATGC exception handling because I'm only enabling this feature 
> > when
> > it's a zoned device, but should I?
> 
> I'm open to apply this to the existing flow in general. Can you take a look at
> that way?
> 

I see. I'll consider applying this feature to the general GC routine, including 
ATGC.

thanks.

> > 
> > >> 
> > >> Experiment
> > >> ==
> > >> 
> > >> Test environment is as follows.
> > >> 
> > >>  System info
> > >>- 3.6GHz, 16 core CPU
> > >>- 36GiB Memory
> > >>  Device info
> > >>- a conventional null_blk with 228MiB
> > >>- a sequential null_blk with 4068 zones of 8MiB
> > >>  Format
> > >>- mkfs.f2fs  -c  -m -Z 8 -o 3.89
> > >>  Mount
> > >>- mount -o prefer_data_victim  
> > >>  Fio script
> > >>- fio --rw=randwrite --bs=4k --ba=4k --filesize=31187m --norandommap 
> > >> --overwrite=1 --name=job1 --filename=./mnt/sustain --io_size=128g
> > >>  WAF calculation
> > >>- (IOs on conv. null_blk + IOs on seq. null_blk) / random write IOs
> > >> 
> > >> Conclusion
> > >> ==
> > >> 
> > >> This experiment showed that the WAF was reduced by 29% (18.75 -> 13.3) 
> > >> when
> > >> the data section was selected first when selecting GC victims. This was
> > >> achieved by reducing the migration of the node blocks by 69.4%
> > >> (253,131,743 blks -> 77,463,278 blks). It is possible to achieve low WAF
> > >> performance with the GC victim selection method in environments where the
> > >> section size is relatively small.
> > >> 
> > >> Signed-off-by: Yonggil Song 
> > >> ---
> > >>  Documentation/filesystems/f2fs.rst |   3 +
> > >>  fs/f2fs/f2fs.h |   2 +
> > >>  fs/f2fs/gc.c   | 100 +++--
> > >>  fs/f2fs/segment.h  |   2 +
> > >>  fs/f2fs/super.c|   9 +++
> > >>  5 files changed, 95 insertions(+), 21 deletions(-)
> > >> 
> > >> diff --git a/Documentation/filesystems/f2fs.rst 
> > >> b/Documentation/filesystems/f2fs.rst
> > >> index d32c6209685d..58e6d001d7ab 100644
> > >> --- a/Documentation/filesystems/f2fs.rst
> > >> +++ b/Documentation/filesystems/f2fs.rst
> > >> @@ -367,6 +367,9 @@ errors=%s Specify f2fs behavior on 
> > >> critical errors. This supports modes:
> > >>   pending node write dropkeep
> > >> N/A
> > >>   pending meta write keepkeep
> > >> N/A
> > >>   == === 
> > >> === 
> > >> +prefer_data_victim   When selecting victims in foreground GC, 
> > >> victims of data type
> > >> + are prioritized. This option minimizes GC 
> > >> victim threshing
> &g

Re: [f2fs-dev] (2) [PATCH v1] f2fs: New victim selection for GC

2023-11-20 Thread Yonggil Song
>Hi Yonggil,
>
>On 10/26, Yonggil Song wrote:
>> Overview
>> 
>> 
>> Introduce a new way to select the data section first when selecting a
>> victim in foreground GC. This victim selection method works when the
>> prefer_data_victim mount option is enabled. If foreground GC migrates only
>> data sections and runs out of free sections, it cleans dirty node sections
>> to get more free sections.
>> 
>> Problem
>> ===
>> 
>> If the total amount of nodes is larger than the size of one section, nodes
>> occupy multiple sections, and node victims are often selected because the
>> gc cost is lowered by data block migration in foreground gc. Since moving
>> the data section causes frequent node victim selection, victim threshing
>> occurs in the node section. This results in an increase in WAF.
>
>How does that work w/ ATGC?
>

Hi jaegeuk.

I didn't consider ATGC because this feature is only supported by zoned 
devices(LFS).
I didn't add ATGC exception handling because I'm only enabling this feature when
it's a zoned device, but should I?

>> 
>> Experiment
>> ==
>> 
>> Test environment is as follows.
>> 
>>  System info
>>- 3.6GHz, 16 core CPU
>>- 36GiB Memory
>>  Device info
>>- a conventional null_blk with 228MiB
>>- a sequential null_blk with 4068 zones of 8MiB
>>  Format
>>- mkfs.f2fs  -c  -m -Z 8 -o 3.89
>>  Mount
>>- mount -o prefer_data_victim  
>>  Fio script
>>- fio --rw=randwrite --bs=4k --ba=4k --filesize=31187m --norandommap 
>> --overwrite=1 --name=job1 --filename=./mnt/sustain --io_size=128g
>>  WAF calculation
>>- (IOs on conv. null_blk + IOs on seq. null_blk) / random write IOs
>> 
>> Conclusion
>> ==
>> 
>> This experiment showed that the WAF was reduced by 29% (18.75 -> 13.3) when
>> the data section was selected first when selecting GC victims. This was
>> achieved by reducing the migration of the node blocks by 69.4%
>> (253,131,743 blks -> 77,463,278 blks). It is possible to achieve low WAF
>> performance with the GC victim selection method in environments where the
>> section size is relatively small.
>> 
>> Signed-off-by: Yonggil Song 
>> ---
>>  Documentation/filesystems/f2fs.rst |   3 +
>>  fs/f2fs/f2fs.h |   2 +
>>  fs/f2fs/gc.c   | 100 +++--
>>  fs/f2fs/segment.h  |   2 +
>>  fs/f2fs/super.c|   9 +++
>>  5 files changed, 95 insertions(+), 21 deletions(-)
>> 
>> diff --git a/Documentation/filesystems/f2fs.rst 
>> b/Documentation/filesystems/f2fs.rst
>> index d32c6209685d..58e6d001d7ab 100644
>> --- a/Documentation/filesystems/f2fs.rst
>> +++ b/Documentation/filesystems/f2fs.rst
>> @@ -367,6 +367,9 @@ errors=%s Specify f2fs behavior on 
>> critical errors. This supports modes:
>>   pending node write dropkeep
>> N/A
>>   pending meta write keepkeep
>> N/A
>>   == === === 
>> 
>> +prefer_data_victim   When selecting victims in foreground GC, victims of 
>> data type
>> + are prioritized. This option minimizes GC victim 
>> threshing
>> + in the node section to reduce WAF.
>>   
>> 
>>  
>>  Debugfs Entries
>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>> index 6d688e42d89c..8b31fa2ea09a 100644
>> --- a/fs/f2fs/f2fs.h
>> +++ b/fs/f2fs/f2fs.h
>> @@ -108,6 +108,7 @@ extern const char *f2fs_fault_name[FAULT_MAX];
>>  #define F2FS_MOUNT_GC_MERGE 0x0200
>>  #define F2FS_MOUNT_COMPRESS_CACHE   0x0400
>>  #define F2FS_MOUNT_AGE_EXTENT_CACHE 0x0800
>> +#define F2FS_MOUNT_PREFER_DATA_VICTIM   0x1000
>>  
>>  #define F2FS_OPTION(sbi)((sbi)->mount_opt)
>>  #define clear_opt(sbi, option)  (F2FS_OPTION(sbi).opt &= 
>> ~F2FS_MOUNT_##option)
>> @@ -1648,6 +1649,7 @@ struct f2fs_sb_info {
>>  struct f2fs_mount_info mount_opt;   /* mount options */
>>  
>>  /* for cleaning operations */
>> +bool need_node_clean;   /* only used for 
>> prefer_data_victim */
>>  struct f2

[f2fs-dev] [PATCH v1] f2fs: New victim selection for GC

2023-10-26 Thread Yonggil Song
Overview


Introduce a new way to select the data section first when selecting a
victim in foreground GC. This victim selection method works when the
prefer_data_victim mount option is enabled. If foreground GC migrates only
data sections and runs out of free sections, it cleans dirty node sections
to get more free sections.

Problem
===

If the total amount of nodes is larger than the size of one section, nodes
occupy multiple sections, and node victims are often selected because the
gc cost is lowered by data block migration in foreground gc. Since moving
the data section causes frequent node victim selection, victim threshing
occurs in the node section. This results in an increase in WAF.

Experiment
==

Test environment is as follows.

System info
  - 3.6GHz, 16 core CPU
  - 36GiB Memory
Device info
  - a conventional null_blk with 228MiB
  - a sequential null_blk with 4068 zones of 8MiB
Format
  - mkfs.f2fs  -c  -m -Z 8 -o 3.89
Mount
  - mount -o prefer_data_victim  
Fio script
  - fio --rw=randwrite --bs=4k --ba=4k --filesize=31187m --norandommap 
--overwrite=1 --name=job1 --filename=./mnt/sustain --io_size=128g
WAF calculation
  - (IOs on conv. null_blk + IOs on seq. null_blk) / random write IOs

Conclusion
==

This experiment showed that the WAF was reduced by 29% (18.75 -> 13.3) when
the data section was selected first when selecting GC victims. This was
achieved by reducing the migration of the node blocks by 69.4%
(253,131,743 blks -> 77,463,278 blks). It is possible to achieve low WAF
performance with the GC victim selection method in environments where the
section size is relatively small.

Signed-off-by: Yonggil Song 
---
 Documentation/filesystems/f2fs.rst |   3 +
 fs/f2fs/f2fs.h |   2 +
 fs/f2fs/gc.c   | 100 +++--
 fs/f2fs/segment.h  |   2 +
 fs/f2fs/super.c|   9 +++
 5 files changed, 95 insertions(+), 21 deletions(-)

diff --git a/Documentation/filesystems/f2fs.rst 
b/Documentation/filesystems/f2fs.rst
index d32c6209685d..58e6d001d7ab 100644
--- a/Documentation/filesystems/f2fs.rst
+++ b/Documentation/filesystems/f2fs.rst
@@ -367,6 +367,9 @@ errors=%sSpecify f2fs behavior on critical 
errors. This supports modes:
 pending node write dropkeep
N/A
 pending meta write keepkeep
N/A
 == === === 

+prefer_data_victim  When selecting victims in foreground GC, victims of 
data type
+are prioritized. This option minimizes GC victim 
threshing
+in the node section to reduce WAF.
  

 
 Debugfs Entries
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 6d688e42d89c..8b31fa2ea09a 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -108,6 +108,7 @@ extern const char *f2fs_fault_name[FAULT_MAX];
 #defineF2FS_MOUNT_GC_MERGE 0x0200
 #define F2FS_MOUNT_COMPRESS_CACHE  0x0400
 #define F2FS_MOUNT_AGE_EXTENT_CACHE0x0800
+#define F2FS_MOUNT_PREFER_DATA_VICTIM  0x1000
 
 #define F2FS_OPTION(sbi)   ((sbi)->mount_opt)
 #define clear_opt(sbi, option) (F2FS_OPTION(sbi).opt &= ~F2FS_MOUNT_##option)
@@ -1648,6 +1649,7 @@ struct f2fs_sb_info {
struct f2fs_mount_info mount_opt;   /* mount options */
 
/* for cleaning operations */
+   bool need_node_clean;   /* only used for 
prefer_data_victim */
struct f2fs_rwsem gc_lock;  /*
 * semaphore for GC, avoid
 * race between GC and GC or CP
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index f550cdeaa663..8a2da808a5fb 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -752,6 +752,8 @@ int f2fs_get_victim(struct f2fs_sb_info *sbi, unsigned int 
*result,
unsigned int last_segment;
unsigned int nsearched;
bool is_atgc;
+   bool is_prefer_data_victim =
+   test_opt(sbi, PREFER_DATA_VICTIM) && gc_type == FG_GC;
int ret = 0;
 
mutex_lock(_i->seglist_lock);
@@ -767,6 +769,11 @@ int f2fs_get_victim(struct f2fs_sb_info *sbi, unsigned int 
*result,
p.oldest_age = 0;
p.min_cost = get_max_cost(sbi, );
 
+   if (is_prefer_data_victim) {
+   p.node_min_cost = p.min_cost;
+   p.node_min_segno = p.min_segno;
+   }
+
is_atgc = (p.gc_mode == GC_AT || p.alloc_mode == AT_SSR);
nsearched = 0;
 
@@ -884,9 +891,25 @@ int f2fs_get_victim(struct f2fs_sb_info *sbi, unsigned int 
*result,
 
  

[f2fs-dev] [PATCH v1] f2fs: New victim selection for GC

2023-10-12 Thread Yonggil Song


Overview


Introduce a new way to select the data section first when selecting a
victim in foreground GC. This victim selection method works when the
prefer_data_victim mount option is enabled. If foreground GC migrates only
data sections and runs out of free sections, it cleans dirty node sections
to get more free sections.

Problem
===

If the total amount of nodes is larger than the size of one section, nodes
occupy multiple sections, and node victims are often selected because the
gc cost is lowered by data block migration in foreground gc. Since moving
the data section causes frequent node victim selection, victim threshing
occurs in the node section. This results in an increase in WAF.

Experiment
==

Test environment is as follows.

System info
  - 3.6GHz, 16 core CPU
  - 36GiB Memory
Device info
  - a conventional null_blk with 228MiB
  - a sequential null_blk with 4068 zones of 8MiB
Format
  - mkfs.f2fs  -c  -m -Z 8 -o 3.89
Mount
  - mount -o prefer_data_victim  
Fio script
  - fio --rw=randwrite --bs=4k --ba=4k --filesize=31187m --norandommap 
--overwrite=1 --name=job1 --filename=./mnt/sustain --io_size=128g
WAF calculation
  - (IOs on conv. null_blk + IOs on seq. null_blk) / random write IOs

Conclusion
==

This experiment showed that the WAF was reduced by 29% (18.75 -> 13.3) when
the data section was selected first when selecting GC victims. This was
achieved by reducing the migration of the node blocks by 69.4%
(253,131,743 blks -> 77,463,278 blks). It is possible to achieve low WAF
performance with the GC victim selection method in environments where the
section size is relatively small.

Signed-off-by: Yonggil Song 
---
 Documentation/filesystems/f2fs.rst |   3 +
 fs/f2fs/f2fs.h |   2 +
 fs/f2fs/gc.c   | 100 +++--
 fs/f2fs/segment.h  |   2 +
 fs/f2fs/super.c|   9 +++
 5 files changed, 95 insertions(+), 21 deletions(-)

diff --git a/Documentation/filesystems/f2fs.rst 
b/Documentation/filesystems/f2fs.rst
index d32c6209685d..58e6d001d7ab 100644
--- a/Documentation/filesystems/f2fs.rst
+++ b/Documentation/filesystems/f2fs.rst
@@ -367,6 +367,9 @@ errors=%sSpecify f2fs behavior on critical 
errors. This supports modes:
 pending node write dropkeep
N/A
 pending meta write keepkeep
N/A
 == === === 

+prefer_data_victim  When selecting victims in foreground GC, victims of 
data type
+are prioritized. This option minimizes GC victim 
threshing
+in the node section to reduce WAF.
  

 
 Debugfs Entries
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 6d688e42d89c..8b31fa2ea09a 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -108,6 +108,7 @@ extern const char *f2fs_fault_name[FAULT_MAX];
 #defineF2FS_MOUNT_GC_MERGE 0x0200
 #define F2FS_MOUNT_COMPRESS_CACHE  0x0400
 #define F2FS_MOUNT_AGE_EXTENT_CACHE0x0800
+#define F2FS_MOUNT_PREFER_DATA_VICTIM  0x1000
 
 #define F2FS_OPTION(sbi)   ((sbi)->mount_opt)
 #define clear_opt(sbi, option) (F2FS_OPTION(sbi).opt &= ~F2FS_MOUNT_##option)
@@ -1648,6 +1649,7 @@ struct f2fs_sb_info {
struct f2fs_mount_info mount_opt;   /* mount options */
 
/* for cleaning operations */
+   bool need_node_clean;   /* only used for 
prefer_data_victim */
struct f2fs_rwsem gc_lock;  /*
 * semaphore for GC, avoid
 * race between GC and GC or CP
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index f550cdeaa663..8a2da808a5fb 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -752,6 +752,8 @@ int f2fs_get_victim(struct f2fs_sb_info *sbi, unsigned int 
*result,
unsigned int last_segment;
unsigned int nsearched;
bool is_atgc;
+   bool is_prefer_data_victim =
+   test_opt(sbi, PREFER_DATA_VICTIM) && gc_type == FG_GC;
int ret = 0;
 
mutex_lock(_i->seglist_lock);
@@ -767,6 +769,11 @@ int f2fs_get_victim(struct f2fs_sb_info *sbi, unsigned int 
*result,
p.oldest_age = 0;
p.min_cost = get_max_cost(sbi, );
 
+   if (is_prefer_data_victim) {
+   p.node_min_cost = p.min_cost;
+   p.node_min_segno = p.min_segno;
+   }
+
is_atgc = (p.gc_mode == GC_AT || p.alloc_mode == AT_SSR);
nsearched = 0;
 
@@ -884,9 +891,25 @@ int f2fs_get_victim(struct f2fs_sb_info *sbi, unsigned int 
*result,
 
  

[f2fs-dev] [PATCH v1] f2fs: Fix over-estimating free section during FG GC

2023-05-11 Thread Yonggil Song
There was a bug that finishing FG GC unconditionally because free sections
are over-estimated after checkpoint in FG GC.
This patch initializes sec_freed by every checkpoint in FG GC.

Signed-off-by: Yonggil Song 
---
 fs/f2fs/gc.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index d455140322a8..51d7e8d29bf1 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1797,7 +1797,7 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct 
f2fs_gc_control *gc_control)
 {
int gc_type = gc_control->init_gc_type;
unsigned int segno = gc_control->victim_segno;
-   int sec_freed = 0, seg_freed = 0, total_freed = 0;
+   int sec_freed = 0, seg_freed = 0, total_freed = 0, total_sec_freed = 0;
int ret = 0;
struct cp_control cpc;
struct gc_inode_list gc_list = {
@@ -1842,6 +1842,8 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct 
f2fs_gc_control *gc_control)
ret = f2fs_write_checkpoint(sbi, );
if (ret)
goto stop;
+   /* Reset due to checkpoint */
+   sec_freed = 0;
}
}
 
@@ -1866,15 +1868,17 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct 
f2fs_gc_control *gc_control)
gc_control->should_migrate_blocks);
total_freed += seg_freed;
 
-   if (seg_freed == f2fs_usable_segs_in_sec(sbi, segno))
+   if (seg_freed == f2fs_usable_segs_in_sec(sbi, segno)) {
sec_freed++;
+   total_sec_freed++;
+   }
 
if (gc_type == FG_GC) {
sbi->cur_victim_sec = NULL_SEGNO;
 
if (has_enough_free_secs(sbi, sec_freed, 0)) {
if (!gc_control->no_bg_gc &&
-   sec_freed < gc_control->nr_free_secs)
+   total_sec_freed < gc_control->nr_free_secs)
goto go_gc_more;
goto stop;
}
@@ -1901,6 +1905,8 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct 
f2fs_gc_control *gc_control)
ret = f2fs_write_checkpoint(sbi, );
if (ret)
goto stop;
+   /* Reset due to checkpoint */
+   sec_freed = 0;
}
 go_gc_more:
segno = NULL_SEGNO;
@@ -1913,7 +1919,7 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct 
f2fs_gc_control *gc_control)
if (gc_type == FG_GC)
f2fs_unpin_all_sections(sbi, true);
 
-   trace_f2fs_gc_end(sbi->sb, ret, total_freed, sec_freed,
+   trace_f2fs_gc_end(sbi->sb, ret, total_freed, total_sec_freed,
get_pages(sbi, F2FS_DIRTY_NODES),
get_pages(sbi, F2FS_DIRTY_DENTS),
get_pages(sbi, F2FS_DIRTY_IMETA),
@@ -1927,7 +1933,7 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct 
f2fs_gc_control *gc_control)
put_gc_inode(_list);
 
if (gc_control->err_gc_skipped && !ret)
-   ret = sec_freed ? 0 : -EAGAIN;
+   ret = total_sec_freed ? 0 : -EAGAIN;
return ret;
 }
 
-- 
2.34.1


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] (2) [PATCH] f2fs_io: support checkpoint command

2023-04-17 Thread Yonggil Song
On 04/17, Yonggil Song wrote:
>> >Fixed a xfstests failure.
>> >
>> >From 400c722c2117660b83190c88e5442d63fbbffe6e Mon Sep 17 00:00:00 2001
>> >From: Jaegeuk Kim 
>> >Date: Mon, 10 Apr 2023 14:48:50 -0700
>> >Subject: [PATCH] f2fs: refactor f2fs_gc to call checkpoint in urgent 
>> >condition
>> >
>> >The major change is to call checkpoint, if there's not enough space while 
>> >having
>> >some prefree segments in FG_GC case.
>> >
>> >Signed-off-by: Jaegeuk Kim 
>> >---
>> > fs/f2fs/gc.c | 27 +--
>> > 1 file changed, 13 insertions(+), 14 deletions(-)
>> >
>> >diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>> >index c748cdfb0501..ba5775dcade6 100644
>> >--- a/fs/f2fs/gc.c
>> >+++ b/fs/f2fs/gc.c
>> >@@ -1829,7 +1829,10 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct 
>> >f2fs_gc_control *gc_control)
>> >goto stop;
>> >}
>> > 
>> >-   if (gc_type == BG_GC && has_not_enough_free_secs(sbi, 0, 0)) {
>> >+   /* Let's run FG_GC, if we don't have enough space. */
>> >+   if (has_not_enough_free_secs(sbi, 0, 0)) {
>> >+   gc_type = FG_GC;
>> >+
>> 
>> Hi, Jaegeuk & Chao
>> 
>> Would it be possible to clarify if this patch is intended to perform 
>> checkpoint every gc round?
>
>Intention is to trigger checkpoint when there's not enough free space. So, it's
>not for every gc round.
>

Thanks for your reply.

When the file system is almost full, the victim’s valid blocks ratio is high.
Therefore, most gc rounds consume free sections to relocate victims.
So, free sections shrink and prefree remains after jumping to gc_more.
Wouldn’t this trigger a checkpoint every gc round?

Thanks.

>> 
>> Thanks.
>> 
>> >/*
>> > * For example, if there are many prefree_segments below given
>> > * threshold, we can make them free by checkpoint. Then, we
>> >@@ -1840,8 +1843,6 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct 
>> >f2fs_gc_control *gc_control)
>> >if (ret)
>> >goto stop;
>> >}
>> >-   if (has_not_enough_free_secs(sbi, 0, 0))
>> >-   gc_type = FG_GC;
>> >}
>> > 
>> >/* f2fs_balance_fs doesn't need to do BG_GC in critical path. */
>> >@@ -1868,19 +1869,15 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct 
>> >f2fs_gc_control *gc_control)
>> >if (seg_freed == f2fs_usable_segs_in_sec(sbi, segno))
>> >sec_freed++;
>> > 
>> >-   if (gc_type == FG_GC)
>> >+   if (gc_type == FG_GC) {
>> >sbi->cur_victim_sec = NULL_SEGNO;
>> > 
>> >-   if (gc_control->init_gc_type == FG_GC ||
>> >-   !has_not_enough_free_secs(sbi,
>> >-   (gc_type == FG_GC) ? sec_freed : 0, 0)) {
>> >-   if (gc_type == FG_GC && sec_freed < gc_control->nr_free_secs)
>> >-   goto go_gc_more;
>> >-   goto stop;
>> >-   }
>> >-
>> >-   /* FG_GC stops GC by skip_count */
>> >-   if (gc_type == FG_GC) {
>> >+   if (!has_not_enough_free_secs(sbi, sec_freed, 0)) {
>> >+   if (!gc_control->no_bg_gc &&
>> >+   sec_freed < gc_control->nr_free_secs)
>> >+   goto go_gc_more;
>> >+   goto stop;
>> >+   }
>> >if (sbi->skipped_gc_rwsem)
>> >skipped_round++;
>> >round++;
>> >@@ -1889,6 +1886,8 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct 
>> >f2fs_gc_control *gc_control)
>> >ret = f2fs_write_checkpoint(sbi, );
>> >goto stop;
>> >}
>> >+   } else if (!has_not_enough_free_secs(sbi, 0, 0)) {
>> >+   goto stop;
>> >}
>> > 
>> >__get_secs_required(sbi, NULL, _secs, NULL);
>> >-- 
>> >2.40.0.634.g4ca3ef3211-goog
>> >
>> >
>> >
>> >___
>> >Linux-f2fs-
>> mailing list
>> >Linux-f2fs-
>>@lists.sourceforge.net
>> >https://lists.sourceforge.net/lists/listinfo/linux-f2fs-
>>


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] f2fs_io: support checkpoint command

2023-04-16 Thread Yonggil Song
>Fixed a xfstests failure.
>
>From 400c722c2117660b83190c88e5442d63fbbffe6e Mon Sep 17 00:00:00 2001
>From: Jaegeuk Kim 
>Date: Mon, 10 Apr 2023 14:48:50 -0700
>Subject: [PATCH] f2fs: refactor f2fs_gc to call checkpoint in urgent condition
>
>The major change is to call checkpoint, if there's not enough space while 
>having
>some prefree segments in FG_GC case.
>
>Signed-off-by: Jaegeuk Kim 
>---
> fs/f2fs/gc.c | 27 +--
> 1 file changed, 13 insertions(+), 14 deletions(-)
>
>diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>index c748cdfb0501..ba5775dcade6 100644
>--- a/fs/f2fs/gc.c
>+++ b/fs/f2fs/gc.c
>@@ -1829,7 +1829,10 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct 
>f2fs_gc_control *gc_control)
>   goto stop;
>   }
> 
>-  if (gc_type == BG_GC && has_not_enough_free_secs(sbi, 0, 0)) {
>+  /* Let's run FG_GC, if we don't have enough space. */
>+  if (has_not_enough_free_secs(sbi, 0, 0)) {
>+  gc_type = FG_GC;
>+

Hi, Jaegeuk & Chao

Would it be possible to clarify if this patch is intended to perform checkpoint 
every gc round?

Thanks.

>   /*
>* For example, if there are many prefree_segments below given
>* threshold, we can make them free by checkpoint. Then, we
>@@ -1840,8 +1843,6 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct 
>f2fs_gc_control *gc_control)
>   if (ret)
>   goto stop;
>   }
>-  if (has_not_enough_free_secs(sbi, 0, 0))
>-  gc_type = FG_GC;
>   }
> 
>   /* f2fs_balance_fs doesn't need to do BG_GC in critical path. */
>@@ -1868,19 +1869,15 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct 
>f2fs_gc_control *gc_control)
>   if (seg_freed == f2fs_usable_segs_in_sec(sbi, segno))
>   sec_freed++;
> 
>-  if (gc_type == FG_GC)
>+  if (gc_type == FG_GC) {
>   sbi->cur_victim_sec = NULL_SEGNO;
> 
>-  if (gc_control->init_gc_type == FG_GC ||
>-  !has_not_enough_free_secs(sbi,
>-  (gc_type == FG_GC) ? sec_freed : 0, 0)) {
>-  if (gc_type == FG_GC && sec_freed < gc_control->nr_free_secs)
>-  goto go_gc_more;
>-  goto stop;
>-  }
>-
>-  /* FG_GC stops GC by skip_count */
>-  if (gc_type == FG_GC) {
>+  if (!has_not_enough_free_secs(sbi, sec_freed, 0)) {
>+  if (!gc_control->no_bg_gc &&
>+  sec_freed < gc_control->nr_free_secs)
>+  goto go_gc_more;
>+  goto stop;
>+  }
>   if (sbi->skipped_gc_rwsem)
>   skipped_round++;
>   round++;
>@@ -1889,6 +1886,8 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct 
>f2fs_gc_control *gc_control)
>   ret = f2fs_write_checkpoint(sbi, );
>   goto stop;
>   }
>+  } else if (!has_not_enough_free_secs(sbi, 0, 0)) {
>+  goto stop;
>   }
> 
>   __get_secs_required(sbi, NULL, _secs, NULL);
>-- 
>2.40.0.634.g4ca3ef3211-goog
>
>
>
>___
>Linux-f2fs-devel mailing list
>Linux-f2fs-devel@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v2] mkfs.f2fs: Introduce configurable reserved sections

2023-04-04 Thread Yonggil Song
Overview


This option allows zoned block device users to configure GC reserved and
overprovision area manually according to their demands on performance of
sustained write latency and WAF.

Problem
===

The overprovision segments that mkfs generates are mostly occupied by GC
reserved. This degrades WAF performance.

Experiment
==

The following experiment evaluated the application of configurable reserved.
The experimental environment is as follows.

  System info
- 4.2Ghz, 8 core CPU
- 64GiB Memory
  Device info
- a conventional null_blk with 448MiB capacity(meta area) and
- a sequential null_blk with 953 zones of 64MiB
  Format
- as-is (find out ovp ratio): mkfs.f2fs  -c  -m
Info: Overprovision ratio = 3.700%
Info: Overprovision segments = 1152 (GC reserved = 1088)
- config rsvd: mkfs.f2fs  -c  -m -Z 8 -o 2.965
Info: Overprovision ratio = 2.965%
Info: Overprovision segments = 1152 (GC reserved = 256)
  Mount
- mount  
  Fio script
- fio --rw=randwrite --bs=4k --ba=4k --filesize=58630m --norandommap 
--overwrite=1 --name=job1 --filename=/sustain --time_based 
--runtime=2h
  WAF calculation
- (IOs on conv. null_blk + IOs on seq. null_blk) / random write IOs

Conclusion
==

In the experiment, it can be shown that reducing the reserved segments
decreases WAF to 10% (from 222 to 23) although it triggers checkpoint more
frequently during gc. With direct IO, the WAF of as-is gets much higher.
In other words, a user can configure more reserved segments for lower GC
latency or allocate less reserved segments for lower WAF on the same number
of OP segments.

Signed-off-by: Yonggil Song 
---
 include/f2fs_fs.h   | 22 --
 lib/libf2fs.c   | 18 ++
 man/mkfs.f2fs.8 | 12 
 mkfs/f2fs_format.c  | 30 --
 mkfs/f2fs_format_main.c | 11 ++-
 5 files changed, 84 insertions(+), 9 deletions(-)

diff --git a/include/f2fs_fs.h b/include/f2fs_fs.h
index 333ae07a5ebd..5d74dcc0dc11 100644
--- a/include/f2fs_fs.h
+++ b/include/f2fs_fs.h
@@ -375,6 +375,10 @@ static inline uint64_t bswap_64(uint64_t val)
 
 #define LPF "lost+found"
 
+/* one for gc buffer, the other for node */
+#define MIN_RSVD_SECS  (NR_CURSEG_TYPE + 2U)
+#define CONFIG_RSVD_DEFAULT_OP_RATIO   3.0
+
 enum f2fs_config_func {
MKFS,
FSCK,
@@ -460,6 +464,7 @@ typedef struct {
 #define ALIGN_UP(addrs, size)  ALIGN_DOWN(((addrs) + (size) - 1), (size))
 
 struct f2fs_configuration {
+   uint32_t conf_reserved_sections;
uint32_t reserved_segments;
uint32_t new_reserved_segments;
int sparse_mode;
@@ -1614,6 +1619,20 @@ extern uint32_t f2fs_get_usable_segments(struct 
f2fs_super_block *sb);
 #define ZONE_ALIGN(blks)   SIZE_ALIGN(blks, c.blks_per_seg * \
c.segs_per_zone)
 
+static inline double get_reserved(struct f2fs_super_block *sb, double ovp)
+{
+   double reserved;
+   uint32_t usable_main_segs = f2fs_get_usable_segments(sb);
+   uint32_t segs_per_sec = round_up(usable_main_segs, 
get_sb(section_count));
+
+   if (c.conf_reserved_sections)
+   reserved = c.conf_reserved_sections * segs_per_sec;
+   else
+   reserved = (100 / ovp + 1 + NR_CURSEG_TYPE) * segs_per_sec;
+
+   return reserved;
+}
+
 static inline double get_best_overprovision(struct f2fs_super_block *sb)
 {
double reserved, ovp, candidate, end, diff, space;
@@ -1631,8 +1650,7 @@ static inline double get_best_overprovision(struct 
f2fs_super_block *sb)
}
 
for (; candidate <= end; candidate += diff) {
-   reserved = (100 / candidate + 1 + NR_CURSEG_TYPE) *
-   round_up(usable_main_segs, 
get_sb(section_count));
+   reserved = get_reserved(sb, candidate);
ovp = (usable_main_segs - reserved) * candidate / 100;
if (ovp < 0)
continue;
diff --git a/lib/libf2fs.c b/lib/libf2fs.c
index f63307a42a08..8dcc33bda0b5 100644
--- a/lib/libf2fs.c
+++ b/lib/libf2fs.c
@@ -1069,6 +1069,24 @@ int get_device_info(int i)
dev->nr_rnd_zones);
MSG(0, "  %zu blocks per zone\n",
dev->zone_blocks);
+
+   if (c.conf_reserved_sections) {
+   if (c.conf_reserved_sections < MIN_RSVD_SECS) {
+   MSG(0, "  Too small sections are 
reserved(%u secs)\n",
+   c.conf_reserved_sections);
+   c.conf_reserved_sections = MIN_RSVD_SECS;
+   MSG(0, "  It is operated as a minimum 
reserved sections(%u secs)\n",
+   c.conf_reserved_sections);
+   } 

Re: [f2fs-dev] (2) [PATCH v1] mkfs.f2fs: Introduce configurable reserved sections

2023-03-28 Thread Yonggil Song
On 03/17, Yonggil Song wrote:
>> Overview
>> 
>> 
>> This option allows zoned block device users to configure GC reserved and
>> overprovision area manually according to their demands on performance of
>> sustained write latency and WAF.
>> 
>> Problem
>> ===
>> 
>> The overprovision segments that mkfs generates are mostly occupied by GC
>> reserved. This degrades WAF performance.
>> 
>> Experiment
>> ==
>> 
>> The following experiment evaluated the application of configurable reserved.
>> The experimental environment is as follows.
>> 
>>   System info
>> - 4.2Ghz, 8 core CPU
>> - 64GiB Memory
>>   Device info
>> - a conventional null_blk with 448MiB capacity(meta area) and
>> - a sequential null_blk with 953 zones of 64MiB
>>   Format
>> - as-is (find out ovp ratio): mkfs.f2fs  -c > null_blk> -m
>> Info: Overprovision ratio = 3.700%
>> Info: Overprovision segments = 1152 (GC reserved = 1088)
>> - config rsvd: mkfs.f2fs  -c  -m 8 -o 2.965
>> Info: Overprovision ratio = 2.965%
>> Info: Overprovision segments = 1152 (GC reserved = 256)
>>   Mount
>> - mount  
>>   Fio script
>> - fio --rw=randwrite --bs=4k --ba=4k --filesize=58630m --norandommap 
>> --overwrite=1 --name=job1 --filename=/sustain --time_based 
>> --runtime=2h
>>   WAF calculation
>> - (IOs on conv. null_blk + IOs on seq. null_blk) / random write IOs
>> 
>> Conclusion
>> ==
>> 
>> In the experiment, it can be shown that reducing the reserved segments
>> decreases WAF to 10% (from 222 to 23) although it triggers checkpoint more
>> frequently during gc. With direct IO, the WAF of as-is gets much higher.
>> In other words, a user can configure more reserved segments for lower GC
>> latency or allocate less reserved segments for lower WAF on the same number
>> of OP segments.
>> 
>> Signed-off-by: Yonggil Song 
>> ---
>>  include/f2fs_fs.h   | 22 --
>>  lib/libf2fs.c   | 22 ++
>>  man/mkfs.f2fs.8 |  9 +++--
>>  mkfs/f2fs_format.c  | 29 +++--
>>  mkfs/f2fs_format_main.c |  5 +++--
>>  5 files changed, 75 insertions(+), 12 deletions(-)
>> 
>> diff --git a/include/f2fs_fs.h b/include/f2fs_fs.h
>> index 333ae07a5ebd..1d41e9f8397e 100644
>> --- a/include/f2fs_fs.h
>> +++ b/include/f2fs_fs.h
>> @@ -375,6 +375,10 @@ static inline uint64_t bswap_64(uint64_t val)
>>  
>>  #define LPF "lost+found"
>>  
>> +/* one for gc buffer, the other for node */
>> +#define MIN_RSVD_SECS   (uint32_t)(NR_CURSEG_TYPE + 2)
>> +#define CONFIG_RSVD_DEFAULT_OP_RATIO3.0
>> +
>>  enum f2fs_config_func {
>>  MKFS,
>>  FSCK,
>> @@ -460,6 +464,7 @@ typedef struct {
>>  #define ALIGN_UP(addrs, size)   ALIGN_DOWN(((addrs) + (size) - 1), 
>> (size))
>>  
>>  struct f2fs_configuration {
>> +uint32_t conf_reserved_sections;
>>  uint32_t reserved_segments;
>>  uint32_t new_reserved_segments;
>>  int sparse_mode;
>> @@ -1614,6 +1619,20 @@ extern uint32_t f2fs_get_usable_segments(struct 
>> f2fs_super_block *sb);
>>  #define ZONE_ALIGN(blks)SIZE_ALIGN(blks, c.blks_per_seg * \
>>  c.segs_per_zone)
>>  
>> +static inline double get_reserved(struct f2fs_super_block *sb, double ovp)
>> +{
>> +double reserved;
>> +uint32_t usable_main_segs = f2fs_get_usable_segments(sb);
>> +uint32_t segs_per_sec = round_up(usable_main_segs, 
>> get_sb(section_count));
>> +
>> +if (c.conf_reserved_sections)
>> +reserved = c.conf_reserved_sections * segs_per_sec;
>> +else
>> +reserved = (100 / ovp + 1 + NR_CURSEG_TYPE) * segs_per_sec;
>> +
>> +return reserved;
>> +}
>> +
>>  static inline double get_best_overprovision(struct f2fs_super_block *sb)
>>  {
>>  double reserved, ovp, candidate, end, diff, space;
>> @@ -1631,8 +1650,7 @@ static inline double get_best_overprovision(struct 
>> f2fs_super_block *sb)
>>  }
>>  
>>  for (; candidate <= end; candidate += diff) {
>> -reserved = (100 / candidate + 1 + NR_CURSEG_TYPE) *
>> -round_up(usable_main_segs, 
>> get_sb(section_count));
>> +reserved = get_r

[f2fs-dev] [PATCH v2] f2fs: Fix system crash due to lack of free space in LFS

2023-03-20 Thread Yonggil Song
When f2fs tries to checkpoint during foreground gc in LFS mode, system
crash occurs due to lack of free space if the amount of dirty node and
dentry pages generated by data migration exceeds free space.
The reproduction sequence is as follows.

 - 20GiB capacity block device (null_blk)
 - format and mount with LFS mode
 - create a file and write 20,000MiB
 - 4k random write on full range of the file

 RIP: 0010:new_curseg+0x48a/0x510 [f2fs]
 Code: 55 e7 f5 89 c0 48 0f af c3 48 8b 5d c0 48 c1 e8 20 83 c0 01 89 43 6c 48 
83 c4 28 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc <0f> 0b f0 41 80 4f 48 04 
45 85 f6 0f 84 ba fd ff ff e9 ef fe ff ff
 RSP: 0018:977bc397b218 EFLAGS: 00010246
 RAX: 27b9 RBX:  RCX: 27c0
 RDX:  RSI: 27b9 RDI: 8c25ab4e74f8
 RBP: 977bc397b268 R08: 27b9 R09: 8c29e4a34b40
 R10: 0001 R11: 977bc397b0d8 R12: 
 R13: 8c25b4dd81a0 R14:  R15: 8c2f667f9000
 FS: () GS:8c344ec8() knlGS:
 CS: 0010 DS:  ES:  CR0: 80050033
 CR2: 00c00055d000 CR3: 000e30810003 CR4: 003706e0
 DR0:  DR1:  DR2: 
 DR3:  DR6: fffe0ff0 DR7: 0400
 Call Trace:
 
 allocate_segment_by_default+0x9c/0x110 [f2fs]
 f2fs_allocate_data_block+0x243/0xa30 [f2fs]
 ? __mod_lruvec_page_state+0xa0/0x150
 do_write_page+0x80/0x160 [f2fs]
 f2fs_do_write_node_page+0x32/0x50 [f2fs]
 __write_node_page+0x339/0x730 [f2fs]
 f2fs_sync_node_pages+0x5a6/0x780 [f2fs]
 block_operations+0x257/0x340 [f2fs]
 f2fs_write_checkpoint+0x102/0x1050 [f2fs]
 f2fs_gc+0x27c/0x630 [f2fs]
 ? folio_mark_dirty+0x36/0x70
 f2fs_balance_fs+0x16f/0x180 [f2fs]

This patch adds checking whether free sections are enough before checkpoint
during gc.

Signed-off-by: Yonggil Song 
---
 fs/f2fs/gc.c  | 10 --
 fs/f2fs/gc.h  |  2 ++
 fs/f2fs/segment.h | 27 ++-
 3 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 4546e01b2ee0..dd563866d3c9 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1773,6 +1773,7 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct 
f2fs_gc_control *gc_control)
.iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS),
};
unsigned int skipped_round = 0, round = 0;
+   unsigned int need_lower = 0, need_upper = 0;
 
trace_f2fs_gc_begin(sbi->sb, gc_type, gc_control->no_bg_gc,
gc_control->nr_free_secs,
@@ -1858,8 +1859,13 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct 
f2fs_gc_control *gc_control)
}
}
 
-   /* Write checkpoint to reclaim prefree segments */
-   if (free_sections(sbi) < NR_CURSEG_PERSIST_TYPE &&
+   ret = get_need_secs(sbi, _lower, _upper);
+
+   /*
+* Write checkpoint to reclaim prefree segments.
+* We need more three extra sections for writer's data/node/dentry.
+*/
+   if (free_sections(sbi) <= need_upper + NR_GC_CHECKPOINT_SECS &&
prefree_segments(sbi)) {
ret = f2fs_write_checkpoint(sbi, );
if (ret)
diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h
index 19b956c2d697..e81d22bf3772 100644
--- a/fs/f2fs/gc.h
+++ b/fs/f2fs/gc.h
@@ -30,6 +30,8 @@
 /* Search max. number of dirty segments to select a victim segment */
 #define DEF_MAX_VICTIM_SEARCH 4096 /* covers 8GB */
 
+#define NR_GC_CHECKPOINT_SECS (3)  /* data/node/dentry sections */
+
 struct f2fs_gc_kthread {
struct task_struct *f2fs_gc_task;
wait_queue_head_t gc_wait_queue_head;
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index be8f2d7d007b..52a6d1ed4f24 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -605,8 +605,12 @@ static inline bool has_curseg_enough_space(struct 
f2fs_sb_info *sbi,
return true;
 }
 
-static inline bool has_not_enough_free_secs(struct f2fs_sb_info *sbi,
-   int freed, int needed)
+/*
+ * calculate needed sections for dirty node/dentry
+ * and call has_curseg_enough_space
+ */
+static inline bool get_need_secs(struct f2fs_sb_info *sbi,
+ unsigned int *lower, unsigned int *upper)
 {
unsigned int total_node_blocks = get_pages(sbi, F2FS_DIRTY_NODES) +
get_pages(sbi, F2FS_DIRTY_DENTS) +
@@ -616,20 +620,33 @@ static inline bool has_not_enough_free_secs(struct 
f2fs_sb_info *sbi,
unsigned int dent_secs = total_dent_blocks / CAP_BLKS_PER_SEC(sbi);
unsigned int node_blocks = total_node_blocks % CAP_BLKS_PER_SEC(sbi);
unsigned int dent_blocks = total_dent_blocks % CAP_BLKS_PER_SEC(sbi);
+
+   *lower = node_secs + dent_secs;
+   *upper = *lower + (node_blocks ? 1 : 0) + (d

[f2fs-dev] [PATCH v1] mkfs.f2fs: Introduce configurable reserved sections

2023-03-16 Thread Yonggil Song
Overview


This option allows zoned block device users to configure GC reserved and
overprovision area manually according to their demands on performance of
sustained write latency and WAF.

Problem
===

The overprovision segments that mkfs generates are mostly occupied by GC
reserved. This degrades WAF performance.

Experiment
==

The following experiment evaluated the application of configurable reserved.
The experimental environment is as follows.

  System info
- 4.2Ghz, 8 core CPU
- 64GiB Memory
  Device info
- a conventional null_blk with 448MiB capacity(meta area) and
- a sequential null_blk with 953 zones of 64MiB
  Format
- as-is (find out ovp ratio): mkfs.f2fs  -c  -m
Info: Overprovision ratio = 3.700%
Info: Overprovision segments = 1152 (GC reserved = 1088)
- config rsvd: mkfs.f2fs  -c  -m 8 -o 2.965
Info: Overprovision ratio = 2.965%
Info: Overprovision segments = 1152 (GC reserved = 256)
  Mount
- mount  
  Fio script
- fio --rw=randwrite --bs=4k --ba=4k --filesize=58630m --norandommap 
--overwrite=1 --name=job1 --filename=/sustain --time_based 
--runtime=2h
  WAF calculation
- (IOs on conv. null_blk + IOs on seq. null_blk) / random write IOs

Conclusion
==

In the experiment, it can be shown that reducing the reserved segments
decreases WAF to 10% (from 222 to 23) although it triggers checkpoint more
frequently during gc. With direct IO, the WAF of as-is gets much higher.
In other words, a user can configure more reserved segments for lower GC
latency or allocate less reserved segments for lower WAF on the same number
of OP segments.

Signed-off-by: Yonggil Song 
---
 include/f2fs_fs.h   | 22 --
 lib/libf2fs.c   | 22 ++
 man/mkfs.f2fs.8 |  9 +++--
 mkfs/f2fs_format.c  | 29 +++--
 mkfs/f2fs_format_main.c |  5 +++--
 5 files changed, 75 insertions(+), 12 deletions(-)

diff --git a/include/f2fs_fs.h b/include/f2fs_fs.h
index 333ae07a5ebd..1d41e9f8397e 100644
--- a/include/f2fs_fs.h
+++ b/include/f2fs_fs.h
@@ -375,6 +375,10 @@ static inline uint64_t bswap_64(uint64_t val)
 
 #define LPF "lost+found"
 
+/* one for gc buffer, the other for node */
+#define MIN_RSVD_SECS  (uint32_t)(NR_CURSEG_TYPE + 2)
+#define CONFIG_RSVD_DEFAULT_OP_RATIO   3.0
+
 enum f2fs_config_func {
MKFS,
FSCK,
@@ -460,6 +464,7 @@ typedef struct {
 #define ALIGN_UP(addrs, size)  ALIGN_DOWN(((addrs) + (size) - 1), (size))
 
 struct f2fs_configuration {
+   uint32_t conf_reserved_sections;
uint32_t reserved_segments;
uint32_t new_reserved_segments;
int sparse_mode;
@@ -1614,6 +1619,20 @@ extern uint32_t f2fs_get_usable_segments(struct 
f2fs_super_block *sb);
 #define ZONE_ALIGN(blks)   SIZE_ALIGN(blks, c.blks_per_seg * \
c.segs_per_zone)
 
+static inline double get_reserved(struct f2fs_super_block *sb, double ovp)
+{
+   double reserved;
+   uint32_t usable_main_segs = f2fs_get_usable_segments(sb);
+   uint32_t segs_per_sec = round_up(usable_main_segs, 
get_sb(section_count));
+
+   if (c.conf_reserved_sections)
+   reserved = c.conf_reserved_sections * segs_per_sec;
+   else
+   reserved = (100 / ovp + 1 + NR_CURSEG_TYPE) * segs_per_sec;
+
+   return reserved;
+}
+
 static inline double get_best_overprovision(struct f2fs_super_block *sb)
 {
double reserved, ovp, candidate, end, diff, space;
@@ -1631,8 +1650,7 @@ static inline double get_best_overprovision(struct 
f2fs_super_block *sb)
}
 
for (; candidate <= end; candidate += diff) {
-   reserved = (100 / candidate + 1 + NR_CURSEG_TYPE) *
-   round_up(usable_main_segs, 
get_sb(section_count));
+   reserved = get_reserved(sb, candidate);
ovp = (usable_main_segs - reserved) * candidate / 100;
if (ovp < 0)
continue;
diff --git a/lib/libf2fs.c b/lib/libf2fs.c
index f63307a42a08..b5644ff6ebdd 100644
--- a/lib/libf2fs.c
+++ b/lib/libf2fs.c
@@ -1069,6 +1069,28 @@ int get_device_info(int i)
dev->nr_rnd_zones);
MSG(0, "  %zu blocks per zone\n",
dev->zone_blocks);
+   if (c.conf_reserved_sections) {
+   if (c.conf_reserved_sections < MIN_RSVD_SECS) {
+   MSG(0, "  Too small sections are 
reserved(%u secs)\n",
+   c.conf_reserved_sections);
+   c.conf_reserved_sections =
+   max(c.conf_reserved_sections, 
MIN_RSVD_SECS);
+   MSG(0, "  It is operated as a minimum 
reserved sections(%u secs)\n",
+

Re: [f2fs-dev] (2) [PATCH v1] f2fs: Fix system crash due to lack of free space in LFS

2023-03-16 Thread Yonggil Song
>On 03/14, Yonggil Song wrote:
>> When f2fs tries to checkpoint during foreground gc in LFS mode, system
>> crash occurs due to lack of free space if the amount of dirty node and
>> dentry pages generated by data migration exceeds free space.
>> The reproduction sequence is as follows.
>> 
>>  - 20GiB capacity block device (null_blk)
>>  - format and mount with LFS mode
>>  - create a file and write 20,000MiB
>>  - 4k random write on full range of the file
>> 
>>  RIP: 0010:new_curseg+0x48a/0x510 [f2fs]
>>  Code: 55 e7 f5 89 c0 48 0f af c3 48 8b 5d c0 48 c1 e8 20 83 c0 01 89 43 6c 
>> 48 83 c4 28 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc <0f> 0b f0 41 80 4f 
>> 48 04 45 85 f6 0f 84 ba fd ff ff e9 ef fe ff ff
>>  RSP: 0018:977bc397b218 EFLAGS: 00010246
>>  RAX: 27b9 RBX:  RCX: 27c0
>>  RDX:  RSI: 27b9 RDI: 8c25ab4e74f8
>>  RBP: 977bc397b268 R08: 27b9 R09: 8c29e4a34b40
>>  R10: 0001 R11: 977bc397b0d8 R12: 
>>  R13: 8c25b4dd81a0 R14:  R15: 8c2f667f9000
>>  FS: () GS:8c344ec8() knlGS:
>>  CS: 0010 DS:  ES:  CR0: 80050033
>>  CR2: 00c00055d000 CR3: 000e30810003 CR4: 003706e0
>>  DR0:  DR1:  DR2: 
>>  DR3:  DR6: fffe0ff0 DR7: 0400
>>  Call Trace:
>>  
>>  allocate_segment_by_default+0x9c/0x110 [f2fs]
>>  f2fs_allocate_data_block+0x243/0xa30 [f2fs]
>>  ? __mod_lruvec_page_state+0xa0/0x150
>>  do_write_page+0x80/0x160 [f2fs]
>>  f2fs_do_write_node_page+0x32/0x50 [f2fs]
>>  __write_node_page+0x339/0x730 [f2fs]
>>  f2fs_sync_node_pages+0x5a6/0x780 [f2fs]
>>  block_operations+0x257/0x340 [f2fs]
>>  f2fs_write_checkpoint+0x102/0x1050 [f2fs]
>>  f2fs_gc+0x27c/0x630 [f2fs]
>>  ? folio_mark_dirty+0x36/0x70
>>  f2fs_balance_fs+0x16f/0x180 [f2fs]
>> 
>> This patch adds checking whether free sections are enough before checkpoint
>> during gc.
>> 
>> Signed-off-by: Yonggil Song 
>> ---
>>  fs/f2fs/gc.c  |  7 ++-
>>  fs/f2fs/segment.h | 26 +-
>>  2 files changed, 27 insertions(+), 6 deletions(-)
>> 
>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>> index 4546e01b2ee0..b22f49a6f128 100644
>> --- a/fs/f2fs/gc.c
>> +++ b/fs/f2fs/gc.c
>> @@ -1773,6 +1773,7 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct 
>> f2fs_gc_control *gc_control)
>>  .iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS),
>>  };
>>  unsigned int skipped_round = 0, round = 0;
>> +unsigned int nr_needed_secs = 0, node_blocks = 0, dent_blocks = 0;
>>  
>>  trace_f2fs_gc_begin(sbi->sb, gc_type, gc_control->no_bg_gc,
>>  gc_control->nr_free_secs,
>> @@ -1858,8 +1859,12 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct 
>> f2fs_gc_control *gc_control)
>>  }
>>  }
>>  
>> +/* need more three extra sections for writer's data/node/dentry */
>> +nr_needed_secs = get_min_need_secs(sbi, _blocks, _blocks) + 3;
>
>   get_min_need_secs(, )
>   {
>   ...
>
>   *lower = node_secs + dent_secs;
>   *upper = *lower + (node_blocks ? 1 : 0) + (dent_blocks ? 1 : 0);
>   }
>
>> +nr_needed_secs += ((node_blocks ? 1 : 0) + (dent_blocks ? 1 : 0));
>> +
>>  /* Write checkpoint to reclaim prefree segments */
>> -if (free_sections(sbi) < NR_CURSEG_PERSIST_TYPE &&
>> +if (free_sections(sbi) <= nr_needed_secs &&
>
>#define NR_GC_CHECKPOINT_SECS  (3) /* data/node/dentry sections */
>
>   if (free_sections(sbi) <= upper + NR_GC_CHECKPOINT_SECS &&
>
>>  prefree_segments(sbi)) {
>>  ret = f2fs_write_checkpoint(sbi, );
>>  if (ret)
>> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
>> index be8f2d7d007b..ac11c47bfe37 100644
>> --- a/fs/f2fs/segment.h
>> +++ b/fs/f2fs/segment.h
>> @@ -605,8 +605,11 @@ static inline bool has_curseg_enough_space(struct 
>> f2fs_sb_info *sbi,
>>  return true;
>>  }
>>  
>> -static inline bool has_not_enough_free_secs(struct f2fs_sb_info *sbi,
>> -int freed, int needed)
>> +/*
>> + * calculate the minimum number of sections (needed) for dirty node/dentry

[f2fs-dev] [PATCH v1] f2fs: Fix system crash due to lack of free space in LFS

2023-03-14 Thread Yonggil Song
When f2fs tries to checkpoint during foreground gc in LFS mode, system
crash occurs due to lack of free space if the amount of dirty node and
dentry pages generated by data migration exceeds free space.
The reproduction sequence is as follows.

 - 20GiB capacity block device (null_blk)
 - format and mount with LFS mode
 - create a file and write 20,000MiB
 - 4k random write on full range of the file

 RIP: 0010:new_curseg+0x48a/0x510 [f2fs]
 Code: 55 e7 f5 89 c0 48 0f af c3 48 8b 5d c0 48 c1 e8 20 83 c0 01 89 43 6c 48 
83 c4 28 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc <0f> 0b f0 41 80 4f 48 04 
45 85 f6 0f 84 ba fd ff ff e9 ef fe ff ff
 RSP: 0018:977bc397b218 EFLAGS: 00010246
 RAX: 27b9 RBX:  RCX: 27c0
 RDX:  RSI: 27b9 RDI: 8c25ab4e74f8
 RBP: 977bc397b268 R08: 27b9 R09: 8c29e4a34b40
 R10: 0001 R11: 977bc397b0d8 R12: 
 R13: 8c25b4dd81a0 R14:  R15: 8c2f667f9000
 FS: () GS:8c344ec8() knlGS:
 CS: 0010 DS:  ES:  CR0: 80050033
 CR2: 00c00055d000 CR3: 000e30810003 CR4: 003706e0
 DR0:  DR1:  DR2: 
 DR3:  DR6: fffe0ff0 DR7: 0400
 Call Trace:
 
 allocate_segment_by_default+0x9c/0x110 [f2fs]
 f2fs_allocate_data_block+0x243/0xa30 [f2fs]
 ? __mod_lruvec_page_state+0xa0/0x150
 do_write_page+0x80/0x160 [f2fs]
 f2fs_do_write_node_page+0x32/0x50 [f2fs]
 __write_node_page+0x339/0x730 [f2fs]
 f2fs_sync_node_pages+0x5a6/0x780 [f2fs]
 block_operations+0x257/0x340 [f2fs]
 f2fs_write_checkpoint+0x102/0x1050 [f2fs]
 f2fs_gc+0x27c/0x630 [f2fs]
 ? folio_mark_dirty+0x36/0x70
 f2fs_balance_fs+0x16f/0x180 [f2fs]

This patch adds checking whether free sections are enough before checkpoint
during gc.

Signed-off-by: Yonggil Song 
---
 fs/f2fs/gc.c  |  7 ++-
 fs/f2fs/segment.h | 26 +-
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 4546e01b2ee0..b22f49a6f128 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1773,6 +1773,7 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct 
f2fs_gc_control *gc_control)
.iroot = RADIX_TREE_INIT(gc_list.iroot, GFP_NOFS),
};
unsigned int skipped_round = 0, round = 0;
+   unsigned int nr_needed_secs = 0, node_blocks = 0, dent_blocks = 0;
 
trace_f2fs_gc_begin(sbi->sb, gc_type, gc_control->no_bg_gc,
gc_control->nr_free_secs,
@@ -1858,8 +1859,12 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct 
f2fs_gc_control *gc_control)
}
}
 
+   /* need more three extra sections for writer's data/node/dentry */
+   nr_needed_secs = get_min_need_secs(sbi, _blocks, _blocks) + 3;
+   nr_needed_secs += ((node_blocks ? 1 : 0) + (dent_blocks ? 1 : 0));
+
/* Write checkpoint to reclaim prefree segments */
-   if (free_sections(sbi) < NR_CURSEG_PERSIST_TYPE &&
+   if (free_sections(sbi) <= nr_needed_secs &&
prefree_segments(sbi)) {
ret = f2fs_write_checkpoint(sbi, );
if (ret)
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index be8f2d7d007b..ac11c47bfe37 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -605,8 +605,11 @@ static inline bool has_curseg_enough_space(struct 
f2fs_sb_info *sbi,
return true;
 }
 
-static inline bool has_not_enough_free_secs(struct f2fs_sb_info *sbi,
-   int freed, int needed)
+/*
+ * calculate the minimum number of sections (needed) for dirty node/dentry
+ */
+static inline unsigned int get_min_need_secs(struct f2fs_sb_info *sbi,
+   unsigned int *node_blocks, unsigned int *dent_blocks)
 {
unsigned int total_node_blocks = get_pages(sbi, F2FS_DIRTY_NODES) +
get_pages(sbi, F2FS_DIRTY_DENTS) +
@@ -614,15 +617,28 @@ static inline bool has_not_enough_free_secs(struct 
f2fs_sb_info *sbi,
unsigned int total_dent_blocks = get_pages(sbi, F2FS_DIRTY_DENTS);
unsigned int node_secs = total_node_blocks / CAP_BLKS_PER_SEC(sbi);
unsigned int dent_secs = total_dent_blocks / CAP_BLKS_PER_SEC(sbi);
-   unsigned int node_blocks = total_node_blocks % CAP_BLKS_PER_SEC(sbi);
-   unsigned int dent_blocks = total_dent_blocks % CAP_BLKS_PER_SEC(sbi);
+
+   f2fs_bug_on(sbi, (!node_blocks || !dent_blocks));
+
+   *node_blocks = total_node_blocks % CAP_BLKS_PER_SEC(sbi);
+   *dent_blocks = total_dent_blocks % CAP_BLKS_PER_SEC(sbi);
+
+   return (node_secs + dent_secs);
+}
+
+static inline bool has_not_enough_free_secs(struct f2fs_sb_info *sbi,
+   int freed, int needed)
+{
+   unsigned int node_blocks = 0;

[f2fs-dev] [PATCH v1] f2fs: Fix discard bug on zoned block devices with 2MiB zone size

2023-03-13 Thread Yonggil Song
When using f2fs on a zoned block device with 2MiB zone size, IO errors
occurs because f2fs tries to write data to a zone that has not been reset.

The cause is that f2fs tries to discard multiple zones at once. This is
caused by a condition in f2fs_clear_prefree_segments that does not check
for zoned block devices when setting the discard range. This leads to
invalid reset commands and write pointer mismatches.

This patch fixes the zoned block device with 2MiB zone size to reset one
zone at a time.

Signed-off-by: Yonggil Song 
---
 fs/f2fs/segment.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index acf3d3fa4363..2b6cb6df623b 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -1953,7 +1953,8 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
(end - 1) <= cpc->trim_end)
continue;
 
-   if (!f2fs_lfs_mode(sbi) || !__is_large_section(sbi)) {
+   if (!f2fs_sb_has_blkzoned(sbi) &&
+   (!f2fs_lfs_mode(sbi) || !__is_large_section(sbi))) {
f2fs_issue_discard(sbi, START_BLOCK(sbi, start),
(end - start) << sbi->log_blocks_per_seg);
continue;
-- 
2.34.1



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [RFC PATCH] f2fs: preserve direct write semantics when buffering is forced

2023-02-22 Thread Yonggil Song
>In some cases, e.g. for zoned block devices, direct writes are
>forced into buffered writes that will populate the page cache
>and be written out just like buffered io.
>
>Direct reads, on the other hand, is supported for the zoned
>block device case. This has the effect that applications
>built for direct io will fill up the page cache with data
>that will never be read, and that is a waste of resources.
>
>If we agree that this is a problem, how do we fix it?

I agree

thanks

>
>A) Supporting proper direct writes for zoned block devices would
>be the best, but it is currently not supported (probably for
>a good but non-obvious reason). Would it be feasible to
>implement proper direct IO?
>
>B) Avoid the cost of keeping unwanted data by syncing and throwing
>out the cached pages for buffered O_DIRECT writes before completion.
>
>This patch implements B) by reusing the code for how partial
>block writes are flushed out on the "normal" direct write path.
>
>Note that this changes the performance characteristics of f2fs
>quite a bit.
>
>Direct IO performance for zoned block devices is lower for
>small writes after this patch, but this should be expected
>with direct IO and in line with how f2fs behaves on top of
>conventional block devices.
>
>Another open question is if the flushing should be done for
>all cases where buffered writes are forced.
>
>Signed-off-by: Hans Holmberg 
Reviewed-by: Yonggil Song 
>---
> fs/f2fs/file.c | 38 ++
> 1 file changed, 30 insertions(+), 8 deletions(-)
>
>diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>index ecbc8c135b49..4e57c37bce35 100644
>--- a/fs/f2fs/file.c
>+++ b/fs/f2fs/file.c
>@@ -4513,6 +4513,19 @@ static const struct iomap_dio_ops 
>f2fs_iomap_dio_write_ops = {
>   .end_io = f2fs_dio_write_end_io,
> };
> 
>+static void f2fs_flush_buffered_write(struct address_space *mapping,
>+loff_t start_pos, loff_t end_pos)
>+{
>+  int ret;
>+
>+  ret = filemap_write_and_wait_range(mapping, start_pos, end_pos);
>+  if (ret < 0)
>+  return;
>+  invalidate_mapping_pages(mapping,
>+   start_pos >> PAGE_SHIFT,
>+   end_pos >> PAGE_SHIFT);
>+}
>+
> static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from,
>  bool *may_need_sync)
> {
>@@ -4612,14 +4625,9 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, 
>struct iov_iter *from,
> 
>   ret += ret2;
> 
>-  ret2 = filemap_write_and_wait_range(file->f_mapping,
>-  bufio_start_pos,
>-  bufio_end_pos);
>-  if (ret2 < 0)
>-  goto out;
>-  invalidate_mapping_pages(file->f_mapping,
>-   bufio_start_pos >> PAGE_SHIFT,
>-   bufio_end_pos >> PAGE_SHIFT);
>+  f2fs_flush_buffered_write(file->f_mapping,
>+bufio_start_pos,
>+bufio_end_pos);
>   }
>   } else {
>   /* iomap_dio_rw() already handled the generic_write_sync(). */
>@@ -4717,8 +4725,22 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, 
>struct iov_iter *from)
>   inode_unlock(inode);
> out:
>   trace_f2fs_file_write_iter(inode, orig_pos, orig_count, ret);
>+
>   if (ret > 0 && may_need_sync)
>   ret = generic_write_sync(iocb, ret);
>+
>+  /* If buffered IO was forced, flush and drop the data from
>+   * the page cache to preserve O_DIRECT semantics
>+   */
>+  if (ret > 0 && !dio && (iocb->ki_flags & IOCB_DIRECT)) {
>+  struct file *file = iocb->ki_filp;
>+  loff_t end_pos = orig_pos + ret - 1;
>+
>+  f2fs_flush_buffered_write(file->f_mapping,
>+orig_pos,
>+end_pos);
>+  }
>+
>   return ret;
> }
> 
>-- 
>2.25.1


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v2] f2fs: fix uninitialized skipped_gc_rwsem

2023-02-15 Thread Yonggil Song
When f2fs skipped a gc round during victim migration, there was a bug which
would skip all upcoming gc rounds unconditionally because skipped_gc_rwsem
was not initialized. It fixes the bug by correctly initializing the
skipped_gc_rwsem inside the gc loop.

Fixes: 3db1de0e582c ("f2fs: change the current atomic write way")
Cc: sta...@vger.kernel.org
Signed-off-by: Yonggil Song 

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index b22f49a6f128..81d326abaac1 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1786,8 +1786,8 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct 
f2fs_gc_control *gc_control)
prefree_segments(sbi));
 
cpc.reason = __get_cp_reason(sbi);
-   sbi->skipped_gc_rwsem = 0;
 gc_more:
+   sbi->skipped_gc_rwsem = 0;
if (unlikely(!(sbi->sb->s_flags & SB_ACTIVE))) {
ret = -EINVAL;
goto stop;
-- 
2.34.1


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v2] f2fs: fix uninitialized skipped_gc_rwsem

2023-02-15 Thread Yonggil Song
When f2fs skipped a gc round during victim migration, there was a bug which
would skip all upcoming gc rounds unconditionally because skipped_gc_rwsem
was not initialized. It fixes the bug by correctly initializing the
skipped_gc_rwsem inside the gc loop.

Fixes: 3db1de0e582c ("f2fs: change the current atomic write way")
Signed-off-by: Yonggil Song 

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index b22f49a6f128..81d326abaac1 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1786,8 +1786,8 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct 
f2fs_gc_control *gc_control)
prefree_segments(sbi));
 
cpc.reason = __get_cp_reason(sbi);
-   sbi->skipped_gc_rwsem = 0;
 gc_more:
+   sbi->skipped_gc_rwsem = 0;
if (unlikely(!(sbi->sb->s_flags & SB_ACTIVE))) {
ret = -EINVAL;
goto stop;
-- 
2.34.1


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v2] f2fs: fix uninitialized skipped_gc_rwsem

2023-02-15 Thread Yonggil Song
When f2fs skipped a gc round during victim migration, there was a bug which
would skip all upcoming gc rounds unconditionally because skipped_gc_rwsem
was not initialized. It fixes the bug by correctly initializing the
skipped_gc_rwsem inside the gc loop.

Fixes: 3db1de0e582c ("f2fs: change the current atomic write way")
Signed-off-by: Yonggil Song 

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index b22f49a6f128..81d326abaac1 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1786,8 +1786,8 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct 
f2fs_gc_control *gc_control)
prefree_segments(sbi));
 
cpc.reason = __get_cp_reason(sbi);
-   sbi->skipped_gc_rwsem = 0;
 gc_more:
+   sbi->skipped_gc_rwsem = 0;
if (unlikely(!(sbi->sb->s_flags & SB_ACTIVE))) {
ret = -EINVAL;
goto stop;
-- 
2.34.1


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v2] f2fs: fix uninitialized skipped_gc_rwsem

2023-02-15 Thread Yonggil Song
When f2fs skipped a gc round during victim migration, there was a bug which
would skip all upcoming gc rounds unconditionally because skipped_gc_rwsem
was not initialized. It fixes the bug by correctly initializing the
skipped_gc_rwsem inside the gc loop.

Fixes: 3db1de0e582c ("f2fs: change the current atomic write way")
Signed-off-by: Yonggil Song 

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index b22f49a6f128..81d326abaac1 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1786,8 +1786,8 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct 
f2fs_gc_control *gc_control)
prefree_segments(sbi));
 
cpc.reason = __get_cp_reason(sbi);
-   sbi->skipped_gc_rwsem = 0;
 gc_more:
+   sbi->skipped_gc_rwsem = 0;
if (unlikely(!(sbi->sb->s_flags & SB_ACTIVE))) {
ret = -EINVAL;
goto stop;
-- 
2.34.1


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] (2) [PATCH v1] f2fs: fix uninitialized skipped_gc_rwsem

2023-02-15 Thread Yonggil Song
On 2023/2/15 10:48, Yonggil Song wrote:
>> When f2fs skipped a gc round during victim migration, there was a bug which
>> would skip all upcoming gc rounds unconditionally because skipped_gc_rwsem
>> was not initialized. It fixes the bug by correctly initializing the
>> skipped_gc_rwsem inside the gc loop.
>
>It makes sense to me.
>
>> 
>> Fixes: d147ea4adb96 ("f2fs: introduce f2fs_gc_control to consolidate f2fs_gc 
>> parameters")
>
>How does this commits introduce the bug?

Oh, sorry I've got wrong hash.
I'll send right hash on PATCH v2.

Thanks for your comment.

>
>Thanks,
>
>> Signed-off-by: Yonggil Song 
>> 
>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>> index b22f49a6f128..81d326abaac1 100644
>> --- a/fs/f2fs/gc.c
>> +++ b/fs/f2fs/gc.c
>> @@ -1786,8 +1786,8 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct 
>> f2fs_gc_control *gc_control)
>>  prefree_segments(sbi));
>>   
>>  cpc.reason = __get_cp_reason(sbi);
>> -sbi->skipped_gc_rwsem = 0;
>>   gc_more:
>> +sbi->skipped_gc_rwsem = 0;
>>  if (unlikely(!(sbi->sb->s_flags & SB_ACTIVE))) {
>>  ret = -EINVAL;
>>  goto stop;


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v1] f2fs: fix uninitialized skipped_gc_rwsem

2023-02-14 Thread Yonggil Song
When f2fs skipped a gc round during victim migration, there was a bug which
would skip all upcoming gc rounds unconditionally because skipped_gc_rwsem
was not initialized. It fixes the bug by correctly initializing the
skipped_gc_rwsem inside the gc loop.

Fixes: d147ea4adb96 ("f2fs: introduce f2fs_gc_control to consolidate f2fs_gc 
parameters")
Signed-off-by: Yonggil Song 

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index b22f49a6f128..81d326abaac1 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1786,8 +1786,8 @@ int f2fs_gc(struct f2fs_sb_info *sbi, struct 
f2fs_gc_control *gc_control)
prefree_segments(sbi));
 
cpc.reason = __get_cp_reason(sbi);
-   sbi->skipped_gc_rwsem = 0;
 gc_more:
+   sbi->skipped_gc_rwsem = 0;
if (unlikely(!(sbi->sb->s_flags & SB_ACTIVE))) {
ret = -EINVAL;
goto stop;
-- 
2.34.1


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [RESEND][PATCH] f2fs: avoid victim selection from previous victim section

2022-11-22 Thread Yonggil Song
When f2fs chooses GC victim in large section & LFS mode,
next_victim_seg[gc_type] is referenced first. After segment is freed,
next_victim_seg[gc_type] has the next segment number.
However, next_victim_seg[gc_type] still has the last segment number
even after the last segment of section is freed. In this case, when f2fs
chooses a victim for the next GC round, the last segment of previous victim
section is chosen as a victim.

Initialize next_victim_seg[gc_type] to NULL_SEGNO for the last segment in
large section.

Fixes: e3080b0120a1 ("f2fs: support subsectional garbage collection")
Signed-off-by: Yonggil Song 
---
 fs/f2fs/gc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 0f967b1e98f2..f1b68eda2235 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1749,8 +1749,9 @@ static int do_garbage_collect(struct f2fs_sb_info *sbi,
get_valid_blocks(sbi, segno, false) == 0)
seg_freed++;
 
-   if (__is_large_section(sbi) && segno + 1 < end_segno)
-   sbi->next_victim_seg[gc_type] = segno + 1;
+   if (__is_large_section(sbi))
+   sbi->next_victim_seg[gc_type] =
+   (segno + 1 < end_segno) ? segno + 1 : 
NULL_SEGNO;
 skip:
f2fs_put_page(sum_page, 0);
}
-- 
2.34.1


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] (2) [PATCH v1] f2fs: avoid victim selection from previous victim section

2022-11-22 Thread Yonggil Song
Hi Chao,

Thanks for your review.
I'll fix this and resend a mail.

Thanks

>Hi Yonggil,
>
>I guess your email client forces converting tab and space characters of
>patch, please check that.
>
>On 2022/11/22 10:36, Yonggil Song wrote:
>> When f2fs chooses GC victim in large section & LFS mode,
>> next_victim_seg[gc_type] is referenced first. After segment is freed,
>> next_victim_seg[gc_type] has the next segment number.
>> However, next_victim_seg[gc_type] still has the last segment number
>> even after the last segment of section is freed. In this case, when f2fs
>> chooses a victim for the next GC round, the last segment of previous victim
>> section is chosen as a victim.
>> 
>> Initialize next_victim_seg[gc_type] to NULL_SEGNO for the last segment in
>> large section.
>> 
>> Fixes: e3080b0120a1 ("f2fs: support subsectional garbage collection")
>
>Good catch, I'm fine with this fix.
>
>Thanks,
>
>> Signed-off-by: Yonggil Song 
>> ---
>>   fs/f2fs/gc.c | 5 +++--
>>   1 file changed, 3 insertions(+), 2 deletions(-)
>> 
>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>> index 4546e01b2ee0..10677d53ef0e 100644
>> --- a/fs/f2fs/gc.c
>> +++ b/fs/f2fs/gc.c
>> @@ -1744,8 +1744,9 @@ static int do_garbage_collect(struct f2fs_sb_info *sbi,
>>   get_valid_blocks(sbi, segno, false) == 0)
>>   seg_freed++;
>>   
>> -if (__is_large_section(sbi) && segno + 1 < end_segno)
>> -sbi->next_victim_seg[gc_type] = segno + 1;
>> +if (__is_large_section(sbi))
>> +sbi->next_victim_seg[gc_type] =
>> +(segno + 1 < end_segno) ? segno + 1 : 
>> NULL_SEGNO;
>>   skip:
>>   f2fs_put_page(sum_page, 0);
>>   }
>> -- 
>> 2.34.1


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v1] f2fs: avoid victim selection from previous victim section

2022-11-21 Thread Yonggil Song
When f2fs chooses GC victim in large section & LFS mode,
next_victim_seg[gc_type] is referenced first. After segment is freed,
next_victim_seg[gc_type] has the next segment number.
However, next_victim_seg[gc_type] still has the last segment number
even after the last segment of section is freed. In this case, when f2fs
chooses a victim for the next GC round, the last segment of previous victim
section is chosen as a victim.

Initialize next_victim_seg[gc_type] to NULL_SEGNO for the last segment in
large section.

Fixes: e3080b0120a1 ("f2fs: support subsectional garbage collection")
Signed-off-by: Yonggil Song 
---
 fs/f2fs/gc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 4546e01b2ee0..10677d53ef0e 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1744,8 +1744,9 @@ static int do_garbage_collect(struct f2fs_sb_info *sbi,
 get_valid_blocks(sbi, segno, false) == 0)
 seg_freed++;
 
-if (__is_large_section(sbi) && segno + 1 < end_segno)
-sbi->next_victim_seg[gc_type] = segno + 1;
+if (__is_large_section(sbi))
+sbi->next_victim_seg[gc_type] =
+(segno + 1 < end_segno) ? segno + 1 : 
NULL_SEGNO;
 skip:
 f2fs_put_page(sum_page, 0);
 }
-- 
2.34.1


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v1] f2fs: avoid victim selection from previous victim section

2022-11-14 Thread Yonggil Song
When f2fs chooses GC victim in large section & LFS mode,
next_victim_seg[gc_type] is referenced first. After segment is freed,
next_victim_seg[gc_type] has the next segment number.
However, next_victim_seg[gc_type] still has the last segment number
even after the last segment of section is freed. In this case, when f2fs
chooses a victim for the next GC round, the last segment of previous victim
section is chosen as a victim.

Initialize next_victim_seg[gc_type] to NULL_SEGNO for the last segment in
large section.

Signed-off-by: Yonggil Song 
---
 fs/f2fs/gc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 4546e01b2ee0..10677d53ef0e 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1744,8 +1744,9 @@ static int do_garbage_collect(struct f2fs_sb_info *sbi,
 get_valid_blocks(sbi, segno, false) == 0)
 seg_freed++;
 
-if (__is_large_section(sbi) && segno + 1 < end_segno)
-sbi->next_victim_seg[gc_type] = segno + 1;
+if (__is_large_section(sbi))
+sbi->next_victim_seg[gc_type] =
+(segno + 1 < end_segno) ? segno + 1 : 
NULL_SEGNO;
 skip:
 f2fs_put_page(sum_page, 0);
 }
-- 
2.34.1


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v1] f2fs: avoid victim selection from previous victim section

2022-11-02 Thread Yonggil Song
When f2fs chooses GC victim in large section & LFS mode,
next_victim_seg[gc_type] is referenced first. After segment is freed,
next_victim_seg[gc_type] has the next segment number.
However, next_victim_seg[gc_type] still has the last segment number
even after the last segment of section is freed. In this case, when f2fs
chooses a victim for the next GC round, the last segment of previous victim
section is chosen as a victim.

Initialize next_victim_seg[gc_type] to NULL_SEGNO for the last segment in
large section.

Signed-off-by: Yonggil Song 
---
 fs/f2fs/gc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 4546e01b2ee0..10677d53ef0e 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1744,8 +1744,9 @@ static int do_garbage_collect(struct f2fs_sb_info *sbi,
get_valid_blocks(sbi, segno, false) == 0)
seg_freed++;
 
-   if (__is_large_section(sbi) && segno + 1 < end_segno)
-   sbi->next_victim_seg[gc_type] = segno + 1;
+   if (__is_large_section(sbi))
+   sbi->next_victim_seg[gc_type] =
+   (segno + 1 < end_segno) ? segno + 1 : 
NULL_SEGNO;
 skip:
f2fs_put_page(sum_page, 0);
}
-- 
2.34.1


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH] f2fs: fix typo

2022-09-18 Thread Yonggil Song



Fix typo in f2fs.h
Detected by Jaeyoon Choi

Signed-off-by: Yonggil Song 
---
 fs/f2fs/f2fs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index eddfd35eadb6..661096be59d1 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -274,7 +274,7 @@ enum {
ORPHAN_INO, /* for orphan ino list */
APPEND_INO, /* for append ino list */
UPDATE_INO, /* for update ino list */
-   TRANS_DIR_INO,  /* for trasactions dir ino list */
+   TRANS_DIR_INO,  /* for transactions dir ino list */
FLUSH_INO,  /* for multiple device flushing */
MAX_INO_ENTRY,  /* max. list */
 };
-- 
2.34.1


___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH v1] f2fs: fix typo

2022-09-01 Thread Yonggil Song


Fix typo in f2fs.h
Detected by Jaeyoon Choi

Signed-off-by: Yonggil Song 
---
 fs/f2fs/f2fs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index eddfd35eadb6..661096be59d1 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -274,7 +274,7 @@ enum {
ORPHAN_INO, /* for orphan ino list */
APPEND_INO, /* for append ino list */
UPDATE_INO, /* for update ino list */
-   TRANS_DIR_INO,  /* for trasactions dir ino list */
+   TRANS_DIR_INO,  /* for transactions dir ino list */
FLUSH_INO,  /* for multiple device flushing */
MAX_INO_ENTRY,  /* max. list */
 };
-- 
2.34.1



___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel