Re: [PATCH] f2fs: optimize gc for better performance

2013-09-04 Thread Jaegeuk Kim
Merged.
Thank you,

2013-09-05 (목), 12:45 +0800, Jin Xu:
> From: Jin Xu 
> 
> This patch improves the gc efficiency by optimizing the victim
> selection policy. With this optimization, the random re-write
> performance could increase up to 20%.
> 
> For f2fs, when disk is in shortage of free spaces, gc will selects
> dirty segments and moves valid blocks around for making more space
> available. The gc cost of a segment is determined by the valid blocks
> in the segment. The less the valid blocks, the higher the efficiency.
> The ideal victim segment is the one that has the most garbage blocks.
> 
> Currently, it searches up to 20 dirty segments for a victim segment.
> The selected victim is not likely the best victim for gc when there
> are much more dirty segments. Why not searching more dirty segments
> for a better victim? The cost of searching dirty segments is
> negligible in comparison to moving blocks.
> 
> In this patch, it enlarges the MAX_VICTIM_SEARCH to 4096 to make
> the search more aggressively for a possible better victim. Since
> it also applies to victim selection for SSR, it will likely improve
> the SSR efficiency as well.
> 
> The test case is simple. It creates as many files until the disk full.
> The size for each file is 32KB. Then it writes as many as 10
> records of 4KB size to random offsets of random files in sync mode.
> The testing was done on a 2GB partition of a SDHC card. Let's see the
> test result of f2fs without and with the patch.
> 
> ---
> 2GB partition, SDHC
> create 52023 files of size 32768 bytes
> random re-write 10 records of 4KB
> ---
> | file creation (s) | rewrite time (s) | gc count | gc garbage blocks |
> [no patch]  341 4227 1174  174840
> [patched]   324 2958 645   106682
> 
> It's obvious that, with the patch, f2fs finishes the test in 20+% less
> time than without the patch. And internally it does much less gc with
> higher efficiency than before.
> 
> Since the performance improvement is related to gc, it might not be so
> obvious for other tests that do not trigger gc as often as this one (
> This is because f2fs selects dirty segments for SSR use most of the
> time when free space is in shortage). The well-known iozone test tool
> was not used for benchmarking the patch becuase it seems do not have
> a test case that performs random re-write on a full disk.
> 
> This patch is the revised version based on the suggestion from
> Jaegeuk Kim.
> 
> Signed-off-by: Jin Xu 
> [Jaegeuk Kim: suggested simpler solution]
> Reviewed-by: Jaegeuk Kim 
> ---
>  fs/f2fs/gc.c  |8 +++-
>  fs/f2fs/gc.h  |2 +-
>  fs/f2fs/segment.h |1 +
>  3 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index 35f9b1a..a78b8e3 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -138,12 +138,18 @@ static void select_policy(struct f2fs_sb_info *sbi, int 
> gc_type,
>   if (p->alloc_mode == SSR) {
>   p->gc_mode = GC_GREEDY;
>   p->dirty_segmap = dirty_i->dirty_segmap[type];
> + p->max_search = dirty_i->nr_dirty[type];
>   p->ofs_unit = 1;
>   } else {
>   p->gc_mode = select_gc_type(gc_type);
>   p->dirty_segmap = dirty_i->dirty_segmap[DIRTY];
> + p->max_search = dirty_i->nr_dirty[DIRTY];
>   p->ofs_unit = sbi->segs_per_sec;
>   }
> +
> + if (p->max_search > MAX_VICTIM_SEARCH)
> + p->max_search = MAX_VICTIM_SEARCH;
> +
>   p->offset = sbi->last_victim[p->gc_mode];
>  }
>  
> @@ -290,7 +296,7 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
>   if (cost == max_cost)
>   continue;
>  
> - if (nsearched++ >= MAX_VICTIM_SEARCH) {
> + if (nsearched++ >= p.max_search) {
>   sbi->last_victim[p.gc_mode] = segno;
>   break;
>   }
> diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h
> index 2c6a6bd..f8ee8a8 100644
> --- a/fs/f2fs/gc.h
> +++ b/fs/f2fs/gc.h
> @@ -20,7 +20,7 @@
>  #define LIMIT_FREE_BLOCK 40 /* percentage over invalid + free space */
>  
>  /* Search max. number of dirty segments to select a victim segment */
> -#define MAX_VICTIM_SEARCH20
> +#define MAX_VICTIM_SEARCH 4096 /* covers 8GB */
>  
>  struct f2fs_gc_kthread {
>   struct task_struct *f2fs_gc_task;
> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
> index 062424a..dca6379 100644
> --- a/fs/f2fs/segment.h
> +++ b/fs/f2fs/segment.h
> @@ -142,6 +142,7 @@ struct victim_sel_policy {
>   int alloc_mode; /* LFS or SSR */
>   int gc_mode;/* GC_CB or GC_GREEDY */
>   unsigned long *dirty_segmap;/* dirty segment bitmap */
> + unsigned int max_search;/* maximum # of segments to search */
>   unsigned int offset; 

[PATCH] f2fs: optimize gc for better performance

2013-09-04 Thread Jin Xu
From: Jin Xu 

This patch improves the gc efficiency by optimizing the victim
selection policy. With this optimization, the random re-write
performance could increase up to 20%.

For f2fs, when disk is in shortage of free spaces, gc will selects
dirty segments and moves valid blocks around for making more space
available. The gc cost of a segment is determined by the valid blocks
in the segment. The less the valid blocks, the higher the efficiency.
The ideal victim segment is the one that has the most garbage blocks.

Currently, it searches up to 20 dirty segments for a victim segment.
The selected victim is not likely the best victim for gc when there
are much more dirty segments. Why not searching more dirty segments
for a better victim? The cost of searching dirty segments is
negligible in comparison to moving blocks.

In this patch, it enlarges the MAX_VICTIM_SEARCH to 4096 to make
the search more aggressively for a possible better victim. Since
it also applies to victim selection for SSR, it will likely improve
the SSR efficiency as well.

The test case is simple. It creates as many files until the disk full.
The size for each file is 32KB. Then it writes as many as 10
records of 4KB size to random offsets of random files in sync mode.
The testing was done on a 2GB partition of a SDHC card. Let's see the
test result of f2fs without and with the patch.

---
2GB partition, SDHC
create 52023 files of size 32768 bytes
random re-write 10 records of 4KB
---
| file creation (s) | rewrite time (s) | gc count | gc garbage blocks |
[no patch]  341 4227 1174  174840
[patched]   324 2958 645   106682

It's obvious that, with the patch, f2fs finishes the test in 20+% less
time than without the patch. And internally it does much less gc with
higher efficiency than before.

Since the performance improvement is related to gc, it might not be so
obvious for other tests that do not trigger gc as often as this one (
This is because f2fs selects dirty segments for SSR use most of the
time when free space is in shortage). The well-known iozone test tool
was not used for benchmarking the patch becuase it seems do not have
a test case that performs random re-write on a full disk.

This patch is the revised version based on the suggestion from
Jaegeuk Kim.

Signed-off-by: Jin Xu 
[Jaegeuk Kim: suggested simpler solution]
Reviewed-by: Jaegeuk Kim 
---
 fs/f2fs/gc.c  |8 +++-
 fs/f2fs/gc.h  |2 +-
 fs/f2fs/segment.h |1 +
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 35f9b1a..a78b8e3 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -138,12 +138,18 @@ static void select_policy(struct f2fs_sb_info *sbi, int 
gc_type,
if (p->alloc_mode == SSR) {
p->gc_mode = GC_GREEDY;
p->dirty_segmap = dirty_i->dirty_segmap[type];
+   p->max_search = dirty_i->nr_dirty[type];
p->ofs_unit = 1;
} else {
p->gc_mode = select_gc_type(gc_type);
p->dirty_segmap = dirty_i->dirty_segmap[DIRTY];
+   p->max_search = dirty_i->nr_dirty[DIRTY];
p->ofs_unit = sbi->segs_per_sec;
}
+
+   if (p->max_search > MAX_VICTIM_SEARCH)
+   p->max_search = MAX_VICTIM_SEARCH;
+
p->offset = sbi->last_victim[p->gc_mode];
 }
 
@@ -290,7 +296,7 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
if (cost == max_cost)
continue;
 
-   if (nsearched++ >= MAX_VICTIM_SEARCH) {
+   if (nsearched++ >= p.max_search) {
sbi->last_victim[p.gc_mode] = segno;
break;
}
diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h
index 2c6a6bd..f8ee8a8 100644
--- a/fs/f2fs/gc.h
+++ b/fs/f2fs/gc.h
@@ -20,7 +20,7 @@
 #define LIMIT_FREE_BLOCK   40 /* percentage over invalid + free space */
 
 /* Search max. number of dirty segments to select a victim segment */
-#define MAX_VICTIM_SEARCH  20
+#define MAX_VICTIM_SEARCH 4096 /* covers 8GB */
 
 struct f2fs_gc_kthread {
struct task_struct *f2fs_gc_task;
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index 062424a..dca6379 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -142,6 +142,7 @@ struct victim_sel_policy {
int alloc_mode; /* LFS or SSR */
int gc_mode;/* GC_CB or GC_GREEDY */
unsigned long *dirty_segmap;/* dirty segment bitmap */
+   unsigned int max_search;/* maximum # of segments to search */
unsigned int offset;/* last scanned bitmap offset */
unsigned int ofs_unit;  /* bitmap search unit */
unsigned int min_cost;  /* minimum cost */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line 

Re: [PATCH] f2fs: optimize gc for better performance

2013-09-04 Thread Jin Xu

Yes, I will submit the patch later.

On 05/09/2013 07:50, Jaegeuk Kim wrote:

Hi Jin,

2013-09-04 (수), 21:17 +0800, Jin Xu:

Hi Jaegeuk,

On 04/09/2013 17:40, Jaegeuk Kim wrote:

Hi Jin,

2013-09-04 (수), 07:59 +0800, Jin Xu:

Hi Jaegeuk,

On 03/09/2013 08:45, Jaegeuk Kim wrote:

Hi Jin,


[...]


It seems that we can obtain the performance gain just by setting the
MAX_VICTIM_SEARCH to 4096, for example.
So, how about just adding an ending criteria like below?



I agree that we could get the performance improvement by simply
enlarging the MAX_VICTIM_SEARCH to 4096, but I am concerning the
scalability a little bit. Because it might always searching the whole
bitmap in some cases, for example, when dirty segments is 4000 and
total segments is 409600.

[snip]

[...]


if (p->max_search > MAX_VICTIM_SEARCH)
p->max_search = MAX_VICTIM_SEARCH;



The optimization does not apply to SSR mode. There has a reason.
As noticed in the test, when SSR selected the segments that have most
garbage blocks, then when gc is needed, all the dirty segments might
have very less garbage blocks, thus the gc overhead is high. This might
lead to performance degradation. So the patch does not change the
victim selection policy for SSR.


I think it doesn't care.
GC is only triggered during the direct node block allocation.
What it means that we need to consider the number of GC triggers where
the GC triggers more frequently during the normal data allocation than
the node block allocation.
So, I think it would not degrade performance significatly.

BTW, could you show some numbers for this?
Or could you test what I suggested?

Thanks,



I re-ran the test and got the following result:

---
2GB SDHC
create 52023 files of size 32768 bytes
random re-write 10 records of 4KB
---

| file creation (s) | rewrite time (s) | gc count | gc garbage blocks |

no patch   341 4227 1174  174840
patched296 2995 634   109314
patched (KIM)  324 2958 645   106682

In this test, it does not show the minor performance degradation caused
by applying the patch to SSR mode. Instead, the performance is a little
better with what you suggested.

I agree that the performance degradation would not be significant even
it does degrade. I ever saw the minor degradation in some workloads, but
I didn't save the data.

So, I agree that we can apply the patch to SSR mode as well.

And do you still have concerns about the formula for calculating the #
of search?


Thank you for the test. :)
What I've concerned is that, if it is really important to get a victim
more accurately for the performance as you described, it doesn't need to
calculate the number of searches IMO. Just let's select nr_dirty. Why
not?
Only the thing that we should consider is to handle the case where the
nr_dirty is too large.
For this, we can just limit the # of searches to avoid performance
degradation.

Still actually, I'm not convincing the effectiveness of your formula.
If possible, could you show it with numbers?


It's not easy to prove the effectiveness of the formula. It's just for
eliminating my concern on the scalability of searching. Since it does
not matter much for the performance improvement, we can put it aside
and choose the simpler method as you suggested.

So, should I revise the patch based on what you suggested or will
you take care of it?


Could you make a patch with your performance description and sumbit it
again?
Thanks a lot,



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] f2fs: optimize gc for better performance

2013-09-04 Thread Jaegeuk Kim
Hi Jin,

2013-09-04 (수), 21:17 +0800, Jin Xu:
> Hi Jaegeuk,
> 
> On 04/09/2013 17:40, Jaegeuk Kim wrote:
> > Hi Jin,
> >
> > 2013-09-04 (수), 07:59 +0800, Jin Xu:
> >> Hi Jaegeuk,
> >>
> >> On 03/09/2013 08:45, Jaegeuk Kim wrote:
> >>> Hi Jin,
> >>>
>  [...]
> >
> > It seems that we can obtain the performance gain just by setting the
> > MAX_VICTIM_SEARCH to 4096, for example.
> > So, how about just adding an ending criteria like below?
> >
> 
>  I agree that we could get the performance improvement by simply
>  enlarging the MAX_VICTIM_SEARCH to 4096, but I am concerning the
>  scalability a little bit. Because it might always searching the whole
>  bitmap in some cases, for example, when dirty segments is 4000 and
>  total segments is 409600.
> > [snip]
>  [...]
> >
> > if (p->max_search > MAX_VICTIM_SEARCH)
> > p->max_search = MAX_VICTIM_SEARCH;
> >
> 
>  The optimization does not apply to SSR mode. There has a reason.
>  As noticed in the test, when SSR selected the segments that have most
>  garbage blocks, then when gc is needed, all the dirty segments might
>  have very less garbage blocks, thus the gc overhead is high. This might
>  lead to performance degradation. So the patch does not change the
>  victim selection policy for SSR.
> >>>
> >>> I think it doesn't care.
> >>> GC is only triggered during the direct node block allocation.
> >>> What it means that we need to consider the number of GC triggers where
> >>> the GC triggers more frequently during the normal data allocation than
> >>> the node block allocation.
> >>> So, I think it would not degrade performance significatly.
> >>>
> >>> BTW, could you show some numbers for this?
> >>> Or could you test what I suggested?
> >>>
> >>> Thanks,
> >>>
> >>
> >> I re-ran the test and got the following result:
> >>
> >> ---
> >> 2GB SDHC
> >> create 52023 files of size 32768 bytes
> >> random re-write 10 records of 4KB
> >> ---
> >>
> >> | file creation (s) | rewrite time (s) | gc count | gc garbage blocks |
> >>
> >> no patch   341 4227 1174  174840
> >> patched296 2995 634   109314
> >> patched (KIM)  324 2958 645   106682
> >>
> >> In this test, it does not show the minor performance degradation caused
> >> by applying the patch to SSR mode. Instead, the performance is a little
> >> better with what you suggested.
> >>
> >> I agree that the performance degradation would not be significant even
> >> it does degrade. I ever saw the minor degradation in some workloads, but
> >> I didn't save the data.
> >>
> >> So, I agree that we can apply the patch to SSR mode as well.
> >>
> >> And do you still have concerns about the formula for calculating the #
> >> of search?
> >
> > Thank you for the test. :)
> > What I've concerned is that, if it is really important to get a victim
> > more accurately for the performance as you described, it doesn't need to
> > calculate the number of searches IMO. Just let's select nr_dirty. Why
> > not?
> > Only the thing that we should consider is to handle the case where the
> > nr_dirty is too large.
> > For this, we can just limit the # of searches to avoid performance
> > degradation.
> >
> > Still actually, I'm not convincing the effectiveness of your formula.
> > If possible, could you show it with numbers?
> 
> It's not easy to prove the effectiveness of the formula. It's just for
> eliminating my concern on the scalability of searching. Since it does
> not matter much for the performance improvement, we can put it aside
> and choose the simpler method as you suggested.
> 
> So, should I revise the patch based on what you suggested or will
> you take care of it?

Could you make a patch with your performance description and sumbit it
again?
Thanks a lot,

-- 
Jaegeuk Kim
Samsung

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] f2fs: optimize gc for better performance

2013-09-04 Thread Jin Xu

Hi Jaegeuk,

On 04/09/2013 17:40, Jaegeuk Kim wrote:

Hi Jin,

2013-09-04 (수), 07:59 +0800, Jin Xu:

Hi Jaegeuk,

On 03/09/2013 08:45, Jaegeuk Kim wrote:

Hi Jin,


[...]


It seems that we can obtain the performance gain just by setting the
MAX_VICTIM_SEARCH to 4096, for example.
So, how about just adding an ending criteria like below?



I agree that we could get the performance improvement by simply
enlarging the MAX_VICTIM_SEARCH to 4096, but I am concerning the
scalability a little bit. Because it might always searching the whole
bitmap in some cases, for example, when dirty segments is 4000 and
total segments is 409600.

[snip]

[...]


if (p->max_search > MAX_VICTIM_SEARCH)
p->max_search = MAX_VICTIM_SEARCH;



The optimization does not apply to SSR mode. There has a reason.
As noticed in the test, when SSR selected the segments that have most
garbage blocks, then when gc is needed, all the dirty segments might
have very less garbage blocks, thus the gc overhead is high. This might
lead to performance degradation. So the patch does not change the
victim selection policy for SSR.


I think it doesn't care.
GC is only triggered during the direct node block allocation.
What it means that we need to consider the number of GC triggers where
the GC triggers more frequently during the normal data allocation than
the node block allocation.
So, I think it would not degrade performance significatly.

BTW, could you show some numbers for this?
Or could you test what I suggested?

Thanks,



I re-ran the test and got the following result:

---
2GB SDHC
create 52023 files of size 32768 bytes
random re-write 10 records of 4KB
---

| file creation (s) | rewrite time (s) | gc count | gc garbage blocks |

no patch   341 4227 1174  174840
patched296 2995 634   109314
patched (KIM)  324 2958 645   106682

In this test, it does not show the minor performance degradation caused
by applying the patch to SSR mode. Instead, the performance is a little
better with what you suggested.

I agree that the performance degradation would not be significant even
it does degrade. I ever saw the minor degradation in some workloads, but
I didn't save the data.

So, I agree that we can apply the patch to SSR mode as well.

And do you still have concerns about the formula for calculating the #
of search?


Thank you for the test. :)
What I've concerned is that, if it is really important to get a victim
more accurately for the performance as you described, it doesn't need to
calculate the number of searches IMO. Just let's select nr_dirty. Why
not?
Only the thing that we should consider is to handle the case where the
nr_dirty is too large.
For this, we can just limit the # of searches to avoid performance
degradation.

Still actually, I'm not convincing the effectiveness of your formula.
If possible, could you show it with numbers?


It's not easy to prove the effectiveness of the formula. It's just for
eliminating my concern on the scalability of searching. Since it does
not matter much for the performance improvement, we can put it aside
and choose the simpler method as you suggested.

So, should I revise the patch based on what you suggested or will
you take care of it?

--
Thanks,
Jin



Thanks,





What do you think now?


#define MAX_VICTIM_SEARCH 4096 /* covers 8GB */


p->offset = sbi->last_victim[p->gc_mode];
@@ -243,6 +245,8 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
struct victim_sel_policy p;
unsigned int secno, max_cost;
int nsearched = 0;
+   unsigned int max_search = MAX_VICTIM_SEARCH;
+   unsigned int nr_dirty;

p.alloc_mode = alloc_mode;
select_policy(sbi, gc_type, type, );
@@ -258,6 +262,27 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
goto got_it;
}

+   nr_dirty = dirty_i->nr_dirty[p.dirty_type];
+   if (p.gc_mode == GC_GREEDY && p.alloc_mode != SSR) {
+   if (TOTAL_SEGS(sbi) <= FULL_VICTIM_SEARCH_THRESH)
+   max_search = nr_dirty; /* search all the dirty segs */
+   else {
+   /*
+* With more dirty segments, garbage blocks are likely
+* more scattered, thus search harder for better
+* victim.
+*/
+   max_search = div_u64 ((nr_dirty *
+   FULL_VICTIM_SEARCH_THRESH), TOTAL_SEGS(sbi));
+   if (max_search < MIN_VICTIM_SEARCH_GREEDY)
+   max_search = MIN_VICTIM_SEARCH_GREEDY;
+   }
+   }
+
+   /* no more than the total dirty segments */
+   if (max_search > nr_dirty)
+   max_search = nr_dirty;
+
while 

Re: [PATCH] f2fs: optimize gc for better performance

2013-09-04 Thread Jaegeuk Kim
Hi Jin,

2013-09-04 (수), 07:59 +0800, Jin Xu:
> Hi Jaegeuk,
> 
> On 03/09/2013 08:45, Jaegeuk Kim wrote:
> > Hi Jin,
> >
> >> [...]
> >>>
> >>> It seems that we can obtain the performance gain just by setting the
> >>> MAX_VICTIM_SEARCH to 4096, for example.
> >>> So, how about just adding an ending criteria like below?
> >>>
> >>
> >> I agree that we could get the performance improvement by simply
> >> enlarging the MAX_VICTIM_SEARCH to 4096, but I am concerning the
> >> scalability a little bit. Because it might always searching the whole
> >> bitmap in some cases, for example, when dirty segments is 4000 and
> >> total segments is 409600.
> >>> [snip]
> >> [...]
> >>>
> >>>   if (p->max_search > MAX_VICTIM_SEARCH)
> >>>   p->max_search = MAX_VICTIM_SEARCH;
> >>>
> >>
> >> The optimization does not apply to SSR mode. There has a reason.
> >> As noticed in the test, when SSR selected the segments that have most
> >> garbage blocks, then when gc is needed, all the dirty segments might
> >> have very less garbage blocks, thus the gc overhead is high. This might
> >> lead to performance degradation. So the patch does not change the
> >> victim selection policy for SSR.
> >
> > I think it doesn't care.
> > GC is only triggered during the direct node block allocation.
> > What it means that we need to consider the number of GC triggers where
> > the GC triggers more frequently during the normal data allocation than
> > the node block allocation.
> > So, I think it would not degrade performance significatly.
> >
> > BTW, could you show some numbers for this?
> > Or could you test what I suggested?
> >
> > Thanks,
> >
> 
> I re-ran the test and got the following result:
> 
> ---
> 2GB SDHC
> create 52023 files of size 32768 bytes
> random re-write 10 records of 4KB
> ---
> 
> | file creation (s) | rewrite time (s) | gc count | gc garbage blocks |
> 
> no patch   341 4227 1174  174840
> patched296 2995 634   109314
> patched (KIM)  324 2958 645   106682
> 
> In this test, it does not show the minor performance degradation caused
> by applying the patch to SSR mode. Instead, the performance is a little 
> better with what you suggested.
> 
> I agree that the performance degradation would not be significant even
> it does degrade. I ever saw the minor degradation in some workloads, but
> I didn't save the data.
> 
> So, I agree that we can apply the patch to SSR mode as well.
> 
> And do you still have concerns about the formula for calculating the #
> of search?

Thank you for the test. :)
What I've concerned is that, if it is really important to get a victim
more accurately for the performance as you described, it doesn't need to
calculate the number of searches IMO. Just let's select nr_dirty. Why
not?
Only the thing that we should consider is to handle the case where the
nr_dirty is too large.
For this, we can just limit the # of searches to avoid performance
degradation.

Still actually, I'm not convincing the effectiveness of your formula.
If possible, could you show it with numbers?

Thanks,

> 
> >>
> >> What do you think now?
> >>
> >>> #define MAX_VICTIM_SEARCH 4096 /* covers 8GB */
> >>>
>   p->offset = sbi->last_victim[p->gc_mode];
>  @@ -243,6 +245,8 @@ static int get_victim_by_default(struct f2fs_sb_info 
>  *sbi,
>   struct victim_sel_policy p;
>   unsigned int secno, max_cost;
>   int nsearched = 0;
>  +unsigned int max_search = MAX_VICTIM_SEARCH;
>  +unsigned int nr_dirty;
> 
>   p.alloc_mode = alloc_mode;
>   select_policy(sbi, gc_type, type, );
>  @@ -258,6 +262,27 @@ static int get_victim_by_default(struct 
>  f2fs_sb_info *sbi,
>   goto got_it;
>   }
> 
>  +nr_dirty = dirty_i->nr_dirty[p.dirty_type];
>  +if (p.gc_mode == GC_GREEDY && p.alloc_mode != SSR) {
>  +if (TOTAL_SEGS(sbi) <= FULL_VICTIM_SEARCH_THRESH)
>  +max_search = nr_dirty; /* search all the dirty 
>  segs */
>  +else {
>  +/*
>  + * With more dirty segments, garbage blocks are 
>  likely
>  + * more scattered, thus search harder for better
>  + * victim.
>  + */
>  +max_search = div_u64 ((nr_dirty *
>  +FULL_VICTIM_SEARCH_THRESH), 
>  TOTAL_SEGS(sbi));
>  +if (max_search < MIN_VICTIM_SEARCH_GREEDY)
>  +max_search = MIN_VICTIM_SEARCH_GREEDY;
>  +}
>  +}
>  +
>  +/* no more than the 

Re: [PATCH] f2fs: optimize gc for better performance

2013-09-04 Thread Jaegeuk Kim
Hi Jin,

2013-09-04 (수), 07:59 +0800, Jin Xu:
 Hi Jaegeuk,
 
 On 03/09/2013 08:45, Jaegeuk Kim wrote:
  Hi Jin,
 
  [...]
 
  It seems that we can obtain the performance gain just by setting the
  MAX_VICTIM_SEARCH to 4096, for example.
  So, how about just adding an ending criteria like below?
 
 
  I agree that we could get the performance improvement by simply
  enlarging the MAX_VICTIM_SEARCH to 4096, but I am concerning the
  scalability a little bit. Because it might always searching the whole
  bitmap in some cases, for example, when dirty segments is 4000 and
  total segments is 409600.
  [snip]
  [...]
 
if (p-max_search  MAX_VICTIM_SEARCH)
p-max_search = MAX_VICTIM_SEARCH;
 
 
  The optimization does not apply to SSR mode. There has a reason.
  As noticed in the test, when SSR selected the segments that have most
  garbage blocks, then when gc is needed, all the dirty segments might
  have very less garbage blocks, thus the gc overhead is high. This might
  lead to performance degradation. So the patch does not change the
  victim selection policy for SSR.
 
  I think it doesn't care.
  GC is only triggered during the direct node block allocation.
  What it means that we need to consider the number of GC triggers where
  the GC triggers more frequently during the normal data allocation than
  the node block allocation.
  So, I think it would not degrade performance significatly.
 
  BTW, could you show some numbers for this?
  Or could you test what I suggested?
 
  Thanks,
 
 
 I re-ran the test and got the following result:
 
 ---
 2GB SDHC
 create 52023 files of size 32768 bytes
 random re-write 10 records of 4KB
 ---
 
 | file creation (s) | rewrite time (s) | gc count | gc garbage blocks |
 
 no patch   341 4227 1174  174840
 patched296 2995 634   109314
 patched (KIM)  324 2958 645   106682
 
 In this test, it does not show the minor performance degradation caused
 by applying the patch to SSR mode. Instead, the performance is a little 
 better with what you suggested.
 
 I agree that the performance degradation would not be significant even
 it does degrade. I ever saw the minor degradation in some workloads, but
 I didn't save the data.
 
 So, I agree that we can apply the patch to SSR mode as well.
 
 And do you still have concerns about the formula for calculating the #
 of search?

Thank you for the test. :)
What I've concerned is that, if it is really important to get a victim
more accurately for the performance as you described, it doesn't need to
calculate the number of searches IMO. Just let's select nr_dirty. Why
not?
Only the thing that we should consider is to handle the case where the
nr_dirty is too large.
For this, we can just limit the # of searches to avoid performance
degradation.

Still actually, I'm not convincing the effectiveness of your formula.
If possible, could you show it with numbers?

Thanks,

 
 
  What do you think now?
 
  #define MAX_VICTIM_SEARCH 4096 /* covers 8GB */
 
   p-offset = sbi-last_victim[p-gc_mode];
  @@ -243,6 +245,8 @@ static int get_victim_by_default(struct f2fs_sb_info 
  *sbi,
   struct victim_sel_policy p;
   unsigned int secno, max_cost;
   int nsearched = 0;
  +unsigned int max_search = MAX_VICTIM_SEARCH;
  +unsigned int nr_dirty;
 
   p.alloc_mode = alloc_mode;
   select_policy(sbi, gc_type, type, p);
  @@ -258,6 +262,27 @@ static int get_victim_by_default(struct 
  f2fs_sb_info *sbi,
   goto got_it;
   }
 
  +nr_dirty = dirty_i-nr_dirty[p.dirty_type];
  +if (p.gc_mode == GC_GREEDY  p.alloc_mode != SSR) {
  +if (TOTAL_SEGS(sbi) = FULL_VICTIM_SEARCH_THRESH)
  +max_search = nr_dirty; /* search all the dirty 
  segs */
  +else {
  +/*
  + * With more dirty segments, garbage blocks are 
  likely
  + * more scattered, thus search harder for better
  + * victim.
  + */
  +max_search = div_u64 ((nr_dirty *
  +FULL_VICTIM_SEARCH_THRESH), 
  TOTAL_SEGS(sbi));
  +if (max_search  MIN_VICTIM_SEARCH_GREEDY)
  +max_search = MIN_VICTIM_SEARCH_GREEDY;
  +}
  +}
  +
  +/* no more than the total dirty segments */
  +if (max_search  nr_dirty)
  +max_search = nr_dirty;
  +
   while (1) {
   unsigned long cost;
   unsigned int segno;
  @@ -290,7 +315,7 @@ static int get_victim_by_default(struct f2fs_sb_info 
  *sbi,
   if (cost == max_cost)
   

Re: [PATCH] f2fs: optimize gc for better performance

2013-09-04 Thread Jin Xu

Hi Jaegeuk,

On 04/09/2013 17:40, Jaegeuk Kim wrote:

Hi Jin,

2013-09-04 (수), 07:59 +0800, Jin Xu:

Hi Jaegeuk,

On 03/09/2013 08:45, Jaegeuk Kim wrote:

Hi Jin,


[...]


It seems that we can obtain the performance gain just by setting the
MAX_VICTIM_SEARCH to 4096, for example.
So, how about just adding an ending criteria like below?



I agree that we could get the performance improvement by simply
enlarging the MAX_VICTIM_SEARCH to 4096, but I am concerning the
scalability a little bit. Because it might always searching the whole
bitmap in some cases, for example, when dirty segments is 4000 and
total segments is 409600.

[snip]

[...]


if (p-max_search  MAX_VICTIM_SEARCH)
p-max_search = MAX_VICTIM_SEARCH;



The optimization does not apply to SSR mode. There has a reason.
As noticed in the test, when SSR selected the segments that have most
garbage blocks, then when gc is needed, all the dirty segments might
have very less garbage blocks, thus the gc overhead is high. This might
lead to performance degradation. So the patch does not change the
victim selection policy for SSR.


I think it doesn't care.
GC is only triggered during the direct node block allocation.
What it means that we need to consider the number of GC triggers where
the GC triggers more frequently during the normal data allocation than
the node block allocation.
So, I think it would not degrade performance significatly.

BTW, could you show some numbers for this?
Or could you test what I suggested?

Thanks,



I re-ran the test and got the following result:

---
2GB SDHC
create 52023 files of size 32768 bytes
random re-write 10 records of 4KB
---

| file creation (s) | rewrite time (s) | gc count | gc garbage blocks |

no patch   341 4227 1174  174840
patched296 2995 634   109314
patched (KIM)  324 2958 645   106682

In this test, it does not show the minor performance degradation caused
by applying the patch to SSR mode. Instead, the performance is a little
better with what you suggested.

I agree that the performance degradation would not be significant even
it does degrade. I ever saw the minor degradation in some workloads, but
I didn't save the data.

So, I agree that we can apply the patch to SSR mode as well.

And do you still have concerns about the formula for calculating the #
of search?


Thank you for the test. :)
What I've concerned is that, if it is really important to get a victim
more accurately for the performance as you described, it doesn't need to
calculate the number of searches IMO. Just let's select nr_dirty. Why
not?
Only the thing that we should consider is to handle the case where the
nr_dirty is too large.
For this, we can just limit the # of searches to avoid performance
degradation.

Still actually, I'm not convincing the effectiveness of your formula.
If possible, could you show it with numbers?


It's not easy to prove the effectiveness of the formula. It's just for
eliminating my concern on the scalability of searching. Since it does
not matter much for the performance improvement, we can put it aside
and choose the simpler method as you suggested.

So, should I revise the patch based on what you suggested or will
you take care of it?

--
Thanks,
Jin



Thanks,





What do you think now?


#define MAX_VICTIM_SEARCH 4096 /* covers 8GB */


p-offset = sbi-last_victim[p-gc_mode];
@@ -243,6 +245,8 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
struct victim_sel_policy p;
unsigned int secno, max_cost;
int nsearched = 0;
+   unsigned int max_search = MAX_VICTIM_SEARCH;
+   unsigned int nr_dirty;

p.alloc_mode = alloc_mode;
select_policy(sbi, gc_type, type, p);
@@ -258,6 +262,27 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
goto got_it;
}

+   nr_dirty = dirty_i-nr_dirty[p.dirty_type];
+   if (p.gc_mode == GC_GREEDY  p.alloc_mode != SSR) {
+   if (TOTAL_SEGS(sbi) = FULL_VICTIM_SEARCH_THRESH)
+   max_search = nr_dirty; /* search all the dirty segs */
+   else {
+   /*
+* With more dirty segments, garbage blocks are likely
+* more scattered, thus search harder for better
+* victim.
+*/
+   max_search = div_u64 ((nr_dirty *
+   FULL_VICTIM_SEARCH_THRESH), TOTAL_SEGS(sbi));
+   if (max_search  MIN_VICTIM_SEARCH_GREEDY)
+   max_search = MIN_VICTIM_SEARCH_GREEDY;
+   }
+   }
+
+   /* no more than the total dirty segments */
+   if (max_search  nr_dirty)
+   max_search = nr_dirty;
+
while (1) {
   

Re: [PATCH] f2fs: optimize gc for better performance

2013-09-04 Thread Jaegeuk Kim
Hi Jin,

2013-09-04 (수), 21:17 +0800, Jin Xu:
 Hi Jaegeuk,
 
 On 04/09/2013 17:40, Jaegeuk Kim wrote:
  Hi Jin,
 
  2013-09-04 (수), 07:59 +0800, Jin Xu:
  Hi Jaegeuk,
 
  On 03/09/2013 08:45, Jaegeuk Kim wrote:
  Hi Jin,
 
  [...]
 
  It seems that we can obtain the performance gain just by setting the
  MAX_VICTIM_SEARCH to 4096, for example.
  So, how about just adding an ending criteria like below?
 
 
  I agree that we could get the performance improvement by simply
  enlarging the MAX_VICTIM_SEARCH to 4096, but I am concerning the
  scalability a little bit. Because it might always searching the whole
  bitmap in some cases, for example, when dirty segments is 4000 and
  total segments is 409600.
  [snip]
  [...]
 
  if (p-max_search  MAX_VICTIM_SEARCH)
  p-max_search = MAX_VICTIM_SEARCH;
 
 
  The optimization does not apply to SSR mode. There has a reason.
  As noticed in the test, when SSR selected the segments that have most
  garbage blocks, then when gc is needed, all the dirty segments might
  have very less garbage blocks, thus the gc overhead is high. This might
  lead to performance degradation. So the patch does not change the
  victim selection policy for SSR.
 
  I think it doesn't care.
  GC is only triggered during the direct node block allocation.
  What it means that we need to consider the number of GC triggers where
  the GC triggers more frequently during the normal data allocation than
  the node block allocation.
  So, I think it would not degrade performance significatly.
 
  BTW, could you show some numbers for this?
  Or could you test what I suggested?
 
  Thanks,
 
 
  I re-ran the test and got the following result:
 
  ---
  2GB SDHC
  create 52023 files of size 32768 bytes
  random re-write 10 records of 4KB
  ---
 
  | file creation (s) | rewrite time (s) | gc count | gc garbage blocks |
 
  no patch   341 4227 1174  174840
  patched296 2995 634   109314
  patched (KIM)  324 2958 645   106682
 
  In this test, it does not show the minor performance degradation caused
  by applying the patch to SSR mode. Instead, the performance is a little
  better with what you suggested.
 
  I agree that the performance degradation would not be significant even
  it does degrade. I ever saw the minor degradation in some workloads, but
  I didn't save the data.
 
  So, I agree that we can apply the patch to SSR mode as well.
 
  And do you still have concerns about the formula for calculating the #
  of search?
 
  Thank you for the test. :)
  What I've concerned is that, if it is really important to get a victim
  more accurately for the performance as you described, it doesn't need to
  calculate the number of searches IMO. Just let's select nr_dirty. Why
  not?
  Only the thing that we should consider is to handle the case where the
  nr_dirty is too large.
  For this, we can just limit the # of searches to avoid performance
  degradation.
 
  Still actually, I'm not convincing the effectiveness of your formula.
  If possible, could you show it with numbers?
 
 It's not easy to prove the effectiveness of the formula. It's just for
 eliminating my concern on the scalability of searching. Since it does
 not matter much for the performance improvement, we can put it aside
 and choose the simpler method as you suggested.
 
 So, should I revise the patch based on what you suggested or will
 you take care of it?

Could you make a patch with your performance description and sumbit it
again?
Thanks a lot,

-- 
Jaegeuk Kim
Samsung

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] f2fs: optimize gc for better performance

2013-09-04 Thread Jin Xu

Yes, I will submit the patch later.

On 05/09/2013 07:50, Jaegeuk Kim wrote:

Hi Jin,

2013-09-04 (수), 21:17 +0800, Jin Xu:

Hi Jaegeuk,

On 04/09/2013 17:40, Jaegeuk Kim wrote:

Hi Jin,

2013-09-04 (수), 07:59 +0800, Jin Xu:

Hi Jaegeuk,

On 03/09/2013 08:45, Jaegeuk Kim wrote:

Hi Jin,


[...]


It seems that we can obtain the performance gain just by setting the
MAX_VICTIM_SEARCH to 4096, for example.
So, how about just adding an ending criteria like below?



I agree that we could get the performance improvement by simply
enlarging the MAX_VICTIM_SEARCH to 4096, but I am concerning the
scalability a little bit. Because it might always searching the whole
bitmap in some cases, for example, when dirty segments is 4000 and
total segments is 409600.

[snip]

[...]


if (p-max_search  MAX_VICTIM_SEARCH)
p-max_search = MAX_VICTIM_SEARCH;



The optimization does not apply to SSR mode. There has a reason.
As noticed in the test, when SSR selected the segments that have most
garbage blocks, then when gc is needed, all the dirty segments might
have very less garbage blocks, thus the gc overhead is high. This might
lead to performance degradation. So the patch does not change the
victim selection policy for SSR.


I think it doesn't care.
GC is only triggered during the direct node block allocation.
What it means that we need to consider the number of GC triggers where
the GC triggers more frequently during the normal data allocation than
the node block allocation.
So, I think it would not degrade performance significatly.

BTW, could you show some numbers for this?
Or could you test what I suggested?

Thanks,



I re-ran the test and got the following result:

---
2GB SDHC
create 52023 files of size 32768 bytes
random re-write 10 records of 4KB
---

| file creation (s) | rewrite time (s) | gc count | gc garbage blocks |

no patch   341 4227 1174  174840
patched296 2995 634   109314
patched (KIM)  324 2958 645   106682

In this test, it does not show the minor performance degradation caused
by applying the patch to SSR mode. Instead, the performance is a little
better with what you suggested.

I agree that the performance degradation would not be significant even
it does degrade. I ever saw the minor degradation in some workloads, but
I didn't save the data.

So, I agree that we can apply the patch to SSR mode as well.

And do you still have concerns about the formula for calculating the #
of search?


Thank you for the test. :)
What I've concerned is that, if it is really important to get a victim
more accurately for the performance as you described, it doesn't need to
calculate the number of searches IMO. Just let's select nr_dirty. Why
not?
Only the thing that we should consider is to handle the case where the
nr_dirty is too large.
For this, we can just limit the # of searches to avoid performance
degradation.

Still actually, I'm not convincing the effectiveness of your formula.
If possible, could you show it with numbers?


It's not easy to prove the effectiveness of the formula. It's just for
eliminating my concern on the scalability of searching. Since it does
not matter much for the performance improvement, we can put it aside
and choose the simpler method as you suggested.

So, should I revise the patch based on what you suggested or will
you take care of it?


Could you make a patch with your performance description and sumbit it
again?
Thanks a lot,



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] f2fs: optimize gc for better performance

2013-09-04 Thread Jin Xu
From: Jin Xu jinuxst...@gmail.com

This patch improves the gc efficiency by optimizing the victim
selection policy. With this optimization, the random re-write
performance could increase up to 20%.

For f2fs, when disk is in shortage of free spaces, gc will selects
dirty segments and moves valid blocks around for making more space
available. The gc cost of a segment is determined by the valid blocks
in the segment. The less the valid blocks, the higher the efficiency.
The ideal victim segment is the one that has the most garbage blocks.

Currently, it searches up to 20 dirty segments for a victim segment.
The selected victim is not likely the best victim for gc when there
are much more dirty segments. Why not searching more dirty segments
for a better victim? The cost of searching dirty segments is
negligible in comparison to moving blocks.

In this patch, it enlarges the MAX_VICTIM_SEARCH to 4096 to make
the search more aggressively for a possible better victim. Since
it also applies to victim selection for SSR, it will likely improve
the SSR efficiency as well.

The test case is simple. It creates as many files until the disk full.
The size for each file is 32KB. Then it writes as many as 10
records of 4KB size to random offsets of random files in sync mode.
The testing was done on a 2GB partition of a SDHC card. Let's see the
test result of f2fs without and with the patch.

---
2GB partition, SDHC
create 52023 files of size 32768 bytes
random re-write 10 records of 4KB
---
| file creation (s) | rewrite time (s) | gc count | gc garbage blocks |
[no patch]  341 4227 1174  174840
[patched]   324 2958 645   106682

It's obvious that, with the patch, f2fs finishes the test in 20+% less
time than without the patch. And internally it does much less gc with
higher efficiency than before.

Since the performance improvement is related to gc, it might not be so
obvious for other tests that do not trigger gc as often as this one (
This is because f2fs selects dirty segments for SSR use most of the
time when free space is in shortage). The well-known iozone test tool
was not used for benchmarking the patch becuase it seems do not have
a test case that performs random re-write on a full disk.

This patch is the revised version based on the suggestion from
Jaegeuk Kim.

Signed-off-by: Jin Xu jinuxst...@gmail.com
[Jaegeuk Kim: suggested simpler solution]
Reviewed-by: Jaegeuk Kim jaegeuk@samsung.com
---
 fs/f2fs/gc.c  |8 +++-
 fs/f2fs/gc.h  |2 +-
 fs/f2fs/segment.h |1 +
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 35f9b1a..a78b8e3 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -138,12 +138,18 @@ static void select_policy(struct f2fs_sb_info *sbi, int 
gc_type,
if (p-alloc_mode == SSR) {
p-gc_mode = GC_GREEDY;
p-dirty_segmap = dirty_i-dirty_segmap[type];
+   p-max_search = dirty_i-nr_dirty[type];
p-ofs_unit = 1;
} else {
p-gc_mode = select_gc_type(gc_type);
p-dirty_segmap = dirty_i-dirty_segmap[DIRTY];
+   p-max_search = dirty_i-nr_dirty[DIRTY];
p-ofs_unit = sbi-segs_per_sec;
}
+
+   if (p-max_search  MAX_VICTIM_SEARCH)
+   p-max_search = MAX_VICTIM_SEARCH;
+
p-offset = sbi-last_victim[p-gc_mode];
 }
 
@@ -290,7 +296,7 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
if (cost == max_cost)
continue;
 
-   if (nsearched++ = MAX_VICTIM_SEARCH) {
+   if (nsearched++ = p.max_search) {
sbi-last_victim[p.gc_mode] = segno;
break;
}
diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h
index 2c6a6bd..f8ee8a8 100644
--- a/fs/f2fs/gc.h
+++ b/fs/f2fs/gc.h
@@ -20,7 +20,7 @@
 #define LIMIT_FREE_BLOCK   40 /* percentage over invalid + free space */
 
 /* Search max. number of dirty segments to select a victim segment */
-#define MAX_VICTIM_SEARCH  20
+#define MAX_VICTIM_SEARCH 4096 /* covers 8GB */
 
 struct f2fs_gc_kthread {
struct task_struct *f2fs_gc_task;
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index 062424a..dca6379 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -142,6 +142,7 @@ struct victim_sel_policy {
int alloc_mode; /* LFS or SSR */
int gc_mode;/* GC_CB or GC_GREEDY */
unsigned long *dirty_segmap;/* dirty segment bitmap */
+   unsigned int max_search;/* maximum # of segments to search */
unsigned int offset;/* last scanned bitmap offset */
unsigned int ofs_unit;  /* bitmap search unit */
unsigned int min_cost;  /* minimum cost */
-- 
1.7.9.5

--
To 

Re: [PATCH] f2fs: optimize gc for better performance

2013-09-04 Thread Jaegeuk Kim
Merged.
Thank you,

2013-09-05 (목), 12:45 +0800, Jin Xu:
 From: Jin Xu jinuxst...@gmail.com
 
 This patch improves the gc efficiency by optimizing the victim
 selection policy. With this optimization, the random re-write
 performance could increase up to 20%.
 
 For f2fs, when disk is in shortage of free spaces, gc will selects
 dirty segments and moves valid blocks around for making more space
 available. The gc cost of a segment is determined by the valid blocks
 in the segment. The less the valid blocks, the higher the efficiency.
 The ideal victim segment is the one that has the most garbage blocks.
 
 Currently, it searches up to 20 dirty segments for a victim segment.
 The selected victim is not likely the best victim for gc when there
 are much more dirty segments. Why not searching more dirty segments
 for a better victim? The cost of searching dirty segments is
 negligible in comparison to moving blocks.
 
 In this patch, it enlarges the MAX_VICTIM_SEARCH to 4096 to make
 the search more aggressively for a possible better victim. Since
 it also applies to victim selection for SSR, it will likely improve
 the SSR efficiency as well.
 
 The test case is simple. It creates as many files until the disk full.
 The size for each file is 32KB. Then it writes as many as 10
 records of 4KB size to random offsets of random files in sync mode.
 The testing was done on a 2GB partition of a SDHC card. Let's see the
 test result of f2fs without and with the patch.
 
 ---
 2GB partition, SDHC
 create 52023 files of size 32768 bytes
 random re-write 10 records of 4KB
 ---
 | file creation (s) | rewrite time (s) | gc count | gc garbage blocks |
 [no patch]  341 4227 1174  174840
 [patched]   324 2958 645   106682
 
 It's obvious that, with the patch, f2fs finishes the test in 20+% less
 time than without the patch. And internally it does much less gc with
 higher efficiency than before.
 
 Since the performance improvement is related to gc, it might not be so
 obvious for other tests that do not trigger gc as often as this one (
 This is because f2fs selects dirty segments for SSR use most of the
 time when free space is in shortage). The well-known iozone test tool
 was not used for benchmarking the patch becuase it seems do not have
 a test case that performs random re-write on a full disk.
 
 This patch is the revised version based on the suggestion from
 Jaegeuk Kim.
 
 Signed-off-by: Jin Xu jinuxst...@gmail.com
 [Jaegeuk Kim: suggested simpler solution]
 Reviewed-by: Jaegeuk Kim jaegeuk@samsung.com
 ---
  fs/f2fs/gc.c  |8 +++-
  fs/f2fs/gc.h  |2 +-
  fs/f2fs/segment.h |1 +
  3 files changed, 9 insertions(+), 2 deletions(-)
 
 diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
 index 35f9b1a..a78b8e3 100644
 --- a/fs/f2fs/gc.c
 +++ b/fs/f2fs/gc.c
 @@ -138,12 +138,18 @@ static void select_policy(struct f2fs_sb_info *sbi, int 
 gc_type,
   if (p-alloc_mode == SSR) {
   p-gc_mode = GC_GREEDY;
   p-dirty_segmap = dirty_i-dirty_segmap[type];
 + p-max_search = dirty_i-nr_dirty[type];
   p-ofs_unit = 1;
   } else {
   p-gc_mode = select_gc_type(gc_type);
   p-dirty_segmap = dirty_i-dirty_segmap[DIRTY];
 + p-max_search = dirty_i-nr_dirty[DIRTY];
   p-ofs_unit = sbi-segs_per_sec;
   }
 +
 + if (p-max_search  MAX_VICTIM_SEARCH)
 + p-max_search = MAX_VICTIM_SEARCH;
 +
   p-offset = sbi-last_victim[p-gc_mode];
  }
  
 @@ -290,7 +296,7 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
   if (cost == max_cost)
   continue;
  
 - if (nsearched++ = MAX_VICTIM_SEARCH) {
 + if (nsearched++ = p.max_search) {
   sbi-last_victim[p.gc_mode] = segno;
   break;
   }
 diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h
 index 2c6a6bd..f8ee8a8 100644
 --- a/fs/f2fs/gc.h
 +++ b/fs/f2fs/gc.h
 @@ -20,7 +20,7 @@
  #define LIMIT_FREE_BLOCK 40 /* percentage over invalid + free space */
  
  /* Search max. number of dirty segments to select a victim segment */
 -#define MAX_VICTIM_SEARCH20
 +#define MAX_VICTIM_SEARCH 4096 /* covers 8GB */
  
  struct f2fs_gc_kthread {
   struct task_struct *f2fs_gc_task;
 diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
 index 062424a..dca6379 100644
 --- a/fs/f2fs/segment.h
 +++ b/fs/f2fs/segment.h
 @@ -142,6 +142,7 @@ struct victim_sel_policy {
   int alloc_mode; /* LFS or SSR */
   int gc_mode;/* GC_CB or GC_GREEDY */
   unsigned long *dirty_segmap;/* dirty segment bitmap */
 + unsigned int max_search;/* maximum # of segments to search */
   unsigned int offset;/* last scanned bitmap offset */
   unsigned int ofs_unit;

Re: [PATCH] f2fs: optimize gc for better performance

2013-09-03 Thread Jin Xu

Hi Jaegeuk,

On 03/09/2013 08:45, Jaegeuk Kim wrote:

Hi Jin,


[...]


It seems that we can obtain the performance gain just by setting the
MAX_VICTIM_SEARCH to 4096, for example.
So, how about just adding an ending criteria like below?



I agree that we could get the performance improvement by simply
enlarging the MAX_VICTIM_SEARCH to 4096, but I am concerning the
scalability a little bit. Because it might always searching the whole
bitmap in some cases, for example, when dirty segments is 4000 and
total segments is 409600.

[snip]

[...]


if (p->max_search > MAX_VICTIM_SEARCH)
p->max_search = MAX_VICTIM_SEARCH;



The optimization does not apply to SSR mode. There has a reason.
As noticed in the test, when SSR selected the segments that have most
garbage blocks, then when gc is needed, all the dirty segments might
have very less garbage blocks, thus the gc overhead is high. This might
lead to performance degradation. So the patch does not change the
victim selection policy for SSR.


I think it doesn't care.
GC is only triggered during the direct node block allocation.
What it means that we need to consider the number of GC triggers where
the GC triggers more frequently during the normal data allocation than
the node block allocation.
So, I think it would not degrade performance significatly.

BTW, could you show some numbers for this?
Or could you test what I suggested?

Thanks,



I re-ran the test and got the following result:

---
2GB SDHC
create 52023 files of size 32768 bytes
random re-write 10 records of 4KB
---

| file creation (s) | rewrite time (s) | gc count | gc garbage blocks |

no patch   341 4227 1174  174840
patched296 2995 634   109314
patched (KIM)  324 2958 645   106682

In this test, it does not show the minor performance degradation caused
by applying the patch to SSR mode. Instead, the performance is a little 
better with what you suggested.


I agree that the performance degradation would not be significant even
it does degrade. I ever saw the minor degradation in some workloads, but
I didn't save the data.

So, I agree that we can apply the patch to SSR mode as well.

And do you still have concerns about the formula for calculating the #
of search?



What do you think now?


#define MAX_VICTIM_SEARCH 4096 /* covers 8GB */


p->offset = sbi->last_victim[p->gc_mode];
@@ -243,6 +245,8 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
struct victim_sel_policy p;
unsigned int secno, max_cost;
int nsearched = 0;
+   unsigned int max_search = MAX_VICTIM_SEARCH;
+   unsigned int nr_dirty;

p.alloc_mode = alloc_mode;
select_policy(sbi, gc_type, type, );
@@ -258,6 +262,27 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
goto got_it;
}

+   nr_dirty = dirty_i->nr_dirty[p.dirty_type];
+   if (p.gc_mode == GC_GREEDY && p.alloc_mode != SSR) {
+   if (TOTAL_SEGS(sbi) <= FULL_VICTIM_SEARCH_THRESH)
+   max_search = nr_dirty; /* search all the dirty segs */
+   else {
+   /*
+* With more dirty segments, garbage blocks are likely
+* more scattered, thus search harder for better
+* victim.
+*/
+   max_search = div_u64 ((nr_dirty *
+   FULL_VICTIM_SEARCH_THRESH), TOTAL_SEGS(sbi));
+   if (max_search < MIN_VICTIM_SEARCH_GREEDY)
+   max_search = MIN_VICTIM_SEARCH_GREEDY;
+   }
+   }
+
+   /* no more than the total dirty segments */
+   if (max_search > nr_dirty)
+   max_search = nr_dirty;
+
while (1) {
unsigned long cost;
unsigned int segno;
@@ -290,7 +315,7 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
if (cost == max_cost)
continue;

-   if (nsearched++ >= MAX_VICTIM_SEARCH) {
+   if (nsearched++ >= max_search) {


if (nsearched++ >= p.max_search) {


sbi->last_victim[p.gc_mode] = segno;
break;
}
diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h
index 2c6a6bd..2f525aa 100644
--- a/fs/f2fs/gc.h
+++ b/fs/f2fs/gc.h
@@ -20,7 +20,9 @@
   #define LIMIT_FREE_BLOCK 40 /* percentage over invalid + free space */

   /* Search max. number of dirty segments to select a victim segment */
-#define MAX_VICTIM_SEARCH  20
+#define MAX_VICTIM_SEARCH  20
+#define MIN_VICTIM_SEARCH_GREEDY   20
+#define FULL_VICTIM_SEARCH_THRESH  4096

   struct f2fs_gc_kthread {
struct task_struct *f2fs_gc_task;

Re: [PATCH] f2fs: optimize gc for better performance

2013-09-03 Thread Jin Xu

Hi Jaegeuk,

On 03/09/2013 08:45, Jaegeuk Kim wrote:

Hi Jin,


[...]


It seems that we can obtain the performance gain just by setting the
MAX_VICTIM_SEARCH to 4096, for example.
So, how about just adding an ending criteria like below?



I agree that we could get the performance improvement by simply
enlarging the MAX_VICTIM_SEARCH to 4096, but I am concerning the
scalability a little bit. Because it might always searching the whole
bitmap in some cases, for example, when dirty segments is 4000 and
total segments is 409600.

[snip]

[...]


if (p-max_search  MAX_VICTIM_SEARCH)
p-max_search = MAX_VICTIM_SEARCH;



The optimization does not apply to SSR mode. There has a reason.
As noticed in the test, when SSR selected the segments that have most
garbage blocks, then when gc is needed, all the dirty segments might
have very less garbage blocks, thus the gc overhead is high. This might
lead to performance degradation. So the patch does not change the
victim selection policy for SSR.


I think it doesn't care.
GC is only triggered during the direct node block allocation.
What it means that we need to consider the number of GC triggers where
the GC triggers more frequently during the normal data allocation than
the node block allocation.
So, I think it would not degrade performance significatly.

BTW, could you show some numbers for this?
Or could you test what I suggested?

Thanks,



I re-ran the test and got the following result:

---
2GB SDHC
create 52023 files of size 32768 bytes
random re-write 10 records of 4KB
---

| file creation (s) | rewrite time (s) | gc count | gc garbage blocks |

no patch   341 4227 1174  174840
patched296 2995 634   109314
patched (KIM)  324 2958 645   106682

In this test, it does not show the minor performance degradation caused
by applying the patch to SSR mode. Instead, the performance is a little 
better with what you suggested.


I agree that the performance degradation would not be significant even
it does degrade. I ever saw the minor degradation in some workloads, but
I didn't save the data.

So, I agree that we can apply the patch to SSR mode as well.

And do you still have concerns about the formula for calculating the #
of search?



What do you think now?


#define MAX_VICTIM_SEARCH 4096 /* covers 8GB */


p-offset = sbi-last_victim[p-gc_mode];
@@ -243,6 +245,8 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
struct victim_sel_policy p;
unsigned int secno, max_cost;
int nsearched = 0;
+   unsigned int max_search = MAX_VICTIM_SEARCH;
+   unsigned int nr_dirty;

p.alloc_mode = alloc_mode;
select_policy(sbi, gc_type, type, p);
@@ -258,6 +262,27 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
goto got_it;
}

+   nr_dirty = dirty_i-nr_dirty[p.dirty_type];
+   if (p.gc_mode == GC_GREEDY  p.alloc_mode != SSR) {
+   if (TOTAL_SEGS(sbi) = FULL_VICTIM_SEARCH_THRESH)
+   max_search = nr_dirty; /* search all the dirty segs */
+   else {
+   /*
+* With more dirty segments, garbage blocks are likely
+* more scattered, thus search harder for better
+* victim.
+*/
+   max_search = div_u64 ((nr_dirty *
+   FULL_VICTIM_SEARCH_THRESH), TOTAL_SEGS(sbi));
+   if (max_search  MIN_VICTIM_SEARCH_GREEDY)
+   max_search = MIN_VICTIM_SEARCH_GREEDY;
+   }
+   }
+
+   /* no more than the total dirty segments */
+   if (max_search  nr_dirty)
+   max_search = nr_dirty;
+
while (1) {
unsigned long cost;
unsigned int segno;
@@ -290,7 +315,7 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
if (cost == max_cost)
continue;

-   if (nsearched++ = MAX_VICTIM_SEARCH) {
+   if (nsearched++ = max_search) {


if (nsearched++ = p.max_search) {


sbi-last_victim[p.gc_mode] = segno;
break;
}
diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h
index 2c6a6bd..2f525aa 100644
--- a/fs/f2fs/gc.h
+++ b/fs/f2fs/gc.h
@@ -20,7 +20,9 @@
   #define LIMIT_FREE_BLOCK 40 /* percentage over invalid + free space */

   /* Search max. number of dirty segments to select a victim segment */
-#define MAX_VICTIM_SEARCH  20
+#define MAX_VICTIM_SEARCH  20
+#define MIN_VICTIM_SEARCH_GREEDY   20
+#define FULL_VICTIM_SEARCH_THRESH  4096

   struct f2fs_gc_kthread {
struct task_struct *f2fs_gc_task;
diff --git 

Re: [PATCH] f2fs: optimize gc for better performance

2013-09-02 Thread Jaegeuk Kim
Hi Jin,

> [...]
> >
> > It seems that we can obtain the performance gain just by setting the
> > MAX_VICTIM_SEARCH to 4096, for example.
> > So, how about just adding an ending criteria like below?
> >
> 
> I agree that we could get the performance improvement by simply
> enlarging the MAX_VICTIM_SEARCH to 4096, but I am concerning the
> scalability a little bit. Because it might always searching the whole
> bitmap in some cases, for example, when dirty segments is 4000 and
> total segments is 409600.
> > [snip]
> [...]
> >
> > if (p->max_search > MAX_VICTIM_SEARCH)
> > p->max_search = MAX_VICTIM_SEARCH;
> >
> 
> The optimization does not apply to SSR mode. There has a reason.
> As noticed in the test, when SSR selected the segments that have most
> garbage blocks, then when gc is needed, all the dirty segments might
> have very less garbage blocks, thus the gc overhead is high. This might
> lead to performance degradation. So the patch does not change the
> victim selection policy for SSR.

I think it doesn't care.
GC is only triggered during the direct node block allocation.
What it means that we need to consider the number of GC triggers where
the GC triggers more frequently during the normal data allocation than
the node block allocation.
So, I think it would not degrade performance significatly.

BTW, could you show some numbers for this?
Or could you test what I suggested?

Thanks,

> 
> What do you think now?
> 
> > #define MAX_VICTIM_SEARCH 4096 /* covers 8GB */
> >
> >>p->offset = sbi->last_victim[p->gc_mode];
> >> @@ -243,6 +245,8 @@ static int get_victim_by_default(struct f2fs_sb_info 
> >> *sbi,
> >>struct victim_sel_policy p;
> >>unsigned int secno, max_cost;
> >>int nsearched = 0;
> >> +  unsigned int max_search = MAX_VICTIM_SEARCH;
> >> +  unsigned int nr_dirty;
> >>
> >>p.alloc_mode = alloc_mode;
> >>select_policy(sbi, gc_type, type, );
> >> @@ -258,6 +262,27 @@ static int get_victim_by_default(struct f2fs_sb_info 
> >> *sbi,
> >>goto got_it;
> >>}
> >>
> >> +  nr_dirty = dirty_i->nr_dirty[p.dirty_type];
> >> +  if (p.gc_mode == GC_GREEDY && p.alloc_mode != SSR) {
> >> +  if (TOTAL_SEGS(sbi) <= FULL_VICTIM_SEARCH_THRESH)
> >> +  max_search = nr_dirty; /* search all the dirty segs */
> >> +  else {
> >> +  /*
> >> +   * With more dirty segments, garbage blocks are likely
> >> +   * more scattered, thus search harder for better
> >> +   * victim.
> >> +   */
> >> +  max_search = div_u64 ((nr_dirty *
> >> +  FULL_VICTIM_SEARCH_THRESH), TOTAL_SEGS(sbi));
> >> +  if (max_search < MIN_VICTIM_SEARCH_GREEDY)
> >> +  max_search = MIN_VICTIM_SEARCH_GREEDY;
> >> +  }
> >> +  }
> >> +
> >> +  /* no more than the total dirty segments */
> >> +  if (max_search > nr_dirty)
> >> +  max_search = nr_dirty;
> >> +
> >>while (1) {
> >>unsigned long cost;
> >>unsigned int segno;
> >> @@ -290,7 +315,7 @@ static int get_victim_by_default(struct f2fs_sb_info 
> >> *sbi,
> >>if (cost == max_cost)
> >>continue;
> >>
> >> -  if (nsearched++ >= MAX_VICTIM_SEARCH) {
> >> +  if (nsearched++ >= max_search) {
> >
> > if (nsearched++ >= p.max_search) {
> >
> >>sbi->last_victim[p.gc_mode] = segno;
> >>break;
> >>}
> >> diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h
> >> index 2c6a6bd..2f525aa 100644
> >> --- a/fs/f2fs/gc.h
> >> +++ b/fs/f2fs/gc.h
> >> @@ -20,7 +20,9 @@
> >>   #define LIMIT_FREE_BLOCK 40 /* percentage over invalid + free space */
> >>
> >>   /* Search max. number of dirty segments to select a victim segment */
> >> -#define MAX_VICTIM_SEARCH 20
> >> +#define MAX_VICTIM_SEARCH 20
> >> +#define MIN_VICTIM_SEARCH_GREEDY  20
> >> +#define FULL_VICTIM_SEARCH_THRESH 4096
> >>
> >>   struct f2fs_gc_kthread {
> >>struct task_struct *f2fs_gc_task;
> >> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
> >> index 062424a..cd33f96 100644
> >> --- a/fs/f2fs/segment.h
> >> +++ b/fs/f2fs/segment.h
> >> @@ -142,6 +142,7 @@ struct victim_sel_policy {
> >>int alloc_mode; /* LFS or SSR */
> >>int gc_mode;/* GC_CB or GC_GREEDY */
> >>unsigned long *dirty_segmap;/* dirty segment bitmap */
> >> +  int dirty_type;
> >
> > int max_search; /* maximum # of segments to search */
> >
> >>unsigned int offset;/* last scanned bitmap offset */
> >>unsigned int ofs_unit;  /* bitmap search unit */
> >>unsigned int min_cost;  /* minimum cost */
> >
> 

-- 
Jaegeuk Kim
Samsung

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [PATCH] f2fs: optimize gc for better performance

2013-09-02 Thread Jaegeuk Kim
Hi Jin,

 [...]
 
  It seems that we can obtain the performance gain just by setting the
  MAX_VICTIM_SEARCH to 4096, for example.
  So, how about just adding an ending criteria like below?
 
 
 I agree that we could get the performance improvement by simply
 enlarging the MAX_VICTIM_SEARCH to 4096, but I am concerning the
 scalability a little bit. Because it might always searching the whole
 bitmap in some cases, for example, when dirty segments is 4000 and
 total segments is 409600.
  [snip]
 [...]
 
  if (p-max_search  MAX_VICTIM_SEARCH)
  p-max_search = MAX_VICTIM_SEARCH;
 
 
 The optimization does not apply to SSR mode. There has a reason.
 As noticed in the test, when SSR selected the segments that have most
 garbage blocks, then when gc is needed, all the dirty segments might
 have very less garbage blocks, thus the gc overhead is high. This might
 lead to performance degradation. So the patch does not change the
 victim selection policy for SSR.

I think it doesn't care.
GC is only triggered during the direct node block allocation.
What it means that we need to consider the number of GC triggers where
the GC triggers more frequently during the normal data allocation than
the node block allocation.
So, I think it would not degrade performance significatly.

BTW, could you show some numbers for this?
Or could you test what I suggested?

Thanks,

 
 What do you think now?
 
  #define MAX_VICTIM_SEARCH 4096 /* covers 8GB */
 
 p-offset = sbi-last_victim[p-gc_mode];
  @@ -243,6 +245,8 @@ static int get_victim_by_default(struct f2fs_sb_info 
  *sbi,
 struct victim_sel_policy p;
 unsigned int secno, max_cost;
 int nsearched = 0;
  +  unsigned int max_search = MAX_VICTIM_SEARCH;
  +  unsigned int nr_dirty;
 
 p.alloc_mode = alloc_mode;
 select_policy(sbi, gc_type, type, p);
  @@ -258,6 +262,27 @@ static int get_victim_by_default(struct f2fs_sb_info 
  *sbi,
 goto got_it;
 }
 
  +  nr_dirty = dirty_i-nr_dirty[p.dirty_type];
  +  if (p.gc_mode == GC_GREEDY  p.alloc_mode != SSR) {
  +  if (TOTAL_SEGS(sbi) = FULL_VICTIM_SEARCH_THRESH)
  +  max_search = nr_dirty; /* search all the dirty segs */
  +  else {
  +  /*
  +   * With more dirty segments, garbage blocks are likely
  +   * more scattered, thus search harder for better
  +   * victim.
  +   */
  +  max_search = div_u64 ((nr_dirty *
  +  FULL_VICTIM_SEARCH_THRESH), TOTAL_SEGS(sbi));
  +  if (max_search  MIN_VICTIM_SEARCH_GREEDY)
  +  max_search = MIN_VICTIM_SEARCH_GREEDY;
  +  }
  +  }
  +
  +  /* no more than the total dirty segments */
  +  if (max_search  nr_dirty)
  +  max_search = nr_dirty;
  +
 while (1) {
 unsigned long cost;
 unsigned int segno;
  @@ -290,7 +315,7 @@ static int get_victim_by_default(struct f2fs_sb_info 
  *sbi,
 if (cost == max_cost)
 continue;
 
  -  if (nsearched++ = MAX_VICTIM_SEARCH) {
  +  if (nsearched++ = max_search) {
 
  if (nsearched++ = p.max_search) {
 
 sbi-last_victim[p.gc_mode] = segno;
 break;
 }
  diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h
  index 2c6a6bd..2f525aa 100644
  --- a/fs/f2fs/gc.h
  +++ b/fs/f2fs/gc.h
  @@ -20,7 +20,9 @@
#define LIMIT_FREE_BLOCK 40 /* percentage over invalid + free space */
 
/* Search max. number of dirty segments to select a victim segment */
  -#define MAX_VICTIM_SEARCH 20
  +#define MAX_VICTIM_SEARCH 20
  +#define MIN_VICTIM_SEARCH_GREEDY  20
  +#define FULL_VICTIM_SEARCH_THRESH 4096
 
struct f2fs_gc_kthread {
 struct task_struct *f2fs_gc_task;
  diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
  index 062424a..cd33f96 100644
  --- a/fs/f2fs/segment.h
  +++ b/fs/f2fs/segment.h
  @@ -142,6 +142,7 @@ struct victim_sel_policy {
 int alloc_mode; /* LFS or SSR */
 int gc_mode;/* GC_CB or GC_GREEDY */
 unsigned long *dirty_segmap;/* dirty segment bitmap */
  +  int dirty_type;
 
  int max_search; /* maximum # of segments to search */
 
 unsigned int offset;/* last scanned bitmap offset */
 unsigned int ofs_unit;  /* bitmap search unit */
 unsigned int min_cost;  /* minimum cost */
 
 

-- 
Jaegeuk Kim
Samsung

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] f2fs: optimize gc for better performance

2013-08-29 Thread Jin Xu

Hi Jaegeuk,

On 08/29/2013 07:56 PM, Jaegeuk Kim wrote:

Hi,

2013-08-29 (목), 08:48 +0800, Jin Xu:

From: Jin Xu 

This patch improves the foreground gc efficiency by optimizing the
victim selection policy. With this optimization, the random re-write
performance could increase up to 20%.


[...]


In this patch, it does not search a constant number of dirty segments
anymore, instead it calculates the number based on the total segments,
dirty segments and a threshold. Following is the pseudo formula.
,-- nr_dirty_segments, if total_segments < threshold
(# of search) = |
`-- (nr_dirty_segments * threshold) / total_segments,
 Otherwise


Nice catch, but,
I don't understand why we search the # of segments in proportion to the
# of dirty segments.
How about the case where threshold = 10 and total_segments = 10?
Or threshold = 100 and total_segments = 100?
For this, we need to define additional MIN/MAX thresholds and another
handling codes as your proposal.



Firstly, calculating the # of search this way could constraint the
searching overhead in a certain level even when the total segments is
too many.
Secondly, when there are more dirty segments, the same # of garbage
blocks might be more  scattered than when there are less dirty segments.
So we search the # of the segments in proportion to the # of dirty
segments.

[...]


It seems that we can obtain the performance gain just by setting the
MAX_VICTIM_SEARCH to 4096, for example.
So, how about just adding an ending criteria like below?



I agree that we could get the performance improvement by simply
enlarging the MAX_VICTIM_SEARCH to 4096, but I am concerning the
scalability a little bit. Because it might always searching the whole
bitmap in some cases, for example, when dirty segments is 4000 and
total segments is 409600.


[snip]

[...]


if (p->max_search > MAX_VICTIM_SEARCH)
p->max_search = MAX_VICTIM_SEARCH;



The optimization does not apply to SSR mode. There has a reason.
As noticed in the test, when SSR selected the segments that have most
garbage blocks, then when gc is needed, all the dirty segments might
have very less garbage blocks, thus the gc overhead is high. This might
lead to performance degradation. So the patch does not change the
victim selection policy for SSR.

What do you think now?


#define MAX_VICTIM_SEARCH 4096 /* covers 8GB */


p->offset = sbi->last_victim[p->gc_mode];
@@ -243,6 +245,8 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
struct victim_sel_policy p;
unsigned int secno, max_cost;
int nsearched = 0;
+   unsigned int max_search = MAX_VICTIM_SEARCH;
+   unsigned int nr_dirty;

p.alloc_mode = alloc_mode;
select_policy(sbi, gc_type, type, );
@@ -258,6 +262,27 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
goto got_it;
}

+   nr_dirty = dirty_i->nr_dirty[p.dirty_type];
+   if (p.gc_mode == GC_GREEDY && p.alloc_mode != SSR) {
+   if (TOTAL_SEGS(sbi) <= FULL_VICTIM_SEARCH_THRESH)
+   max_search = nr_dirty; /* search all the dirty segs */
+   else {
+   /*
+* With more dirty segments, garbage blocks are likely
+* more scattered, thus search harder for better
+* victim.
+*/
+   max_search = div_u64 ((nr_dirty *
+   FULL_VICTIM_SEARCH_THRESH), TOTAL_SEGS(sbi));
+   if (max_search < MIN_VICTIM_SEARCH_GREEDY)
+   max_search = MIN_VICTIM_SEARCH_GREEDY;
+   }
+   }
+
+   /* no more than the total dirty segments */
+   if (max_search > nr_dirty)
+   max_search = nr_dirty;
+
while (1) {
unsigned long cost;
unsigned int segno;
@@ -290,7 +315,7 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
if (cost == max_cost)
continue;

-   if (nsearched++ >= MAX_VICTIM_SEARCH) {
+   if (nsearched++ >= max_search) {


if (nsearched++ >= p.max_search) {


sbi->last_victim[p.gc_mode] = segno;
break;
}
diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h
index 2c6a6bd..2f525aa 100644
--- a/fs/f2fs/gc.h
+++ b/fs/f2fs/gc.h
@@ -20,7 +20,9 @@
  #define LIMIT_FREE_BLOCK  40 /* percentage over invalid + free space */

  /* Search max. number of dirty segments to select a victim segment */
-#define MAX_VICTIM_SEARCH  20
+#define MAX_VICTIM_SEARCH  20
+#define MIN_VICTIM_SEARCH_GREEDY   20
+#define FULL_VICTIM_SEARCH_THRESH  4096

  struct f2fs_gc_kthread {
struct task_struct *f2fs_gc_task;
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h

Re: [PATCH] f2fs: optimize gc for better performance

2013-08-29 Thread Jaegeuk Kim
Hi,

2013-08-29 (목), 08:48 +0800, Jin Xu:
> From: Jin Xu 
> 
> This patch improves the foreground gc efficiency by optimizing the
> victim selection policy. With this optimization, the random re-write
> performance could increase up to 20%.
> 
> For f2fs, foreground gc happens when disk is lack of free spaces,
> it selects dirty segments and moves valid blocks around for making
> more space available. The gc cost of a segment is determined by the
> valid blocks in the segment. The less the valid blocks, the higher
> the efficiency. The ideal victim segment is the one that has the
> most garbage blocks.
> 
> Currently, it searches up to 20 dirty segments for a victim segment.
> The selected victim is not likely the best victim for gc when there
> are much more dirty segments. Why not searching more dirty segments
> for a better victim? The cost of searching dirty segments is
> negligible in comparison to moving blocks.
> 
> In this patch, it does not search a constant number of dirty segments
> anymore, instead it calculates the number based on the total segments,
> dirty segments and a threshold. Following is the pseudo formula.
>   ,-- nr_dirty_segments, if total_segments < threshold
> (# of search) = |
>   `-- (nr_dirty_segments * threshold) / total_segments,
> Otherwise

Nice catch, but,
I don't understand why we search the # of segments in proportion to the
# of dirty segments.
How about the case where threshold = 10 and total_segments = 10?
Or threshold = 100 and total_segments = 100?
For this, we need to define additional MIN/MAX thresholds and another
handling codes as your proposal.

> 
> The test case is simple. It creates as many files until the disk full.
> The size for each file is 32KB. Then it writes as many as 10
> records of 4KB size to random offsets of random files in sync mode.
> The testing was done on a 2GB partition of a SDHC card. Let's see the
> test result of f2fs without and with the patch.

It seems that we can obtain the performance gain just by setting the
MAX_VICTIM_SEARCH to 4096, for example.
So, how about just adding an ending criteria like below?

[snip]

> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index 35f9b1a..4e045e6 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -138,10 +138,12 @@ static void select_policy(struct f2fs_sb_info *sbi, int 
> gc_type,
>   if (p->alloc_mode == SSR) {
>   p->gc_mode = GC_GREEDY;
>   p->dirty_segmap = dirty_i->dirty_segmap[type];
> + p->dirty_type = type;

p->max_search = dirty_i->nr_dirty[type];

>   p->ofs_unit = 1;
>   } else {
>   p->gc_mode = select_gc_type(gc_type);
>   p->dirty_segmap = dirty_i->dirty_segmap[DIRTY];
> + p->dirty_type = DIRTY;

p->max_search = dirty_i->nr_dirty[DIRTY];

>   p->ofs_unit = sbi->segs_per_sec;
>   }

if (p->max_search > MAX_VICTIM_SEARCH)
p->max_search = MAX_VICTIM_SEARCH;

#define MAX_VICTIM_SEARCH 4096 /* covers 8GB */

>   p->offset = sbi->last_victim[p->gc_mode];
> @@ -243,6 +245,8 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
>   struct victim_sel_policy p;
>   unsigned int secno, max_cost;
>   int nsearched = 0;
> + unsigned int max_search = MAX_VICTIM_SEARCH;
> + unsigned int nr_dirty;
>  
>   p.alloc_mode = alloc_mode;
>   select_policy(sbi, gc_type, type, );
> @@ -258,6 +262,27 @@ static int get_victim_by_default(struct f2fs_sb_info 
> *sbi,
>   goto got_it;
>   }
>  
> + nr_dirty = dirty_i->nr_dirty[p.dirty_type];
> + if (p.gc_mode == GC_GREEDY && p.alloc_mode != SSR) {
> + if (TOTAL_SEGS(sbi) <= FULL_VICTIM_SEARCH_THRESH)
> + max_search = nr_dirty; /* search all the dirty segs */
> + else {
> + /*
> +  * With more dirty segments, garbage blocks are likely
> +  * more scattered, thus search harder for better
> +  * victim.
> +  */
> + max_search = div_u64 ((nr_dirty *
> + FULL_VICTIM_SEARCH_THRESH), TOTAL_SEGS(sbi));
> + if (max_search < MIN_VICTIM_SEARCH_GREEDY)
> + max_search = MIN_VICTIM_SEARCH_GREEDY;
> + }
> + }
> +
> + /* no more than the total dirty segments */
> + if (max_search > nr_dirty)
> + max_search = nr_dirty;
> +
>   while (1) {
>   unsigned long cost;
>   unsigned int segno;
> @@ -290,7 +315,7 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
>   if (cost == max_cost)
>   continue;
>  
> - if (nsearched++ >= MAX_VICTIM_SEARCH) {
> + if (nsearched++ >= max_search) {

if (nsearched++ >= 

Re: [PATCH] f2fs: optimize gc for better performance

2013-08-29 Thread Jaegeuk Kim
Hi,

2013-08-29 (목), 08:48 +0800, Jin Xu:
 From: Jin Xu jinuxst...@gmail.com
 
 This patch improves the foreground gc efficiency by optimizing the
 victim selection policy. With this optimization, the random re-write
 performance could increase up to 20%.
 
 For f2fs, foreground gc happens when disk is lack of free spaces,
 it selects dirty segments and moves valid blocks around for making
 more space available. The gc cost of a segment is determined by the
 valid blocks in the segment. The less the valid blocks, the higher
 the efficiency. The ideal victim segment is the one that has the
 most garbage blocks.
 
 Currently, it searches up to 20 dirty segments for a victim segment.
 The selected victim is not likely the best victim for gc when there
 are much more dirty segments. Why not searching more dirty segments
 for a better victim? The cost of searching dirty segments is
 negligible in comparison to moving blocks.
 
 In this patch, it does not search a constant number of dirty segments
 anymore, instead it calculates the number based on the total segments,
 dirty segments and a threshold. Following is the pseudo formula.
   ,-- nr_dirty_segments, if total_segments  threshold
 (# of search) = |
   `-- (nr_dirty_segments * threshold) / total_segments,
 Otherwise

Nice catch, but,
I don't understand why we search the # of segments in proportion to the
# of dirty segments.
How about the case where threshold = 10 and total_segments = 10?
Or threshold = 100 and total_segments = 100?
For this, we need to define additional MIN/MAX thresholds and another
handling codes as your proposal.

 
 The test case is simple. It creates as many files until the disk full.
 The size for each file is 32KB. Then it writes as many as 10
 records of 4KB size to random offsets of random files in sync mode.
 The testing was done on a 2GB partition of a SDHC card. Let's see the
 test result of f2fs without and with the patch.

It seems that we can obtain the performance gain just by setting the
MAX_VICTIM_SEARCH to 4096, for example.
So, how about just adding an ending criteria like below?

[snip]

 diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
 index 35f9b1a..4e045e6 100644
 --- a/fs/f2fs/gc.c
 +++ b/fs/f2fs/gc.c
 @@ -138,10 +138,12 @@ static void select_policy(struct f2fs_sb_info *sbi, int 
 gc_type,
   if (p-alloc_mode == SSR) {
   p-gc_mode = GC_GREEDY;
   p-dirty_segmap = dirty_i-dirty_segmap[type];
 + p-dirty_type = type;

p-max_search = dirty_i-nr_dirty[type];

   p-ofs_unit = 1;
   } else {
   p-gc_mode = select_gc_type(gc_type);
   p-dirty_segmap = dirty_i-dirty_segmap[DIRTY];
 + p-dirty_type = DIRTY;

p-max_search = dirty_i-nr_dirty[DIRTY];

   p-ofs_unit = sbi-segs_per_sec;
   }

if (p-max_search  MAX_VICTIM_SEARCH)
p-max_search = MAX_VICTIM_SEARCH;

#define MAX_VICTIM_SEARCH 4096 /* covers 8GB */

   p-offset = sbi-last_victim[p-gc_mode];
 @@ -243,6 +245,8 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
   struct victim_sel_policy p;
   unsigned int secno, max_cost;
   int nsearched = 0;
 + unsigned int max_search = MAX_VICTIM_SEARCH;
 + unsigned int nr_dirty;
  
   p.alloc_mode = alloc_mode;
   select_policy(sbi, gc_type, type, p);
 @@ -258,6 +262,27 @@ static int get_victim_by_default(struct f2fs_sb_info 
 *sbi,
   goto got_it;
   }
  
 + nr_dirty = dirty_i-nr_dirty[p.dirty_type];
 + if (p.gc_mode == GC_GREEDY  p.alloc_mode != SSR) {
 + if (TOTAL_SEGS(sbi) = FULL_VICTIM_SEARCH_THRESH)
 + max_search = nr_dirty; /* search all the dirty segs */
 + else {
 + /*
 +  * With more dirty segments, garbage blocks are likely
 +  * more scattered, thus search harder for better
 +  * victim.
 +  */
 + max_search = div_u64 ((nr_dirty *
 + FULL_VICTIM_SEARCH_THRESH), TOTAL_SEGS(sbi));
 + if (max_search  MIN_VICTIM_SEARCH_GREEDY)
 + max_search = MIN_VICTIM_SEARCH_GREEDY;
 + }
 + }
 +
 + /* no more than the total dirty segments */
 + if (max_search  nr_dirty)
 + max_search = nr_dirty;
 +
   while (1) {
   unsigned long cost;
   unsigned int segno;
 @@ -290,7 +315,7 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
   if (cost == max_cost)
   continue;
  
 - if (nsearched++ = MAX_VICTIM_SEARCH) {
 + if (nsearched++ = max_search) {

if (nsearched++ = p.max_search) {

   sbi-last_victim[p.gc_mode] = segno;
   break;
   

Re: [PATCH] f2fs: optimize gc for better performance

2013-08-29 Thread Jin Xu

Hi Jaegeuk,

On 08/29/2013 07:56 PM, Jaegeuk Kim wrote:

Hi,

2013-08-29 (목), 08:48 +0800, Jin Xu:

From: Jin Xu jinuxst...@gmail.com

This patch improves the foreground gc efficiency by optimizing the
victim selection policy. With this optimization, the random re-write
performance could increase up to 20%.


[...]


In this patch, it does not search a constant number of dirty segments
anymore, instead it calculates the number based on the total segments,
dirty segments and a threshold. Following is the pseudo formula.
,-- nr_dirty_segments, if total_segments  threshold
(# of search) = |
`-- (nr_dirty_segments * threshold) / total_segments,
 Otherwise


Nice catch, but,
I don't understand why we search the # of segments in proportion to the
# of dirty segments.
How about the case where threshold = 10 and total_segments = 10?
Or threshold = 100 and total_segments = 100?
For this, we need to define additional MIN/MAX thresholds and another
handling codes as your proposal.



Firstly, calculating the # of search this way could constraint the
searching overhead in a certain level even when the total segments is
too many.
Secondly, when there are more dirty segments, the same # of garbage
blocks might be more  scattered than when there are less dirty segments.
So we search the # of the segments in proportion to the # of dirty
segments.

[...]


It seems that we can obtain the performance gain just by setting the
MAX_VICTIM_SEARCH to 4096, for example.
So, how about just adding an ending criteria like below?



I agree that we could get the performance improvement by simply
enlarging the MAX_VICTIM_SEARCH to 4096, but I am concerning the
scalability a little bit. Because it might always searching the whole
bitmap in some cases, for example, when dirty segments is 4000 and
total segments is 409600.


[snip]

[...]


if (p-max_search  MAX_VICTIM_SEARCH)
p-max_search = MAX_VICTIM_SEARCH;



The optimization does not apply to SSR mode. There has a reason.
As noticed in the test, when SSR selected the segments that have most
garbage blocks, then when gc is needed, all the dirty segments might
have very less garbage blocks, thus the gc overhead is high. This might
lead to performance degradation. So the patch does not change the
victim selection policy for SSR.

What do you think now?


#define MAX_VICTIM_SEARCH 4096 /* covers 8GB */


p-offset = sbi-last_victim[p-gc_mode];
@@ -243,6 +245,8 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
struct victim_sel_policy p;
unsigned int secno, max_cost;
int nsearched = 0;
+   unsigned int max_search = MAX_VICTIM_SEARCH;
+   unsigned int nr_dirty;

p.alloc_mode = alloc_mode;
select_policy(sbi, gc_type, type, p);
@@ -258,6 +262,27 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
goto got_it;
}

+   nr_dirty = dirty_i-nr_dirty[p.dirty_type];
+   if (p.gc_mode == GC_GREEDY  p.alloc_mode != SSR) {
+   if (TOTAL_SEGS(sbi) = FULL_VICTIM_SEARCH_THRESH)
+   max_search = nr_dirty; /* search all the dirty segs */
+   else {
+   /*
+* With more dirty segments, garbage blocks are likely
+* more scattered, thus search harder for better
+* victim.
+*/
+   max_search = div_u64 ((nr_dirty *
+   FULL_VICTIM_SEARCH_THRESH), TOTAL_SEGS(sbi));
+   if (max_search  MIN_VICTIM_SEARCH_GREEDY)
+   max_search = MIN_VICTIM_SEARCH_GREEDY;
+   }
+   }
+
+   /* no more than the total dirty segments */
+   if (max_search  nr_dirty)
+   max_search = nr_dirty;
+
while (1) {
unsigned long cost;
unsigned int segno;
@@ -290,7 +315,7 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
if (cost == max_cost)
continue;

-   if (nsearched++ = MAX_VICTIM_SEARCH) {
+   if (nsearched++ = max_search) {


if (nsearched++ = p.max_search) {


sbi-last_victim[p.gc_mode] = segno;
break;
}
diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h
index 2c6a6bd..2f525aa 100644
--- a/fs/f2fs/gc.h
+++ b/fs/f2fs/gc.h
@@ -20,7 +20,9 @@
  #define LIMIT_FREE_BLOCK  40 /* percentage over invalid + free space */

  /* Search max. number of dirty segments to select a victim segment */
-#define MAX_VICTIM_SEARCH  20
+#define MAX_VICTIM_SEARCH  20
+#define MIN_VICTIM_SEARCH_GREEDY   20
+#define FULL_VICTIM_SEARCH_THRESH  4096

  struct f2fs_gc_kthread {
struct task_struct *f2fs_gc_task;
diff --git a/fs/f2fs/segment.h 

[PATCH] f2fs: optimize gc for better performance

2013-08-28 Thread Jin Xu
From: Jin Xu 

This patch improves the foreground gc efficiency by optimizing the
victim selection policy. With this optimization, the random re-write
performance could increase up to 20%.

For f2fs, foreground gc happens when disk is lack of free spaces,
it selects dirty segments and moves valid blocks around for making
more space available. The gc cost of a segment is determined by the
valid blocks in the segment. The less the valid blocks, the higher
the efficiency. The ideal victim segment is the one that has the
most garbage blocks.

Currently, it searches up to 20 dirty segments for a victim segment.
The selected victim is not likely the best victim for gc when there
are much more dirty segments. Why not searching more dirty segments
for a better victim? The cost of searching dirty segments is
negligible in comparison to moving blocks.

In this patch, it does not search a constant number of dirty segments
anymore, instead it calculates the number based on the total segments,
dirty segments and a threshold. Following is the pseudo formula.
,-- nr_dirty_segments, if total_segments < threshold
(# of search) = |
`-- (nr_dirty_segments * threshold) / total_segments,
Otherwise

The test case is simple. It creates as many files until the disk full.
The size for each file is 32KB. Then it writes as many as 10
records of 4KB size to random offsets of random files in sync mode.
The testing was done on a 2GB partition of a SDHC card. Let's see the
test result of f2fs without and with the patch.

without the patch
created 52023 files of size 32768 bytes in 341 seconds
finished 10 loops in 4204 seconds
(internally, 722 gc were done, 115939 garbage blocks were reclaimed)

with the patch
created 52023 files of size 32768 bytes in 326 seconds
finished 10 loops in 3067 seconds
(internally, 1255 gc were done, 181101 garbage blocks were reclaimed)

It's obvious that, with the patch, f2fs finishes the test in 20+% less
time than without the patch.

Since the performance improvement is related to gc, it might not be so
obvious for other tests that do not trigger gc as often as this one (
This is because f2fs selects dirty segments for SSR use most of the
time when free space is in shortage). The well-known iozone test tool
was not used for benchmarking the patch becuase it seems do not have
a test case that performs random re-write on a full disk.

Signed-off-by: Jin Xu 
---
 fs/f2fs/gc.c  |   27 ++-
 fs/f2fs/gc.h  |4 +++-
 fs/f2fs/segment.h |1 +
 3 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 35f9b1a..4e045e6 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -138,10 +138,12 @@ static void select_policy(struct f2fs_sb_info *sbi, int 
gc_type,
if (p->alloc_mode == SSR) {
p->gc_mode = GC_GREEDY;
p->dirty_segmap = dirty_i->dirty_segmap[type];
+   p->dirty_type = type;
p->ofs_unit = 1;
} else {
p->gc_mode = select_gc_type(gc_type);
p->dirty_segmap = dirty_i->dirty_segmap[DIRTY];
+   p->dirty_type = DIRTY;
p->ofs_unit = sbi->segs_per_sec;
}
p->offset = sbi->last_victim[p->gc_mode];
@@ -243,6 +245,8 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
struct victim_sel_policy p;
unsigned int secno, max_cost;
int nsearched = 0;
+   unsigned int max_search = MAX_VICTIM_SEARCH;
+   unsigned int nr_dirty;
 
p.alloc_mode = alloc_mode;
select_policy(sbi, gc_type, type, );
@@ -258,6 +262,27 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
goto got_it;
}
 
+   nr_dirty = dirty_i->nr_dirty[p.dirty_type];
+   if (p.gc_mode == GC_GREEDY && p.alloc_mode != SSR) {
+   if (TOTAL_SEGS(sbi) <= FULL_VICTIM_SEARCH_THRESH)
+   max_search = nr_dirty; /* search all the dirty segs */
+   else {
+   /*
+* With more dirty segments, garbage blocks are likely
+* more scattered, thus search harder for better
+* victim.
+*/
+   max_search = div_u64 ((nr_dirty *
+   FULL_VICTIM_SEARCH_THRESH), TOTAL_SEGS(sbi));
+   if (max_search < MIN_VICTIM_SEARCH_GREEDY)
+   max_search = MIN_VICTIM_SEARCH_GREEDY;
+   }
+   }
+
+   /* no more than the total dirty segments */
+   if (max_search > nr_dirty)
+   max_search = nr_dirty;
+
while (1) {
unsigned long cost;
unsigned int segno;
@@ -290,7 +315,7 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
if (cost == max_cost)
  

[PATCH] f2fs: optimize gc for better performance

2013-08-28 Thread Jin Xu
From: Jin Xu jinuxst...@gmail.com

This patch improves the foreground gc efficiency by optimizing the
victim selection policy. With this optimization, the random re-write
performance could increase up to 20%.

For f2fs, foreground gc happens when disk is lack of free spaces,
it selects dirty segments and moves valid blocks around for making
more space available. The gc cost of a segment is determined by the
valid blocks in the segment. The less the valid blocks, the higher
the efficiency. The ideal victim segment is the one that has the
most garbage blocks.

Currently, it searches up to 20 dirty segments for a victim segment.
The selected victim is not likely the best victim for gc when there
are much more dirty segments. Why not searching more dirty segments
for a better victim? The cost of searching dirty segments is
negligible in comparison to moving blocks.

In this patch, it does not search a constant number of dirty segments
anymore, instead it calculates the number based on the total segments,
dirty segments and a threshold. Following is the pseudo formula.
,-- nr_dirty_segments, if total_segments  threshold
(# of search) = |
`-- (nr_dirty_segments * threshold) / total_segments,
Otherwise

The test case is simple. It creates as many files until the disk full.
The size for each file is 32KB. Then it writes as many as 10
records of 4KB size to random offsets of random files in sync mode.
The testing was done on a 2GB partition of a SDHC card. Let's see the
test result of f2fs without and with the patch.

without the patch
created 52023 files of size 32768 bytes in 341 seconds
finished 10 loops in 4204 seconds
(internally, 722 gc were done, 115939 garbage blocks were reclaimed)

with the patch
created 52023 files of size 32768 bytes in 326 seconds
finished 10 loops in 3067 seconds
(internally, 1255 gc were done, 181101 garbage blocks were reclaimed)

It's obvious that, with the patch, f2fs finishes the test in 20+% less
time than without the patch.

Since the performance improvement is related to gc, it might not be so
obvious for other tests that do not trigger gc as often as this one (
This is because f2fs selects dirty segments for SSR use most of the
time when free space is in shortage). The well-known iozone test tool
was not used for benchmarking the patch becuase it seems do not have
a test case that performs random re-write on a full disk.

Signed-off-by: Jin Xu jinuxst...@gmail.com
---
 fs/f2fs/gc.c  |   27 ++-
 fs/f2fs/gc.h  |4 +++-
 fs/f2fs/segment.h |1 +
 3 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 35f9b1a..4e045e6 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -138,10 +138,12 @@ static void select_policy(struct f2fs_sb_info *sbi, int 
gc_type,
if (p-alloc_mode == SSR) {
p-gc_mode = GC_GREEDY;
p-dirty_segmap = dirty_i-dirty_segmap[type];
+   p-dirty_type = type;
p-ofs_unit = 1;
} else {
p-gc_mode = select_gc_type(gc_type);
p-dirty_segmap = dirty_i-dirty_segmap[DIRTY];
+   p-dirty_type = DIRTY;
p-ofs_unit = sbi-segs_per_sec;
}
p-offset = sbi-last_victim[p-gc_mode];
@@ -243,6 +245,8 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
struct victim_sel_policy p;
unsigned int secno, max_cost;
int nsearched = 0;
+   unsigned int max_search = MAX_VICTIM_SEARCH;
+   unsigned int nr_dirty;
 
p.alloc_mode = alloc_mode;
select_policy(sbi, gc_type, type, p);
@@ -258,6 +262,27 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
goto got_it;
}
 
+   nr_dirty = dirty_i-nr_dirty[p.dirty_type];
+   if (p.gc_mode == GC_GREEDY  p.alloc_mode != SSR) {
+   if (TOTAL_SEGS(sbi) = FULL_VICTIM_SEARCH_THRESH)
+   max_search = nr_dirty; /* search all the dirty segs */
+   else {
+   /*
+* With more dirty segments, garbage blocks are likely
+* more scattered, thus search harder for better
+* victim.
+*/
+   max_search = div_u64 ((nr_dirty *
+   FULL_VICTIM_SEARCH_THRESH), TOTAL_SEGS(sbi));
+   if (max_search  MIN_VICTIM_SEARCH_GREEDY)
+   max_search = MIN_VICTIM_SEARCH_GREEDY;
+   }
+   }
+
+   /* no more than the total dirty segments */
+   if (max_search  nr_dirty)
+   max_search = nr_dirty;
+
while (1) {
unsigned long cost;
unsigned int segno;
@@ -290,7 +315,7 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
if (cost ==