Re: [f2fs-dev] [PATCH] f2fs: optimize gc for better performance

2013-09-04 Thread Jaegeuk Kim
Hi Jin,

2013-09-04 (수), 07:59 +0800, Jin Xu:
 Hi Jaegeuk,
 
 On 03/09/2013 08:45, Jaegeuk Kim wrote:
  Hi Jin,
 
  [...]
 
  It seems that we can obtain the performance gain just by setting the
  MAX_VICTIM_SEARCH to 4096, for example.
  So, how about just adding an ending criteria like below?
 
 
  I agree that we could get the performance improvement by simply
  enlarging the MAX_VICTIM_SEARCH to 4096, but I am concerning the
  scalability a little bit. Because it might always searching the whole
  bitmap in some cases, for example, when dirty segments is 4000 and
  total segments is 409600.
  [snip]
  [...]
 
if (p-max_search  MAX_VICTIM_SEARCH)
p-max_search = MAX_VICTIM_SEARCH;
 
 
  The optimization does not apply to SSR mode. There has a reason.
  As noticed in the test, when SSR selected the segments that have most
  garbage blocks, then when gc is needed, all the dirty segments might
  have very less garbage blocks, thus the gc overhead is high. This might
  lead to performance degradation. So the patch does not change the
  victim selection policy for SSR.
 
  I think it doesn't care.
  GC is only triggered during the direct node block allocation.
  What it means that we need to consider the number of GC triggers where
  the GC triggers more frequently during the normal data allocation than
  the node block allocation.
  So, I think it would not degrade performance significatly.
 
  BTW, could you show some numbers for this?
  Or could you test what I suggested?
 
  Thanks,
 
 
 I re-ran the test and got the following result:
 
 ---
 2GB SDHC
 create 52023 files of size 32768 bytes
 random re-write 10 records of 4KB
 ---
 
 | file creation (s) | rewrite time (s) | gc count | gc garbage blocks |
 
 no patch   341 4227 1174  174840
 patched296 2995 634   109314
 patched (KIM)  324 2958 645   106682
 
 In this test, it does not show the minor performance degradation caused
 by applying the patch to SSR mode. Instead, the performance is a little 
 better with what you suggested.
 
 I agree that the performance degradation would not be significant even
 it does degrade. I ever saw the minor degradation in some workloads, but
 I didn't save the data.
 
 So, I agree that we can apply the patch to SSR mode as well.
 
 And do you still have concerns about the formula for calculating the #
 of search?

Thank you for the test. :)
What I've concerned is that, if it is really important to get a victim
more accurately for the performance as you described, it doesn't need to
calculate the number of searches IMO. Just let's select nr_dirty. Why
not?
Only the thing that we should consider is to handle the case where the
nr_dirty is too large.
For this, we can just limit the # of searches to avoid performance
degradation.

Still actually, I'm not convincing the effectiveness of your formula.
If possible, could you show it with numbers?

Thanks,

 
 
  What do you think now?
 
  #define MAX_VICTIM_SEARCH 4096 /* covers 8GB */
 
   p-offset = sbi-last_victim[p-gc_mode];
  @@ -243,6 +245,8 @@ static int get_victim_by_default(struct f2fs_sb_info 
  *sbi,
   struct victim_sel_policy p;
   unsigned int secno, max_cost;
   int nsearched = 0;
  +unsigned int max_search = MAX_VICTIM_SEARCH;
  +unsigned int nr_dirty;
 
   p.alloc_mode = alloc_mode;
   select_policy(sbi, gc_type, type, p);
  @@ -258,6 +262,27 @@ static int get_victim_by_default(struct 
  f2fs_sb_info *sbi,
   goto got_it;
   }
 
  +nr_dirty = dirty_i-nr_dirty[p.dirty_type];
  +if (p.gc_mode == GC_GREEDY  p.alloc_mode != SSR) {
  +if (TOTAL_SEGS(sbi) = FULL_VICTIM_SEARCH_THRESH)
  +max_search = nr_dirty; /* search all the dirty 
  segs */
  +else {
  +/*
  + * With more dirty segments, garbage blocks are 
  likely
  + * more scattered, thus search harder for better
  + * victim.
  + */
  +max_search = div_u64 ((nr_dirty *
  +FULL_VICTIM_SEARCH_THRESH), 
  TOTAL_SEGS(sbi));
  +if (max_search  MIN_VICTIM_SEARCH_GREEDY)
  +max_search = MIN_VICTIM_SEARCH_GREEDY;
  +}
  +}
  +
  +/* no more than the total dirty segments */
  +if (max_search  nr_dirty)
  +max_search = nr_dirty;
  +
   while (1) {
   unsigned long cost;
   unsigned int segno;
  @@ -290,7 +315,7 @@ static int get_victim_by_default(struct f2fs_sb_info 
  *sbi,
   if (cost == max_cost)
   

Re: [f2fs-dev] [PATCH] f2fs: optimize gc for better performance

2013-09-04 Thread Jin Xu
Hi Jaegeuk,

On 04/09/2013 17:40, Jaegeuk Kim wrote:
 Hi Jin,

 2013-09-04 (수), 07:59 +0800, Jin Xu:
 Hi Jaegeuk,

 On 03/09/2013 08:45, Jaegeuk Kim wrote:
 Hi Jin,

 [...]

 It seems that we can obtain the performance gain just by setting the
 MAX_VICTIM_SEARCH to 4096, for example.
 So, how about just adding an ending criteria like below?


 I agree that we could get the performance improvement by simply
 enlarging the MAX_VICTIM_SEARCH to 4096, but I am concerning the
 scalability a little bit. Because it might always searching the whole
 bitmap in some cases, for example, when dirty segments is 4000 and
 total segments is 409600.
 [snip]
 [...]

   if (p-max_search  MAX_VICTIM_SEARCH)
   p-max_search = MAX_VICTIM_SEARCH;


 The optimization does not apply to SSR mode. There has a reason.
 As noticed in the test, when SSR selected the segments that have most
 garbage blocks, then when gc is needed, all the dirty segments might
 have very less garbage blocks, thus the gc overhead is high. This might
 lead to performance degradation. So the patch does not change the
 victim selection policy for SSR.

 I think it doesn't care.
 GC is only triggered during the direct node block allocation.
 What it means that we need to consider the number of GC triggers where
 the GC triggers more frequently during the normal data allocation than
 the node block allocation.
 So, I think it would not degrade performance significatly.

 BTW, could you show some numbers for this?
 Or could you test what I suggested?

 Thanks,


 I re-ran the test and got the following result:

 ---
 2GB SDHC
 create 52023 files of size 32768 bytes
 random re-write 10 records of 4KB
 ---

 | file creation (s) | rewrite time (s) | gc count | gc garbage blocks |

 no patch   341 4227 1174  174840
 patched296 2995 634   109314
 patched (KIM)  324 2958 645   106682

 In this test, it does not show the minor performance degradation caused
 by applying the patch to SSR mode. Instead, the performance is a little
 better with what you suggested.

 I agree that the performance degradation would not be significant even
 it does degrade. I ever saw the minor degradation in some workloads, but
 I didn't save the data.

 So, I agree that we can apply the patch to SSR mode as well.

 And do you still have concerns about the formula for calculating the #
 of search?

 Thank you for the test. :)
 What I've concerned is that, if it is really important to get a victim
 more accurately for the performance as you described, it doesn't need to
 calculate the number of searches IMO. Just let's select nr_dirty. Why
 not?
 Only the thing that we should consider is to handle the case where the
 nr_dirty is too large.
 For this, we can just limit the # of searches to avoid performance
 degradation.

 Still actually, I'm not convincing the effectiveness of your formula.
 If possible, could you show it with numbers?

It's not easy to prove the effectiveness of the formula. It's just for
eliminating my concern on the scalability of searching. Since it does
not matter much for the performance improvement, we can put it aside
and choose the simpler method as you suggested.

So, should I revise the patch based on what you suggested or will
you take care of it?

--
Thanks,
Jin


 Thanks,



 What do you think now?

 #define MAX_VICTIM_SEARCH 4096 /* covers 8GB */

  p-offset = sbi-last_victim[p-gc_mode];
 @@ -243,6 +245,8 @@ static int get_victim_by_default(struct f2fs_sb_info 
 *sbi,
  struct victim_sel_policy p;
  unsigned int secno, max_cost;
  int nsearched = 0;
 +unsigned int max_search = MAX_VICTIM_SEARCH;
 +unsigned int nr_dirty;

  p.alloc_mode = alloc_mode;
  select_policy(sbi, gc_type, type, p);
 @@ -258,6 +262,27 @@ static int get_victim_by_default(struct 
 f2fs_sb_info *sbi,
  goto got_it;
  }

 +nr_dirty = dirty_i-nr_dirty[p.dirty_type];
 +if (p.gc_mode == GC_GREEDY  p.alloc_mode != SSR) {
 +if (TOTAL_SEGS(sbi) = FULL_VICTIM_SEARCH_THRESH)
 +max_search = nr_dirty; /* search all the dirty 
 segs */
 +else {
 +/*
 + * With more dirty segments, garbage blocks are 
 likely
 + * more scattered, thus search harder for better
 + * victim.
 + */
 +max_search = div_u64 ((nr_dirty *
 +FULL_VICTIM_SEARCH_THRESH), 
 TOTAL_SEGS(sbi));
 +if (max_search  MIN_VICTIM_SEARCH_GREEDY)
 +max_search = MIN_VICTIM_SEARCH_GREEDY;
 +}
 +}
 +
 +/* no more than the total dirty segments */
 

Re: [f2fs-dev] [PATCH] f2fs: optimize gc for better performance

2013-09-04 Thread Jaegeuk Kim
Hi Jin,

2013-09-04 (수), 21:17 +0800, Jin Xu:
 Hi Jaegeuk,
 
 On 04/09/2013 17:40, Jaegeuk Kim wrote:
  Hi Jin,
 
  2013-09-04 (수), 07:59 +0800, Jin Xu:
  Hi Jaegeuk,
 
  On 03/09/2013 08:45, Jaegeuk Kim wrote:
  Hi Jin,
 
  [...]
 
  It seems that we can obtain the performance gain just by setting the
  MAX_VICTIM_SEARCH to 4096, for example.
  So, how about just adding an ending criteria like below?
 
 
  I agree that we could get the performance improvement by simply
  enlarging the MAX_VICTIM_SEARCH to 4096, but I am concerning the
  scalability a little bit. Because it might always searching the whole
  bitmap in some cases, for example, when dirty segments is 4000 and
  total segments is 409600.
  [snip]
  [...]
 
  if (p-max_search  MAX_VICTIM_SEARCH)
  p-max_search = MAX_VICTIM_SEARCH;
 
 
  The optimization does not apply to SSR mode. There has a reason.
  As noticed in the test, when SSR selected the segments that have most
  garbage blocks, then when gc is needed, all the dirty segments might
  have very less garbage blocks, thus the gc overhead is high. This might
  lead to performance degradation. So the patch does not change the
  victim selection policy for SSR.
 
  I think it doesn't care.
  GC is only triggered during the direct node block allocation.
  What it means that we need to consider the number of GC triggers where
  the GC triggers more frequently during the normal data allocation than
  the node block allocation.
  So, I think it would not degrade performance significatly.
 
  BTW, could you show some numbers for this?
  Or could you test what I suggested?
 
  Thanks,
 
 
  I re-ran the test and got the following result:
 
  ---
  2GB SDHC
  create 52023 files of size 32768 bytes
  random re-write 10 records of 4KB
  ---
 
  | file creation (s) | rewrite time (s) | gc count | gc garbage blocks |
 
  no patch   341 4227 1174  174840
  patched296 2995 634   109314
  patched (KIM)  324 2958 645   106682
 
  In this test, it does not show the minor performance degradation caused
  by applying the patch to SSR mode. Instead, the performance is a little
  better with what you suggested.
 
  I agree that the performance degradation would not be significant even
  it does degrade. I ever saw the minor degradation in some workloads, but
  I didn't save the data.
 
  So, I agree that we can apply the patch to SSR mode as well.
 
  And do you still have concerns about the formula for calculating the #
  of search?
 
  Thank you for the test. :)
  What I've concerned is that, if it is really important to get a victim
  more accurately for the performance as you described, it doesn't need to
  calculate the number of searches IMO. Just let's select nr_dirty. Why
  not?
  Only the thing that we should consider is to handle the case where the
  nr_dirty is too large.
  For this, we can just limit the # of searches to avoid performance
  degradation.
 
  Still actually, I'm not convincing the effectiveness of your formula.
  If possible, could you show it with numbers?
 
 It's not easy to prove the effectiveness of the formula. It's just for
 eliminating my concern on the scalability of searching. Since it does
 not matter much for the performance improvement, we can put it aside
 and choose the simpler method as you suggested.
 
 So, should I revise the patch based on what you suggested or will
 you take care of it?

Could you make a patch with your performance description and sumbit it
again?
Thanks a lot,

-- 
Jaegeuk Kim
Samsung


--
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58041391iu=/4140/ostg.clktrk
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel