Re: [f2fs-dev] [PATCH v3] f2fs: fix performance issue observed with multi-thread sequential read

2018-08-14 Thread Chao Yu
On 2018/8/15 10:15, Jaegeuk Kim wrote:
> On 08/15, Chao Yu wrote:
>> On 2018/8/15 1:28, Jaegeuk Kim wrote:
>>> On 08/14, Chao Yu wrote:
 On 2018/8/14 12:04, Jaegeuk Kim wrote:
> On 08/14, Chao Yu wrote:
>> On 2018/8/14 4:11, Jaegeuk Kim wrote:
>>> On 08/13, Chao Yu wrote:
 Hi Jaegeuk,

 On 2018/8/11 2:56, Jaegeuk Kim wrote:
> This reverts the commit - "b93f771 - f2fs: remove writepages lock"
> to fix the drop in sequential read throughput.
>
> Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L
> device: UFS
>
> Before -
> read throughput: 185 MB/s
> total read requests: 85177 (of these ~8 are 4KB size requests).
> total write requests: 2546 (of these ~2208 requests are written in 
> 512KB).
>
> After -
> read throughput: 758 MB/s
> total read requests: 2417 (of these ~2042 are 512KB reads).
> total write requests: 2701 (of these ~2034 requests are written in 
> 512KB).

 IMO, it only impact sequential read performance in a large file which 
 may be
 fragmented during multi-thread writing.

 In android environment, mostly, the large file should be cold type, 
 such as apk,
 mp3, rmvb, jpeg..., so I think we only need to serialize writepages() 
 for cold
 data area writer.

 So how about adding a mount option to serialize writepage() for 
 different type
 of log, e.g. in android, using serialize=4; by default, using 
 serialize=7
 HOT_DATA   1
 WARM_DATA  2
 COLD_DATA  4
>>>
>>> Well, I don't think we need to give too many mount options for this 
>>> fragmented
>>> case. How about doing this for the large files only like this?
>>
>> Thread A write 512 pages Thread B write 8 pages
>>
>> - writepages()
>>  - mutex_lock(>writepages);
>>   - writepage();
>> ...
>>  - writepages()
>>   - writepage()
>>
>>   - writepage();
>> ...
>>  - mutex_unlock(>writepages);
>>
>> Above case will also cause fragmentation since we didn't serialize all
>> concurrent IO with the lock.
>>
>> Do we need to consider such case?
>
> We can simply allow 512 and 8 in the same segment, which would not a big 
> deal,
> when considering starvation of Thread B.

 Yeah, but in reality, there would be more threads competing in same log 
 header,
 so I worry that the effect of defragmenting will not so good as we expect,
 anyway, for benchmark, it's enough.
>>>
>>> Basically, I think this is not a benchmark issue. :) It just reveals the 
>>> issue
>>> much easily. Let me think about three cases:
>>> 1) WB_SYNC_NONE & WB_SYNC_NONE
>>>  -> can simply use mutex_lock
>>>
>>> 2) WB_SYNC_ALL & WB_SYNC_NONE
>>>  -> can use mutex_lock on WB_SYNC_ALL having >512 blocks, while WB_SYNC_NONE
>>> will skip writing blocks
>>>
>>> 3) WB_SYNC_ALL & WB_SYNC_ALL
>>>  -> can use mutex_lock on WB_SYNC_ALL having >512 blocks, in order to avoid
>>> starvation.
>>>
>>>
>>> I've been testing the below.
>>>
>>> if (!S_ISDIR(inode->i_mode) && (wbc->sync_mode != WB_SYNC_ALL ||
>>> get_dirty_pages(inode) <= SM_I(sbi)->min_seq_blocks)) {
>>> mutex_lock(>writepages);
>>> locked = true;
>>
>> Just cover buffered IO? how about covering Direct IO and atomic write as 
>> well?
> 
> I'd expect direct IO does in-place-updates, and not sure whether we need to

For initial writes, they are not IPU.

> add another lock contention between buffered or direct IO. Atomic writes
> would be covered by ->min_seq_blocks.

Okay. :)

Thanks,

> 
>>
>> Thanks,
>>
>>> }
>>>
>>> Thanks,
>>>

 Thanks,

>
>>
>> Thanks,
>>
>>>
>>> >From 4fea0b6e4da8512a72dd52afc7a51beb35966ad9 Mon Sep 17 00:00:00 2001
>>> From: Jaegeuk Kim 
>>> Date: Thu, 9 Aug 2018 17:53:34 -0700
>>> Subject: [PATCH] f2fs: fix performance issue observed with multi-thread
>>>  sequential read
>>>
>>> This reverts the commit - "b93f771 - f2fs: remove writepages lock"
>>> to fix the drop in sequential read throughput.
>>>
>>> Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L
>>> device: UFS
>>>
>>> Before -
>>> read throughput: 185 MB/s
>>> total read requests: 85177 (of these ~8 are 4KB size requests).
>>> total write requests: 2546 (of these ~2208 requests are written in 
>>> 512KB).
>>>
>>> After -
>>> read throughput: 758 MB/s
>>> total read requests: 2417 (of these ~2042 are 512KB reads).
>>> total write requests: 2701 (of these ~2034 

Re: [f2fs-dev] [PATCH] Revert "f2fs: use printk_ratelimited for f2fs_msg"

2018-08-14 Thread Chao Yu
On 2018/8/15 11:25, Jaegeuk Kim wrote:
> On 08/15, Chao Yu wrote:
>> On 2018/8/15 11:01, Jaegeuk Kim wrote:
>>> On 08/15, Chao Yu wrote:
 On 2018/8/15 1:09, Jaegeuk Kim wrote:
> On 08/13, Chao Yu wrote:
>> Don't limit printing log, so that we will not miss any key messages.
>
> For example, this can avoid lots of messages during roll-forward recovery.

 That's really important evidence that can prove filesystem didn't lose last
 fsynced data, when file/data is missing after SPO, otherwise, filesystem 
 will
 still be treated as suspect... :(
>>>
>>> It's too bad that we need to rely on kernel messages for that purpose. 
>>> Anyway,
>>
>> Yes, it's very hard to make sure the problem is not caused by filesystem's 
>> bug
>> when trouble shooting, unless there is explicit kernel message.
>>
>> Also, without message, trouble shooting becomes harder.
>>
>>> if so, we may need to use printk() for essential messages only.
>>
>> Now, during our debugging, we have to change to use printk to avoid potential
>> message missing...
> 
> Let me understand what messages are really important for your debugging 
> purpose.

The most important message is about filesystem/data consistence.

> I believe it'd not require all of them.

Yeah, IMO, error injection prints too many messages.

Thanks,

> 
>>
>> Thanks,
>>
>>>

 Thanks,

>
>>
>> This reverts commit a36c106dffb616250117efb1cab271c19a8f94ff.
>>
>> Signed-off-by: Chao Yu 
>> ---
>>  fs/f2fs/super.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>> index bf4c919fe418..3d89d94191e7 100644
>> --- a/fs/f2fs/super.c
>> +++ b/fs/f2fs/super.c
>> @@ -209,7 +209,7 @@ void f2fs_msg(struct super_block *sb, const char 
>> *level, const char *fmt, ...)
>>  va_start(args, fmt);
>>  vaf.fmt = fmt;
>>  vaf.va = 
>> -printk_ratelimited("%sF2FS-fs (%s): %pV\n", level, sb->s_id, 
>> );
>> +printk("%sF2FS-fs (%s): %pV\n", level, sb->s_id, );
>>  va_end(args);
>>  }
>>  
>> -- 
>> 2.18.0.rc1
>
> .
>
>>>
>>> .
>>>
> 
> .
> 


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 2/2] f2fs: tune discard speed with storage usage rate

2018-08-14 Thread Chao Yu
On 2018/8/15 11:20, Jaegeuk Kim wrote:
> On 08/15, Chao Yu wrote:
>> On 2018/8/15 10:56, Jaegeuk Kim wrote:
>>> On 08/15, Chao Yu wrote:
 On 2018/8/15 10:33, Jaegeuk Kim wrote:
> On 08/15, Chao Yu wrote:
>> On 2018/8/15 1:23, Jaegeuk Kim wrote:
>>> On 08/14, Chao Yu wrote:
 On 2018/8/14 12:19, Jaegeuk Kim wrote:
> On 08/10, Chao Yu wrote:
>> Previously, discard speed was fixed mostly, and in high usage rate
>> device, we will speed up issuing discard, but it doesn't make sense
>> that in a non-full filesystem, we still issue discard with slow 
>> speed.
>
> Could you please elaborate the problem in more detail? The speed 
> depends
> on how many candidates?

 undiscard blocks are all 4k granularity.
 a) utility: filesystem: 20% + undiscard blocks: 20% = flash storage: 
 40%
 b) utility: filesystem: 40% + undiscard blocks: 25% = flash storage: 
 65%
 c) utility: filesystem: 60% + undiscard blocks: 30% = flash storage: 
 100%


 1. for case c), we need to speed up issuing discard based on 
 utilization of
 "filesystem + undiscard" instead of just utilization of filesystem.

 -  if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL) {
 -  dpolicy->granularity = 1;
 -  dpolicy->max_interval = 
 DEF_MIN_DISCARD_ISSUE_TIME;
 -  }

 2. If free space in storage touches therein threshold, performance 
 will be very
 sensitive. In low-end storage, with high usage in space, even free 
 space is
 reduced by 1%, performance will decrease a lot.
>>>
>>> So, we may need to distinguish low-end vs. high-end storage. In 
>>> high-end case,
>>> it'd be better to avoid IO contention, while low-end device wants to 
>>> get more
>>> discard commands as much as possible. So, how about adding an option 
>>> for this
>>> as a tunable point?
>>
>> Agreed, how about adding a sysfs entry discard_tunning:
>> 1: enabled, use 4k granularity, self-adapted speed based on real device 
>> free space.
>> 0: disabled, use dcc->discard_granularity, fixed speed.
>>
>> By default: enabled
>>
>> How do you think?
>
> I don't think this is proper with a sysfs entry, since we already know the

 You mean by storage capacity? <= 32GB means low-end?
>>>
>>> Yes, that's current condition to judge it. If there is any other method, 
>>> it'd be
>>
>> That would be hard code...
>>
>> Still I have not got any other method to do the judgment except capacity.
> 
> Maybe ufs or emmc?

Yeah, that's may be a good way.

I remember very initial version UFS has very poor discard performance, for that
kind of storage, it will be not accurate?

Let me check whether there is a flag to distinguish ufs/emmc.

Thanks,

> 
>>
>> Thanks,
>>
>>> better to change it.
>>>

 Thanks,

> device type when mounting the partition. We won't require to change the 
> policy
> on the fly. And, I still don't get to change the default.
>
>>
>> Thanks,
>>
>>>

 IMO, in above cases, we'd better to issue discard with high speed for 
 c), middle
 speed for b), and low speed for a).

 How do you think?

 Thanks,

>
> Thanks,
>
>>
>> Anyway, it comes out undiscarded block makes FTL GC be lower 
>> efficient
>> and causing high lifetime overhead.
>>
>> Let's tune discard speed as below:
>>
>> a. adjust default issue interval:
>>  originalafter
>> min_interval:50ms100ms
>> mid_interval:500ms   1000ms
>> max_interval:6ms 1ms
>>
>> b. if last time we stop issuing discard due to IO interruption of 
>> user,
>> let's reset all {min,mid,max}_interval to default one.
>>
>> c. tune {min,mid,max}_interval with below calculation method:
>>
>> base_interval = default_interval / 10;
>> total_interval = default_interval - base_interval;
>> interval = base_interval + total_interval * (100 - dev_util) / 100;
>>
>> For example:
>> min_interval (:100ms)
>> dev_util (%) interval (ms)
>> 0100
>> 10   91
>> 20   82
>> 30   73
>> ...
>> 80   28
>> 90   19
>> 100  10
>>
>> Signed-off-by: Chao Yu 
>> ---
>>  fs/f2fs/f2fs.h| 11 
>>  fs/f2fs/segment.c | 64 
>> 

Re: [f2fs-dev] [PATCH] Revert "f2fs: use printk_ratelimited for f2fs_msg"

2018-08-14 Thread Jaegeuk Kim
On 08/15, Chao Yu wrote:
> On 2018/8/15 11:01, Jaegeuk Kim wrote:
> > On 08/15, Chao Yu wrote:
> >> On 2018/8/15 1:09, Jaegeuk Kim wrote:
> >>> On 08/13, Chao Yu wrote:
>  Don't limit printing log, so that we will not miss any key messages.
> >>>
> >>> For example, this can avoid lots of messages during roll-forward recovery.
> >>
> >> That's really important evidence that can prove filesystem didn't lose last
> >> fsynced data, when file/data is missing after SPO, otherwise, filesystem 
> >> will
> >> still be treated as suspect... :(
> > 
> > It's too bad that we need to rely on kernel messages for that purpose. 
> > Anyway,
> 
> Yes, it's very hard to make sure the problem is not caused by filesystem's bug
> when trouble shooting, unless there is explicit kernel message.
> 
> Also, without message, trouble shooting becomes harder.
> 
> > if so, we may need to use printk() for essential messages only.
> 
> Now, during our debugging, we have to change to use printk to avoid potential
> message missing...

Let me understand what messages are really important for your debugging purpose.
I believe it'd not require all of them.

> 
> Thanks,
> 
> > 
> >>
> >> Thanks,
> >>
> >>>
> 
>  This reverts commit a36c106dffb616250117efb1cab271c19a8f94ff.
> 
>  Signed-off-by: Chao Yu 
>  ---
>   fs/f2fs/super.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
>  diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>  index bf4c919fe418..3d89d94191e7 100644
>  --- a/fs/f2fs/super.c
>  +++ b/fs/f2fs/super.c
>  @@ -209,7 +209,7 @@ void f2fs_msg(struct super_block *sb, const char 
>  *level, const char *fmt, ...)
>   va_start(args, fmt);
>   vaf.fmt = fmt;
>   vaf.va = 
>  -printk_ratelimited("%sF2FS-fs (%s): %pV\n", level, sb->s_id, 
>  );
>  +printk("%sF2FS-fs (%s): %pV\n", level, sb->s_id, );
>   va_end(args);
>   }
>   
>  -- 
>  2.18.0.rc1
> >>>
> >>> .
> >>>
> > 
> > .
> > 

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 2/2] f2fs: tune discard speed with storage usage rate

2018-08-14 Thread Jaegeuk Kim
On 08/15, Chao Yu wrote:
> On 2018/8/15 10:56, Jaegeuk Kim wrote:
> > On 08/15, Chao Yu wrote:
> >> On 2018/8/15 10:33, Jaegeuk Kim wrote:
> >>> On 08/15, Chao Yu wrote:
>  On 2018/8/15 1:23, Jaegeuk Kim wrote:
> > On 08/14, Chao Yu wrote:
> >> On 2018/8/14 12:19, Jaegeuk Kim wrote:
> >>> On 08/10, Chao Yu wrote:
>  Previously, discard speed was fixed mostly, and in high usage rate
>  device, we will speed up issuing discard, but it doesn't make sense
>  that in a non-full filesystem, we still issue discard with slow 
>  speed.
> >>>
> >>> Could you please elaborate the problem in more detail? The speed 
> >>> depends
> >>> on how many candidates?
> >>
> >> undiscard blocks are all 4k granularity.
> >> a) utility: filesystem: 20% + undiscard blocks: 20% = flash storage: 
> >> 40%
> >> b) utility: filesystem: 40% + undiscard blocks: 25% = flash storage: 
> >> 65%
> >> c) utility: filesystem: 60% + undiscard blocks: 30% = flash storage: 
> >> 100%
> >>
> >>
> >> 1. for case c), we need to speed up issuing discard based on 
> >> utilization of
> >> "filesystem + undiscard" instead of just utilization of filesystem.
> >>
> >> -  if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL) {
> >> -  dpolicy->granularity = 1;
> >> -  dpolicy->max_interval = 
> >> DEF_MIN_DISCARD_ISSUE_TIME;
> >> -  }
> >>
> >> 2. If free space in storage touches therein threshold, performance 
> >> will be very
> >> sensitive. In low-end storage, with high usage in space, even free 
> >> space is
> >> reduced by 1%, performance will decrease a lot.
> >
> > So, we may need to distinguish low-end vs. high-end storage. In 
> > high-end case,
> > it'd be better to avoid IO contention, while low-end device wants to 
> > get more
> > discard commands as much as possible. So, how about adding an option 
> > for this
> > as a tunable point?
> 
>  Agreed, how about adding a sysfs entry discard_tunning:
>  1: enabled, use 4k granularity, self-adapted speed based on real device 
>  free space.
>  0: disabled, use dcc->discard_granularity, fixed speed.
> 
>  By default: enabled
> 
>  How do you think?
> >>>
> >>> I don't think this is proper with a sysfs entry, since we already know the
> >>
> >> You mean by storage capacity? <= 32GB means low-end?
> > 
> > Yes, that's current condition to judge it. If there is any other method, 
> > it'd be
> 
> That would be hard code...
> 
> Still I have not got any other method to do the judgment except capacity.

Maybe ufs or emmc?

> 
> Thanks,
> 
> > better to change it.
> > 
> >>
> >> Thanks,
> >>
> >>> device type when mounting the partition. We won't require to change the 
> >>> policy
> >>> on the fly. And, I still don't get to change the default.
> >>>
> 
>  Thanks,
> 
> >
> >>
> >> IMO, in above cases, we'd better to issue discard with high speed for 
> >> c), middle
> >> speed for b), and low speed for a).
> >>
> >> How do you think?
> >>
> >> Thanks,
> >>
> >>>
> >>> Thanks,
> >>>
> 
>  Anyway, it comes out undiscarded block makes FTL GC be lower 
>  efficient
>  and causing high lifetime overhead.
> 
>  Let's tune discard speed as below:
> 
>  a. adjust default issue interval:
>   originalafter
>  min_interval:50ms100ms
>  mid_interval:500ms   1000ms
>  max_interval:6ms 1ms
> 
>  b. if last time we stop issuing discard due to IO interruption of 
>  user,
>  let's reset all {min,mid,max}_interval to default one.
> 
>  c. tune {min,mid,max}_interval with below calculation method:
> 
>  base_interval = default_interval / 10;
>  total_interval = default_interval - base_interval;
>  interval = base_interval + total_interval * (100 - dev_util) / 100;
> 
>  For example:
>  min_interval (:100ms)
>  dev_util (%) interval (ms)
>  0100
>  10   91
>  20   82
>  30   73
>  ...
>  80   28
>  90   19
>  100  10
> 
>  Signed-off-by: Chao Yu 
>  ---
>   fs/f2fs/f2fs.h| 11 
>   fs/f2fs/segment.c | 64 
>  +--
>   fs/f2fs/segment.h |  9 +++
>   fs/f2fs/super.c   |  2 +-
>   4 files changed, 67 insertions(+), 19 deletions(-)
> 
>  diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>  index 273ffdaf4891..a1dd2e1c3cb9 

Re: [f2fs-dev] [PATCH] Revert "f2fs: use printk_ratelimited for f2fs_msg"

2018-08-14 Thread Chao Yu
On 2018/8/15 11:01, Jaegeuk Kim wrote:
> On 08/15, Chao Yu wrote:
>> On 2018/8/15 1:09, Jaegeuk Kim wrote:
>>> On 08/13, Chao Yu wrote:
 Don't limit printing log, so that we will not miss any key messages.
>>>
>>> For example, this can avoid lots of messages during roll-forward recovery.
>>
>> That's really important evidence that can prove filesystem didn't lose last
>> fsynced data, when file/data is missing after SPO, otherwise, filesystem will
>> still be treated as suspect... :(
> 
> It's too bad that we need to rely on kernel messages for that purpose. Anyway,

Yes, it's very hard to make sure the problem is not caused by filesystem's bug
when trouble shooting, unless there is explicit kernel message.

Also, without message, trouble shooting becomes harder.

> if so, we may need to use printk() for essential messages only.

Now, during our debugging, we have to change to use printk to avoid potential
message missing...

Thanks,

> 
>>
>> Thanks,
>>
>>>

 This reverts commit a36c106dffb616250117efb1cab271c19a8f94ff.

 Signed-off-by: Chao Yu 
 ---
  fs/f2fs/super.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
 index bf4c919fe418..3d89d94191e7 100644
 --- a/fs/f2fs/super.c
 +++ b/fs/f2fs/super.c
 @@ -209,7 +209,7 @@ void f2fs_msg(struct super_block *sb, const char 
 *level, const char *fmt, ...)
va_start(args, fmt);
vaf.fmt = fmt;
vaf.va = 
 -  printk_ratelimited("%sF2FS-fs (%s): %pV\n", level, sb->s_id, );
 +  printk("%sF2FS-fs (%s): %pV\n", level, sb->s_id, );
va_end(args);
  }
  
 -- 
 2.18.0.rc1
>>>
>>> .
>>>
> 
> .
> 


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 2/2] f2fs: tune discard speed with storage usage rate

2018-08-14 Thread Chao Yu
On 2018/8/15 10:56, Jaegeuk Kim wrote:
> On 08/15, Chao Yu wrote:
>> On 2018/8/15 10:33, Jaegeuk Kim wrote:
>>> On 08/15, Chao Yu wrote:
 On 2018/8/15 1:23, Jaegeuk Kim wrote:
> On 08/14, Chao Yu wrote:
>> On 2018/8/14 12:19, Jaegeuk Kim wrote:
>>> On 08/10, Chao Yu wrote:
 Previously, discard speed was fixed mostly, and in high usage rate
 device, we will speed up issuing discard, but it doesn't make sense
 that in a non-full filesystem, we still issue discard with slow speed.
>>>
>>> Could you please elaborate the problem in more detail? The speed depends
>>> on how many candidates?
>>
>> undiscard blocks are all 4k granularity.
>> a) utility: filesystem: 20% + undiscard blocks: 20% = flash storage: 40%
>> b) utility: filesystem: 40% + undiscard blocks: 25% = flash storage: 65%
>> c) utility: filesystem: 60% + undiscard blocks: 30% = flash storage: 100%
>>
>>
>> 1. for case c), we need to speed up issuing discard based on utilization 
>> of
>> "filesystem + undiscard" instead of just utilization of filesystem.
>>
>> -if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL) {
>> -dpolicy->granularity = 1;
>> -dpolicy->max_interval = 
>> DEF_MIN_DISCARD_ISSUE_TIME;
>> -}
>>
>> 2. If free space in storage touches therein threshold, performance will 
>> be very
>> sensitive. In low-end storage, with high usage in space, even free space 
>> is
>> reduced by 1%, performance will decrease a lot.
>
> So, we may need to distinguish low-end vs. high-end storage. In high-end 
> case,
> it'd be better to avoid IO contention, while low-end device wants to get 
> more
> discard commands as much as possible. So, how about adding an option for 
> this
> as a tunable point?

 Agreed, how about adding a sysfs entry discard_tunning:
 1: enabled, use 4k granularity, self-adapted speed based on real device 
 free space.
 0: disabled, use dcc->discard_granularity, fixed speed.

 By default: enabled

 How do you think?
>>>
>>> I don't think this is proper with a sysfs entry, since we already know the
>>
>> You mean by storage capacity? <= 32GB means low-end?
> 
> Yes, that's current condition to judge it. If there is any other method, it'd 
> be

That would be hard code...

Still I have not got any other method to do the judgment except capacity.

Thanks,

> better to change it.
> 
>>
>> Thanks,
>>
>>> device type when mounting the partition. We won't require to change the 
>>> policy
>>> on the fly. And, I still don't get to change the default.
>>>

 Thanks,

>
>>
>> IMO, in above cases, we'd better to issue discard with high speed for 
>> c), middle
>> speed for b), and low speed for a).
>>
>> How do you think?
>>
>> Thanks,
>>
>>>
>>> Thanks,
>>>

 Anyway, it comes out undiscarded block makes FTL GC be lower efficient
 and causing high lifetime overhead.

 Let's tune discard speed as below:

 a. adjust default issue interval:
originalafter
 min_interval:  50ms100ms
 mid_interval:  500ms   1000ms
 max_interval:  6ms 1ms

 b. if last time we stop issuing discard due to IO interruption of user,
 let's reset all {min,mid,max}_interval to default one.

 c. tune {min,mid,max}_interval with below calculation method:

 base_interval = default_interval / 10;
 total_interval = default_interval - base_interval;
 interval = base_interval + total_interval * (100 - dev_util) / 100;

 For example:
 min_interval (:100ms)
 dev_util (%)   interval (ms)
 0  100
 10 91
 20 82
 30 73
 ...
 80 28
 90 19
 10010

 Signed-off-by: Chao Yu 
 ---
  fs/f2fs/f2fs.h| 11 
  fs/f2fs/segment.c | 64 +--
  fs/f2fs/segment.h |  9 +++
  fs/f2fs/super.c   |  2 +-
  4 files changed, 67 insertions(+), 19 deletions(-)

 diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
 index 273ffdaf4891..a1dd2e1c3cb9 100644
 --- a/fs/f2fs/f2fs.h
 +++ b/fs/f2fs/f2fs.h
 @@ -185,10 +185,9 @@ enum {
  
  #define MAX_DISCARD_BLOCKS(sbi)   BLKS_PER_SEC(sbi)
  #define DEF_MAX_DISCARD_REQUEST   8   /* issue 8 
 discards per round */
 -#define DEF_MIN_DISCARD_ISSUE_TIME50  /* 50 ms, if exists */
 -#define 

Re: [f2fs-dev] [PATCH] Revert "f2fs: use printk_ratelimited for f2fs_msg"

2018-08-14 Thread Jaegeuk Kim
On 08/15, Chao Yu wrote:
> On 2018/8/15 1:09, Jaegeuk Kim wrote:
> > On 08/13, Chao Yu wrote:
> >> Don't limit printing log, so that we will not miss any key messages.
> > 
> > For example, this can avoid lots of messages during roll-forward recovery.
> 
> That's really important evidence that can prove filesystem didn't lose last
> fsynced data, when file/data is missing after SPO, otherwise, filesystem will
> still be treated as suspect... :(

It's too bad that we need to rely on kernel messages for that purpose. Anyway,
if so, we may need to use printk() for essential messages only.

> 
> Thanks,
> 
> > 
> >>
> >> This reverts commit a36c106dffb616250117efb1cab271c19a8f94ff.
> >>
> >> Signed-off-by: Chao Yu 
> >> ---
> >>  fs/f2fs/super.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> >> index bf4c919fe418..3d89d94191e7 100644
> >> --- a/fs/f2fs/super.c
> >> +++ b/fs/f2fs/super.c
> >> @@ -209,7 +209,7 @@ void f2fs_msg(struct super_block *sb, const char 
> >> *level, const char *fmt, ...)
> >>va_start(args, fmt);
> >>vaf.fmt = fmt;
> >>vaf.va = 
> >> -  printk_ratelimited("%sF2FS-fs (%s): %pV\n", level, sb->s_id, );
> >> +  printk("%sF2FS-fs (%s): %pV\n", level, sb->s_id, );
> >>va_end(args);
> >>  }
> >>  
> >> -- 
> >> 2.18.0.rc1
> > 
> > .
> > 

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 2/2] f2fs: tune discard speed with storage usage rate

2018-08-14 Thread Jaegeuk Kim
On 08/15, Chao Yu wrote:
> On 2018/8/15 10:33, Jaegeuk Kim wrote:
> > On 08/15, Chao Yu wrote:
> >> On 2018/8/15 1:23, Jaegeuk Kim wrote:
> >>> On 08/14, Chao Yu wrote:
>  On 2018/8/14 12:19, Jaegeuk Kim wrote:
> > On 08/10, Chao Yu wrote:
> >> Previously, discard speed was fixed mostly, and in high usage rate
> >> device, we will speed up issuing discard, but it doesn't make sense
> >> that in a non-full filesystem, we still issue discard with slow speed.
> >
> > Could you please elaborate the problem in more detail? The speed depends
> > on how many candidates?
> 
>  undiscard blocks are all 4k granularity.
>  a) utility: filesystem: 20% + undiscard blocks: 20% = flash storage: 40%
>  b) utility: filesystem: 40% + undiscard blocks: 25% = flash storage: 65%
>  c) utility: filesystem: 60% + undiscard blocks: 30% = flash storage: 100%
> 
> 
>  1. for case c), we need to speed up issuing discard based on utilization 
>  of
>  "filesystem + undiscard" instead of just utilization of filesystem.
> 
>  -if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL) {
>  -dpolicy->granularity = 1;
>  -dpolicy->max_interval = 
>  DEF_MIN_DISCARD_ISSUE_TIME;
>  -}
> 
>  2. If free space in storage touches therein threshold, performance will 
>  be very
>  sensitive. In low-end storage, with high usage in space, even free space 
>  is
>  reduced by 1%, performance will decrease a lot.
> >>>
> >>> So, we may need to distinguish low-end vs. high-end storage. In high-end 
> >>> case,
> >>> it'd be better to avoid IO contention, while low-end device wants to get 
> >>> more
> >>> discard commands as much as possible. So, how about adding an option for 
> >>> this
> >>> as a tunable point?
> >>
> >> Agreed, how about adding a sysfs entry discard_tunning:
> >> 1: enabled, use 4k granularity, self-adapted speed based on real device 
> >> free space.
> >> 0: disabled, use dcc->discard_granularity, fixed speed.
> >>
> >> By default: enabled
> >>
> >> How do you think?
> > 
> > I don't think this is proper with a sysfs entry, since we already know the
> 
> You mean by storage capacity? <= 32GB means low-end?

Yes, that's current condition to judge it. If there is any other method, it'd be
better to change it.

> 
> Thanks,
> 
> > device type when mounting the partition. We won't require to change the 
> > policy
> > on the fly. And, I still don't get to change the default.
> > 
> >>
> >> Thanks,
> >>
> >>>
> 
>  IMO, in above cases, we'd better to issue discard with high speed for 
>  c), middle
>  speed for b), and low speed for a).
> 
>  How do you think?
> 
>  Thanks,
> 
> >
> > Thanks,
> >
> >>
> >> Anyway, it comes out undiscarded block makes FTL GC be lower efficient
> >> and causing high lifetime overhead.
> >>
> >> Let's tune discard speed as below:
> >>
> >> a. adjust default issue interval:
> >>originalafter
> >> min_interval:  50ms100ms
> >> mid_interval:  500ms   1000ms
> >> max_interval:  6ms 1ms
> >>
> >> b. if last time we stop issuing discard due to IO interruption of user,
> >> let's reset all {min,mid,max}_interval to default one.
> >>
> >> c. tune {min,mid,max}_interval with below calculation method:
> >>
> >> base_interval = default_interval / 10;
> >> total_interval = default_interval - base_interval;
> >> interval = base_interval + total_interval * (100 - dev_util) / 100;
> >>
> >> For example:
> >> min_interval (:100ms)
> >> dev_util (%)   interval (ms)
> >> 0  100
> >> 10 91
> >> 20 82
> >> 30 73
> >> ...
> >> 80 28
> >> 90 19
> >> 10010
> >>
> >> Signed-off-by: Chao Yu 
> >> ---
> >>  fs/f2fs/f2fs.h| 11 
> >>  fs/f2fs/segment.c | 64 +--
> >>  fs/f2fs/segment.h |  9 +++
> >>  fs/f2fs/super.c   |  2 +-
> >>  4 files changed, 67 insertions(+), 19 deletions(-)
> >>
> >> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> >> index 273ffdaf4891..a1dd2e1c3cb9 100644
> >> --- a/fs/f2fs/f2fs.h
> >> +++ b/fs/f2fs/f2fs.h
> >> @@ -185,10 +185,9 @@ enum {
> >>  
> >>  #define MAX_DISCARD_BLOCKS(sbi)   BLKS_PER_SEC(sbi)
> >>  #define DEF_MAX_DISCARD_REQUEST   8   /* issue 8 
> >> discards per round */
> >> -#define DEF_MIN_DISCARD_ISSUE_TIME50  /* 50 ms, if exists */
> >> -#define DEF_MID_DISCARD_ISSUE_TIME500 /* 500 ms, if device 
> >> busy */
> >> -#define DEF_MAX_DISCARD_ISSUE_TIME6   /* 60 s, if no 
> >> candidates */
> 

Re: [f2fs-dev] [PATCH 2/2] f2fs: tune discard speed with storage usage rate

2018-08-14 Thread Chao Yu
On 2018/8/15 10:33, Jaegeuk Kim wrote:
> On 08/15, Chao Yu wrote:
>> On 2018/8/15 1:23, Jaegeuk Kim wrote:
>>> On 08/14, Chao Yu wrote:
 On 2018/8/14 12:19, Jaegeuk Kim wrote:
> On 08/10, Chao Yu wrote:
>> Previously, discard speed was fixed mostly, and in high usage rate
>> device, we will speed up issuing discard, but it doesn't make sense
>> that in a non-full filesystem, we still issue discard with slow speed.
>
> Could you please elaborate the problem in more detail? The speed depends
> on how many candidates?

 undiscard blocks are all 4k granularity.
 a) utility: filesystem: 20% + undiscard blocks: 20% = flash storage: 40%
 b) utility: filesystem: 40% + undiscard blocks: 25% = flash storage: 65%
 c) utility: filesystem: 60% + undiscard blocks: 30% = flash storage: 100%


 1. for case c), we need to speed up issuing discard based on utilization of
 "filesystem + undiscard" instead of just utilization of filesystem.

 -  if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL) {
 -  dpolicy->granularity = 1;
 -  dpolicy->max_interval = DEF_MIN_DISCARD_ISSUE_TIME;
 -  }

 2. If free space in storage touches therein threshold, performance will be 
 very
 sensitive. In low-end storage, with high usage in space, even free space is
 reduced by 1%, performance will decrease a lot.
>>>
>>> So, we may need to distinguish low-end vs. high-end storage. In high-end 
>>> case,
>>> it'd be better to avoid IO contention, while low-end device wants to get 
>>> more
>>> discard commands as much as possible. So, how about adding an option for 
>>> this
>>> as a tunable point?
>>
>> Agreed, how about adding a sysfs entry discard_tunning:
>> 1: enabled, use 4k granularity, self-adapted speed based on real device free 
>> space.
>> 0: disabled, use dcc->discard_granularity, fixed speed.
>>
>> By default: enabled
>>
>> How do you think?
> 
> I don't think this is proper with a sysfs entry, since we already know the

You mean by storage capacity? <= 32GB means low-end?

Thanks,

> device type when mounting the partition. We won't require to change the policy
> on the fly. And, I still don't get to change the default.
> 
>>
>> Thanks,
>>
>>>

 IMO, in above cases, we'd better to issue discard with high speed for c), 
 middle
 speed for b), and low speed for a).

 How do you think?

 Thanks,

>
> Thanks,
>
>>
>> Anyway, it comes out undiscarded block makes FTL GC be lower efficient
>> and causing high lifetime overhead.
>>
>> Let's tune discard speed as below:
>>
>> a. adjust default issue interval:
>>  originalafter
>> min_interval:50ms100ms
>> mid_interval:500ms   1000ms
>> max_interval:6ms 1ms
>>
>> b. if last time we stop issuing discard due to IO interruption of user,
>> let's reset all {min,mid,max}_interval to default one.
>>
>> c. tune {min,mid,max}_interval with below calculation method:
>>
>> base_interval = default_interval / 10;
>> total_interval = default_interval - base_interval;
>> interval = base_interval + total_interval * (100 - dev_util) / 100;
>>
>> For example:
>> min_interval (:100ms)
>> dev_util (%) interval (ms)
>> 0100
>> 10   91
>> 20   82
>> 30   73
>> ...
>> 80   28
>> 90   19
>> 100  10
>>
>> Signed-off-by: Chao Yu 
>> ---
>>  fs/f2fs/f2fs.h| 11 
>>  fs/f2fs/segment.c | 64 +--
>>  fs/f2fs/segment.h |  9 +++
>>  fs/f2fs/super.c   |  2 +-
>>  4 files changed, 67 insertions(+), 19 deletions(-)
>>
>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>> index 273ffdaf4891..a1dd2e1c3cb9 100644
>> --- a/fs/f2fs/f2fs.h
>> +++ b/fs/f2fs/f2fs.h
>> @@ -185,10 +185,9 @@ enum {
>>  
>>  #define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi)
>>  #define DEF_MAX_DISCARD_REQUEST 8   /* issue 8 discards per 
>> round */
>> -#define DEF_MIN_DISCARD_ISSUE_TIME  50  /* 50 ms, if exists */
>> -#define DEF_MID_DISCARD_ISSUE_TIME  500 /* 500 ms, if device 
>> busy */
>> -#define DEF_MAX_DISCARD_ISSUE_TIME  6   /* 60 s, if no 
>> candidates */
>> -#define DEF_DISCARD_URGENT_UTIL 80  /* do more discard over 
>> 80% */
>> +#define DEF_MIN_DISCARD_ISSUE_TIME  100 /* 100 ms, if exists */
>> +#define DEF_MID_DISCARD_ISSUE_TIME  1000/* 1000 ms, if device 
>> busy */
>> +#define DEF_MAX_DISCARD_ISSUE_TIME  1   /* 1 ms, if no 
>> candidates */
>>  #define DEF_CP_INTERVAL 60  /* 60 

Re: [f2fs-dev] [PATCH 2/2] f2fs: tune discard speed with storage usage rate

2018-08-14 Thread Jaegeuk Kim
On 08/15, Chao Yu wrote:
> On 2018/8/15 1:23, Jaegeuk Kim wrote:
> > On 08/14, Chao Yu wrote:
> >> On 2018/8/14 12:19, Jaegeuk Kim wrote:
> >>> On 08/10, Chao Yu wrote:
>  Previously, discard speed was fixed mostly, and in high usage rate
>  device, we will speed up issuing discard, but it doesn't make sense
>  that in a non-full filesystem, we still issue discard with slow speed.
> >>>
> >>> Could you please elaborate the problem in more detail? The speed depends
> >>> on how many candidates?
> >>
> >> undiscard blocks are all 4k granularity.
> >> a) utility: filesystem: 20% + undiscard blocks: 20% = flash storage: 40%
> >> b) utility: filesystem: 40% + undiscard blocks: 25% = flash storage: 65%
> >> c) utility: filesystem: 60% + undiscard blocks: 30% = flash storage: 100%
> >>
> >>
> >> 1. for case c), we need to speed up issuing discard based on utilization of
> >> "filesystem + undiscard" instead of just utilization of filesystem.
> >>
> >> -  if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL) {
> >> -  dpolicy->granularity = 1;
> >> -  dpolicy->max_interval = DEF_MIN_DISCARD_ISSUE_TIME;
> >> -  }
> >>
> >> 2. If free space in storage touches therein threshold, performance will be 
> >> very
> >> sensitive. In low-end storage, with high usage in space, even free space is
> >> reduced by 1%, performance will decrease a lot.
> > 
> > So, we may need to distinguish low-end vs. high-end storage. In high-end 
> > case,
> > it'd be better to avoid IO contention, while low-end device wants to get 
> > more
> > discard commands as much as possible. So, how about adding an option for 
> > this
> > as a tunable point?
> 
> Agreed, how about adding a sysfs entry discard_tunning:
> 1: enabled, use 4k granularity, self-adapted speed based on real device free 
> space.
> 0: disabled, use dcc->discard_granularity, fixed speed.
> 
> By default: enabled
> 
> How do you think?

I don't think this is proper with a sysfs entry, since we already know the
device type when mounting the partition. We won't require to change the policy
on the fly. And, I still don't get to change the default.

> 
> Thanks,
> 
> > 
> >>
> >> IMO, in above cases, we'd better to issue discard with high speed for c), 
> >> middle
> >> speed for b), and low speed for a).
> >>
> >> How do you think?
> >>
> >> Thanks,
> >>
> >>>
> >>> Thanks,
> >>>
> 
>  Anyway, it comes out undiscarded block makes FTL GC be lower efficient
>  and causing high lifetime overhead.
> 
>  Let's tune discard speed as below:
> 
>  a. adjust default issue interval:
>   originalafter
>  min_interval:50ms100ms
>  mid_interval:500ms   1000ms
>  max_interval:6ms 1ms
> 
>  b. if last time we stop issuing discard due to IO interruption of user,
>  let's reset all {min,mid,max}_interval to default one.
> 
>  c. tune {min,mid,max}_interval with below calculation method:
> 
>  base_interval = default_interval / 10;
>  total_interval = default_interval - base_interval;
>  interval = base_interval + total_interval * (100 - dev_util) / 100;
> 
>  For example:
>  min_interval (:100ms)
>  dev_util (%) interval (ms)
>  0100
>  10   91
>  20   82
>  30   73
>  ...
>  80   28
>  90   19
>  100  10
> 
>  Signed-off-by: Chao Yu 
>  ---
>   fs/f2fs/f2fs.h| 11 
>   fs/f2fs/segment.c | 64 +--
>   fs/f2fs/segment.h |  9 +++
>   fs/f2fs/super.c   |  2 +-
>   4 files changed, 67 insertions(+), 19 deletions(-)
> 
>  diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>  index 273ffdaf4891..a1dd2e1c3cb9 100644
>  --- a/fs/f2fs/f2fs.h
>  +++ b/fs/f2fs/f2fs.h
>  @@ -185,10 +185,9 @@ enum {
>   
>   #define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi)
>   #define DEF_MAX_DISCARD_REQUEST 8   /* issue 8 discards per 
>  round */
>  -#define DEF_MIN_DISCARD_ISSUE_TIME  50  /* 50 ms, if exists */
>  -#define DEF_MID_DISCARD_ISSUE_TIME  500 /* 500 ms, if device 
>  busy */
>  -#define DEF_MAX_DISCARD_ISSUE_TIME  6   /* 60 s, if no 
>  candidates */
>  -#define DEF_DISCARD_URGENT_UTIL 80  /* do more discard over 
>  80% */
>  +#define DEF_MIN_DISCARD_ISSUE_TIME  100 /* 100 ms, if exists */
>  +#define DEF_MID_DISCARD_ISSUE_TIME  1000/* 1000 ms, if device 
>  busy */
>  +#define DEF_MAX_DISCARD_ISSUE_TIME  1   /* 1 ms, if no 
>  candidates */
>   #define DEF_CP_INTERVAL 60  /* 60 secs */
>   #define DEF_IDLE_INTERVAL   5   /* 5 secs */
>   
>  @@ -248,7 +247,8 @@ struct 

Re: [f2fs-dev] [PATCH v3] f2fs: fix performance issue observed with multi-thread sequential read

2018-08-14 Thread Jaegeuk Kim
On 08/15, Chao Yu wrote:
> On 2018/8/15 1:28, Jaegeuk Kim wrote:
> > On 08/14, Chao Yu wrote:
> >> On 2018/8/14 12:04, Jaegeuk Kim wrote:
> >>> On 08/14, Chao Yu wrote:
>  On 2018/8/14 4:11, Jaegeuk Kim wrote:
> > On 08/13, Chao Yu wrote:
> >> Hi Jaegeuk,
> >>
> >> On 2018/8/11 2:56, Jaegeuk Kim wrote:
> >>> This reverts the commit - "b93f771 - f2fs: remove writepages lock"
> >>> to fix the drop in sequential read throughput.
> >>>
> >>> Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L
> >>> device: UFS
> >>>
> >>> Before -
> >>> read throughput: 185 MB/s
> >>> total read requests: 85177 (of these ~8 are 4KB size requests).
> >>> total write requests: 2546 (of these ~2208 requests are written in 
> >>> 512KB).
> >>>
> >>> After -
> >>> read throughput: 758 MB/s
> >>> total read requests: 2417 (of these ~2042 are 512KB reads).
> >>> total write requests: 2701 (of these ~2034 requests are written in 
> >>> 512KB).
> >>
> >> IMO, it only impact sequential read performance in a large file which 
> >> may be
> >> fragmented during multi-thread writing.
> >>
> >> In android environment, mostly, the large file should be cold type, 
> >> such as apk,
> >> mp3, rmvb, jpeg..., so I think we only need to serialize writepages() 
> >> for cold
> >> data area writer.
> >>
> >> So how about adding a mount option to serialize writepage() for 
> >> different type
> >> of log, e.g. in android, using serialize=4; by default, using 
> >> serialize=7
> >> HOT_DATA   1
> >> WARM_DATA  2
> >> COLD_DATA  4
> >
> > Well, I don't think we need to give too many mount options for this 
> > fragmented
> > case. How about doing this for the large files only like this?
> 
>  Thread A write 512 pages Thread B write 8 pages
> 
>  - writepages()
>   - mutex_lock(>writepages);
>    - writepage();
>  ...
>   - writepages()
>    - writepage()
> 
>    - writepage();
>  ...
>   - mutex_unlock(>writepages);
> 
>  Above case will also cause fragmentation since we didn't serialize all
>  concurrent IO with the lock.
> 
>  Do we need to consider such case?
> >>>
> >>> We can simply allow 512 and 8 in the same segment, which would not a big 
> >>> deal,
> >>> when considering starvation of Thread B.
> >>
> >> Yeah, but in reality, there would be more threads competing in same log 
> >> header,
> >> so I worry that the effect of defragmenting will not so good as we expect,
> >> anyway, for benchmark, it's enough.
> > 
> > Basically, I think this is not a benchmark issue. :) It just reveals the 
> > issue
> > much easily. Let me think about three cases:
> > 1) WB_SYNC_NONE & WB_SYNC_NONE
> >  -> can simply use mutex_lock
> > 
> > 2) WB_SYNC_ALL & WB_SYNC_NONE
> >  -> can use mutex_lock on WB_SYNC_ALL having >512 blocks, while WB_SYNC_NONE
> > will skip writing blocks
> > 
> > 3) WB_SYNC_ALL & WB_SYNC_ALL
> >  -> can use mutex_lock on WB_SYNC_ALL having >512 blocks, in order to avoid
> > starvation.
> > 
> > 
> > I've been testing the below.
> > 
> > if (!S_ISDIR(inode->i_mode) && (wbc->sync_mode != WB_SYNC_ALL ||
> > get_dirty_pages(inode) <= SM_I(sbi)->min_seq_blocks)) {
> > mutex_lock(>writepages);
> > locked = true;
> 
> Just cover buffered IO? how about covering Direct IO and atomic write as well?

I'd expect direct IO does in-place-updates, and not sure whether we need to
add another lock contention between buffered or direct IO. Atomic writes
would be covered by ->min_seq_blocks.

> 
> Thanks,
> 
> > }
> > 
> > Thanks,
> > 
> >>
> >> Thanks,
> >>
> >>>
> 
>  Thanks,
> 
> >
> > >From 4fea0b6e4da8512a72dd52afc7a51beb35966ad9 Mon Sep 17 00:00:00 2001
> > From: Jaegeuk Kim 
> > Date: Thu, 9 Aug 2018 17:53:34 -0700
> > Subject: [PATCH] f2fs: fix performance issue observed with multi-thread
> >  sequential read
> >
> > This reverts the commit - "b93f771 - f2fs: remove writepages lock"
> > to fix the drop in sequential read throughput.
> >
> > Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L
> > device: UFS
> >
> > Before -
> > read throughput: 185 MB/s
> > total read requests: 85177 (of these ~8 are 4KB size requests).
> > total write requests: 2546 (of these ~2208 requests are written in 
> > 512KB).
> >
> > After -
> > read throughput: 758 MB/s
> > total read requests: 2417 (of these ~2042 are 512KB reads).
> > total write requests: 2701 (of these ~2034 requests are written in 
> > 512KB).
> >
> > Signed-off-by: Sahitya Tummala 
> > Signed-off-by: 

Re: [f2fs-dev] [PATCH v3] f2fs: fix performance issue observed with multi-thread sequential read

2018-08-14 Thread Chao Yu
On 2018/8/15 1:28, Jaegeuk Kim wrote:
> On 08/14, Chao Yu wrote:
>> On 2018/8/14 12:04, Jaegeuk Kim wrote:
>>> On 08/14, Chao Yu wrote:
 On 2018/8/14 4:11, Jaegeuk Kim wrote:
> On 08/13, Chao Yu wrote:
>> Hi Jaegeuk,
>>
>> On 2018/8/11 2:56, Jaegeuk Kim wrote:
>>> This reverts the commit - "b93f771 - f2fs: remove writepages lock"
>>> to fix the drop in sequential read throughput.
>>>
>>> Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L
>>> device: UFS
>>>
>>> Before -
>>> read throughput: 185 MB/s
>>> total read requests: 85177 (of these ~8 are 4KB size requests).
>>> total write requests: 2546 (of these ~2208 requests are written in 
>>> 512KB).
>>>
>>> After -
>>> read throughput: 758 MB/s
>>> total read requests: 2417 (of these ~2042 are 512KB reads).
>>> total write requests: 2701 (of these ~2034 requests are written in 
>>> 512KB).
>>
>> IMO, it only impact sequential read performance in a large file which 
>> may be
>> fragmented during multi-thread writing.
>>
>> In android environment, mostly, the large file should be cold type, such 
>> as apk,
>> mp3, rmvb, jpeg..., so I think we only need to serialize writepages() 
>> for cold
>> data area writer.
>>
>> So how about adding a mount option to serialize writepage() for 
>> different type
>> of log, e.g. in android, using serialize=4; by default, using serialize=7
>> HOT_DATA 1
>> WARM_DATA2
>> COLD_DATA4
>
> Well, I don't think we need to give too many mount options for this 
> fragmented
> case. How about doing this for the large files only like this?

 Thread A write 512 pages   Thread B write 8 pages

 - writepages()
  - mutex_lock(>writepages);
   - writepage();
 ...
- writepages()
 - writepage()
  
   - writepage();
 ...
  - mutex_unlock(>writepages);

 Above case will also cause fragmentation since we didn't serialize all
 concurrent IO with the lock.

 Do we need to consider such case?
>>>
>>> We can simply allow 512 and 8 in the same segment, which would not a big 
>>> deal,
>>> when considering starvation of Thread B.
>>
>> Yeah, but in reality, there would be more threads competing in same log 
>> header,
>> so I worry that the effect of defragmenting will not so good as we expect,
>> anyway, for benchmark, it's enough.
> 
> Basically, I think this is not a benchmark issue. :) It just reveals the issue
> much easily. Let me think about three cases:
> 1) WB_SYNC_NONE & WB_SYNC_NONE
>  -> can simply use mutex_lock
> 
> 2) WB_SYNC_ALL & WB_SYNC_NONE
>  -> can use mutex_lock on WB_SYNC_ALL having >512 blocks, while WB_SYNC_NONE
> will skip writing blocks
> 
> 3) WB_SYNC_ALL & WB_SYNC_ALL
>  -> can use mutex_lock on WB_SYNC_ALL having >512 blocks, in order to avoid
> starvation.
> 
> 
> I've been testing the below.
> 
> if (!S_ISDIR(inode->i_mode) && (wbc->sync_mode != WB_SYNC_ALL ||
> get_dirty_pages(inode) <= SM_I(sbi)->min_seq_blocks)) {
> mutex_lock(>writepages);
> locked = true;

Just cover buffered IO? how about covering Direct IO and atomic write as well?

Thanks,

> }
> 
> Thanks,
> 
>>
>> Thanks,
>>
>>>

 Thanks,

>
> >From 4fea0b6e4da8512a72dd52afc7a51beb35966ad9 Mon Sep 17 00:00:00 2001
> From: Jaegeuk Kim 
> Date: Thu, 9 Aug 2018 17:53:34 -0700
> Subject: [PATCH] f2fs: fix performance issue observed with multi-thread
>  sequential read
>
> This reverts the commit - "b93f771 - f2fs: remove writepages lock"
> to fix the drop in sequential read throughput.
>
> Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L
> device: UFS
>
> Before -
> read throughput: 185 MB/s
> total read requests: 85177 (of these ~8 are 4KB size requests).
> total write requests: 2546 (of these ~2208 requests are written in 512KB).
>
> After -
> read throughput: 758 MB/s
> total read requests: 2417 (of these ~2042 are 512KB reads).
> total write requests: 2701 (of these ~2034 requests are written in 512KB).
>
> Signed-off-by: Sahitya Tummala 
> Signed-off-by: Jaegeuk Kim 
> ---
>  Documentation/ABI/testing/sysfs-fs-f2fs |  8 
>  fs/f2fs/data.c  | 10 ++
>  fs/f2fs/f2fs.h  |  2 ++
>  fs/f2fs/segment.c   |  1 +
>  fs/f2fs/super.c |  1 +
>  fs/f2fs/sysfs.c |  2 ++
>  6 files changed, 24 insertions(+)
>
> diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs 
> 

Re: [f2fs-dev] [PATCH 2/2] f2fs: tune discard speed with storage usage rate

2018-08-14 Thread Chao Yu
On 2018/8/15 1:23, Jaegeuk Kim wrote:
> On 08/14, Chao Yu wrote:
>> On 2018/8/14 12:19, Jaegeuk Kim wrote:
>>> On 08/10, Chao Yu wrote:
 Previously, discard speed was fixed mostly, and in high usage rate
 device, we will speed up issuing discard, but it doesn't make sense
 that in a non-full filesystem, we still issue discard with slow speed.
>>>
>>> Could you please elaborate the problem in more detail? The speed depends
>>> on how many candidates?
>>
>> undiscard blocks are all 4k granularity.
>> a) utility: filesystem: 20% + undiscard blocks: 20% = flash storage: 40%
>> b) utility: filesystem: 40% + undiscard blocks: 25% = flash storage: 65%
>> c) utility: filesystem: 60% + undiscard blocks: 30% = flash storage: 100%
>>
>>
>> 1. for case c), we need to speed up issuing discard based on utilization of
>> "filesystem + undiscard" instead of just utilization of filesystem.
>>
>> -if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL) {
>> -dpolicy->granularity = 1;
>> -dpolicy->max_interval = DEF_MIN_DISCARD_ISSUE_TIME;
>> -}
>>
>> 2. If free space in storage touches therein threshold, performance will be 
>> very
>> sensitive. In low-end storage, with high usage in space, even free space is
>> reduced by 1%, performance will decrease a lot.
> 
> So, we may need to distinguish low-end vs. high-end storage. In high-end case,
> it'd be better to avoid IO contention, while low-end device wants to get more
> discard commands as much as possible. So, how about adding an option for this
> as a tunable point?

Agreed, how about adding a sysfs entry discard_tunning:
1: enabled, use 4k granularity, self-adapted speed based on real device free 
space.
0: disabled, use dcc->discard_granularity, fixed speed.

By default: enabled

How do you think?

Thanks,

> 
>>
>> IMO, in above cases, we'd better to issue discard with high speed for c), 
>> middle
>> speed for b), and low speed for a).
>>
>> How do you think?
>>
>> Thanks,
>>
>>>
>>> Thanks,
>>>

 Anyway, it comes out undiscarded block makes FTL GC be lower efficient
 and causing high lifetime overhead.

 Let's tune discard speed as below:

 a. adjust default issue interval:
originalafter
 min_interval:  50ms100ms
 mid_interval:  500ms   1000ms
 max_interval:  6ms 1ms

 b. if last time we stop issuing discard due to IO interruption of user,
 let's reset all {min,mid,max}_interval to default one.

 c. tune {min,mid,max}_interval with below calculation method:

 base_interval = default_interval / 10;
 total_interval = default_interval - base_interval;
 interval = base_interval + total_interval * (100 - dev_util) / 100;

 For example:
 min_interval (:100ms)
 dev_util (%)   interval (ms)
 0  100
 10 91
 20 82
 30 73
 ...
 80 28
 90 19
 10010

 Signed-off-by: Chao Yu 
 ---
  fs/f2fs/f2fs.h| 11 
  fs/f2fs/segment.c | 64 +--
  fs/f2fs/segment.h |  9 +++
  fs/f2fs/super.c   |  2 +-
  4 files changed, 67 insertions(+), 19 deletions(-)

 diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
 index 273ffdaf4891..a1dd2e1c3cb9 100644
 --- a/fs/f2fs/f2fs.h
 +++ b/fs/f2fs/f2fs.h
 @@ -185,10 +185,9 @@ enum {
  
  #define MAX_DISCARD_BLOCKS(sbi)   BLKS_PER_SEC(sbi)
  #define DEF_MAX_DISCARD_REQUEST   8   /* issue 8 discards per 
 round */
 -#define DEF_MIN_DISCARD_ISSUE_TIME50  /* 50 ms, if exists */
 -#define DEF_MID_DISCARD_ISSUE_TIME500 /* 500 ms, if device 
 busy */
 -#define DEF_MAX_DISCARD_ISSUE_TIME6   /* 60 s, if no 
 candidates */
 -#define DEF_DISCARD_URGENT_UTIL   80  /* do more discard over 
 80% */
 +#define DEF_MIN_DISCARD_ISSUE_TIME100 /* 100 ms, if exists */
 +#define DEF_MID_DISCARD_ISSUE_TIME1000/* 1000 ms, if device 
 busy */
 +#define DEF_MAX_DISCARD_ISSUE_TIME1   /* 1 ms, if no 
 candidates */
  #define DEF_CP_INTERVAL   60  /* 60 secs */
  #define DEF_IDLE_INTERVAL 5   /* 5 secs */
  
 @@ -248,7 +247,8 @@ struct discard_entry {
  };
  
  /* default discard granularity of inner discard thread, unit: block count 
 */
 -#define DEFAULT_DISCARD_GRANULARITY   1
 +#define MID_DISCARD_GRANULARITY   16
 +#define MIN_DISCARD_GRANULARITY   1
  
  /* max discard pend list number */
  #define MAX_PLIST_NUM 512
 @@ -330,6 +330,7 @@ struct discard_cmd_control {
atomic_t discard_cmd_cnt;   /* # of cached cmd 

Re: [f2fs-dev] [PATCH] Revert "f2fs: use printk_ratelimited for f2fs_msg"

2018-08-14 Thread Chao Yu
On 2018/8/15 1:09, Jaegeuk Kim wrote:
> On 08/13, Chao Yu wrote:
>> Don't limit printing log, so that we will not miss any key messages.
> 
> For example, this can avoid lots of messages during roll-forward recovery.

That's really important evidence that can prove filesystem didn't lose last
fsynced data, when file/data is missing after SPO, otherwise, filesystem will
still be treated as suspect... :(

Thanks,

> 
>>
>> This reverts commit a36c106dffb616250117efb1cab271c19a8f94ff.
>>
>> Signed-off-by: Chao Yu 
>> ---
>>  fs/f2fs/super.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>> index bf4c919fe418..3d89d94191e7 100644
>> --- a/fs/f2fs/super.c
>> +++ b/fs/f2fs/super.c
>> @@ -209,7 +209,7 @@ void f2fs_msg(struct super_block *sb, const char *level, 
>> const char *fmt, ...)
>>  va_start(args, fmt);
>>  vaf.fmt = fmt;
>>  vaf.va = 
>> -printk_ratelimited("%sF2FS-fs (%s): %pV\n", level, sb->s_id, );
>> +printk("%sF2FS-fs (%s): %pV\n", level, sb->s_id, );
>>  va_end(args);
>>  }
>>  
>> -- 
>> 2.18.0.rc1
> 
> .
> 


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 2/2] f2fs: tune discard speed with storage usage rate

2018-08-14 Thread Jaegeuk Kim
On 08/14, Chao Yu wrote:
> On 2018/8/14 12:19, Jaegeuk Kim wrote:
> > On 08/10, Chao Yu wrote:
> >> Previously, discard speed was fixed mostly, and in high usage rate
> >> device, we will speed up issuing discard, but it doesn't make sense
> >> that in a non-full filesystem, we still issue discard with slow speed.
> > 
> > Could you please elaborate the problem in more detail? The speed depends
> > on how many candidates?
> 
> undiscard blocks are all 4k granularity.
> a) utility: filesystem: 20% + undiscard blocks: 20% = flash storage: 40%
> b) utility: filesystem: 40% + undiscard blocks: 25% = flash storage: 65%
> c) utility: filesystem: 60% + undiscard blocks: 30% = flash storage: 100%
> 
> 
> 1. for case c), we need to speed up issuing discard based on utilization of
> "filesystem + undiscard" instead of just utilization of filesystem.
> 
> - if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL) {
> - dpolicy->granularity = 1;
> - dpolicy->max_interval = DEF_MIN_DISCARD_ISSUE_TIME;
> - }
> 
> 2. If free space in storage touches therein threshold, performance will be 
> very
> sensitive. In low-end storage, with high usage in space, even free space is
> reduced by 1%, performance will decrease a lot.

So, we may need to distinguish low-end vs. high-end storage. In high-end case,
it'd be better to avoid IO contention, while low-end device wants to get more
discard commands as much as possible. So, how about adding an option for this
as a tunable point?

> 
> IMO, in above cases, we'd better to issue discard with high speed for c), 
> middle
> speed for b), and low speed for a).
> 
> How do you think?
> 
> Thanks,
> 
> > 
> > Thanks,
> > 
> >>
> >> Anyway, it comes out undiscarded block makes FTL GC be lower efficient
> >> and causing high lifetime overhead.
> >>
> >> Let's tune discard speed as below:
> >>
> >> a. adjust default issue interval:
> >>originalafter
> >> min_interval:  50ms100ms
> >> mid_interval:  500ms   1000ms
> >> max_interval:  6ms 1ms
> >>
> >> b. if last time we stop issuing discard due to IO interruption of user,
> >> let's reset all {min,mid,max}_interval to default one.
> >>
> >> c. tune {min,mid,max}_interval with below calculation method:
> >>
> >> base_interval = default_interval / 10;
> >> total_interval = default_interval - base_interval;
> >> interval = base_interval + total_interval * (100 - dev_util) / 100;
> >>
> >> For example:
> >> min_interval (:100ms)
> >> dev_util (%)   interval (ms)
> >> 0  100
> >> 10 91
> >> 20 82
> >> 30 73
> >> ...
> >> 80 28
> >> 90 19
> >> 10010
> >>
> >> Signed-off-by: Chao Yu 
> >> ---
> >>  fs/f2fs/f2fs.h| 11 
> >>  fs/f2fs/segment.c | 64 +--
> >>  fs/f2fs/segment.h |  9 +++
> >>  fs/f2fs/super.c   |  2 +-
> >>  4 files changed, 67 insertions(+), 19 deletions(-)
> >>
> >> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> >> index 273ffdaf4891..a1dd2e1c3cb9 100644
> >> --- a/fs/f2fs/f2fs.h
> >> +++ b/fs/f2fs/f2fs.h
> >> @@ -185,10 +185,9 @@ enum {
> >>  
> >>  #define MAX_DISCARD_BLOCKS(sbi)   BLKS_PER_SEC(sbi)
> >>  #define DEF_MAX_DISCARD_REQUEST   8   /* issue 8 discards per 
> >> round */
> >> -#define DEF_MIN_DISCARD_ISSUE_TIME50  /* 50 ms, if exists */
> >> -#define DEF_MID_DISCARD_ISSUE_TIME500 /* 500 ms, if device 
> >> busy */
> >> -#define DEF_MAX_DISCARD_ISSUE_TIME6   /* 60 s, if no 
> >> candidates */
> >> -#define DEF_DISCARD_URGENT_UTIL   80  /* do more discard over 
> >> 80% */
> >> +#define DEF_MIN_DISCARD_ISSUE_TIME100 /* 100 ms, if exists */
> >> +#define DEF_MID_DISCARD_ISSUE_TIME1000/* 1000 ms, if device 
> >> busy */
> >> +#define DEF_MAX_DISCARD_ISSUE_TIME1   /* 1 ms, if no 
> >> candidates */
> >>  #define DEF_CP_INTERVAL   60  /* 60 secs */
> >>  #define DEF_IDLE_INTERVAL 5   /* 5 secs */
> >>  
> >> @@ -248,7 +247,8 @@ struct discard_entry {
> >>  };
> >>  
> >>  /* default discard granularity of inner discard thread, unit: block count 
> >> */
> >> -#define DEFAULT_DISCARD_GRANULARITY   1
> >> +#define MID_DISCARD_GRANULARITY   16
> >> +#define MIN_DISCARD_GRANULARITY   1
> >>  
> >>  /* max discard pend list number */
> >>  #define MAX_PLIST_NUM 512
> >> @@ -330,6 +330,7 @@ struct discard_cmd_control {
> >>atomic_t discard_cmd_cnt;   /* # of cached cmd count */
> >>struct rb_root root;/* root of discard rb-tree */
> >>bool rbtree_check;  /* config for consistence check 
> >> */
> >> +  bool io_interrupted;/* last state of io interrupted 
> >> */
> >>  };
> >>  
> >>  /* for the list of fsync 

Re: [f2fs-dev] [PATCH] Revert "f2fs: use printk_ratelimited for f2fs_msg"

2018-08-14 Thread Jaegeuk Kim
On 08/13, Chao Yu wrote:
> Don't limit printing log, so that we will not miss any key messages.

For example, this can avoid lots of messages during roll-forward recovery.

> 
> This reverts commit a36c106dffb616250117efb1cab271c19a8f94ff.
> 
> Signed-off-by: Chao Yu 
> ---
>  fs/f2fs/super.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index bf4c919fe418..3d89d94191e7 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -209,7 +209,7 @@ void f2fs_msg(struct super_block *sb, const char *level, 
> const char *fmt, ...)
>   va_start(args, fmt);
>   vaf.fmt = fmt;
>   vaf.va = 
> - printk_ratelimited("%sF2FS-fs (%s): %pV\n", level, sb->s_id, );
> + printk("%sF2FS-fs (%s): %pV\n", level, sb->s_id, );
>   va_end(args);
>  }
>  
> -- 
> 2.18.0.rc1

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH] f2fs: readahead encrypted block during GC

2018-08-14 Thread Chao Yu
From: Chao Yu 

During GC, for each encrypted block, we will read block synchronously
into meta page, and then submit it into current cold data log area.

So this block read model with 4k granularity can make poor performance,
like migrating non-encrypted block, let's readahead encrypted block
as well to improve migration performance.

To implement this, we choose meta page that its index is old block
address of the encrypted block, and readahead ciphertext into this
page, later, if readaheaded page is still updated, we will load its
data into target meta page, and submit the write IO.

Note that for OPU, truncation, deletion, we need to invalid meta
page after we invalid old block address, to make sure we won't load
invalid data from target meta page during encrypted block migration.

for ((i = 0; i < 1000; i++))
do {
xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync";
} done

for ((i = 0; i < 1000; i+=2))
do {
rm /mnt/f2fs/dir/$i;
} done

ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0);

Before:
  gc-6549  [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 
() 786400 + 64 [gc]
  gc-6549  [001] d..1 214682.212802: block_unplug: [gc] 1
  gc-6549  [001]  214682.213892: block_bio_queue: 8,32 R 
67494144 + 8 [gc]
  gc-6549  [001]  214682.213899: block_getrq: 8,32 R 67494144 + 
8 [gc]
  gc-6549  [001]  214682.213902: block_plug: [gc]
  gc-6549  [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 
() 67494144 + 8 [gc]
  gc-6549  [001] d..1 214682.213908: block_unplug: [gc] 1
  gc-6549  [001]  214682.226405: block_bio_queue: 8,32 R 
67494152 + 8 [gc]
  gc-6549  [001]  214682.226412: block_getrq: 8,32 R 67494152 + 
8 [gc]
  gc-6549  [001]  214682.226414: block_plug: [gc]
  gc-6549  [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 
() 67494152 + 8 [gc]
  gc-6549  [001] d..1 214682.226420: block_unplug: [gc] 1
  gc-6549  [001]  214682.226904: block_bio_queue: 8,32 R 
67494160 + 8 [gc]
  gc-6549  [001]  214682.226910: block_getrq: 8,32 R 67494160 + 
8 [gc]
  gc-6549  [001]  214682.226911: block_plug: [gc]
  gc-6549  [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 
() 67494160 + 8 [gc]
  gc-6549  [001] d..1 214682.226916: block_unplug: [gc] 1

After:
  gc-5678  [003]  214327.025906: block_bio_queue: 8,32 R 
67493824 + 8 [gc]
  gc-5678  [003]  214327.025908: block_bio_backmerge: 8,32 R 
67493824 + 8 [gc]
  gc-5678  [003]  214327.025915: block_bio_queue: 8,32 R 
67493832 + 8 [gc]
  gc-5678  [003]  214327.025917: block_bio_backmerge: 8,32 R 
67493832 + 8 [gc]
  gc-5678  [003]  214327.025923: block_bio_queue: 8,32 R 
67493840 + 8 [gc]
  gc-5678  [003]  214327.025925: block_bio_backmerge: 8,32 R 
67493840 + 8 [gc]
  gc-5678  [003]  214327.025932: block_bio_queue: 8,32 R 
67493848 + 8 [gc]
  gc-5678  [003]  214327.025934: block_bio_backmerge: 8,32 R 
67493848 + 8 [gc]
  gc-5678  [003]  214327.025941: block_bio_queue: 8,32 R 
67493856 + 8 [gc]
  gc-5678  [003]  214327.025943: block_bio_backmerge: 8,32 R 
67493856 + 8 [gc]
  gc-5678  [003]  214327.025953: block_bio_queue: 8,32 R 
67493864 + 8 [gc]
  gc-5678  [003]  214327.025955: block_bio_backmerge: 8,32 R 
67493864 + 8 [gc]
  gc-5678  [003]  214327.025962: block_bio_queue: 8,32 R 
67493872 + 8 [gc]
  gc-5678  [003]  214327.025964: block_bio_backmerge: 8,32 R 
67493872 + 8 [gc]
  gc-5678  [003]  214327.025970: block_bio_queue: 8,32 R 
67493880 + 8 [gc]
  gc-5678  [003]  214327.025972: block_bio_backmerge: 8,32 R 
67493880 + 8 [gc]
  gc-5678  [003]  214327.026000: block_bio_queue: 8,32 WS 
34123776 + 2048 [gc]
  gc-5678  [003]  214327.026019: block_getrq: 8,32 WS 34123776 
+ 2048 [gc]
  gc-5678  [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 
() 67493632 + 256 [gc]
  gc-5678  [003] d..1 214327.026023: block_unplug: [gc] 1
  gc-5678  [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 
() 67493632 + 256 [gc]
  gc-5678  [003]  214327.026046: block_plug: [gc]

Signed-off-by: Chao Yu 
---
 fs/f2fs/data.c|  35 ++-
 fs/f2fs/gc.c  | 111 +-
 fs/f2fs/segment.c |  10 -
 3 files changed, 134 insertions(+), 22 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 339b26838f9b..fbed8d59d10a 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -877,6 +877,7 @@ static int __allocate_data_block(struct dnode_of_data *dn, 
int seg_type)
struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode);

[f2fs-dev] [PATCH 5/5] orangefs: cache NULL when both default_acl and acl are NULL

2018-08-14 Thread Chengguang Xu
default_acl and acl of newly created inode will be initiated
as ACL_NOT_CACHED in vfs function inode_init_always() and later
will be updated by calling xxx_init_acl() in specific filesystems.
Howerver, when default_acl and acl are NULL then they keep the value
of ACL_NOT_CACHED, this patch tries to cache NULL for acl/default_acl
in this case.

Signed-off-by: Chengguang Xu 
---
 fs/orangefs/acl.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/orangefs/acl.c b/fs/orangefs/acl.c
index 10587413b20e..e3b043a263bc 100644
--- a/fs/orangefs/acl.c
+++ b/fs/orangefs/acl.c
@@ -175,6 +175,9 @@ int orangefs_init_acl(struct inode *inode, struct inode 
*dir)
posix_acl_release(acl);
}
 
+   if (!default_acl && !acl)
+   cache_no_acl(inode);
+
/* If mode of the inode was changed, then do a forcible ->setattr */
if (mode != inode->i_mode) {
memset(, 0, sizeof iattr);
-- 
2.17.1


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH 1/5] ext2: cache NULL when both default_acl and acl are NULL

2018-08-14 Thread Chengguang Xu
default_acl and acl of newly created inode will be initiated
as ACL_NOT_CACHED in vfs function inode_init_always() and later
will be updated by calling xxx_init_acl() in specific filesystems.
Howerver, when default_acl and acl are NULL then they keep the value
of ACL_NOT_CACHED, this patch tries to cache NULL for acl/default_acl
in this case.

Signed-off-by: Chengguang Xu 
---
 fs/ext2/acl.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/ext2/acl.c b/fs/ext2/acl.c
index 224c04abb2e5..74411e8ea507 100644
--- a/fs/ext2/acl.c
+++ b/fs/ext2/acl.c
@@ -262,5 +262,8 @@ ext2_init_acl(struct inode *inode, struct inode *dir)
error = __ext2_set_acl(inode, acl, ACL_TYPE_ACCESS);
posix_acl_release(acl);
}
+   if (!default_acl && !acl)
+   cache_no_acl(inode);
+
return error;
 }
-- 
2.17.1


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH 4/5] jfs: cache NULL when both default_acl and acl are NULL

2018-08-14 Thread Chengguang Xu
default_acl and acl of newly created inode will be initiated
as ACL_NOT_CACHED in vfs function inode_init_always() and later
will be updated by calling xxx_init_acl() in specific filesystems.
Howerver, when default_acl and acl are NULL then they keep the value
of ACL_NOT_CACHED, this patch tries to cache NULL for acl/default_acl
in this case.

Signed-off-by: Chengguang Xu 
---
 fs/jfs/acl.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/jfs/acl.c b/fs/jfs/acl.c
index 2e71b6e7e646..15b9a9b74d72 100644
--- a/fs/jfs/acl.c
+++ b/fs/jfs/acl.c
@@ -154,6 +154,9 @@ int jfs_init_acl(tid_t tid, struct inode *inode, struct 
inode *dir)
posix_acl_release(acl);
}
 
+   if (!default_acl && !acl)
+   cache_no_acl(inode);
+
JFS_IP(inode)->mode2 = (JFS_IP(inode)->mode2 & 0x) |
   inode->i_mode;
 
-- 
2.17.1


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH 2/5] ext4: cache NULL when both default_acl and acl are NULL

2018-08-14 Thread Chengguang Xu
default_acl and acl of newly created inode will be initiated
as ACL_NOT_CACHED in vfs function inode_init_always() and later
will be updated by calling xxx_init_acl() in specific filesystems.
Howerver, when default_acl and acl are NULL then they keep the value
of ACL_NOT_CACHED, this patch tries to cache NULL for acl/default_acl
in this case.

Signed-off-by: Chengguang Xu 
---
 fs/ext4/acl.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/ext4/acl.c b/fs/ext4/acl.c
index fb50f9aa6ead..69a01d28356d 100644
--- a/fs/ext4/acl.c
+++ b/fs/ext4/acl.c
@@ -291,5 +291,8 @@ ext4_init_acl(handle_t *handle, struct inode *inode, struct 
inode *dir)
   acl, XATTR_CREATE);
posix_acl_release(acl);
}
+   if (!default_acl && !acl)
+   cache_no_acl(inode);
+
return error;
 }
-- 
2.17.1


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [PATCH 3/5] f2fs: cache NULL when both default_acl and acl are NULL

2018-08-14 Thread Chengguang Xu
default_acl and acl of newly created inode will be initiated
as ACL_NOT_CACHED in vfs function inode_init_always() and later
will be updated by calling xxx_init_acl() in specific filesystems.
Howerver, when default_acl and acl are NULL then they keep the value
of ACL_NOT_CACHED, this patch tries to cache NULL for acl/default_acl
in this case.

Signed-off-by: Chengguang Xu 
---
 fs/f2fs/acl.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c
index 111824199a88..97ed555316be 100644
--- a/fs/f2fs/acl.c
+++ b/fs/f2fs/acl.c
@@ -401,6 +401,8 @@ int f2fs_init_acl(struct inode *inode, struct inode *dir, 
struct page *ipage,
   ipage);
posix_acl_release(acl);
}
+   if (!default_acl && !acl)
+   cache_no_acl(inode);
 
return error;
 }
-- 
2.17.1


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH] Revert "f2fs: use printk_ratelimited for f2fs_msg"

2018-08-14 Thread Chao Yu
Hi Joe, thanks for pointing out this, I didn't see any cases need to be limited 
now.

Thanks,

On 2018/8/13 17:50, Joe Perches wrote:
> On Mon, 2018-08-13 at 14:28 +0800, Chao Yu wrote:
>> Don't limit printing log, so that we will not miss any key messages.
>>
>> This reverts commit a36c106dffb616250117efb1cab271c19a8f94ff.
> []
>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> []
>> @@ -209,7 +209,7 @@ void f2fs_msg(struct super_block *sb, const char *level, 
>> const char *fmt, ...)
>>  va_start(args, fmt);
>>  vaf.fmt = fmt;
>>  vaf.va = 
>> -printk_ratelimited("%sF2FS-fs (%s): %pV\n", level, sb->s_id, );
>> +printk("%sF2FS-fs (%s): %pV\n", level, sb->s_id, );
>>  va_end(args);
>>  }
> 
> If there was value in ratelimiting these messages at all,
> perhaps there is value in using a specific ratelimit_state
> other than
> 
>   static DEFINE_RATELIMIT_STATE(_rs,  \
> DEFAULT_RATELIMIT_INTERVAL,   \
> DEFAULT_RATELIMIT_BURST); \
> 
> 
> 
> 
> .
> 


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


Re: [f2fs-dev] [PATCH 2/2] f2fs: tune discard speed with storage usage rate

2018-08-14 Thread Chao Yu
On 2018/8/14 12:19, Jaegeuk Kim wrote:
> On 08/10, Chao Yu wrote:
>> Previously, discard speed was fixed mostly, and in high usage rate
>> device, we will speed up issuing discard, but it doesn't make sense
>> that in a non-full filesystem, we still issue discard with slow speed.
> 
> Could you please elaborate the problem in more detail? The speed depends
> on how many candidates?

undiscard blocks are all 4k granularity.
a) utility: filesystem: 20% + undiscard blocks: 20% = flash storage: 40%
b) utility: filesystem: 40% + undiscard blocks: 25% = flash storage: 65%
c) utility: filesystem: 60% + undiscard blocks: 30% = flash storage: 100%


1. for case c), we need to speed up issuing discard based on utilization of
"filesystem + undiscard" instead of just utilization of filesystem.

-   if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL) {
-   dpolicy->granularity = 1;
-   dpolicy->max_interval = DEF_MIN_DISCARD_ISSUE_TIME;
-   }

2. If free space in storage touches therein threshold, performance will be very
sensitive. In low-end storage, with high usage in space, even free space is
reduced by 1%, performance will decrease a lot.

IMO, in above cases, we'd better to issue discard with high speed for c), middle
speed for b), and low speed for a).

How do you think?

Thanks,

> 
> Thanks,
> 
>>
>> Anyway, it comes out undiscarded block makes FTL GC be lower efficient
>> and causing high lifetime overhead.
>>
>> Let's tune discard speed as below:
>>
>> a. adjust default issue interval:
>>  originalafter
>> min_interval:50ms100ms
>> mid_interval:500ms   1000ms
>> max_interval:6ms 1ms
>>
>> b. if last time we stop issuing discard due to IO interruption of user,
>> let's reset all {min,mid,max}_interval to default one.
>>
>> c. tune {min,mid,max}_interval with below calculation method:
>>
>> base_interval = default_interval / 10;
>> total_interval = default_interval - base_interval;
>> interval = base_interval + total_interval * (100 - dev_util) / 100;
>>
>> For example:
>> min_interval (:100ms)
>> dev_util (%) interval (ms)
>> 0100
>> 10   91
>> 20   82
>> 30   73
>> ...
>> 80   28
>> 90   19
>> 100  10
>>
>> Signed-off-by: Chao Yu 
>> ---
>>  fs/f2fs/f2fs.h| 11 
>>  fs/f2fs/segment.c | 64 +--
>>  fs/f2fs/segment.h |  9 +++
>>  fs/f2fs/super.c   |  2 +-
>>  4 files changed, 67 insertions(+), 19 deletions(-)
>>
>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>> index 273ffdaf4891..a1dd2e1c3cb9 100644
>> --- a/fs/f2fs/f2fs.h
>> +++ b/fs/f2fs/f2fs.h
>> @@ -185,10 +185,9 @@ enum {
>>  
>>  #define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi)
>>  #define DEF_MAX_DISCARD_REQUEST 8   /* issue 8 discards per 
>> round */
>> -#define DEF_MIN_DISCARD_ISSUE_TIME  50  /* 50 ms, if exists */
>> -#define DEF_MID_DISCARD_ISSUE_TIME  500 /* 500 ms, if device busy */
>> -#define DEF_MAX_DISCARD_ISSUE_TIME  6   /* 60 s, if no candidates */
>> -#define DEF_DISCARD_URGENT_UTIL 80  /* do more discard over 
>> 80% */
>> +#define DEF_MIN_DISCARD_ISSUE_TIME  100 /* 100 ms, if exists */
>> +#define DEF_MID_DISCARD_ISSUE_TIME  1000/* 1000 ms, if device busy */
>> +#define DEF_MAX_DISCARD_ISSUE_TIME  1   /* 1 ms, if no candidates */
>>  #define DEF_CP_INTERVAL 60  /* 60 secs */
>>  #define DEF_IDLE_INTERVAL   5   /* 5 secs */
>>  
>> @@ -248,7 +247,8 @@ struct discard_entry {
>>  };
>>  
>>  /* default discard granularity of inner discard thread, unit: block count */
>> -#define DEFAULT_DISCARD_GRANULARITY 1
>> +#define MID_DISCARD_GRANULARITY 16
>> +#define MIN_DISCARD_GRANULARITY 1
>>  
>>  /* max discard pend list number */
>>  #define MAX_PLIST_NUM   512
>> @@ -330,6 +330,7 @@ struct discard_cmd_control {
>>  atomic_t discard_cmd_cnt;   /* # of cached cmd count */
>>  struct rb_root root;/* root of discard rb-tree */
>>  bool rbtree_check;  /* config for consistence check 
>> */
>> +bool io_interrupted;/* last state of io interrupted 
>> */
>>  };
>>  
>>  /* for the list of fsync inodes, used only during recovery */
>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>> index 8b52e8dfb12f..9564aaf1f27b 100644
>> --- a/fs/f2fs/segment.c
>> +++ b/fs/f2fs/segment.c
>> @@ -968,6 +968,44 @@ static void __check_sit_bitmap(struct f2fs_sb_info *sbi,
>>  #endif
>>  }
>>  
>> +static void __adjust_discard_speed(unsigned int *interval,
>> +unsigned int def_interval, int dev_util)
>> +{
>> +unsigned int base_interval, total_interval;
>> +
>> +base_interval = def_interval / 10;
>> +total_interval = 

Re: [f2fs-dev] [PATCH 1/2] f2fs: set 4KB discard granularity by default

2018-08-14 Thread Chao Yu
On 2018/8/14 12:13, Jaegeuk Kim wrote:
> On 08/10, Chao Yu wrote:
>> Small granularity (size < 64K) fragmentation will cause f2fs suspending
>> all pending discards, result in performance regression, so let's set
>> 4KB discard granularity by default.
>>
>> So that without fstrim, we also have the ability to avoid any performance
>> regression caused by non-alignment mapping between fs and flash device.
> 
> This is why we added a sysfs entry. Why do we need to change the default
> value every time?

Of course, I didn't forget it. But, mostly, our user didn't know very well about
our filesystem including each configuration's meaning, mechanism, or usage
scenario, most of the time, they will choose to test f2fs with all default
option, and then make the conclusion.

Currently, with default 64k granularity, if we simulate fragmentation scenario
of filesystem, like by a)writing 4k file and b)deleting even indexing file, then
all 4k discards won't be issued, result in exhaustion of free space of flash
storage, and performance degradation.

So I think we'd better to consider and set default value of configuration more
elaborately to avoid obvious weakness.

Thoughts?

Thanks,

> 
>>
>> Signed-off-by: Chao Yu 
>> ---
>>  fs/f2fs/f2fs.h | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>> index 58431b9bfd8f..273ffdaf4891 100644
>> --- a/fs/f2fs/f2fs.h
>> +++ b/fs/f2fs/f2fs.h
>> @@ -248,7 +248,7 @@ struct discard_entry {
>>  };
>>  
>>  /* default discard granularity of inner discard thread, unit: block count */
>> -#define DEFAULT_DISCARD_GRANULARITY 16
>> +#define DEFAULT_DISCARD_GRANULARITY 1
>>  
>>  /* max discard pend list number */
>>  #define MAX_PLIST_NUM   512
>> -- 
>> 2.18.0.rc1
> 
> .
> 


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel


[f2fs-dev] [RFC PATCH RESEND 1/4] f2fs: support superblock checksum

2018-08-14 Thread Junling Zheng
Now we support crc32 checksum for superblock.

Reviewed-by: Chao Yu 
Signed-off-by: Junling Zheng 
---
 fs/f2fs/f2fs.h  |  2 ++
 fs/f2fs/super.c | 29 +
 fs/f2fs/sysfs.c |  7 +++
 include/linux/f2fs_fs.h |  3 ++-
 4 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 4525f4f82af0..d50d6efda96b 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -147,6 +147,7 @@ struct f2fs_mount_info {
 #define F2FS_FEATURE_INODE_CRTIME  0x0100
 #define F2FS_FEATURE_LOST_FOUND0x0200
 #define F2FS_FEATURE_VERITY0x0400  /* reserved */
+#define F2FS_FEATURE_SB_CHKSUM 0x0800
 
 #define F2FS_HAS_FEATURE(sb, mask) \
((F2FS_SB(sb)->raw_super->feature & cpu_to_le32(mask)) != 0)
@@ -3376,6 +3377,7 @@ F2FS_FEATURE_FUNCS(flexible_inline_xattr, 
FLEXIBLE_INLINE_XATTR);
 F2FS_FEATURE_FUNCS(quota_ino, QUOTA_INO);
 F2FS_FEATURE_FUNCS(inode_crtime, INODE_CRTIME);
 F2FS_FEATURE_FUNCS(lost_found, LOST_FOUND);
+F2FS_FEATURE_FUNCS(sb_chksum, SB_CHKSUM);
 
 #ifdef CONFIG_BLK_DEV_ZONED
 static inline int get_blkz_type(struct f2fs_sb_info *sbi,
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index bd57be470e23..3ffc336caea8 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -2149,6 +2149,26 @@ static int sanity_check_raw_super(struct f2fs_sb_info 
*sbi,
(bh->b_data + F2FS_SUPER_OFFSET);
struct super_block *sb = sbi->sb;
unsigned int blocksize;
+   size_t crc_offset = 0;
+   __u32 crc = 0;
+
+   /* Check checksum_offset and crc in superblock */
+   if (le32_to_cpu(raw_super->feature) & F2FS_FEATURE_SB_CHKSUM) {
+   crc_offset = le32_to_cpu(raw_super->checksum_offset);
+   if (crc_offset !=
+   offsetof(struct f2fs_super_block, crc)) {
+   f2fs_msg(sb, KERN_INFO,
+   "Invalid SB checksum offset: %zu",
+   crc_offset);
+   return 1;
+   }
+   crc = le32_to_cpu(raw_super->crc);
+   if (!f2fs_crc_valid(sbi, crc, raw_super, crc_offset)) {
+   f2fs_msg(sb, KERN_INFO,
+   "Invalid SB checksum value: %u", crc);
+   return 1;
+   }
+   }
 
if (F2FS_SUPER_MAGIC != le32_to_cpu(raw_super->magic)) {
f2fs_msg(sb, KERN_INFO,
@@ -2568,6 +2588,7 @@ static int read_raw_super_block(struct f2fs_sb_info *sbi,
 int f2fs_commit_super(struct f2fs_sb_info *sbi, bool recover)
 {
struct buffer_head *bh;
+   __u32 crc = 0;
int err;
 
if ((recover && f2fs_readonly(sbi->sb)) ||
@@ -2576,6 +2597,13 @@ int f2fs_commit_super(struct f2fs_sb_info *sbi, bool 
recover)
return -EROFS;
}
 
+   /* we should update superblock crc here */
+   if (!recover && f2fs_sb_has_sb_chksum(sbi->sb)) {
+   crc = f2fs_crc32(sbi, F2FS_RAW_SUPER(sbi),
+   offsetof(struct f2fs_super_block, crc));
+   F2FS_RAW_SUPER(sbi)->crc = cpu_to_le32(crc);
+   }
+
/* write back-up superblock first */
bh = sb_bread(sbi->sb, sbi->valid_super_block ? 0 : 1);
if (!bh)
diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index cd2e030e47b8..c86d91be6c48 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -120,6 +120,9 @@ static ssize_t features_show(struct f2fs_attr *a,
if (f2fs_sb_has_lost_found(sb))
len += snprintf(buf + len, PAGE_SIZE - len, "%s%s",
len ? ", " : "", "lost_found");
+   if (f2fs_sb_has_sb_chksum(sb))
+   len += snprintf(buf + len, PAGE_SIZE - len, "%s%s",
+   len ? ", " : "", "sb_checksum");
len += snprintf(buf + len, PAGE_SIZE - len, "\n");
return len;
 }
@@ -337,6 +340,7 @@ enum feat_id {
FEAT_QUOTA_INO,
FEAT_INODE_CRTIME,
FEAT_LOST_FOUND,
+   FEAT_SB_CHECKSUM,
 };
 
 static ssize_t f2fs_feature_show(struct f2fs_attr *a,
@@ -353,6 +357,7 @@ static ssize_t f2fs_feature_show(struct f2fs_attr *a,
case FEAT_QUOTA_INO:
case FEAT_INODE_CRTIME:
case FEAT_LOST_FOUND:
+   case FEAT_SB_CHECKSUM:
return snprintf(buf, PAGE_SIZE, "supported\n");
}
return 0;
@@ -433,6 +438,7 @@ F2FS_FEATURE_RO_ATTR(flexible_inline_xattr, 
FEAT_FLEXIBLE_INLINE_XATTR);
 F2FS_FEATURE_RO_ATTR(quota_ino, FEAT_QUOTA_INO);
 F2FS_FEATURE_RO_ATTR(inode_crtime, FEAT_INODE_CRTIME);
 F2FS_FEATURE_RO_ATTR(lost_found, FEAT_LOST_FOUND);
+F2FS_FEATURE_RO_ATTR(sb_checksum, FEAT_SB_CHECKSUM);
 
 #define ATTR_LIST(name) (_attr_##name.attr)
 static struct attribute *f2fs_attrs[] = {
@@ -489,6 +495,7 @@ static struct attribute *f2fs_feat_attrs[] = {
ATTR_LIST(quota_ino),

[f2fs-dev] [RFC PATCH RESEND 2/4] f2fs-tools: rename CHECKSUM_OFFSET to CP_CHKSUM_OFFSET

2018-08-14 Thread Junling Zheng
This patch renamed CHECKSUM_OFFSET to CP_CHKSUM_OFFSET.

Reviewed-by: Chao Yu 
Signed-off-by: Junling Zheng 
---
 fsck/fsck.c|  4 ++--
 fsck/mount.c   |  4 ++--
 fsck/resize.c  |  8 
 include/f2fs_fs.h  |  6 +++---
 mkfs/f2fs_format.c | 14 +++---
 5 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/fsck/fsck.c b/fsck/fsck.c
index d550403..f080d3c 100644
--- a/fsck/fsck.c
+++ b/fsck/fsck.c
@@ -2010,8 +2010,8 @@ static void fix_checkpoint(struct f2fs_sb_info *sbi)
set_cp(valid_node_count, fsck->chk.valid_node_cnt);
set_cp(valid_inode_count, fsck->chk.valid_inode_cnt);
 
-   crc = f2fs_cal_crc32(F2FS_SUPER_MAGIC, cp, CHECKSUM_OFFSET);
-   *((__le32 *)((unsigned char *)cp + CHECKSUM_OFFSET)) = cpu_to_le32(crc);
+   crc = f2fs_cal_crc32(F2FS_SUPER_MAGIC, cp, CP_CHKSUM_OFFSET);
+   *((__le32 *)((unsigned char *)cp + CP_CHKSUM_OFFSET)) = 
cpu_to_le32(crc);
 
cp_blk_no = get_sb(cp_blkaddr);
if (sbi->cur_cp == 2)
diff --git a/fsck/mount.c b/fsck/mount.c
index 8fb4d59..58ef3e6 100644
--- a/fsck/mount.c
+++ b/fsck/mount.c
@@ -2187,8 +2187,8 @@ void write_checkpoint(struct f2fs_sb_info *sbi)
flags = update_nat_bits_flags(sb, cp, flags);
set_cp(ckpt_flags, flags);
 
-   crc = f2fs_cal_crc32(F2FS_SUPER_MAGIC, cp, CHECKSUM_OFFSET);
-   *((__le32 *)((unsigned char *)cp + CHECKSUM_OFFSET)) = cpu_to_le32(crc);
+   crc = f2fs_cal_crc32(F2FS_SUPER_MAGIC, cp, CP_CHKSUM_OFFSET);
+   *((__le32 *)((unsigned char *)cp + CP_CHKSUM_OFFSET)) = 
cpu_to_le32(crc);
 
cp_blk_no = get_sb(cp_blkaddr);
if (sbi->cur_cp == 2)
diff --git a/fsck/resize.c b/fsck/resize.c
index fe8a61a..e9612b3 100644
--- a/fsck/resize.c
+++ b/fsck/resize.c
@@ -90,10 +90,10 @@ static int get_new_sb(struct f2fs_super_block *sb)
 * It requires more pages for cp.
 */
if (max_sit_bitmap_size > MAX_SIT_BITMAP_SIZE_IN_CKPT) {
-   max_nat_bitmap_size = CHECKSUM_OFFSET - sizeof(struct 
f2fs_checkpoint) + 1;
+   max_nat_bitmap_size = CP_CHKSUM_OFFSET - sizeof(struct 
f2fs_checkpoint) + 1;
set_sb(cp_payload, F2FS_BLK_ALIGN(max_sit_bitmap_size));
} else {
-   max_nat_bitmap_size = CHECKSUM_OFFSET - sizeof(struct 
f2fs_checkpoint) + 1
+   max_nat_bitmap_size = CP_CHKSUM_OFFSET - sizeof(struct 
f2fs_checkpoint) + 1
- max_sit_bitmap_size;
set_sb(cp_payload, 0);
}
@@ -520,8 +520,8 @@ static void rebuild_checkpoint(struct f2fs_sb_info *sbi,
(unsigned char *)cp);
new_cp->checkpoint_ver = cpu_to_le64(cp_ver + 1);
 
-   crc = f2fs_cal_crc32(F2FS_SUPER_MAGIC, new_cp, CHECKSUM_OFFSET);
-   *((__le32 *)((unsigned char *)new_cp + CHECKSUM_OFFSET)) =
+   crc = f2fs_cal_crc32(F2FS_SUPER_MAGIC, new_cp, CP_CHKSUM_OFFSET);
+   *((__le32 *)((unsigned char *)new_cp + CP_CHKSUM_OFFSET)) =
cpu_to_le32(crc);
 
/* Write a new checkpoint in the other set */
diff --git a/include/f2fs_fs.h b/include/f2fs_fs.h
index 53fa002..e279b9f 100644
--- a/include/f2fs_fs.h
+++ b/include/f2fs_fs.h
@@ -278,7 +278,7 @@ static inline uint64_t bswap_64(uint64_t val)
 #define PAGE_CACHE_SIZE4096
 #define BITS_PER_BYTE  8
 #define F2FS_SUPER_MAGIC   0xF2F52010  /* F2FS Magic Number */
-#define CHECKSUM_OFFSET4092
+#define CP_CHKSUM_OFFSET   4092
 #define MAX_PATH_LEN   64
 #define MAX_DEVICES8
 
@@ -682,9 +682,9 @@ struct f2fs_checkpoint {
 } __attribute__((packed));
 
 #define MAX_SIT_BITMAP_SIZE_IN_CKPT\
-   (CHECKSUM_OFFSET - sizeof(struct f2fs_checkpoint) + 1 - 64)
+   (CP_CHKSUM_OFFSET - sizeof(struct f2fs_checkpoint) + 1 - 64)
 #define MAX_BITMAP_SIZE_IN_CKPT\
-   (CHECKSUM_OFFSET - sizeof(struct f2fs_checkpoint) + 1)
+   (CP_CHKSUM_OFFSET - sizeof(struct f2fs_checkpoint) + 1)
 
 /*
  * For orphan inode management
diff --git a/mkfs/f2fs_format.c b/mkfs/f2fs_format.c
index 4b88d93..621126c 100644
--- a/mkfs/f2fs_format.c
+++ b/mkfs/f2fs_format.c
@@ -342,12 +342,12 @@ static int f2fs_prepare_super_block(void)
 * It requires more pages for cp.
 */
if (max_sit_bitmap_size > MAX_SIT_BITMAP_SIZE_IN_CKPT) {
-   max_nat_bitmap_size = CHECKSUM_OFFSET -
+   max_nat_bitmap_size = CP_CHKSUM_OFFSET -
sizeof(struct f2fs_checkpoint) + 1;
set_sb(cp_payload, F2FS_BLK_ALIGN(max_sit_bitmap_size));
} else {
max_nat_bitmap_size =
-   CHECKSUM_OFFSET - sizeof(struct 
f2fs_checkpoint) + 1
+   

[f2fs-dev] [RFC PATCH v2 3/4] f2fs-tools: unify the writeback of superblock

2018-08-14 Thread Junling Zheng
Introduce __write_superblock() to support updating specified one
superblock or both, thus we can wrapper it in update_superblock() and
f2fs_write_super_block to unify all places where sb needs to be updated.

Signed-off-by: Junling Zheng 
---
v1 -> v2:
 - if dev_write_block failed, add some notes and free buf to avoid memory leak.
 fsck/fsck.h|  2 +-
 fsck/mount.c   | 74 +++---
 fsck/resize.c  | 20 ++---
 include/f2fs_fs.h  | 35 ++
 mkfs/f2fs_format.c | 19 +---
 5 files changed, 56 insertions(+), 94 deletions(-)

diff --git a/fsck/fsck.h b/fsck/fsck.h
index 6042e68..e3490e6 100644
--- a/fsck/fsck.h
+++ b/fsck/fsck.h
@@ -178,7 +178,7 @@ extern void move_curseg_info(struct f2fs_sb_info *, u64, 
int);
 extern void write_curseg_info(struct f2fs_sb_info *);
 extern int find_next_free_block(struct f2fs_sb_info *, u64 *, int, int);
 extern void write_checkpoint(struct f2fs_sb_info *);
-extern void write_superblock(struct f2fs_super_block *);
+extern void update_superblock(struct f2fs_super_block *, int);
 extern void update_data_blkaddr(struct f2fs_sb_info *, nid_t, u16, block_t);
 extern void update_nat_blkaddr(struct f2fs_sb_info *, nid_t, nid_t, block_t);
 
diff --git a/fsck/mount.c b/fsck/mount.c
index 58ef3e6..e7ceb8d 100644
--- a/fsck/mount.c
+++ b/fsck/mount.c
@@ -476,7 +476,7 @@ void print_sb_state(struct f2fs_super_block *sb)
 }
 
 static inline int sanity_check_area_boundary(struct f2fs_super_block *sb,
-   u64 offset)
+   enum SB_ADDR sb_addr)
 {
u32 segment0_blkaddr = get_sb(segment0_blkaddr);
u32 cp_blkaddr = get_sb(cp_blkaddr);
@@ -542,14 +542,11 @@ static inline int sanity_check_area_boundary(struct 
f2fs_super_block *sb,
segment_count_main << log_blocks_per_seg);
return -1;
} else if (main_end_blkaddr < seg_end_blkaddr) {
-   int err;
-
set_sb(segment_count, (main_end_blkaddr -
segment0_blkaddr) >> log_blocks_per_seg);
 
-   err = dev_write(sb, offset, sizeof(struct f2fs_super_block));
-   MSG(0, "Info: Fix alignment: %s, start(%u) end(%u) block(%u)\n",
-   err ? "failed": "done",
+   update_superblock(sb, 1 << sb_addr);
+   MSG(0, "Info: Fix alignment: start(%u) end(%u) block(%u)\n",
main_blkaddr,
segment0_blkaddr +
(segment_count << log_blocks_per_seg),
@@ -558,7 +555,7 @@ static inline int sanity_check_area_boundary(struct 
f2fs_super_block *sb,
return 0;
 }
 
-int sanity_check_raw_super(struct f2fs_super_block *sb, u64 offset)
+int sanity_check_raw_super(struct f2fs_super_block *sb, enum SB_ADDR sb_addr)
 {
unsigned int blocksize;
 
@@ -600,30 +597,24 @@ int sanity_check_raw_super(struct f2fs_super_block *sb, 
u64 offset)
if (get_sb(segment_count) > F2FS_MAX_SEGMENT)
return -1;
 
-   if (sanity_check_area_boundary(sb, offset))
+   if (sanity_check_area_boundary(sb, sb_addr))
return -1;
return 0;
 }
 
-int validate_super_block(struct f2fs_sb_info *sbi, int block)
+int validate_super_block(struct f2fs_sb_info *sbi, enum SB_ADDR sb_addr)
 {
-   u64 offset;
char buf[F2FS_BLKSIZE];
 
sbi->raw_super = malloc(sizeof(struct f2fs_super_block));
 
-   if (block == 0)
-   offset = F2FS_SUPER_OFFSET;
-   else
-   offset = F2FS_BLKSIZE + F2FS_SUPER_OFFSET;
-
-   if (dev_read_block(buf, block))
+   if (dev_read_block(buf, sb_addr))
return -1;
 
memcpy(sbi->raw_super, buf + F2FS_SUPER_OFFSET,
sizeof(struct f2fs_super_block));
 
-   if (!sanity_check_raw_super(sbi->raw_super, offset)) {
+   if (!sanity_check_raw_super(sbi->raw_super, sb_addr)) {
/* get kernel version */
if (c.kd >= 0) {
dev_read_version(c.version, 0, VERSION_LEN);
@@ -642,13 +633,9 @@ int validate_super_block(struct f2fs_sb_info *sbi, int 
block)
MSG(0, "Info: FSCK version\n  from \"%s\"\nto \"%s\"\n",
c.sb_version, c.version);
if (memcmp(c.sb_version, c.version, VERSION_LEN)) {
-   int ret;
-
memcpy(sbi->raw_super->version,
c.version, VERSION_LEN);
-   ret = dev_write(sbi->raw_super, offset,
-   sizeof(struct f2fs_super_block));
-   ASSERT(ret >= 0);
+   update_superblock(sbi->raw_super, 1 << sb_addr);
 
c.auto_fix = 0;

[f2fs-dev] [RFC PATCH RESEND 4/4] f2fs-tools: introduce sb checksum

2018-08-14 Thread Junling Zheng
This patch introduced crc for superblock.

Reviewed-by: Chao Yu 
Signed-off-by: Junling Zheng 
---
 fsck/mount.c   | 23 +++
 fsck/resize.c  | 12 ++--
 include/f2fs_fs.h  | 16 +++-
 mkfs/f2fs_format.c |  3 +++
 4 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/fsck/mount.c b/fsck/mount.c
index e7ceb8d..af9b219 100644
--- a/fsck/mount.c
+++ b/fsck/mount.c
@@ -340,6 +340,7 @@ void print_raw_sb_info(struct f2fs_super_block *sb)
DISP_u32(sb, node_ino);
DISP_u32(sb, meta_ino);
DISP_u32(sb, cp_payload);
+   DISP_u32(sb, crc);
DISP("%-.256s", sb, version);
printf("\n");
 }
@@ -467,6 +468,9 @@ void print_sb_state(struct f2fs_super_block *sb)
if (f & cpu_to_le32(F2FS_FEATURE_LOST_FOUND)) {
MSG(0, "%s", " lost_found");
}
+   if (f & cpu_to_le32(F2FS_FEATURE_SB_CHKSUM)) {
+   MSG(0, "%s", " sb_checksum");
+   }
MSG(0, "\n");
MSG(0, "Info: superblock encrypt level = %d, salt = ",
sb->encryption_level);
@@ -555,10 +559,29 @@ static inline int sanity_check_area_boundary(struct 
f2fs_super_block *sb,
return 0;
 }
 
+static int verify_sb_chksum(struct f2fs_super_block *sb)
+{
+   if (SB_CHKSUM_OFFSET != get_sb(checksum_offset)) {
+   MSG(0, "\tInvalid SB CRC offset: %u\n",
+   get_sb(checksum_offset));
+   return -1;
+   }
+   if (f2fs_crc_valid(get_sb(crc), sb,
+   get_sb(checksum_offset))) {
+   MSG(0, "\tInvalid SB CRC: 0x%x\n", get_sb(crc));
+   return -1;
+   }
+   return 0;
+}
+
 int sanity_check_raw_super(struct f2fs_super_block *sb, enum SB_ADDR sb_addr)
 {
unsigned int blocksize;
 
+   if ((get_sb(feature) & F2FS_FEATURE_SB_CHKSUM) &&
+   verify_sb_chksum(sb))
+   return -1;
+
if (F2FS_SUPER_MAGIC != get_sb(magic))
return -1;
 
diff --git a/fsck/resize.c b/fsck/resize.c
index 5161a1f..3462165 100644
--- a/fsck/resize.c
+++ b/fsck/resize.c
@@ -603,9 +603,6 @@ static int f2fs_resize_grow(struct f2fs_sb_info *sbi)
}
}
 
-   print_raw_sb_info(sb);
-   print_raw_sb_info(new_sb);
-
old_main_blkaddr = get_sb(main_blkaddr);
new_main_blkaddr = get_newsb(main_blkaddr);
offset = new_main_blkaddr - old_main_blkaddr;
@@ -629,6 +626,9 @@ static int f2fs_resize_grow(struct f2fs_sb_info *sbi)
migrate_sit(sbi, new_sb, offset_seg);
rebuild_checkpoint(sbi, new_sb, offset_seg);
update_superblock(new_sb, SB_ALL);
+   print_raw_sb_info(sb);
+   print_raw_sb_info(new_sb);
+
return 0;
 }
 
@@ -658,9 +658,6 @@ static int f2fs_resize_shrink(struct f2fs_sb_info *sbi)
}
}
 
-   print_raw_sb_info(sb);
-   print_raw_sb_info(new_sb);
-
old_main_blkaddr = get_sb(main_blkaddr);
new_main_blkaddr = get_newsb(main_blkaddr);
offset = old_main_blkaddr - new_main_blkaddr;
@@ -690,6 +687,9 @@ static int f2fs_resize_shrink(struct f2fs_sb_info *sbi)
/* move whole data region */
//if (err)
//  migrate_main(sbi, offset);
+   print_raw_sb_info(sb);
+   print_raw_sb_info(new_sb);
+
return 0;
 }
 
diff --git a/include/f2fs_fs.h b/include/f2fs_fs.h
index 791e64f..7446baa 100644
--- a/include/f2fs_fs.h
+++ b/include/f2fs_fs.h
@@ -279,6 +279,7 @@ static inline uint64_t bswap_64(uint64_t val)
 #define BITS_PER_BYTE  8
 #define F2FS_SUPER_MAGIC   0xF2F52010  /* F2FS Magic Number */
 #define CP_CHKSUM_OFFSET   4092
+#define SB_CHKSUM_OFFSET   3068
 #define MAX_PATH_LEN   64
 #define MAX_DEVICES8
 
@@ -579,6 +580,7 @@ enum {
 #define F2FS_FEATURE_INODE_CRTIME  0x0100
 #define F2FS_FEATURE_LOST_FOUND0x0200
 #define F2FS_FEATURE_VERITY0x0400  /* reserved */
+#define F2FS_FEATURE_SB_CHKSUM 0x0800
 
 #define MAX_VOLUME_NAME512
 
@@ -632,7 +634,8 @@ struct f2fs_super_block {
struct f2fs_device devs[MAX_DEVICES];   /* device list */
__le32 qf_ino[F2FS_MAX_QUOTAS]; /* quota inode numbers */
__u8 hot_ext_count; /* # of hot file extension */
-   __u8 reserved[314]; /* valid reserved region */
+   __u8 reserved[310]; /* valid reserved region */
+   __le32 crc; /* checksum of superblock */
 } __attribute__((packed));
 
 /*
@@ -1337,6 +1340,7 @@ struct feature feature_table[] = {
\
{ "inode_crtime",   F2FS_FEATURE_INODE_CRTIME },\
{ "lost_found", F2FS_FEATURE_LOST_FOUND },  \
{ "verity", F2FS_FEATURE_VERITY },  /* reserved */ \
+   { "sb_checksum",   

Re: [f2fs-dev] [PATCH] f2fs: avoid fi->i_gc_rwsem[WRITE] lock in f2fs_gc

2018-08-14 Thread Chao Yu
On 2018/8/14 4:12, Jaegeuk Kim wrote:
> On 08/12, Chao Yu wrote:
>> On 2018/8/4 10:31, Chao Yu wrote:
>>> How about keep lock order as:
>>>
>>> - inode_lock
>>>  - i_mmap_sem
>>>   - lock_all()
>>>   - unlock_all()
>>>   - i_gc_rwsem[WRITE]
>>>- lock_op()
>>
>> I got below warning when testing last dev-test:
>>
>> - f2fs_direct_IO current lock dependency
>>  - i_gc_rwsem[WRITE]
>>  - i_mmap_sem
>>   - do_blockdev_direct_IO
>>- i_mmap_sem
>>   - i_gc_rwsem[WRITE]
>>
> 
> Yeah, it seems it's true.
> How about this?

It looks good to me, anyway, let me check this patch with fstests again.

Thanks,

> 
> ---
>  fs/f2fs/data.c |  4 ++--
>  fs/f2fs/file.c | 43 +++
>  2 files changed, 25 insertions(+), 22 deletions(-)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index f09231b1cc74..021923dc666b 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -2208,14 +2208,14 @@ static void f2fs_write_failed(struct address_space 
> *mapping, loff_t to)
>   loff_t i_size = i_size_read(inode);
>  
>   if (to > i_size) {
> - down_write(_I(inode)->i_mmap_sem);
>   down_write(_I(inode)->i_gc_rwsem[WRITE]);
> + down_write(_I(inode)->i_mmap_sem);
>  
>   truncate_pagecache(inode, i_size);
>   f2fs_truncate_blocks(inode, i_size, true);
>  
> - up_write(_I(inode)->i_gc_rwsem[WRITE]);
>   up_write(_I(inode)->i_mmap_sem);
> + up_write(_I(inode)->i_gc_rwsem[WRITE]);
>   }
>  }
>  
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 560751adba01..8b13afb23734 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -798,8 +798,8 @@ int f2fs_setattr(struct dentry *dentry, struct iattr 
> *attr)
>   if (attr->ia_valid & ATTR_SIZE) {
>   bool to_smaller = (attr->ia_size <= i_size_read(inode));
>  
> - down_write(_I(inode)->i_mmap_sem);
>   down_write(_I(inode)->i_gc_rwsem[WRITE]);
> + down_write(_I(inode)->i_mmap_sem);
>  
>   truncate_setsize(inode, attr->ia_size);
>  
> @@ -809,8 +809,8 @@ int f2fs_setattr(struct dentry *dentry, struct iattr 
> *attr)
>* do not trim all blocks after i_size if target size is
>* larger than i_size.
>*/
> - up_write(_I(inode)->i_gc_rwsem[WRITE]);
>   up_write(_I(inode)->i_mmap_sem);
> + up_write(_I(inode)->i_gc_rwsem[WRITE]);
>  
>   if (err)
>   return err;
> @@ -963,8 +963,8 @@ static int punch_hole(struct inode *inode, loff_t offset, 
> loff_t len)
>   blk_start = (loff_t)pg_start << PAGE_SHIFT;
>   blk_end = (loff_t)pg_end << PAGE_SHIFT;
>  
> - down_write(_I(inode)->i_mmap_sem);
>   down_write(_I(inode)->i_gc_rwsem[WRITE]);
> + down_write(_I(inode)->i_mmap_sem);
>  
>   truncate_inode_pages_range(mapping, blk_start,
>   blk_end - 1);
> @@ -973,8 +973,8 @@ static int punch_hole(struct inode *inode, loff_t offset, 
> loff_t len)
>   ret = f2fs_truncate_hole(inode, pg_start, pg_end);
>   f2fs_unlock_op(sbi);
>  
> - up_write(_I(inode)->i_gc_rwsem[WRITE]);
>   up_write(_I(inode)->i_mmap_sem);
> + up_write(_I(inode)->i_gc_rwsem[WRITE]);
>   }
>   }
>  
> @@ -1201,6 +1201,7 @@ static int f2fs_do_collapse(struct inode *inode, loff_t 
> offset, loff_t len)
>  
>   /* avoid gc operation during block exchange */
>   down_write(_I(inode)->i_gc_rwsem[WRITE]);
> + down_write(_I(inode)->i_mmap_sem);
>  
>   f2fs_lock_op(sbi);
>   f2fs_drop_extent_tree(inode);
> @@ -1208,6 +1209,7 @@ static int f2fs_do_collapse(struct inode *inode, loff_t 
> offset, loff_t len)
>   ret = __exchange_data_block(inode, inode, end, start, nrpages - end, 
> true);
>   f2fs_unlock_op(sbi);
>  
> + up_write(_I(inode)->i_mmap_sem);
>   up_write(_I(inode)->i_gc_rwsem[WRITE]);
>   return ret;
>  }
> @@ -1228,17 +1230,17 @@ static int f2fs_collapse_range(struct inode *inode, 
> loff_t offset, loff_t len)
>   if (ret)
>   return ret;
>  
> - down_write(_I(inode)->i_mmap_sem);
>   /* write out all dirty pages from offset */
>   ret = filemap_write_and_wait_range(inode->i_mapping, offset, LLONG_MAX);
>   if (ret)
> - goto out_unlock;
> + return ret;
>  
>   ret = f2fs_do_collapse(inode, offset, len);
>   if (ret)
> - goto out_unlock;
> + return ret;
>  
>   /* write out all moved pages, if possible */
> + down_write(_I(inode)->i_mmap_sem);
>   filemap_write_and_wait_range(inode->i_mapping, offset, 

Re: [f2fs-dev] [PATCH v3] f2fs: fix performance issue observed with multi-thread sequential read

2018-08-14 Thread Chao Yu
On 2018/8/14 12:04, Jaegeuk Kim wrote:
> On 08/14, Chao Yu wrote:
>> On 2018/8/14 4:11, Jaegeuk Kim wrote:
>>> On 08/13, Chao Yu wrote:
 Hi Jaegeuk,

 On 2018/8/11 2:56, Jaegeuk Kim wrote:
> This reverts the commit - "b93f771 - f2fs: remove writepages lock"
> to fix the drop in sequential read throughput.
>
> Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L
> device: UFS
>
> Before -
> read throughput: 185 MB/s
> total read requests: 85177 (of these ~8 are 4KB size requests).
> total write requests: 2546 (of these ~2208 requests are written in 512KB).
>
> After -
> read throughput: 758 MB/s
> total read requests: 2417 (of these ~2042 are 512KB reads).
> total write requests: 2701 (of these ~2034 requests are written in 512KB).

 IMO, it only impact sequential read performance in a large file which may 
 be
 fragmented during multi-thread writing.

 In android environment, mostly, the large file should be cold type, such 
 as apk,
 mp3, rmvb, jpeg..., so I think we only need to serialize writepages() for 
 cold
 data area writer.

 So how about adding a mount option to serialize writepage() for different 
 type
 of log, e.g. in android, using serialize=4; by default, using serialize=7
 HOT_DATA   1
 WARM_DATA  2
 COLD_DATA  4
>>>
>>> Well, I don't think we need to give too many mount options for this 
>>> fragmented
>>> case. How about doing this for the large files only like this?
>>
>> Thread A write 512 pages Thread B write 8 pages
>>
>> - writepages()
>>  - mutex_lock(>writepages);
>>   - writepage();
>> ...
>>  - writepages()
>>   - writepage()
>>
>>   - writepage();
>> ...
>>  - mutex_unlock(>writepages);
>>
>> Above case will also cause fragmentation since we didn't serialize all
>> concurrent IO with the lock.
>>
>> Do we need to consider such case?
> 
> We can simply allow 512 and 8 in the same segment, which would not a big deal,
> when considering starvation of Thread B.

Yeah, but in reality, there would be more threads competing in same log header,
so I worry that the effect of defragmenting will not so good as we expect,
anyway, for benchmark, it's enough.

Thanks,

> 
>>
>> Thanks,
>>
>>>
>>> >From 4fea0b6e4da8512a72dd52afc7a51beb35966ad9 Mon Sep 17 00:00:00 2001
>>> From: Jaegeuk Kim 
>>> Date: Thu, 9 Aug 2018 17:53:34 -0700
>>> Subject: [PATCH] f2fs: fix performance issue observed with multi-thread
>>>  sequential read
>>>
>>> This reverts the commit - "b93f771 - f2fs: remove writepages lock"
>>> to fix the drop in sequential read throughput.
>>>
>>> Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L
>>> device: UFS
>>>
>>> Before -
>>> read throughput: 185 MB/s
>>> total read requests: 85177 (of these ~8 are 4KB size requests).
>>> total write requests: 2546 (of these ~2208 requests are written in 512KB).
>>>
>>> After -
>>> read throughput: 758 MB/s
>>> total read requests: 2417 (of these ~2042 are 512KB reads).
>>> total write requests: 2701 (of these ~2034 requests are written in 512KB).
>>>
>>> Signed-off-by: Sahitya Tummala 
>>> Signed-off-by: Jaegeuk Kim 
>>> ---
>>>  Documentation/ABI/testing/sysfs-fs-f2fs |  8 
>>>  fs/f2fs/data.c  | 10 ++
>>>  fs/f2fs/f2fs.h  |  2 ++
>>>  fs/f2fs/segment.c   |  1 +
>>>  fs/f2fs/super.c |  1 +
>>>  fs/f2fs/sysfs.c |  2 ++
>>>  6 files changed, 24 insertions(+)
>>>
>>> diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs 
>>> b/Documentation/ABI/testing/sysfs-fs-f2fs
>>> index 9b0123388f18..94a24aedcdb2 100644
>>> --- a/Documentation/ABI/testing/sysfs-fs-f2fs
>>> +++ b/Documentation/ABI/testing/sysfs-fs-f2fs
>>> @@ -51,6 +51,14 @@ Description:
>>>  Controls the dirty page count condition for the in-place-update
>>>  policies.
>>>  
>>> +What:  /sys/fs/f2fs//min_seq_blocks
>>> +Date:  August 2018
>>> +Contact:   "Jaegeuk Kim" 
>>> +Description:
>>> +Controls the dirty page count condition for batched sequential
>>> +writes in ->writepages.
>>> +
>>> +
>>>  What:  /sys/fs/f2fs//min_hot_blocks
>>>  Date:  March 2017
>>>  Contact:   "Jaegeuk Kim" 
>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>> index 45f043ee48bd..f09231b1cc74 100644
>>> --- a/fs/f2fs/data.c
>>> +++ b/fs/f2fs/data.c
>>> @@ -2132,6 +2132,7 @@ static int __f2fs_write_data_pages(struct 
>>> address_space *mapping,
>>> struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
>>> struct blk_plug plug;
>>> int ret;
>>> +   bool locked = false;
>>>  
>>> /* deal with chardevs and other special file */
>>> if 

Re: [f2fs-dev] [PATCH] f2fs: rework fault injection handling to avoid a warning

2018-08-14 Thread Chao Yu
On 2018/8/14 5:38, Arnd Bergmann wrote:
> When CONFIG_F2FS_FAULT_INJECTION is disabled, we get a warning about an
> unused label:
> 
> fs/f2fs/segment.c: In function '__submit_discard_cmd':
> fs/f2fs/segment.c:1059:1: error: label 'submit' defined but not used 
> [-Werror=unused-label]
> 
> This could be fixed by adding another #ifdef around it, but the more
> reliable way of doing this seems to be to remove the other #ifdefs
> where that is easily possible.
> 
> By defining time_to_inject() as a trivial stub, most of the checks for
> CONFIG_F2FS_FAULT_INJECTION can go away. This also leads to nicer
> formatting of the code.
> 
> Signed-off-by: Arnd Bergmann 

Reviewed-by: Chao Yu 

Thanks,


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel