Re: [f2fs-dev] [PATCH v3] f2fs: fix performance issue observed with multi-thread sequential read
On 2018/8/15 10:15, Jaegeuk Kim wrote: > On 08/15, Chao Yu wrote: >> On 2018/8/15 1:28, Jaegeuk Kim wrote: >>> On 08/14, Chao Yu wrote: On 2018/8/14 12:04, Jaegeuk Kim wrote: > On 08/14, Chao Yu wrote: >> On 2018/8/14 4:11, Jaegeuk Kim wrote: >>> On 08/13, Chao Yu wrote: Hi Jaegeuk, On 2018/8/11 2:56, Jaegeuk Kim wrote: > This reverts the commit - "b93f771 - f2fs: remove writepages lock" > to fix the drop in sequential read throughput. > > Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L > device: UFS > > Before - > read throughput: 185 MB/s > total read requests: 85177 (of these ~8 are 4KB size requests). > total write requests: 2546 (of these ~2208 requests are written in > 512KB). > > After - > read throughput: 758 MB/s > total read requests: 2417 (of these ~2042 are 512KB reads). > total write requests: 2701 (of these ~2034 requests are written in > 512KB). IMO, it only impact sequential read performance in a large file which may be fragmented during multi-thread writing. In android environment, mostly, the large file should be cold type, such as apk, mp3, rmvb, jpeg..., so I think we only need to serialize writepages() for cold data area writer. So how about adding a mount option to serialize writepage() for different type of log, e.g. in android, using serialize=4; by default, using serialize=7 HOT_DATA 1 WARM_DATA 2 COLD_DATA 4 >>> >>> Well, I don't think we need to give too many mount options for this >>> fragmented >>> case. How about doing this for the large files only like this? >> >> Thread A write 512 pages Thread B write 8 pages >> >> - writepages() >> - mutex_lock(>writepages); >> - writepage(); >> ... >> - writepages() >> - writepage() >> >> - writepage(); >> ... >> - mutex_unlock(>writepages); >> >> Above case will also cause fragmentation since we didn't serialize all >> concurrent IO with the lock. >> >> Do we need to consider such case? > > We can simply allow 512 and 8 in the same segment, which would not a big > deal, > when considering starvation of Thread B. Yeah, but in reality, there would be more threads competing in same log header, so I worry that the effect of defragmenting will not so good as we expect, anyway, for benchmark, it's enough. >>> >>> Basically, I think this is not a benchmark issue. :) It just reveals the >>> issue >>> much easily. Let me think about three cases: >>> 1) WB_SYNC_NONE & WB_SYNC_NONE >>> -> can simply use mutex_lock >>> >>> 2) WB_SYNC_ALL & WB_SYNC_NONE >>> -> can use mutex_lock on WB_SYNC_ALL having >512 blocks, while WB_SYNC_NONE >>> will skip writing blocks >>> >>> 3) WB_SYNC_ALL & WB_SYNC_ALL >>> -> can use mutex_lock on WB_SYNC_ALL having >512 blocks, in order to avoid >>> starvation. >>> >>> >>> I've been testing the below. >>> >>> if (!S_ISDIR(inode->i_mode) && (wbc->sync_mode != WB_SYNC_ALL || >>> get_dirty_pages(inode) <= SM_I(sbi)->min_seq_blocks)) { >>> mutex_lock(>writepages); >>> locked = true; >> >> Just cover buffered IO? how about covering Direct IO and atomic write as >> well? > > I'd expect direct IO does in-place-updates, and not sure whether we need to For initial writes, they are not IPU. > add another lock contention between buffered or direct IO. Atomic writes > would be covered by ->min_seq_blocks. Okay. :) Thanks, > >> >> Thanks, >> >>> } >>> >>> Thanks, >>> Thanks, > >> >> Thanks, >> >>> >>> >From 4fea0b6e4da8512a72dd52afc7a51beb35966ad9 Mon Sep 17 00:00:00 2001 >>> From: Jaegeuk Kim >>> Date: Thu, 9 Aug 2018 17:53:34 -0700 >>> Subject: [PATCH] f2fs: fix performance issue observed with multi-thread >>> sequential read >>> >>> This reverts the commit - "b93f771 - f2fs: remove writepages lock" >>> to fix the drop in sequential read throughput. >>> >>> Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L >>> device: UFS >>> >>> Before - >>> read throughput: 185 MB/s >>> total read requests: 85177 (of these ~8 are 4KB size requests). >>> total write requests: 2546 (of these ~2208 requests are written in >>> 512KB). >>> >>> After - >>> read throughput: 758 MB/s >>> total read requests: 2417 (of these ~2042 are 512KB reads). >>> total write requests: 2701 (of these ~2034
Re: [f2fs-dev] [PATCH] Revert "f2fs: use printk_ratelimited for f2fs_msg"
On 2018/8/15 11:25, Jaegeuk Kim wrote: > On 08/15, Chao Yu wrote: >> On 2018/8/15 11:01, Jaegeuk Kim wrote: >>> On 08/15, Chao Yu wrote: On 2018/8/15 1:09, Jaegeuk Kim wrote: > On 08/13, Chao Yu wrote: >> Don't limit printing log, so that we will not miss any key messages. > > For example, this can avoid lots of messages during roll-forward recovery. That's really important evidence that can prove filesystem didn't lose last fsynced data, when file/data is missing after SPO, otherwise, filesystem will still be treated as suspect... :( >>> >>> It's too bad that we need to rely on kernel messages for that purpose. >>> Anyway, >> >> Yes, it's very hard to make sure the problem is not caused by filesystem's >> bug >> when trouble shooting, unless there is explicit kernel message. >> >> Also, without message, trouble shooting becomes harder. >> >>> if so, we may need to use printk() for essential messages only. >> >> Now, during our debugging, we have to change to use printk to avoid potential >> message missing... > > Let me understand what messages are really important for your debugging > purpose. The most important message is about filesystem/data consistence. > I believe it'd not require all of them. Yeah, IMO, error injection prints too many messages. Thanks, > >> >> Thanks, >> >>> Thanks, > >> >> This reverts commit a36c106dffb616250117efb1cab271c19a8f94ff. >> >> Signed-off-by: Chao Yu >> --- >> fs/f2fs/super.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c >> index bf4c919fe418..3d89d94191e7 100644 >> --- a/fs/f2fs/super.c >> +++ b/fs/f2fs/super.c >> @@ -209,7 +209,7 @@ void f2fs_msg(struct super_block *sb, const char >> *level, const char *fmt, ...) >> va_start(args, fmt); >> vaf.fmt = fmt; >> vaf.va = >> -printk_ratelimited("%sF2FS-fs (%s): %pV\n", level, sb->s_id, >> ); >> +printk("%sF2FS-fs (%s): %pV\n", level, sb->s_id, ); >> va_end(args); >> } >> >> -- >> 2.18.0.rc1 > > . > >>> >>> . >>> > > . > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 2/2] f2fs: tune discard speed with storage usage rate
On 2018/8/15 11:20, Jaegeuk Kim wrote: > On 08/15, Chao Yu wrote: >> On 2018/8/15 10:56, Jaegeuk Kim wrote: >>> On 08/15, Chao Yu wrote: On 2018/8/15 10:33, Jaegeuk Kim wrote: > On 08/15, Chao Yu wrote: >> On 2018/8/15 1:23, Jaegeuk Kim wrote: >>> On 08/14, Chao Yu wrote: On 2018/8/14 12:19, Jaegeuk Kim wrote: > On 08/10, Chao Yu wrote: >> Previously, discard speed was fixed mostly, and in high usage rate >> device, we will speed up issuing discard, but it doesn't make sense >> that in a non-full filesystem, we still issue discard with slow >> speed. > > Could you please elaborate the problem in more detail? The speed > depends > on how many candidates? undiscard blocks are all 4k granularity. a) utility: filesystem: 20% + undiscard blocks: 20% = flash storage: 40% b) utility: filesystem: 40% + undiscard blocks: 25% = flash storage: 65% c) utility: filesystem: 60% + undiscard blocks: 30% = flash storage: 100% 1. for case c), we need to speed up issuing discard based on utilization of "filesystem + undiscard" instead of just utilization of filesystem. - if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL) { - dpolicy->granularity = 1; - dpolicy->max_interval = DEF_MIN_DISCARD_ISSUE_TIME; - } 2. If free space in storage touches therein threshold, performance will be very sensitive. In low-end storage, with high usage in space, even free space is reduced by 1%, performance will decrease a lot. >>> >>> So, we may need to distinguish low-end vs. high-end storage. In >>> high-end case, >>> it'd be better to avoid IO contention, while low-end device wants to >>> get more >>> discard commands as much as possible. So, how about adding an option >>> for this >>> as a tunable point? >> >> Agreed, how about adding a sysfs entry discard_tunning: >> 1: enabled, use 4k granularity, self-adapted speed based on real device >> free space. >> 0: disabled, use dcc->discard_granularity, fixed speed. >> >> By default: enabled >> >> How do you think? > > I don't think this is proper with a sysfs entry, since we already know the You mean by storage capacity? <= 32GB means low-end? >>> >>> Yes, that's current condition to judge it. If there is any other method, >>> it'd be >> >> That would be hard code... >> >> Still I have not got any other method to do the judgment except capacity. > > Maybe ufs or emmc? Yeah, that's may be a good way. I remember very initial version UFS has very poor discard performance, for that kind of storage, it will be not accurate? Let me check whether there is a flag to distinguish ufs/emmc. Thanks, > >> >> Thanks, >> >>> better to change it. >>> Thanks, > device type when mounting the partition. We won't require to change the > policy > on the fly. And, I still don't get to change the default. > >> >> Thanks, >> >>> IMO, in above cases, we'd better to issue discard with high speed for c), middle speed for b), and low speed for a). How do you think? Thanks, > > Thanks, > >> >> Anyway, it comes out undiscarded block makes FTL GC be lower >> efficient >> and causing high lifetime overhead. >> >> Let's tune discard speed as below: >> >> a. adjust default issue interval: >> originalafter >> min_interval:50ms100ms >> mid_interval:500ms 1000ms >> max_interval:6ms 1ms >> >> b. if last time we stop issuing discard due to IO interruption of >> user, >> let's reset all {min,mid,max}_interval to default one. >> >> c. tune {min,mid,max}_interval with below calculation method: >> >> base_interval = default_interval / 10; >> total_interval = default_interval - base_interval; >> interval = base_interval + total_interval * (100 - dev_util) / 100; >> >> For example: >> min_interval (:100ms) >> dev_util (%) interval (ms) >> 0100 >> 10 91 >> 20 82 >> 30 73 >> ... >> 80 28 >> 90 19 >> 100 10 >> >> Signed-off-by: Chao Yu >> --- >> fs/f2fs/f2fs.h| 11 >> fs/f2fs/segment.c | 64 >>
Re: [f2fs-dev] [PATCH] Revert "f2fs: use printk_ratelimited for f2fs_msg"
On 08/15, Chao Yu wrote: > On 2018/8/15 11:01, Jaegeuk Kim wrote: > > On 08/15, Chao Yu wrote: > >> On 2018/8/15 1:09, Jaegeuk Kim wrote: > >>> On 08/13, Chao Yu wrote: > Don't limit printing log, so that we will not miss any key messages. > >>> > >>> For example, this can avoid lots of messages during roll-forward recovery. > >> > >> That's really important evidence that can prove filesystem didn't lose last > >> fsynced data, when file/data is missing after SPO, otherwise, filesystem > >> will > >> still be treated as suspect... :( > > > > It's too bad that we need to rely on kernel messages for that purpose. > > Anyway, > > Yes, it's very hard to make sure the problem is not caused by filesystem's bug > when trouble shooting, unless there is explicit kernel message. > > Also, without message, trouble shooting becomes harder. > > > if so, we may need to use printk() for essential messages only. > > Now, during our debugging, we have to change to use printk to avoid potential > message missing... Let me understand what messages are really important for your debugging purpose. I believe it'd not require all of them. > > Thanks, > > > > >> > >> Thanks, > >> > >>> > > This reverts commit a36c106dffb616250117efb1cab271c19a8f94ff. > > Signed-off-by: Chao Yu > --- > fs/f2fs/super.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c > index bf4c919fe418..3d89d94191e7 100644 > --- a/fs/f2fs/super.c > +++ b/fs/f2fs/super.c > @@ -209,7 +209,7 @@ void f2fs_msg(struct super_block *sb, const char > *level, const char *fmt, ...) > va_start(args, fmt); > vaf.fmt = fmt; > vaf.va = > -printk_ratelimited("%sF2FS-fs (%s): %pV\n", level, sb->s_id, > ); > +printk("%sF2FS-fs (%s): %pV\n", level, sb->s_id, ); > va_end(args); > } > > -- > 2.18.0.rc1 > >>> > >>> . > >>> > > > > . > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 2/2] f2fs: tune discard speed with storage usage rate
On 08/15, Chao Yu wrote: > On 2018/8/15 10:56, Jaegeuk Kim wrote: > > On 08/15, Chao Yu wrote: > >> On 2018/8/15 10:33, Jaegeuk Kim wrote: > >>> On 08/15, Chao Yu wrote: > On 2018/8/15 1:23, Jaegeuk Kim wrote: > > On 08/14, Chao Yu wrote: > >> On 2018/8/14 12:19, Jaegeuk Kim wrote: > >>> On 08/10, Chao Yu wrote: > Previously, discard speed was fixed mostly, and in high usage rate > device, we will speed up issuing discard, but it doesn't make sense > that in a non-full filesystem, we still issue discard with slow > speed. > >>> > >>> Could you please elaborate the problem in more detail? The speed > >>> depends > >>> on how many candidates? > >> > >> undiscard blocks are all 4k granularity. > >> a) utility: filesystem: 20% + undiscard blocks: 20% = flash storage: > >> 40% > >> b) utility: filesystem: 40% + undiscard blocks: 25% = flash storage: > >> 65% > >> c) utility: filesystem: 60% + undiscard blocks: 30% = flash storage: > >> 100% > >> > >> > >> 1. for case c), we need to speed up issuing discard based on > >> utilization of > >> "filesystem + undiscard" instead of just utilization of filesystem. > >> > >> - if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL) { > >> - dpolicy->granularity = 1; > >> - dpolicy->max_interval = > >> DEF_MIN_DISCARD_ISSUE_TIME; > >> - } > >> > >> 2. If free space in storage touches therein threshold, performance > >> will be very > >> sensitive. In low-end storage, with high usage in space, even free > >> space is > >> reduced by 1%, performance will decrease a lot. > > > > So, we may need to distinguish low-end vs. high-end storage. In > > high-end case, > > it'd be better to avoid IO contention, while low-end device wants to > > get more > > discard commands as much as possible. So, how about adding an option > > for this > > as a tunable point? > > Agreed, how about adding a sysfs entry discard_tunning: > 1: enabled, use 4k granularity, self-adapted speed based on real device > free space. > 0: disabled, use dcc->discard_granularity, fixed speed. > > By default: enabled > > How do you think? > >>> > >>> I don't think this is proper with a sysfs entry, since we already know the > >> > >> You mean by storage capacity? <= 32GB means low-end? > > > > Yes, that's current condition to judge it. If there is any other method, > > it'd be > > That would be hard code... > > Still I have not got any other method to do the judgment except capacity. Maybe ufs or emmc? > > Thanks, > > > better to change it. > > > >> > >> Thanks, > >> > >>> device type when mounting the partition. We won't require to change the > >>> policy > >>> on the fly. And, I still don't get to change the default. > >>> > > Thanks, > > > > >> > >> IMO, in above cases, we'd better to issue discard with high speed for > >> c), middle > >> speed for b), and low speed for a). > >> > >> How do you think? > >> > >> Thanks, > >> > >>> > >>> Thanks, > >>> > > Anyway, it comes out undiscarded block makes FTL GC be lower > efficient > and causing high lifetime overhead. > > Let's tune discard speed as below: > > a. adjust default issue interval: > originalafter > min_interval:50ms100ms > mid_interval:500ms 1000ms > max_interval:6ms 1ms > > b. if last time we stop issuing discard due to IO interruption of > user, > let's reset all {min,mid,max}_interval to default one. > > c. tune {min,mid,max}_interval with below calculation method: > > base_interval = default_interval / 10; > total_interval = default_interval - base_interval; > interval = base_interval + total_interval * (100 - dev_util) / 100; > > For example: > min_interval (:100ms) > dev_util (%) interval (ms) > 0100 > 10 91 > 20 82 > 30 73 > ... > 80 28 > 90 19 > 100 10 > > Signed-off-by: Chao Yu > --- > fs/f2fs/f2fs.h| 11 > fs/f2fs/segment.c | 64 > +-- > fs/f2fs/segment.h | 9 +++ > fs/f2fs/super.c | 2 +- > 4 files changed, 67 insertions(+), 19 deletions(-) > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > index 273ffdaf4891..a1dd2e1c3cb9
Re: [f2fs-dev] [PATCH] Revert "f2fs: use printk_ratelimited for f2fs_msg"
On 2018/8/15 11:01, Jaegeuk Kim wrote: > On 08/15, Chao Yu wrote: >> On 2018/8/15 1:09, Jaegeuk Kim wrote: >>> On 08/13, Chao Yu wrote: Don't limit printing log, so that we will not miss any key messages. >>> >>> For example, this can avoid lots of messages during roll-forward recovery. >> >> That's really important evidence that can prove filesystem didn't lose last >> fsynced data, when file/data is missing after SPO, otherwise, filesystem will >> still be treated as suspect... :( > > It's too bad that we need to rely on kernel messages for that purpose. Anyway, Yes, it's very hard to make sure the problem is not caused by filesystem's bug when trouble shooting, unless there is explicit kernel message. Also, without message, trouble shooting becomes harder. > if so, we may need to use printk() for essential messages only. Now, during our debugging, we have to change to use printk to avoid potential message missing... Thanks, > >> >> Thanks, >> >>> This reverts commit a36c106dffb616250117efb1cab271c19a8f94ff. Signed-off-by: Chao Yu --- fs/f2fs/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index bf4c919fe418..3d89d94191e7 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -209,7 +209,7 @@ void f2fs_msg(struct super_block *sb, const char *level, const char *fmt, ...) va_start(args, fmt); vaf.fmt = fmt; vaf.va = - printk_ratelimited("%sF2FS-fs (%s): %pV\n", level, sb->s_id, ); + printk("%sF2FS-fs (%s): %pV\n", level, sb->s_id, ); va_end(args); } -- 2.18.0.rc1 >>> >>> . >>> > > . > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 2/2] f2fs: tune discard speed with storage usage rate
On 2018/8/15 10:56, Jaegeuk Kim wrote: > On 08/15, Chao Yu wrote: >> On 2018/8/15 10:33, Jaegeuk Kim wrote: >>> On 08/15, Chao Yu wrote: On 2018/8/15 1:23, Jaegeuk Kim wrote: > On 08/14, Chao Yu wrote: >> On 2018/8/14 12:19, Jaegeuk Kim wrote: >>> On 08/10, Chao Yu wrote: Previously, discard speed was fixed mostly, and in high usage rate device, we will speed up issuing discard, but it doesn't make sense that in a non-full filesystem, we still issue discard with slow speed. >>> >>> Could you please elaborate the problem in more detail? The speed depends >>> on how many candidates? >> >> undiscard blocks are all 4k granularity. >> a) utility: filesystem: 20% + undiscard blocks: 20% = flash storage: 40% >> b) utility: filesystem: 40% + undiscard blocks: 25% = flash storage: 65% >> c) utility: filesystem: 60% + undiscard blocks: 30% = flash storage: 100% >> >> >> 1. for case c), we need to speed up issuing discard based on utilization >> of >> "filesystem + undiscard" instead of just utilization of filesystem. >> >> -if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL) { >> -dpolicy->granularity = 1; >> -dpolicy->max_interval = >> DEF_MIN_DISCARD_ISSUE_TIME; >> -} >> >> 2. If free space in storage touches therein threshold, performance will >> be very >> sensitive. In low-end storage, with high usage in space, even free space >> is >> reduced by 1%, performance will decrease a lot. > > So, we may need to distinguish low-end vs. high-end storage. In high-end > case, > it'd be better to avoid IO contention, while low-end device wants to get > more > discard commands as much as possible. So, how about adding an option for > this > as a tunable point? Agreed, how about adding a sysfs entry discard_tunning: 1: enabled, use 4k granularity, self-adapted speed based on real device free space. 0: disabled, use dcc->discard_granularity, fixed speed. By default: enabled How do you think? >>> >>> I don't think this is proper with a sysfs entry, since we already know the >> >> You mean by storage capacity? <= 32GB means low-end? > > Yes, that's current condition to judge it. If there is any other method, it'd > be That would be hard code... Still I have not got any other method to do the judgment except capacity. Thanks, > better to change it. > >> >> Thanks, >> >>> device type when mounting the partition. We won't require to change the >>> policy >>> on the fly. And, I still don't get to change the default. >>> Thanks, > >> >> IMO, in above cases, we'd better to issue discard with high speed for >> c), middle >> speed for b), and low speed for a). >> >> How do you think? >> >> Thanks, >> >>> >>> Thanks, >>> Anyway, it comes out undiscarded block makes FTL GC be lower efficient and causing high lifetime overhead. Let's tune discard speed as below: a. adjust default issue interval: originalafter min_interval: 50ms100ms mid_interval: 500ms 1000ms max_interval: 6ms 1ms b. if last time we stop issuing discard due to IO interruption of user, let's reset all {min,mid,max}_interval to default one. c. tune {min,mid,max}_interval with below calculation method: base_interval = default_interval / 10; total_interval = default_interval - base_interval; interval = base_interval + total_interval * (100 - dev_util) / 100; For example: min_interval (:100ms) dev_util (%) interval (ms) 0 100 10 91 20 82 30 73 ... 80 28 90 19 10010 Signed-off-by: Chao Yu --- fs/f2fs/f2fs.h| 11 fs/f2fs/segment.c | 64 +-- fs/f2fs/segment.h | 9 +++ fs/f2fs/super.c | 2 +- 4 files changed, 67 insertions(+), 19 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 273ffdaf4891..a1dd2e1c3cb9 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -185,10 +185,9 @@ enum { #define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi) #define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 discards per round */ -#define DEF_MIN_DISCARD_ISSUE_TIME50 /* 50 ms, if exists */ -#define
Re: [f2fs-dev] [PATCH] Revert "f2fs: use printk_ratelimited for f2fs_msg"
On 08/15, Chao Yu wrote: > On 2018/8/15 1:09, Jaegeuk Kim wrote: > > On 08/13, Chao Yu wrote: > >> Don't limit printing log, so that we will not miss any key messages. > > > > For example, this can avoid lots of messages during roll-forward recovery. > > That's really important evidence that can prove filesystem didn't lose last > fsynced data, when file/data is missing after SPO, otherwise, filesystem will > still be treated as suspect... :( It's too bad that we need to rely on kernel messages for that purpose. Anyway, if so, we may need to use printk() for essential messages only. > > Thanks, > > > > >> > >> This reverts commit a36c106dffb616250117efb1cab271c19a8f94ff. > >> > >> Signed-off-by: Chao Yu > >> --- > >> fs/f2fs/super.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c > >> index bf4c919fe418..3d89d94191e7 100644 > >> --- a/fs/f2fs/super.c > >> +++ b/fs/f2fs/super.c > >> @@ -209,7 +209,7 @@ void f2fs_msg(struct super_block *sb, const char > >> *level, const char *fmt, ...) > >>va_start(args, fmt); > >>vaf.fmt = fmt; > >>vaf.va = > >> - printk_ratelimited("%sF2FS-fs (%s): %pV\n", level, sb->s_id, ); > >> + printk("%sF2FS-fs (%s): %pV\n", level, sb->s_id, ); > >>va_end(args); > >> } > >> > >> -- > >> 2.18.0.rc1 > > > > . > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 2/2] f2fs: tune discard speed with storage usage rate
On 08/15, Chao Yu wrote: > On 2018/8/15 10:33, Jaegeuk Kim wrote: > > On 08/15, Chao Yu wrote: > >> On 2018/8/15 1:23, Jaegeuk Kim wrote: > >>> On 08/14, Chao Yu wrote: > On 2018/8/14 12:19, Jaegeuk Kim wrote: > > On 08/10, Chao Yu wrote: > >> Previously, discard speed was fixed mostly, and in high usage rate > >> device, we will speed up issuing discard, but it doesn't make sense > >> that in a non-full filesystem, we still issue discard with slow speed. > > > > Could you please elaborate the problem in more detail? The speed depends > > on how many candidates? > > undiscard blocks are all 4k granularity. > a) utility: filesystem: 20% + undiscard blocks: 20% = flash storage: 40% > b) utility: filesystem: 40% + undiscard blocks: 25% = flash storage: 65% > c) utility: filesystem: 60% + undiscard blocks: 30% = flash storage: 100% > > > 1. for case c), we need to speed up issuing discard based on utilization > of > "filesystem + undiscard" instead of just utilization of filesystem. > > -if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL) { > -dpolicy->granularity = 1; > -dpolicy->max_interval = > DEF_MIN_DISCARD_ISSUE_TIME; > -} > > 2. If free space in storage touches therein threshold, performance will > be very > sensitive. In low-end storage, with high usage in space, even free space > is > reduced by 1%, performance will decrease a lot. > >>> > >>> So, we may need to distinguish low-end vs. high-end storage. In high-end > >>> case, > >>> it'd be better to avoid IO contention, while low-end device wants to get > >>> more > >>> discard commands as much as possible. So, how about adding an option for > >>> this > >>> as a tunable point? > >> > >> Agreed, how about adding a sysfs entry discard_tunning: > >> 1: enabled, use 4k granularity, self-adapted speed based on real device > >> free space. > >> 0: disabled, use dcc->discard_granularity, fixed speed. > >> > >> By default: enabled > >> > >> How do you think? > > > > I don't think this is proper with a sysfs entry, since we already know the > > You mean by storage capacity? <= 32GB means low-end? Yes, that's current condition to judge it. If there is any other method, it'd be better to change it. > > Thanks, > > > device type when mounting the partition. We won't require to change the > > policy > > on the fly. And, I still don't get to change the default. > > > >> > >> Thanks, > >> > >>> > > IMO, in above cases, we'd better to issue discard with high speed for > c), middle > speed for b), and low speed for a). > > How do you think? > > Thanks, > > > > > Thanks, > > > >> > >> Anyway, it comes out undiscarded block makes FTL GC be lower efficient > >> and causing high lifetime overhead. > >> > >> Let's tune discard speed as below: > >> > >> a. adjust default issue interval: > >>originalafter > >> min_interval: 50ms100ms > >> mid_interval: 500ms 1000ms > >> max_interval: 6ms 1ms > >> > >> b. if last time we stop issuing discard due to IO interruption of user, > >> let's reset all {min,mid,max}_interval to default one. > >> > >> c. tune {min,mid,max}_interval with below calculation method: > >> > >> base_interval = default_interval / 10; > >> total_interval = default_interval - base_interval; > >> interval = base_interval + total_interval * (100 - dev_util) / 100; > >> > >> For example: > >> min_interval (:100ms) > >> dev_util (%) interval (ms) > >> 0 100 > >> 10 91 > >> 20 82 > >> 30 73 > >> ... > >> 80 28 > >> 90 19 > >> 10010 > >> > >> Signed-off-by: Chao Yu > >> --- > >> fs/f2fs/f2fs.h| 11 > >> fs/f2fs/segment.c | 64 +-- > >> fs/f2fs/segment.h | 9 +++ > >> fs/f2fs/super.c | 2 +- > >> 4 files changed, 67 insertions(+), 19 deletions(-) > >> > >> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > >> index 273ffdaf4891..a1dd2e1c3cb9 100644 > >> --- a/fs/f2fs/f2fs.h > >> +++ b/fs/f2fs/f2fs.h > >> @@ -185,10 +185,9 @@ enum { > >> > >> #define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi) > >> #define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 > >> discards per round */ > >> -#define DEF_MIN_DISCARD_ISSUE_TIME50 /* 50 ms, if exists */ > >> -#define DEF_MID_DISCARD_ISSUE_TIME500 /* 500 ms, if device > >> busy */ > >> -#define DEF_MAX_DISCARD_ISSUE_TIME6 /* 60 s, if no > >> candidates */ >
Re: [f2fs-dev] [PATCH 2/2] f2fs: tune discard speed with storage usage rate
On 2018/8/15 10:33, Jaegeuk Kim wrote: > On 08/15, Chao Yu wrote: >> On 2018/8/15 1:23, Jaegeuk Kim wrote: >>> On 08/14, Chao Yu wrote: On 2018/8/14 12:19, Jaegeuk Kim wrote: > On 08/10, Chao Yu wrote: >> Previously, discard speed was fixed mostly, and in high usage rate >> device, we will speed up issuing discard, but it doesn't make sense >> that in a non-full filesystem, we still issue discard with slow speed. > > Could you please elaborate the problem in more detail? The speed depends > on how many candidates? undiscard blocks are all 4k granularity. a) utility: filesystem: 20% + undiscard blocks: 20% = flash storage: 40% b) utility: filesystem: 40% + undiscard blocks: 25% = flash storage: 65% c) utility: filesystem: 60% + undiscard blocks: 30% = flash storage: 100% 1. for case c), we need to speed up issuing discard based on utilization of "filesystem + undiscard" instead of just utilization of filesystem. - if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL) { - dpolicy->granularity = 1; - dpolicy->max_interval = DEF_MIN_DISCARD_ISSUE_TIME; - } 2. If free space in storage touches therein threshold, performance will be very sensitive. In low-end storage, with high usage in space, even free space is reduced by 1%, performance will decrease a lot. >>> >>> So, we may need to distinguish low-end vs. high-end storage. In high-end >>> case, >>> it'd be better to avoid IO contention, while low-end device wants to get >>> more >>> discard commands as much as possible. So, how about adding an option for >>> this >>> as a tunable point? >> >> Agreed, how about adding a sysfs entry discard_tunning: >> 1: enabled, use 4k granularity, self-adapted speed based on real device free >> space. >> 0: disabled, use dcc->discard_granularity, fixed speed. >> >> By default: enabled >> >> How do you think? > > I don't think this is proper with a sysfs entry, since we already know the You mean by storage capacity? <= 32GB means low-end? Thanks, > device type when mounting the partition. We won't require to change the policy > on the fly. And, I still don't get to change the default. > >> >> Thanks, >> >>> IMO, in above cases, we'd better to issue discard with high speed for c), middle speed for b), and low speed for a). How do you think? Thanks, > > Thanks, > >> >> Anyway, it comes out undiscarded block makes FTL GC be lower efficient >> and causing high lifetime overhead. >> >> Let's tune discard speed as below: >> >> a. adjust default issue interval: >> originalafter >> min_interval:50ms100ms >> mid_interval:500ms 1000ms >> max_interval:6ms 1ms >> >> b. if last time we stop issuing discard due to IO interruption of user, >> let's reset all {min,mid,max}_interval to default one. >> >> c. tune {min,mid,max}_interval with below calculation method: >> >> base_interval = default_interval / 10; >> total_interval = default_interval - base_interval; >> interval = base_interval + total_interval * (100 - dev_util) / 100; >> >> For example: >> min_interval (:100ms) >> dev_util (%) interval (ms) >> 0100 >> 10 91 >> 20 82 >> 30 73 >> ... >> 80 28 >> 90 19 >> 100 10 >> >> Signed-off-by: Chao Yu >> --- >> fs/f2fs/f2fs.h| 11 >> fs/f2fs/segment.c | 64 +-- >> fs/f2fs/segment.h | 9 +++ >> fs/f2fs/super.c | 2 +- >> 4 files changed, 67 insertions(+), 19 deletions(-) >> >> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h >> index 273ffdaf4891..a1dd2e1c3cb9 100644 >> --- a/fs/f2fs/f2fs.h >> +++ b/fs/f2fs/f2fs.h >> @@ -185,10 +185,9 @@ enum { >> >> #define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi) >> #define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 discards per >> round */ >> -#define DEF_MIN_DISCARD_ISSUE_TIME 50 /* 50 ms, if exists */ >> -#define DEF_MID_DISCARD_ISSUE_TIME 500 /* 500 ms, if device >> busy */ >> -#define DEF_MAX_DISCARD_ISSUE_TIME 6 /* 60 s, if no >> candidates */ >> -#define DEF_DISCARD_URGENT_UTIL 80 /* do more discard over >> 80% */ >> +#define DEF_MIN_DISCARD_ISSUE_TIME 100 /* 100 ms, if exists */ >> +#define DEF_MID_DISCARD_ISSUE_TIME 1000/* 1000 ms, if device >> busy */ >> +#define DEF_MAX_DISCARD_ISSUE_TIME 1 /* 1 ms, if no >> candidates */ >> #define DEF_CP_INTERVAL 60 /* 60
Re: [f2fs-dev] [PATCH 2/2] f2fs: tune discard speed with storage usage rate
On 08/15, Chao Yu wrote: > On 2018/8/15 1:23, Jaegeuk Kim wrote: > > On 08/14, Chao Yu wrote: > >> On 2018/8/14 12:19, Jaegeuk Kim wrote: > >>> On 08/10, Chao Yu wrote: > Previously, discard speed was fixed mostly, and in high usage rate > device, we will speed up issuing discard, but it doesn't make sense > that in a non-full filesystem, we still issue discard with slow speed. > >>> > >>> Could you please elaborate the problem in more detail? The speed depends > >>> on how many candidates? > >> > >> undiscard blocks are all 4k granularity. > >> a) utility: filesystem: 20% + undiscard blocks: 20% = flash storage: 40% > >> b) utility: filesystem: 40% + undiscard blocks: 25% = flash storage: 65% > >> c) utility: filesystem: 60% + undiscard blocks: 30% = flash storage: 100% > >> > >> > >> 1. for case c), we need to speed up issuing discard based on utilization of > >> "filesystem + undiscard" instead of just utilization of filesystem. > >> > >> - if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL) { > >> - dpolicy->granularity = 1; > >> - dpolicy->max_interval = DEF_MIN_DISCARD_ISSUE_TIME; > >> - } > >> > >> 2. If free space in storage touches therein threshold, performance will be > >> very > >> sensitive. In low-end storage, with high usage in space, even free space is > >> reduced by 1%, performance will decrease a lot. > > > > So, we may need to distinguish low-end vs. high-end storage. In high-end > > case, > > it'd be better to avoid IO contention, while low-end device wants to get > > more > > discard commands as much as possible. So, how about adding an option for > > this > > as a tunable point? > > Agreed, how about adding a sysfs entry discard_tunning: > 1: enabled, use 4k granularity, self-adapted speed based on real device free > space. > 0: disabled, use dcc->discard_granularity, fixed speed. > > By default: enabled > > How do you think? I don't think this is proper with a sysfs entry, since we already know the device type when mounting the partition. We won't require to change the policy on the fly. And, I still don't get to change the default. > > Thanks, > > > > >> > >> IMO, in above cases, we'd better to issue discard with high speed for c), > >> middle > >> speed for b), and low speed for a). > >> > >> How do you think? > >> > >> Thanks, > >> > >>> > >>> Thanks, > >>> > > Anyway, it comes out undiscarded block makes FTL GC be lower efficient > and causing high lifetime overhead. > > Let's tune discard speed as below: > > a. adjust default issue interval: > originalafter > min_interval:50ms100ms > mid_interval:500ms 1000ms > max_interval:6ms 1ms > > b. if last time we stop issuing discard due to IO interruption of user, > let's reset all {min,mid,max}_interval to default one. > > c. tune {min,mid,max}_interval with below calculation method: > > base_interval = default_interval / 10; > total_interval = default_interval - base_interval; > interval = base_interval + total_interval * (100 - dev_util) / 100; > > For example: > min_interval (:100ms) > dev_util (%) interval (ms) > 0100 > 10 91 > 20 82 > 30 73 > ... > 80 28 > 90 19 > 100 10 > > Signed-off-by: Chao Yu > --- > fs/f2fs/f2fs.h| 11 > fs/f2fs/segment.c | 64 +-- > fs/f2fs/segment.h | 9 +++ > fs/f2fs/super.c | 2 +- > 4 files changed, 67 insertions(+), 19 deletions(-) > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > index 273ffdaf4891..a1dd2e1c3cb9 100644 > --- a/fs/f2fs/f2fs.h > +++ b/fs/f2fs/f2fs.h > @@ -185,10 +185,9 @@ enum { > > #define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi) > #define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 discards per > round */ > -#define DEF_MIN_DISCARD_ISSUE_TIME 50 /* 50 ms, if exists */ > -#define DEF_MID_DISCARD_ISSUE_TIME 500 /* 500 ms, if device > busy */ > -#define DEF_MAX_DISCARD_ISSUE_TIME 6 /* 60 s, if no > candidates */ > -#define DEF_DISCARD_URGENT_UTIL 80 /* do more discard over > 80% */ > +#define DEF_MIN_DISCARD_ISSUE_TIME 100 /* 100 ms, if exists */ > +#define DEF_MID_DISCARD_ISSUE_TIME 1000/* 1000 ms, if device > busy */ > +#define DEF_MAX_DISCARD_ISSUE_TIME 1 /* 1 ms, if no > candidates */ > #define DEF_CP_INTERVAL 60 /* 60 secs */ > #define DEF_IDLE_INTERVAL 5 /* 5 secs */ > > @@ -248,7 +247,8 @@ struct
Re: [f2fs-dev] [PATCH v3] f2fs: fix performance issue observed with multi-thread sequential read
On 08/15, Chao Yu wrote: > On 2018/8/15 1:28, Jaegeuk Kim wrote: > > On 08/14, Chao Yu wrote: > >> On 2018/8/14 12:04, Jaegeuk Kim wrote: > >>> On 08/14, Chao Yu wrote: > On 2018/8/14 4:11, Jaegeuk Kim wrote: > > On 08/13, Chao Yu wrote: > >> Hi Jaegeuk, > >> > >> On 2018/8/11 2:56, Jaegeuk Kim wrote: > >>> This reverts the commit - "b93f771 - f2fs: remove writepages lock" > >>> to fix the drop in sequential read throughput. > >>> > >>> Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L > >>> device: UFS > >>> > >>> Before - > >>> read throughput: 185 MB/s > >>> total read requests: 85177 (of these ~8 are 4KB size requests). > >>> total write requests: 2546 (of these ~2208 requests are written in > >>> 512KB). > >>> > >>> After - > >>> read throughput: 758 MB/s > >>> total read requests: 2417 (of these ~2042 are 512KB reads). > >>> total write requests: 2701 (of these ~2034 requests are written in > >>> 512KB). > >> > >> IMO, it only impact sequential read performance in a large file which > >> may be > >> fragmented during multi-thread writing. > >> > >> In android environment, mostly, the large file should be cold type, > >> such as apk, > >> mp3, rmvb, jpeg..., so I think we only need to serialize writepages() > >> for cold > >> data area writer. > >> > >> So how about adding a mount option to serialize writepage() for > >> different type > >> of log, e.g. in android, using serialize=4; by default, using > >> serialize=7 > >> HOT_DATA 1 > >> WARM_DATA 2 > >> COLD_DATA 4 > > > > Well, I don't think we need to give too many mount options for this > > fragmented > > case. How about doing this for the large files only like this? > > Thread A write 512 pages Thread B write 8 pages > > - writepages() > - mutex_lock(>writepages); > - writepage(); > ... > - writepages() > - writepage() > > - writepage(); > ... > - mutex_unlock(>writepages); > > Above case will also cause fragmentation since we didn't serialize all > concurrent IO with the lock. > > Do we need to consider such case? > >>> > >>> We can simply allow 512 and 8 in the same segment, which would not a big > >>> deal, > >>> when considering starvation of Thread B. > >> > >> Yeah, but in reality, there would be more threads competing in same log > >> header, > >> so I worry that the effect of defragmenting will not so good as we expect, > >> anyway, for benchmark, it's enough. > > > > Basically, I think this is not a benchmark issue. :) It just reveals the > > issue > > much easily. Let me think about three cases: > > 1) WB_SYNC_NONE & WB_SYNC_NONE > > -> can simply use mutex_lock > > > > 2) WB_SYNC_ALL & WB_SYNC_NONE > > -> can use mutex_lock on WB_SYNC_ALL having >512 blocks, while WB_SYNC_NONE > > will skip writing blocks > > > > 3) WB_SYNC_ALL & WB_SYNC_ALL > > -> can use mutex_lock on WB_SYNC_ALL having >512 blocks, in order to avoid > > starvation. > > > > > > I've been testing the below. > > > > if (!S_ISDIR(inode->i_mode) && (wbc->sync_mode != WB_SYNC_ALL || > > get_dirty_pages(inode) <= SM_I(sbi)->min_seq_blocks)) { > > mutex_lock(>writepages); > > locked = true; > > Just cover buffered IO? how about covering Direct IO and atomic write as well? I'd expect direct IO does in-place-updates, and not sure whether we need to add another lock contention between buffered or direct IO. Atomic writes would be covered by ->min_seq_blocks. > > Thanks, > > > } > > > > Thanks, > > > >> > >> Thanks, > >> > >>> > > Thanks, > > > > > >From 4fea0b6e4da8512a72dd52afc7a51beb35966ad9 Mon Sep 17 00:00:00 2001 > > From: Jaegeuk Kim > > Date: Thu, 9 Aug 2018 17:53:34 -0700 > > Subject: [PATCH] f2fs: fix performance issue observed with multi-thread > > sequential read > > > > This reverts the commit - "b93f771 - f2fs: remove writepages lock" > > to fix the drop in sequential read throughput. > > > > Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L > > device: UFS > > > > Before - > > read throughput: 185 MB/s > > total read requests: 85177 (of these ~8 are 4KB size requests). > > total write requests: 2546 (of these ~2208 requests are written in > > 512KB). > > > > After - > > read throughput: 758 MB/s > > total read requests: 2417 (of these ~2042 are 512KB reads). > > total write requests: 2701 (of these ~2034 requests are written in > > 512KB). > > > > Signed-off-by: Sahitya Tummala > > Signed-off-by:
Re: [f2fs-dev] [PATCH v3] f2fs: fix performance issue observed with multi-thread sequential read
On 2018/8/15 1:28, Jaegeuk Kim wrote: > On 08/14, Chao Yu wrote: >> On 2018/8/14 12:04, Jaegeuk Kim wrote: >>> On 08/14, Chao Yu wrote: On 2018/8/14 4:11, Jaegeuk Kim wrote: > On 08/13, Chao Yu wrote: >> Hi Jaegeuk, >> >> On 2018/8/11 2:56, Jaegeuk Kim wrote: >>> This reverts the commit - "b93f771 - f2fs: remove writepages lock" >>> to fix the drop in sequential read throughput. >>> >>> Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L >>> device: UFS >>> >>> Before - >>> read throughput: 185 MB/s >>> total read requests: 85177 (of these ~8 are 4KB size requests). >>> total write requests: 2546 (of these ~2208 requests are written in >>> 512KB). >>> >>> After - >>> read throughput: 758 MB/s >>> total read requests: 2417 (of these ~2042 are 512KB reads). >>> total write requests: 2701 (of these ~2034 requests are written in >>> 512KB). >> >> IMO, it only impact sequential read performance in a large file which >> may be >> fragmented during multi-thread writing. >> >> In android environment, mostly, the large file should be cold type, such >> as apk, >> mp3, rmvb, jpeg..., so I think we only need to serialize writepages() >> for cold >> data area writer. >> >> So how about adding a mount option to serialize writepage() for >> different type >> of log, e.g. in android, using serialize=4; by default, using serialize=7 >> HOT_DATA 1 >> WARM_DATA2 >> COLD_DATA4 > > Well, I don't think we need to give too many mount options for this > fragmented > case. How about doing this for the large files only like this? Thread A write 512 pages Thread B write 8 pages - writepages() - mutex_lock(>writepages); - writepage(); ... - writepages() - writepage() - writepage(); ... - mutex_unlock(>writepages); Above case will also cause fragmentation since we didn't serialize all concurrent IO with the lock. Do we need to consider such case? >>> >>> We can simply allow 512 and 8 in the same segment, which would not a big >>> deal, >>> when considering starvation of Thread B. >> >> Yeah, but in reality, there would be more threads competing in same log >> header, >> so I worry that the effect of defragmenting will not so good as we expect, >> anyway, for benchmark, it's enough. > > Basically, I think this is not a benchmark issue. :) It just reveals the issue > much easily. Let me think about three cases: > 1) WB_SYNC_NONE & WB_SYNC_NONE > -> can simply use mutex_lock > > 2) WB_SYNC_ALL & WB_SYNC_NONE > -> can use mutex_lock on WB_SYNC_ALL having >512 blocks, while WB_SYNC_NONE > will skip writing blocks > > 3) WB_SYNC_ALL & WB_SYNC_ALL > -> can use mutex_lock on WB_SYNC_ALL having >512 blocks, in order to avoid > starvation. > > > I've been testing the below. > > if (!S_ISDIR(inode->i_mode) && (wbc->sync_mode != WB_SYNC_ALL || > get_dirty_pages(inode) <= SM_I(sbi)->min_seq_blocks)) { > mutex_lock(>writepages); > locked = true; Just cover buffered IO? how about covering Direct IO and atomic write as well? Thanks, > } > > Thanks, > >> >> Thanks, >> >>> Thanks, > > >From 4fea0b6e4da8512a72dd52afc7a51beb35966ad9 Mon Sep 17 00:00:00 2001 > From: Jaegeuk Kim > Date: Thu, 9 Aug 2018 17:53:34 -0700 > Subject: [PATCH] f2fs: fix performance issue observed with multi-thread > sequential read > > This reverts the commit - "b93f771 - f2fs: remove writepages lock" > to fix the drop in sequential read throughput. > > Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L > device: UFS > > Before - > read throughput: 185 MB/s > total read requests: 85177 (of these ~8 are 4KB size requests). > total write requests: 2546 (of these ~2208 requests are written in 512KB). > > After - > read throughput: 758 MB/s > total read requests: 2417 (of these ~2042 are 512KB reads). > total write requests: 2701 (of these ~2034 requests are written in 512KB). > > Signed-off-by: Sahitya Tummala > Signed-off-by: Jaegeuk Kim > --- > Documentation/ABI/testing/sysfs-fs-f2fs | 8 > fs/f2fs/data.c | 10 ++ > fs/f2fs/f2fs.h | 2 ++ > fs/f2fs/segment.c | 1 + > fs/f2fs/super.c | 1 + > fs/f2fs/sysfs.c | 2 ++ > 6 files changed, 24 insertions(+) > > diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs >
Re: [f2fs-dev] [PATCH 2/2] f2fs: tune discard speed with storage usage rate
On 2018/8/15 1:23, Jaegeuk Kim wrote: > On 08/14, Chao Yu wrote: >> On 2018/8/14 12:19, Jaegeuk Kim wrote: >>> On 08/10, Chao Yu wrote: Previously, discard speed was fixed mostly, and in high usage rate device, we will speed up issuing discard, but it doesn't make sense that in a non-full filesystem, we still issue discard with slow speed. >>> >>> Could you please elaborate the problem in more detail? The speed depends >>> on how many candidates? >> >> undiscard blocks are all 4k granularity. >> a) utility: filesystem: 20% + undiscard blocks: 20% = flash storage: 40% >> b) utility: filesystem: 40% + undiscard blocks: 25% = flash storage: 65% >> c) utility: filesystem: 60% + undiscard blocks: 30% = flash storage: 100% >> >> >> 1. for case c), we need to speed up issuing discard based on utilization of >> "filesystem + undiscard" instead of just utilization of filesystem. >> >> -if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL) { >> -dpolicy->granularity = 1; >> -dpolicy->max_interval = DEF_MIN_DISCARD_ISSUE_TIME; >> -} >> >> 2. If free space in storage touches therein threshold, performance will be >> very >> sensitive. In low-end storage, with high usage in space, even free space is >> reduced by 1%, performance will decrease a lot. > > So, we may need to distinguish low-end vs. high-end storage. In high-end case, > it'd be better to avoid IO contention, while low-end device wants to get more > discard commands as much as possible. So, how about adding an option for this > as a tunable point? Agreed, how about adding a sysfs entry discard_tunning: 1: enabled, use 4k granularity, self-adapted speed based on real device free space. 0: disabled, use dcc->discard_granularity, fixed speed. By default: enabled How do you think? Thanks, > >> >> IMO, in above cases, we'd better to issue discard with high speed for c), >> middle >> speed for b), and low speed for a). >> >> How do you think? >> >> Thanks, >> >>> >>> Thanks, >>> Anyway, it comes out undiscarded block makes FTL GC be lower efficient and causing high lifetime overhead. Let's tune discard speed as below: a. adjust default issue interval: originalafter min_interval: 50ms100ms mid_interval: 500ms 1000ms max_interval: 6ms 1ms b. if last time we stop issuing discard due to IO interruption of user, let's reset all {min,mid,max}_interval to default one. c. tune {min,mid,max}_interval with below calculation method: base_interval = default_interval / 10; total_interval = default_interval - base_interval; interval = base_interval + total_interval * (100 - dev_util) / 100; For example: min_interval (:100ms) dev_util (%) interval (ms) 0 100 10 91 20 82 30 73 ... 80 28 90 19 10010 Signed-off-by: Chao Yu --- fs/f2fs/f2fs.h| 11 fs/f2fs/segment.c | 64 +-- fs/f2fs/segment.h | 9 +++ fs/f2fs/super.c | 2 +- 4 files changed, 67 insertions(+), 19 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 273ffdaf4891..a1dd2e1c3cb9 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -185,10 +185,9 @@ enum { #define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi) #define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 discards per round */ -#define DEF_MIN_DISCARD_ISSUE_TIME50 /* 50 ms, if exists */ -#define DEF_MID_DISCARD_ISSUE_TIME500 /* 500 ms, if device busy */ -#define DEF_MAX_DISCARD_ISSUE_TIME6 /* 60 s, if no candidates */ -#define DEF_DISCARD_URGENT_UTIL 80 /* do more discard over 80% */ +#define DEF_MIN_DISCARD_ISSUE_TIME100 /* 100 ms, if exists */ +#define DEF_MID_DISCARD_ISSUE_TIME1000/* 1000 ms, if device busy */ +#define DEF_MAX_DISCARD_ISSUE_TIME1 /* 1 ms, if no candidates */ #define DEF_CP_INTERVAL 60 /* 60 secs */ #define DEF_IDLE_INTERVAL 5 /* 5 secs */ @@ -248,7 +247,8 @@ struct discard_entry { }; /* default discard granularity of inner discard thread, unit: block count */ -#define DEFAULT_DISCARD_GRANULARITY 1 +#define MID_DISCARD_GRANULARITY 16 +#define MIN_DISCARD_GRANULARITY 1 /* max discard pend list number */ #define MAX_PLIST_NUM 512 @@ -330,6 +330,7 @@ struct discard_cmd_control { atomic_t discard_cmd_cnt; /* # of cached cmd
Re: [f2fs-dev] [PATCH] Revert "f2fs: use printk_ratelimited for f2fs_msg"
On 2018/8/15 1:09, Jaegeuk Kim wrote: > On 08/13, Chao Yu wrote: >> Don't limit printing log, so that we will not miss any key messages. > > For example, this can avoid lots of messages during roll-forward recovery. That's really important evidence that can prove filesystem didn't lose last fsynced data, when file/data is missing after SPO, otherwise, filesystem will still be treated as suspect... :( Thanks, > >> >> This reverts commit a36c106dffb616250117efb1cab271c19a8f94ff. >> >> Signed-off-by: Chao Yu >> --- >> fs/f2fs/super.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c >> index bf4c919fe418..3d89d94191e7 100644 >> --- a/fs/f2fs/super.c >> +++ b/fs/f2fs/super.c >> @@ -209,7 +209,7 @@ void f2fs_msg(struct super_block *sb, const char *level, >> const char *fmt, ...) >> va_start(args, fmt); >> vaf.fmt = fmt; >> vaf.va = >> -printk_ratelimited("%sF2FS-fs (%s): %pV\n", level, sb->s_id, ); >> +printk("%sF2FS-fs (%s): %pV\n", level, sb->s_id, ); >> va_end(args); >> } >> >> -- >> 2.18.0.rc1 > > . > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 2/2] f2fs: tune discard speed with storage usage rate
On 08/14, Chao Yu wrote: > On 2018/8/14 12:19, Jaegeuk Kim wrote: > > On 08/10, Chao Yu wrote: > >> Previously, discard speed was fixed mostly, and in high usage rate > >> device, we will speed up issuing discard, but it doesn't make sense > >> that in a non-full filesystem, we still issue discard with slow speed. > > > > Could you please elaborate the problem in more detail? The speed depends > > on how many candidates? > > undiscard blocks are all 4k granularity. > a) utility: filesystem: 20% + undiscard blocks: 20% = flash storage: 40% > b) utility: filesystem: 40% + undiscard blocks: 25% = flash storage: 65% > c) utility: filesystem: 60% + undiscard blocks: 30% = flash storage: 100% > > > 1. for case c), we need to speed up issuing discard based on utilization of > "filesystem + undiscard" instead of just utilization of filesystem. > > - if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL) { > - dpolicy->granularity = 1; > - dpolicy->max_interval = DEF_MIN_DISCARD_ISSUE_TIME; > - } > > 2. If free space in storage touches therein threshold, performance will be > very > sensitive. In low-end storage, with high usage in space, even free space is > reduced by 1%, performance will decrease a lot. So, we may need to distinguish low-end vs. high-end storage. In high-end case, it'd be better to avoid IO contention, while low-end device wants to get more discard commands as much as possible. So, how about adding an option for this as a tunable point? > > IMO, in above cases, we'd better to issue discard with high speed for c), > middle > speed for b), and low speed for a). > > How do you think? > > Thanks, > > > > > Thanks, > > > >> > >> Anyway, it comes out undiscarded block makes FTL GC be lower efficient > >> and causing high lifetime overhead. > >> > >> Let's tune discard speed as below: > >> > >> a. adjust default issue interval: > >>originalafter > >> min_interval: 50ms100ms > >> mid_interval: 500ms 1000ms > >> max_interval: 6ms 1ms > >> > >> b. if last time we stop issuing discard due to IO interruption of user, > >> let's reset all {min,mid,max}_interval to default one. > >> > >> c. tune {min,mid,max}_interval with below calculation method: > >> > >> base_interval = default_interval / 10; > >> total_interval = default_interval - base_interval; > >> interval = base_interval + total_interval * (100 - dev_util) / 100; > >> > >> For example: > >> min_interval (:100ms) > >> dev_util (%) interval (ms) > >> 0 100 > >> 10 91 > >> 20 82 > >> 30 73 > >> ... > >> 80 28 > >> 90 19 > >> 10010 > >> > >> Signed-off-by: Chao Yu > >> --- > >> fs/f2fs/f2fs.h| 11 > >> fs/f2fs/segment.c | 64 +-- > >> fs/f2fs/segment.h | 9 +++ > >> fs/f2fs/super.c | 2 +- > >> 4 files changed, 67 insertions(+), 19 deletions(-) > >> > >> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > >> index 273ffdaf4891..a1dd2e1c3cb9 100644 > >> --- a/fs/f2fs/f2fs.h > >> +++ b/fs/f2fs/f2fs.h > >> @@ -185,10 +185,9 @@ enum { > >> > >> #define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi) > >> #define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 discards per > >> round */ > >> -#define DEF_MIN_DISCARD_ISSUE_TIME50 /* 50 ms, if exists */ > >> -#define DEF_MID_DISCARD_ISSUE_TIME500 /* 500 ms, if device > >> busy */ > >> -#define DEF_MAX_DISCARD_ISSUE_TIME6 /* 60 s, if no > >> candidates */ > >> -#define DEF_DISCARD_URGENT_UTIL 80 /* do more discard over > >> 80% */ > >> +#define DEF_MIN_DISCARD_ISSUE_TIME100 /* 100 ms, if exists */ > >> +#define DEF_MID_DISCARD_ISSUE_TIME1000/* 1000 ms, if device > >> busy */ > >> +#define DEF_MAX_DISCARD_ISSUE_TIME1 /* 1 ms, if no > >> candidates */ > >> #define DEF_CP_INTERVAL 60 /* 60 secs */ > >> #define DEF_IDLE_INTERVAL 5 /* 5 secs */ > >> > >> @@ -248,7 +247,8 @@ struct discard_entry { > >> }; > >> > >> /* default discard granularity of inner discard thread, unit: block count > >> */ > >> -#define DEFAULT_DISCARD_GRANULARITY 1 > >> +#define MID_DISCARD_GRANULARITY 16 > >> +#define MIN_DISCARD_GRANULARITY 1 > >> > >> /* max discard pend list number */ > >> #define MAX_PLIST_NUM 512 > >> @@ -330,6 +330,7 @@ struct discard_cmd_control { > >>atomic_t discard_cmd_cnt; /* # of cached cmd count */ > >>struct rb_root root;/* root of discard rb-tree */ > >>bool rbtree_check; /* config for consistence check > >> */ > >> + bool io_interrupted;/* last state of io interrupted > >> */ > >> }; > >> > >> /* for the list of fsync
Re: [f2fs-dev] [PATCH] Revert "f2fs: use printk_ratelimited for f2fs_msg"
On 08/13, Chao Yu wrote: > Don't limit printing log, so that we will not miss any key messages. For example, this can avoid lots of messages during roll-forward recovery. > > This reverts commit a36c106dffb616250117efb1cab271c19a8f94ff. > > Signed-off-by: Chao Yu > --- > fs/f2fs/super.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c > index bf4c919fe418..3d89d94191e7 100644 > --- a/fs/f2fs/super.c > +++ b/fs/f2fs/super.c > @@ -209,7 +209,7 @@ void f2fs_msg(struct super_block *sb, const char *level, > const char *fmt, ...) > va_start(args, fmt); > vaf.fmt = fmt; > vaf.va = > - printk_ratelimited("%sF2FS-fs (%s): %pV\n", level, sb->s_id, ); > + printk("%sF2FS-fs (%s): %pV\n", level, sb->s_id, ); > va_end(args); > } > > -- > 2.18.0.rc1 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH] f2fs: readahead encrypted block during GC
From: Chao Yu During GC, for each encrypted block, we will read block synchronously into meta page, and then submit it into current cold data log area. So this block read model with 4k granularity can make poor performance, like migrating non-encrypted block, let's readahead encrypted block as well to improve migration performance. To implement this, we choose meta page that its index is old block address of the encrypted block, and readahead ciphertext into this page, later, if readaheaded page is still updated, we will load its data into target meta page, and submit the write IO. Note that for OPU, truncation, deletion, we need to invalid meta page after we invalid old block address, to make sure we won't load invalid data from target meta page during encrypted block migration. for ((i = 0; i < 1000; i++)) do { xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync"; } done for ((i = 0; i < 1000; i+=2)) do { rm /mnt/f2fs/dir/$i; } done ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0); Before: gc-6549 [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 () 786400 + 64 [gc] gc-6549 [001] d..1 214682.212802: block_unplug: [gc] 1 gc-6549 [001] 214682.213892: block_bio_queue: 8,32 R 67494144 + 8 [gc] gc-6549 [001] 214682.213899: block_getrq: 8,32 R 67494144 + 8 [gc] gc-6549 [001] 214682.213902: block_plug: [gc] gc-6549 [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 () 67494144 + 8 [gc] gc-6549 [001] d..1 214682.213908: block_unplug: [gc] 1 gc-6549 [001] 214682.226405: block_bio_queue: 8,32 R 67494152 + 8 [gc] gc-6549 [001] 214682.226412: block_getrq: 8,32 R 67494152 + 8 [gc] gc-6549 [001] 214682.226414: block_plug: [gc] gc-6549 [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 () 67494152 + 8 [gc] gc-6549 [001] d..1 214682.226420: block_unplug: [gc] 1 gc-6549 [001] 214682.226904: block_bio_queue: 8,32 R 67494160 + 8 [gc] gc-6549 [001] 214682.226910: block_getrq: 8,32 R 67494160 + 8 [gc] gc-6549 [001] 214682.226911: block_plug: [gc] gc-6549 [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 () 67494160 + 8 [gc] gc-6549 [001] d..1 214682.226916: block_unplug: [gc] 1 After: gc-5678 [003] 214327.025906: block_bio_queue: 8,32 R 67493824 + 8 [gc] gc-5678 [003] 214327.025908: block_bio_backmerge: 8,32 R 67493824 + 8 [gc] gc-5678 [003] 214327.025915: block_bio_queue: 8,32 R 67493832 + 8 [gc] gc-5678 [003] 214327.025917: block_bio_backmerge: 8,32 R 67493832 + 8 [gc] gc-5678 [003] 214327.025923: block_bio_queue: 8,32 R 67493840 + 8 [gc] gc-5678 [003] 214327.025925: block_bio_backmerge: 8,32 R 67493840 + 8 [gc] gc-5678 [003] 214327.025932: block_bio_queue: 8,32 R 67493848 + 8 [gc] gc-5678 [003] 214327.025934: block_bio_backmerge: 8,32 R 67493848 + 8 [gc] gc-5678 [003] 214327.025941: block_bio_queue: 8,32 R 67493856 + 8 [gc] gc-5678 [003] 214327.025943: block_bio_backmerge: 8,32 R 67493856 + 8 [gc] gc-5678 [003] 214327.025953: block_bio_queue: 8,32 R 67493864 + 8 [gc] gc-5678 [003] 214327.025955: block_bio_backmerge: 8,32 R 67493864 + 8 [gc] gc-5678 [003] 214327.025962: block_bio_queue: 8,32 R 67493872 + 8 [gc] gc-5678 [003] 214327.025964: block_bio_backmerge: 8,32 R 67493872 + 8 [gc] gc-5678 [003] 214327.025970: block_bio_queue: 8,32 R 67493880 + 8 [gc] gc-5678 [003] 214327.025972: block_bio_backmerge: 8,32 R 67493880 + 8 [gc] gc-5678 [003] 214327.026000: block_bio_queue: 8,32 WS 34123776 + 2048 [gc] gc-5678 [003] 214327.026019: block_getrq: 8,32 WS 34123776 + 2048 [gc] gc-5678 [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 () 67493632 + 256 [gc] gc-5678 [003] d..1 214327.026023: block_unplug: [gc] 1 gc-5678 [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 () 67493632 + 256 [gc] gc-5678 [003] 214327.026046: block_plug: [gc] Signed-off-by: Chao Yu --- fs/f2fs/data.c| 35 ++- fs/f2fs/gc.c | 111 +- fs/f2fs/segment.c | 10 - 3 files changed, 134 insertions(+), 22 deletions(-) diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 339b26838f9b..fbed8d59d10a 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -877,6 +877,7 @@ static int __allocate_data_block(struct dnode_of_data *dn, int seg_type) struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode);
[f2fs-dev] [PATCH 5/5] orangefs: cache NULL when both default_acl and acl are NULL
default_acl and acl of newly created inode will be initiated as ACL_NOT_CACHED in vfs function inode_init_always() and later will be updated by calling xxx_init_acl() in specific filesystems. Howerver, when default_acl and acl are NULL then they keep the value of ACL_NOT_CACHED, this patch tries to cache NULL for acl/default_acl in this case. Signed-off-by: Chengguang Xu --- fs/orangefs/acl.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/orangefs/acl.c b/fs/orangefs/acl.c index 10587413b20e..e3b043a263bc 100644 --- a/fs/orangefs/acl.c +++ b/fs/orangefs/acl.c @@ -175,6 +175,9 @@ int orangefs_init_acl(struct inode *inode, struct inode *dir) posix_acl_release(acl); } + if (!default_acl && !acl) + cache_no_acl(inode); + /* If mode of the inode was changed, then do a forcible ->setattr */ if (mode != inode->i_mode) { memset(, 0, sizeof iattr); -- 2.17.1 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH 1/5] ext2: cache NULL when both default_acl and acl are NULL
default_acl and acl of newly created inode will be initiated as ACL_NOT_CACHED in vfs function inode_init_always() and later will be updated by calling xxx_init_acl() in specific filesystems. Howerver, when default_acl and acl are NULL then they keep the value of ACL_NOT_CACHED, this patch tries to cache NULL for acl/default_acl in this case. Signed-off-by: Chengguang Xu --- fs/ext2/acl.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/ext2/acl.c b/fs/ext2/acl.c index 224c04abb2e5..74411e8ea507 100644 --- a/fs/ext2/acl.c +++ b/fs/ext2/acl.c @@ -262,5 +262,8 @@ ext2_init_acl(struct inode *inode, struct inode *dir) error = __ext2_set_acl(inode, acl, ACL_TYPE_ACCESS); posix_acl_release(acl); } + if (!default_acl && !acl) + cache_no_acl(inode); + return error; } -- 2.17.1 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH 4/5] jfs: cache NULL when both default_acl and acl are NULL
default_acl and acl of newly created inode will be initiated as ACL_NOT_CACHED in vfs function inode_init_always() and later will be updated by calling xxx_init_acl() in specific filesystems. Howerver, when default_acl and acl are NULL then they keep the value of ACL_NOT_CACHED, this patch tries to cache NULL for acl/default_acl in this case. Signed-off-by: Chengguang Xu --- fs/jfs/acl.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/jfs/acl.c b/fs/jfs/acl.c index 2e71b6e7e646..15b9a9b74d72 100644 --- a/fs/jfs/acl.c +++ b/fs/jfs/acl.c @@ -154,6 +154,9 @@ int jfs_init_acl(tid_t tid, struct inode *inode, struct inode *dir) posix_acl_release(acl); } + if (!default_acl && !acl) + cache_no_acl(inode); + JFS_IP(inode)->mode2 = (JFS_IP(inode)->mode2 & 0x) | inode->i_mode; -- 2.17.1 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH 2/5] ext4: cache NULL when both default_acl and acl are NULL
default_acl and acl of newly created inode will be initiated as ACL_NOT_CACHED in vfs function inode_init_always() and later will be updated by calling xxx_init_acl() in specific filesystems. Howerver, when default_acl and acl are NULL then they keep the value of ACL_NOT_CACHED, this patch tries to cache NULL for acl/default_acl in this case. Signed-off-by: Chengguang Xu --- fs/ext4/acl.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/ext4/acl.c b/fs/ext4/acl.c index fb50f9aa6ead..69a01d28356d 100644 --- a/fs/ext4/acl.c +++ b/fs/ext4/acl.c @@ -291,5 +291,8 @@ ext4_init_acl(handle_t *handle, struct inode *inode, struct inode *dir) acl, XATTR_CREATE); posix_acl_release(acl); } + if (!default_acl && !acl) + cache_no_acl(inode); + return error; } -- 2.17.1 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [PATCH 3/5] f2fs: cache NULL when both default_acl and acl are NULL
default_acl and acl of newly created inode will be initiated as ACL_NOT_CACHED in vfs function inode_init_always() and later will be updated by calling xxx_init_acl() in specific filesystems. Howerver, when default_acl and acl are NULL then they keep the value of ACL_NOT_CACHED, this patch tries to cache NULL for acl/default_acl in this case. Signed-off-by: Chengguang Xu --- fs/f2fs/acl.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c index 111824199a88..97ed555316be 100644 --- a/fs/f2fs/acl.c +++ b/fs/f2fs/acl.c @@ -401,6 +401,8 @@ int f2fs_init_acl(struct inode *inode, struct inode *dir, struct page *ipage, ipage); posix_acl_release(acl); } + if (!default_acl && !acl) + cache_no_acl(inode); return error; } -- 2.17.1 -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH] Revert "f2fs: use printk_ratelimited for f2fs_msg"
Hi Joe, thanks for pointing out this, I didn't see any cases need to be limited now. Thanks, On 2018/8/13 17:50, Joe Perches wrote: > On Mon, 2018-08-13 at 14:28 +0800, Chao Yu wrote: >> Don't limit printing log, so that we will not miss any key messages. >> >> This reverts commit a36c106dffb616250117efb1cab271c19a8f94ff. > [] >> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c > [] >> @@ -209,7 +209,7 @@ void f2fs_msg(struct super_block *sb, const char *level, >> const char *fmt, ...) >> va_start(args, fmt); >> vaf.fmt = fmt; >> vaf.va = >> -printk_ratelimited("%sF2FS-fs (%s): %pV\n", level, sb->s_id, ); >> +printk("%sF2FS-fs (%s): %pV\n", level, sb->s_id, ); >> va_end(args); >> } > > If there was value in ratelimiting these messages at all, > perhaps there is value in using a specific ratelimit_state > other than > > static DEFINE_RATELIMIT_STATE(_rs, \ > DEFAULT_RATELIMIT_INTERVAL, \ > DEFAULT_RATELIMIT_BURST); \ > > > > > . > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
Re: [f2fs-dev] [PATCH 2/2] f2fs: tune discard speed with storage usage rate
On 2018/8/14 12:19, Jaegeuk Kim wrote: > On 08/10, Chao Yu wrote: >> Previously, discard speed was fixed mostly, and in high usage rate >> device, we will speed up issuing discard, but it doesn't make sense >> that in a non-full filesystem, we still issue discard with slow speed. > > Could you please elaborate the problem in more detail? The speed depends > on how many candidates? undiscard blocks are all 4k granularity. a) utility: filesystem: 20% + undiscard blocks: 20% = flash storage: 40% b) utility: filesystem: 40% + undiscard blocks: 25% = flash storage: 65% c) utility: filesystem: 60% + undiscard blocks: 30% = flash storage: 100% 1. for case c), we need to speed up issuing discard based on utilization of "filesystem + undiscard" instead of just utilization of filesystem. - if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL) { - dpolicy->granularity = 1; - dpolicy->max_interval = DEF_MIN_DISCARD_ISSUE_TIME; - } 2. If free space in storage touches therein threshold, performance will be very sensitive. In low-end storage, with high usage in space, even free space is reduced by 1%, performance will decrease a lot. IMO, in above cases, we'd better to issue discard with high speed for c), middle speed for b), and low speed for a). How do you think? Thanks, > > Thanks, > >> >> Anyway, it comes out undiscarded block makes FTL GC be lower efficient >> and causing high lifetime overhead. >> >> Let's tune discard speed as below: >> >> a. adjust default issue interval: >> originalafter >> min_interval:50ms100ms >> mid_interval:500ms 1000ms >> max_interval:6ms 1ms >> >> b. if last time we stop issuing discard due to IO interruption of user, >> let's reset all {min,mid,max}_interval to default one. >> >> c. tune {min,mid,max}_interval with below calculation method: >> >> base_interval = default_interval / 10; >> total_interval = default_interval - base_interval; >> interval = base_interval + total_interval * (100 - dev_util) / 100; >> >> For example: >> min_interval (:100ms) >> dev_util (%) interval (ms) >> 0100 >> 10 91 >> 20 82 >> 30 73 >> ... >> 80 28 >> 90 19 >> 100 10 >> >> Signed-off-by: Chao Yu >> --- >> fs/f2fs/f2fs.h| 11 >> fs/f2fs/segment.c | 64 +-- >> fs/f2fs/segment.h | 9 +++ >> fs/f2fs/super.c | 2 +- >> 4 files changed, 67 insertions(+), 19 deletions(-) >> >> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h >> index 273ffdaf4891..a1dd2e1c3cb9 100644 >> --- a/fs/f2fs/f2fs.h >> +++ b/fs/f2fs/f2fs.h >> @@ -185,10 +185,9 @@ enum { >> >> #define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi) >> #define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 discards per >> round */ >> -#define DEF_MIN_DISCARD_ISSUE_TIME 50 /* 50 ms, if exists */ >> -#define DEF_MID_DISCARD_ISSUE_TIME 500 /* 500 ms, if device busy */ >> -#define DEF_MAX_DISCARD_ISSUE_TIME 6 /* 60 s, if no candidates */ >> -#define DEF_DISCARD_URGENT_UTIL 80 /* do more discard over >> 80% */ >> +#define DEF_MIN_DISCARD_ISSUE_TIME 100 /* 100 ms, if exists */ >> +#define DEF_MID_DISCARD_ISSUE_TIME 1000/* 1000 ms, if device busy */ >> +#define DEF_MAX_DISCARD_ISSUE_TIME 1 /* 1 ms, if no candidates */ >> #define DEF_CP_INTERVAL 60 /* 60 secs */ >> #define DEF_IDLE_INTERVAL 5 /* 5 secs */ >> >> @@ -248,7 +247,8 @@ struct discard_entry { >> }; >> >> /* default discard granularity of inner discard thread, unit: block count */ >> -#define DEFAULT_DISCARD_GRANULARITY 1 >> +#define MID_DISCARD_GRANULARITY 16 >> +#define MIN_DISCARD_GRANULARITY 1 >> >> /* max discard pend list number */ >> #define MAX_PLIST_NUM 512 >> @@ -330,6 +330,7 @@ struct discard_cmd_control { >> atomic_t discard_cmd_cnt; /* # of cached cmd count */ >> struct rb_root root;/* root of discard rb-tree */ >> bool rbtree_check; /* config for consistence check >> */ >> +bool io_interrupted;/* last state of io interrupted >> */ >> }; >> >> /* for the list of fsync inodes, used only during recovery */ >> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c >> index 8b52e8dfb12f..9564aaf1f27b 100644 >> --- a/fs/f2fs/segment.c >> +++ b/fs/f2fs/segment.c >> @@ -968,6 +968,44 @@ static void __check_sit_bitmap(struct f2fs_sb_info *sbi, >> #endif >> } >> >> +static void __adjust_discard_speed(unsigned int *interval, >> +unsigned int def_interval, int dev_util) >> +{ >> +unsigned int base_interval, total_interval; >> + >> +base_interval = def_interval / 10; >> +total_interval =
Re: [f2fs-dev] [PATCH 1/2] f2fs: set 4KB discard granularity by default
On 2018/8/14 12:13, Jaegeuk Kim wrote: > On 08/10, Chao Yu wrote: >> Small granularity (size < 64K) fragmentation will cause f2fs suspending >> all pending discards, result in performance regression, so let's set >> 4KB discard granularity by default. >> >> So that without fstrim, we also have the ability to avoid any performance >> regression caused by non-alignment mapping between fs and flash device. > > This is why we added a sysfs entry. Why do we need to change the default > value every time? Of course, I didn't forget it. But, mostly, our user didn't know very well about our filesystem including each configuration's meaning, mechanism, or usage scenario, most of the time, they will choose to test f2fs with all default option, and then make the conclusion. Currently, with default 64k granularity, if we simulate fragmentation scenario of filesystem, like by a)writing 4k file and b)deleting even indexing file, then all 4k discards won't be issued, result in exhaustion of free space of flash storage, and performance degradation. So I think we'd better to consider and set default value of configuration more elaborately to avoid obvious weakness. Thoughts? Thanks, > >> >> Signed-off-by: Chao Yu >> --- >> fs/f2fs/f2fs.h | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h >> index 58431b9bfd8f..273ffdaf4891 100644 >> --- a/fs/f2fs/f2fs.h >> +++ b/fs/f2fs/f2fs.h >> @@ -248,7 +248,7 @@ struct discard_entry { >> }; >> >> /* default discard granularity of inner discard thread, unit: block count */ >> -#define DEFAULT_DISCARD_GRANULARITY 16 >> +#define DEFAULT_DISCARD_GRANULARITY 1 >> >> /* max discard pend list number */ >> #define MAX_PLIST_NUM 512 >> -- >> 2.18.0.rc1 > > . > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
[f2fs-dev] [RFC PATCH RESEND 1/4] f2fs: support superblock checksum
Now we support crc32 checksum for superblock. Reviewed-by: Chao Yu Signed-off-by: Junling Zheng --- fs/f2fs/f2fs.h | 2 ++ fs/f2fs/super.c | 29 + fs/f2fs/sysfs.c | 7 +++ include/linux/f2fs_fs.h | 3 ++- 4 files changed, 40 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 4525f4f82af0..d50d6efda96b 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -147,6 +147,7 @@ struct f2fs_mount_info { #define F2FS_FEATURE_INODE_CRTIME 0x0100 #define F2FS_FEATURE_LOST_FOUND0x0200 #define F2FS_FEATURE_VERITY0x0400 /* reserved */ +#define F2FS_FEATURE_SB_CHKSUM 0x0800 #define F2FS_HAS_FEATURE(sb, mask) \ ((F2FS_SB(sb)->raw_super->feature & cpu_to_le32(mask)) != 0) @@ -3376,6 +3377,7 @@ F2FS_FEATURE_FUNCS(flexible_inline_xattr, FLEXIBLE_INLINE_XATTR); F2FS_FEATURE_FUNCS(quota_ino, QUOTA_INO); F2FS_FEATURE_FUNCS(inode_crtime, INODE_CRTIME); F2FS_FEATURE_FUNCS(lost_found, LOST_FOUND); +F2FS_FEATURE_FUNCS(sb_chksum, SB_CHKSUM); #ifdef CONFIG_BLK_DEV_ZONED static inline int get_blkz_type(struct f2fs_sb_info *sbi, diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index bd57be470e23..3ffc336caea8 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -2149,6 +2149,26 @@ static int sanity_check_raw_super(struct f2fs_sb_info *sbi, (bh->b_data + F2FS_SUPER_OFFSET); struct super_block *sb = sbi->sb; unsigned int blocksize; + size_t crc_offset = 0; + __u32 crc = 0; + + /* Check checksum_offset and crc in superblock */ + if (le32_to_cpu(raw_super->feature) & F2FS_FEATURE_SB_CHKSUM) { + crc_offset = le32_to_cpu(raw_super->checksum_offset); + if (crc_offset != + offsetof(struct f2fs_super_block, crc)) { + f2fs_msg(sb, KERN_INFO, + "Invalid SB checksum offset: %zu", + crc_offset); + return 1; + } + crc = le32_to_cpu(raw_super->crc); + if (!f2fs_crc_valid(sbi, crc, raw_super, crc_offset)) { + f2fs_msg(sb, KERN_INFO, + "Invalid SB checksum value: %u", crc); + return 1; + } + } if (F2FS_SUPER_MAGIC != le32_to_cpu(raw_super->magic)) { f2fs_msg(sb, KERN_INFO, @@ -2568,6 +2588,7 @@ static int read_raw_super_block(struct f2fs_sb_info *sbi, int f2fs_commit_super(struct f2fs_sb_info *sbi, bool recover) { struct buffer_head *bh; + __u32 crc = 0; int err; if ((recover && f2fs_readonly(sbi->sb)) || @@ -2576,6 +2597,13 @@ int f2fs_commit_super(struct f2fs_sb_info *sbi, bool recover) return -EROFS; } + /* we should update superblock crc here */ + if (!recover && f2fs_sb_has_sb_chksum(sbi->sb)) { + crc = f2fs_crc32(sbi, F2FS_RAW_SUPER(sbi), + offsetof(struct f2fs_super_block, crc)); + F2FS_RAW_SUPER(sbi)->crc = cpu_to_le32(crc); + } + /* write back-up superblock first */ bh = sb_bread(sbi->sb, sbi->valid_super_block ? 0 : 1); if (!bh) diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c index cd2e030e47b8..c86d91be6c48 100644 --- a/fs/f2fs/sysfs.c +++ b/fs/f2fs/sysfs.c @@ -120,6 +120,9 @@ static ssize_t features_show(struct f2fs_attr *a, if (f2fs_sb_has_lost_found(sb)) len += snprintf(buf + len, PAGE_SIZE - len, "%s%s", len ? ", " : "", "lost_found"); + if (f2fs_sb_has_sb_chksum(sb)) + len += snprintf(buf + len, PAGE_SIZE - len, "%s%s", + len ? ", " : "", "sb_checksum"); len += snprintf(buf + len, PAGE_SIZE - len, "\n"); return len; } @@ -337,6 +340,7 @@ enum feat_id { FEAT_QUOTA_INO, FEAT_INODE_CRTIME, FEAT_LOST_FOUND, + FEAT_SB_CHECKSUM, }; static ssize_t f2fs_feature_show(struct f2fs_attr *a, @@ -353,6 +357,7 @@ static ssize_t f2fs_feature_show(struct f2fs_attr *a, case FEAT_QUOTA_INO: case FEAT_INODE_CRTIME: case FEAT_LOST_FOUND: + case FEAT_SB_CHECKSUM: return snprintf(buf, PAGE_SIZE, "supported\n"); } return 0; @@ -433,6 +438,7 @@ F2FS_FEATURE_RO_ATTR(flexible_inline_xattr, FEAT_FLEXIBLE_INLINE_XATTR); F2FS_FEATURE_RO_ATTR(quota_ino, FEAT_QUOTA_INO); F2FS_FEATURE_RO_ATTR(inode_crtime, FEAT_INODE_CRTIME); F2FS_FEATURE_RO_ATTR(lost_found, FEAT_LOST_FOUND); +F2FS_FEATURE_RO_ATTR(sb_checksum, FEAT_SB_CHECKSUM); #define ATTR_LIST(name) (_attr_##name.attr) static struct attribute *f2fs_attrs[] = { @@ -489,6 +495,7 @@ static struct attribute *f2fs_feat_attrs[] = { ATTR_LIST(quota_ino),
[f2fs-dev] [RFC PATCH RESEND 2/4] f2fs-tools: rename CHECKSUM_OFFSET to CP_CHKSUM_OFFSET
This patch renamed CHECKSUM_OFFSET to CP_CHKSUM_OFFSET. Reviewed-by: Chao Yu Signed-off-by: Junling Zheng --- fsck/fsck.c| 4 ++-- fsck/mount.c | 4 ++-- fsck/resize.c | 8 include/f2fs_fs.h | 6 +++--- mkfs/f2fs_format.c | 14 +++--- 5 files changed, 18 insertions(+), 18 deletions(-) diff --git a/fsck/fsck.c b/fsck/fsck.c index d550403..f080d3c 100644 --- a/fsck/fsck.c +++ b/fsck/fsck.c @@ -2010,8 +2010,8 @@ static void fix_checkpoint(struct f2fs_sb_info *sbi) set_cp(valid_node_count, fsck->chk.valid_node_cnt); set_cp(valid_inode_count, fsck->chk.valid_inode_cnt); - crc = f2fs_cal_crc32(F2FS_SUPER_MAGIC, cp, CHECKSUM_OFFSET); - *((__le32 *)((unsigned char *)cp + CHECKSUM_OFFSET)) = cpu_to_le32(crc); + crc = f2fs_cal_crc32(F2FS_SUPER_MAGIC, cp, CP_CHKSUM_OFFSET); + *((__le32 *)((unsigned char *)cp + CP_CHKSUM_OFFSET)) = cpu_to_le32(crc); cp_blk_no = get_sb(cp_blkaddr); if (sbi->cur_cp == 2) diff --git a/fsck/mount.c b/fsck/mount.c index 8fb4d59..58ef3e6 100644 --- a/fsck/mount.c +++ b/fsck/mount.c @@ -2187,8 +2187,8 @@ void write_checkpoint(struct f2fs_sb_info *sbi) flags = update_nat_bits_flags(sb, cp, flags); set_cp(ckpt_flags, flags); - crc = f2fs_cal_crc32(F2FS_SUPER_MAGIC, cp, CHECKSUM_OFFSET); - *((__le32 *)((unsigned char *)cp + CHECKSUM_OFFSET)) = cpu_to_le32(crc); + crc = f2fs_cal_crc32(F2FS_SUPER_MAGIC, cp, CP_CHKSUM_OFFSET); + *((__le32 *)((unsigned char *)cp + CP_CHKSUM_OFFSET)) = cpu_to_le32(crc); cp_blk_no = get_sb(cp_blkaddr); if (sbi->cur_cp == 2) diff --git a/fsck/resize.c b/fsck/resize.c index fe8a61a..e9612b3 100644 --- a/fsck/resize.c +++ b/fsck/resize.c @@ -90,10 +90,10 @@ static int get_new_sb(struct f2fs_super_block *sb) * It requires more pages for cp. */ if (max_sit_bitmap_size > MAX_SIT_BITMAP_SIZE_IN_CKPT) { - max_nat_bitmap_size = CHECKSUM_OFFSET - sizeof(struct f2fs_checkpoint) + 1; + max_nat_bitmap_size = CP_CHKSUM_OFFSET - sizeof(struct f2fs_checkpoint) + 1; set_sb(cp_payload, F2FS_BLK_ALIGN(max_sit_bitmap_size)); } else { - max_nat_bitmap_size = CHECKSUM_OFFSET - sizeof(struct f2fs_checkpoint) + 1 + max_nat_bitmap_size = CP_CHKSUM_OFFSET - sizeof(struct f2fs_checkpoint) + 1 - max_sit_bitmap_size; set_sb(cp_payload, 0); } @@ -520,8 +520,8 @@ static void rebuild_checkpoint(struct f2fs_sb_info *sbi, (unsigned char *)cp); new_cp->checkpoint_ver = cpu_to_le64(cp_ver + 1); - crc = f2fs_cal_crc32(F2FS_SUPER_MAGIC, new_cp, CHECKSUM_OFFSET); - *((__le32 *)((unsigned char *)new_cp + CHECKSUM_OFFSET)) = + crc = f2fs_cal_crc32(F2FS_SUPER_MAGIC, new_cp, CP_CHKSUM_OFFSET); + *((__le32 *)((unsigned char *)new_cp + CP_CHKSUM_OFFSET)) = cpu_to_le32(crc); /* Write a new checkpoint in the other set */ diff --git a/include/f2fs_fs.h b/include/f2fs_fs.h index 53fa002..e279b9f 100644 --- a/include/f2fs_fs.h +++ b/include/f2fs_fs.h @@ -278,7 +278,7 @@ static inline uint64_t bswap_64(uint64_t val) #define PAGE_CACHE_SIZE4096 #define BITS_PER_BYTE 8 #define F2FS_SUPER_MAGIC 0xF2F52010 /* F2FS Magic Number */ -#define CHECKSUM_OFFSET4092 +#define CP_CHKSUM_OFFSET 4092 #define MAX_PATH_LEN 64 #define MAX_DEVICES8 @@ -682,9 +682,9 @@ struct f2fs_checkpoint { } __attribute__((packed)); #define MAX_SIT_BITMAP_SIZE_IN_CKPT\ - (CHECKSUM_OFFSET - sizeof(struct f2fs_checkpoint) + 1 - 64) + (CP_CHKSUM_OFFSET - sizeof(struct f2fs_checkpoint) + 1 - 64) #define MAX_BITMAP_SIZE_IN_CKPT\ - (CHECKSUM_OFFSET - sizeof(struct f2fs_checkpoint) + 1) + (CP_CHKSUM_OFFSET - sizeof(struct f2fs_checkpoint) + 1) /* * For orphan inode management diff --git a/mkfs/f2fs_format.c b/mkfs/f2fs_format.c index 4b88d93..621126c 100644 --- a/mkfs/f2fs_format.c +++ b/mkfs/f2fs_format.c @@ -342,12 +342,12 @@ static int f2fs_prepare_super_block(void) * It requires more pages for cp. */ if (max_sit_bitmap_size > MAX_SIT_BITMAP_SIZE_IN_CKPT) { - max_nat_bitmap_size = CHECKSUM_OFFSET - + max_nat_bitmap_size = CP_CHKSUM_OFFSET - sizeof(struct f2fs_checkpoint) + 1; set_sb(cp_payload, F2FS_BLK_ALIGN(max_sit_bitmap_size)); } else { max_nat_bitmap_size = - CHECKSUM_OFFSET - sizeof(struct f2fs_checkpoint) + 1 +
[f2fs-dev] [RFC PATCH v2 3/4] f2fs-tools: unify the writeback of superblock
Introduce __write_superblock() to support updating specified one superblock or both, thus we can wrapper it in update_superblock() and f2fs_write_super_block to unify all places where sb needs to be updated. Signed-off-by: Junling Zheng --- v1 -> v2: - if dev_write_block failed, add some notes and free buf to avoid memory leak. fsck/fsck.h| 2 +- fsck/mount.c | 74 +++--- fsck/resize.c | 20 ++--- include/f2fs_fs.h | 35 ++ mkfs/f2fs_format.c | 19 +--- 5 files changed, 56 insertions(+), 94 deletions(-) diff --git a/fsck/fsck.h b/fsck/fsck.h index 6042e68..e3490e6 100644 --- a/fsck/fsck.h +++ b/fsck/fsck.h @@ -178,7 +178,7 @@ extern void move_curseg_info(struct f2fs_sb_info *, u64, int); extern void write_curseg_info(struct f2fs_sb_info *); extern int find_next_free_block(struct f2fs_sb_info *, u64 *, int, int); extern void write_checkpoint(struct f2fs_sb_info *); -extern void write_superblock(struct f2fs_super_block *); +extern void update_superblock(struct f2fs_super_block *, int); extern void update_data_blkaddr(struct f2fs_sb_info *, nid_t, u16, block_t); extern void update_nat_blkaddr(struct f2fs_sb_info *, nid_t, nid_t, block_t); diff --git a/fsck/mount.c b/fsck/mount.c index 58ef3e6..e7ceb8d 100644 --- a/fsck/mount.c +++ b/fsck/mount.c @@ -476,7 +476,7 @@ void print_sb_state(struct f2fs_super_block *sb) } static inline int sanity_check_area_boundary(struct f2fs_super_block *sb, - u64 offset) + enum SB_ADDR sb_addr) { u32 segment0_blkaddr = get_sb(segment0_blkaddr); u32 cp_blkaddr = get_sb(cp_blkaddr); @@ -542,14 +542,11 @@ static inline int sanity_check_area_boundary(struct f2fs_super_block *sb, segment_count_main << log_blocks_per_seg); return -1; } else if (main_end_blkaddr < seg_end_blkaddr) { - int err; - set_sb(segment_count, (main_end_blkaddr - segment0_blkaddr) >> log_blocks_per_seg); - err = dev_write(sb, offset, sizeof(struct f2fs_super_block)); - MSG(0, "Info: Fix alignment: %s, start(%u) end(%u) block(%u)\n", - err ? "failed": "done", + update_superblock(sb, 1 << sb_addr); + MSG(0, "Info: Fix alignment: start(%u) end(%u) block(%u)\n", main_blkaddr, segment0_blkaddr + (segment_count << log_blocks_per_seg), @@ -558,7 +555,7 @@ static inline int sanity_check_area_boundary(struct f2fs_super_block *sb, return 0; } -int sanity_check_raw_super(struct f2fs_super_block *sb, u64 offset) +int sanity_check_raw_super(struct f2fs_super_block *sb, enum SB_ADDR sb_addr) { unsigned int blocksize; @@ -600,30 +597,24 @@ int sanity_check_raw_super(struct f2fs_super_block *sb, u64 offset) if (get_sb(segment_count) > F2FS_MAX_SEGMENT) return -1; - if (sanity_check_area_boundary(sb, offset)) + if (sanity_check_area_boundary(sb, sb_addr)) return -1; return 0; } -int validate_super_block(struct f2fs_sb_info *sbi, int block) +int validate_super_block(struct f2fs_sb_info *sbi, enum SB_ADDR sb_addr) { - u64 offset; char buf[F2FS_BLKSIZE]; sbi->raw_super = malloc(sizeof(struct f2fs_super_block)); - if (block == 0) - offset = F2FS_SUPER_OFFSET; - else - offset = F2FS_BLKSIZE + F2FS_SUPER_OFFSET; - - if (dev_read_block(buf, block)) + if (dev_read_block(buf, sb_addr)) return -1; memcpy(sbi->raw_super, buf + F2FS_SUPER_OFFSET, sizeof(struct f2fs_super_block)); - if (!sanity_check_raw_super(sbi->raw_super, offset)) { + if (!sanity_check_raw_super(sbi->raw_super, sb_addr)) { /* get kernel version */ if (c.kd >= 0) { dev_read_version(c.version, 0, VERSION_LEN); @@ -642,13 +633,9 @@ int validate_super_block(struct f2fs_sb_info *sbi, int block) MSG(0, "Info: FSCK version\n from \"%s\"\nto \"%s\"\n", c.sb_version, c.version); if (memcmp(c.sb_version, c.version, VERSION_LEN)) { - int ret; - memcpy(sbi->raw_super->version, c.version, VERSION_LEN); - ret = dev_write(sbi->raw_super, offset, - sizeof(struct f2fs_super_block)); - ASSERT(ret >= 0); + update_superblock(sbi->raw_super, 1 << sb_addr); c.auto_fix = 0;
[f2fs-dev] [RFC PATCH RESEND 4/4] f2fs-tools: introduce sb checksum
This patch introduced crc for superblock. Reviewed-by: Chao Yu Signed-off-by: Junling Zheng --- fsck/mount.c | 23 +++ fsck/resize.c | 12 ++-- include/f2fs_fs.h | 16 +++- mkfs/f2fs_format.c | 3 +++ 4 files changed, 47 insertions(+), 7 deletions(-) diff --git a/fsck/mount.c b/fsck/mount.c index e7ceb8d..af9b219 100644 --- a/fsck/mount.c +++ b/fsck/mount.c @@ -340,6 +340,7 @@ void print_raw_sb_info(struct f2fs_super_block *sb) DISP_u32(sb, node_ino); DISP_u32(sb, meta_ino); DISP_u32(sb, cp_payload); + DISP_u32(sb, crc); DISP("%-.256s", sb, version); printf("\n"); } @@ -467,6 +468,9 @@ void print_sb_state(struct f2fs_super_block *sb) if (f & cpu_to_le32(F2FS_FEATURE_LOST_FOUND)) { MSG(0, "%s", " lost_found"); } + if (f & cpu_to_le32(F2FS_FEATURE_SB_CHKSUM)) { + MSG(0, "%s", " sb_checksum"); + } MSG(0, "\n"); MSG(0, "Info: superblock encrypt level = %d, salt = ", sb->encryption_level); @@ -555,10 +559,29 @@ static inline int sanity_check_area_boundary(struct f2fs_super_block *sb, return 0; } +static int verify_sb_chksum(struct f2fs_super_block *sb) +{ + if (SB_CHKSUM_OFFSET != get_sb(checksum_offset)) { + MSG(0, "\tInvalid SB CRC offset: %u\n", + get_sb(checksum_offset)); + return -1; + } + if (f2fs_crc_valid(get_sb(crc), sb, + get_sb(checksum_offset))) { + MSG(0, "\tInvalid SB CRC: 0x%x\n", get_sb(crc)); + return -1; + } + return 0; +} + int sanity_check_raw_super(struct f2fs_super_block *sb, enum SB_ADDR sb_addr) { unsigned int blocksize; + if ((get_sb(feature) & F2FS_FEATURE_SB_CHKSUM) && + verify_sb_chksum(sb)) + return -1; + if (F2FS_SUPER_MAGIC != get_sb(magic)) return -1; diff --git a/fsck/resize.c b/fsck/resize.c index 5161a1f..3462165 100644 --- a/fsck/resize.c +++ b/fsck/resize.c @@ -603,9 +603,6 @@ static int f2fs_resize_grow(struct f2fs_sb_info *sbi) } } - print_raw_sb_info(sb); - print_raw_sb_info(new_sb); - old_main_blkaddr = get_sb(main_blkaddr); new_main_blkaddr = get_newsb(main_blkaddr); offset = new_main_blkaddr - old_main_blkaddr; @@ -629,6 +626,9 @@ static int f2fs_resize_grow(struct f2fs_sb_info *sbi) migrate_sit(sbi, new_sb, offset_seg); rebuild_checkpoint(sbi, new_sb, offset_seg); update_superblock(new_sb, SB_ALL); + print_raw_sb_info(sb); + print_raw_sb_info(new_sb); + return 0; } @@ -658,9 +658,6 @@ static int f2fs_resize_shrink(struct f2fs_sb_info *sbi) } } - print_raw_sb_info(sb); - print_raw_sb_info(new_sb); - old_main_blkaddr = get_sb(main_blkaddr); new_main_blkaddr = get_newsb(main_blkaddr); offset = old_main_blkaddr - new_main_blkaddr; @@ -690,6 +687,9 @@ static int f2fs_resize_shrink(struct f2fs_sb_info *sbi) /* move whole data region */ //if (err) // migrate_main(sbi, offset); + print_raw_sb_info(sb); + print_raw_sb_info(new_sb); + return 0; } diff --git a/include/f2fs_fs.h b/include/f2fs_fs.h index 791e64f..7446baa 100644 --- a/include/f2fs_fs.h +++ b/include/f2fs_fs.h @@ -279,6 +279,7 @@ static inline uint64_t bswap_64(uint64_t val) #define BITS_PER_BYTE 8 #define F2FS_SUPER_MAGIC 0xF2F52010 /* F2FS Magic Number */ #define CP_CHKSUM_OFFSET 4092 +#define SB_CHKSUM_OFFSET 3068 #define MAX_PATH_LEN 64 #define MAX_DEVICES8 @@ -579,6 +580,7 @@ enum { #define F2FS_FEATURE_INODE_CRTIME 0x0100 #define F2FS_FEATURE_LOST_FOUND0x0200 #define F2FS_FEATURE_VERITY0x0400 /* reserved */ +#define F2FS_FEATURE_SB_CHKSUM 0x0800 #define MAX_VOLUME_NAME512 @@ -632,7 +634,8 @@ struct f2fs_super_block { struct f2fs_device devs[MAX_DEVICES]; /* device list */ __le32 qf_ino[F2FS_MAX_QUOTAS]; /* quota inode numbers */ __u8 hot_ext_count; /* # of hot file extension */ - __u8 reserved[314]; /* valid reserved region */ + __u8 reserved[310]; /* valid reserved region */ + __le32 crc; /* checksum of superblock */ } __attribute__((packed)); /* @@ -1337,6 +1340,7 @@ struct feature feature_table[] = { \ { "inode_crtime", F2FS_FEATURE_INODE_CRTIME },\ { "lost_found", F2FS_FEATURE_LOST_FOUND }, \ { "verity", F2FS_FEATURE_VERITY }, /* reserved */ \ + { "sb_checksum",
Re: [f2fs-dev] [PATCH] f2fs: avoid fi->i_gc_rwsem[WRITE] lock in f2fs_gc
On 2018/8/14 4:12, Jaegeuk Kim wrote: > On 08/12, Chao Yu wrote: >> On 2018/8/4 10:31, Chao Yu wrote: >>> How about keep lock order as: >>> >>> - inode_lock >>> - i_mmap_sem >>> - lock_all() >>> - unlock_all() >>> - i_gc_rwsem[WRITE] >>>- lock_op() >> >> I got below warning when testing last dev-test: >> >> - f2fs_direct_IO current lock dependency >> - i_gc_rwsem[WRITE] >> - i_mmap_sem >> - do_blockdev_direct_IO >>- i_mmap_sem >> - i_gc_rwsem[WRITE] >> > > Yeah, it seems it's true. > How about this? It looks good to me, anyway, let me check this patch with fstests again. Thanks, > > --- > fs/f2fs/data.c | 4 ++-- > fs/f2fs/file.c | 43 +++ > 2 files changed, 25 insertions(+), 22 deletions(-) > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c > index f09231b1cc74..021923dc666b 100644 > --- a/fs/f2fs/data.c > +++ b/fs/f2fs/data.c > @@ -2208,14 +2208,14 @@ static void f2fs_write_failed(struct address_space > *mapping, loff_t to) > loff_t i_size = i_size_read(inode); > > if (to > i_size) { > - down_write(_I(inode)->i_mmap_sem); > down_write(_I(inode)->i_gc_rwsem[WRITE]); > + down_write(_I(inode)->i_mmap_sem); > > truncate_pagecache(inode, i_size); > f2fs_truncate_blocks(inode, i_size, true); > > - up_write(_I(inode)->i_gc_rwsem[WRITE]); > up_write(_I(inode)->i_mmap_sem); > + up_write(_I(inode)->i_gc_rwsem[WRITE]); > } > } > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c > index 560751adba01..8b13afb23734 100644 > --- a/fs/f2fs/file.c > +++ b/fs/f2fs/file.c > @@ -798,8 +798,8 @@ int f2fs_setattr(struct dentry *dentry, struct iattr > *attr) > if (attr->ia_valid & ATTR_SIZE) { > bool to_smaller = (attr->ia_size <= i_size_read(inode)); > > - down_write(_I(inode)->i_mmap_sem); > down_write(_I(inode)->i_gc_rwsem[WRITE]); > + down_write(_I(inode)->i_mmap_sem); > > truncate_setsize(inode, attr->ia_size); > > @@ -809,8 +809,8 @@ int f2fs_setattr(struct dentry *dentry, struct iattr > *attr) >* do not trim all blocks after i_size if target size is >* larger than i_size. >*/ > - up_write(_I(inode)->i_gc_rwsem[WRITE]); > up_write(_I(inode)->i_mmap_sem); > + up_write(_I(inode)->i_gc_rwsem[WRITE]); > > if (err) > return err; > @@ -963,8 +963,8 @@ static int punch_hole(struct inode *inode, loff_t offset, > loff_t len) > blk_start = (loff_t)pg_start << PAGE_SHIFT; > blk_end = (loff_t)pg_end << PAGE_SHIFT; > > - down_write(_I(inode)->i_mmap_sem); > down_write(_I(inode)->i_gc_rwsem[WRITE]); > + down_write(_I(inode)->i_mmap_sem); > > truncate_inode_pages_range(mapping, blk_start, > blk_end - 1); > @@ -973,8 +973,8 @@ static int punch_hole(struct inode *inode, loff_t offset, > loff_t len) > ret = f2fs_truncate_hole(inode, pg_start, pg_end); > f2fs_unlock_op(sbi); > > - up_write(_I(inode)->i_gc_rwsem[WRITE]); > up_write(_I(inode)->i_mmap_sem); > + up_write(_I(inode)->i_gc_rwsem[WRITE]); > } > } > > @@ -1201,6 +1201,7 @@ static int f2fs_do_collapse(struct inode *inode, loff_t > offset, loff_t len) > > /* avoid gc operation during block exchange */ > down_write(_I(inode)->i_gc_rwsem[WRITE]); > + down_write(_I(inode)->i_mmap_sem); > > f2fs_lock_op(sbi); > f2fs_drop_extent_tree(inode); > @@ -1208,6 +1209,7 @@ static int f2fs_do_collapse(struct inode *inode, loff_t > offset, loff_t len) > ret = __exchange_data_block(inode, inode, end, start, nrpages - end, > true); > f2fs_unlock_op(sbi); > > + up_write(_I(inode)->i_mmap_sem); > up_write(_I(inode)->i_gc_rwsem[WRITE]); > return ret; > } > @@ -1228,17 +1230,17 @@ static int f2fs_collapse_range(struct inode *inode, > loff_t offset, loff_t len) > if (ret) > return ret; > > - down_write(_I(inode)->i_mmap_sem); > /* write out all dirty pages from offset */ > ret = filemap_write_and_wait_range(inode->i_mapping, offset, LLONG_MAX); > if (ret) > - goto out_unlock; > + return ret; > > ret = f2fs_do_collapse(inode, offset, len); > if (ret) > - goto out_unlock; > + return ret; > > /* write out all moved pages, if possible */ > + down_write(_I(inode)->i_mmap_sem); > filemap_write_and_wait_range(inode->i_mapping, offset,
Re: [f2fs-dev] [PATCH v3] f2fs: fix performance issue observed with multi-thread sequential read
On 2018/8/14 12:04, Jaegeuk Kim wrote: > On 08/14, Chao Yu wrote: >> On 2018/8/14 4:11, Jaegeuk Kim wrote: >>> On 08/13, Chao Yu wrote: Hi Jaegeuk, On 2018/8/11 2:56, Jaegeuk Kim wrote: > This reverts the commit - "b93f771 - f2fs: remove writepages lock" > to fix the drop in sequential read throughput. > > Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L > device: UFS > > Before - > read throughput: 185 MB/s > total read requests: 85177 (of these ~8 are 4KB size requests). > total write requests: 2546 (of these ~2208 requests are written in 512KB). > > After - > read throughput: 758 MB/s > total read requests: 2417 (of these ~2042 are 512KB reads). > total write requests: 2701 (of these ~2034 requests are written in 512KB). IMO, it only impact sequential read performance in a large file which may be fragmented during multi-thread writing. In android environment, mostly, the large file should be cold type, such as apk, mp3, rmvb, jpeg..., so I think we only need to serialize writepages() for cold data area writer. So how about adding a mount option to serialize writepage() for different type of log, e.g. in android, using serialize=4; by default, using serialize=7 HOT_DATA 1 WARM_DATA 2 COLD_DATA 4 >>> >>> Well, I don't think we need to give too many mount options for this >>> fragmented >>> case. How about doing this for the large files only like this? >> >> Thread A write 512 pages Thread B write 8 pages >> >> - writepages() >> - mutex_lock(>writepages); >> - writepage(); >> ... >> - writepages() >> - writepage() >> >> - writepage(); >> ... >> - mutex_unlock(>writepages); >> >> Above case will also cause fragmentation since we didn't serialize all >> concurrent IO with the lock. >> >> Do we need to consider such case? > > We can simply allow 512 and 8 in the same segment, which would not a big deal, > when considering starvation of Thread B. Yeah, but in reality, there would be more threads competing in same log header, so I worry that the effect of defragmenting will not so good as we expect, anyway, for benchmark, it's enough. Thanks, > >> >> Thanks, >> >>> >>> >From 4fea0b6e4da8512a72dd52afc7a51beb35966ad9 Mon Sep 17 00:00:00 2001 >>> From: Jaegeuk Kim >>> Date: Thu, 9 Aug 2018 17:53:34 -0700 >>> Subject: [PATCH] f2fs: fix performance issue observed with multi-thread >>> sequential read >>> >>> This reverts the commit - "b93f771 - f2fs: remove writepages lock" >>> to fix the drop in sequential read throughput. >>> >>> Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L >>> device: UFS >>> >>> Before - >>> read throughput: 185 MB/s >>> total read requests: 85177 (of these ~8 are 4KB size requests). >>> total write requests: 2546 (of these ~2208 requests are written in 512KB). >>> >>> After - >>> read throughput: 758 MB/s >>> total read requests: 2417 (of these ~2042 are 512KB reads). >>> total write requests: 2701 (of these ~2034 requests are written in 512KB). >>> >>> Signed-off-by: Sahitya Tummala >>> Signed-off-by: Jaegeuk Kim >>> --- >>> Documentation/ABI/testing/sysfs-fs-f2fs | 8 >>> fs/f2fs/data.c | 10 ++ >>> fs/f2fs/f2fs.h | 2 ++ >>> fs/f2fs/segment.c | 1 + >>> fs/f2fs/super.c | 1 + >>> fs/f2fs/sysfs.c | 2 ++ >>> 6 files changed, 24 insertions(+) >>> >>> diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs >>> b/Documentation/ABI/testing/sysfs-fs-f2fs >>> index 9b0123388f18..94a24aedcdb2 100644 >>> --- a/Documentation/ABI/testing/sysfs-fs-f2fs >>> +++ b/Documentation/ABI/testing/sysfs-fs-f2fs >>> @@ -51,6 +51,14 @@ Description: >>> Controls the dirty page count condition for the in-place-update >>> policies. >>> >>> +What: /sys/fs/f2fs//min_seq_blocks >>> +Date: August 2018 >>> +Contact: "Jaegeuk Kim" >>> +Description: >>> +Controls the dirty page count condition for batched sequential >>> +writes in ->writepages. >>> + >>> + >>> What: /sys/fs/f2fs//min_hot_blocks >>> Date: March 2017 >>> Contact: "Jaegeuk Kim" >>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c >>> index 45f043ee48bd..f09231b1cc74 100644 >>> --- a/fs/f2fs/data.c >>> +++ b/fs/f2fs/data.c >>> @@ -2132,6 +2132,7 @@ static int __f2fs_write_data_pages(struct >>> address_space *mapping, >>> struct f2fs_sb_info *sbi = F2FS_I_SB(inode); >>> struct blk_plug plug; >>> int ret; >>> + bool locked = false; >>> >>> /* deal with chardevs and other special file */ >>> if
Re: [f2fs-dev] [PATCH] f2fs: rework fault injection handling to avoid a warning
On 2018/8/14 5:38, Arnd Bergmann wrote: > When CONFIG_F2FS_FAULT_INJECTION is disabled, we get a warning about an > unused label: > > fs/f2fs/segment.c: In function '__submit_discard_cmd': > fs/f2fs/segment.c:1059:1: error: label 'submit' defined but not used > [-Werror=unused-label] > > This could be fixed by adding another #ifdef around it, but the more > reliable way of doing this seems to be to remove the other #ifdefs > where that is easily possible. > > By defining time_to_inject() as a trivial stub, most of the checks for > CONFIG_F2FS_FAULT_INJECTION can go away. This also leads to nicer > formatting of the code. > > Signed-off-by: Arnd Bergmann Reviewed-by: Chao Yu Thanks, -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel