Re: [PATCH 5/6] f2fs: add a kernel thread to issue discard commands asynchronously
On 2017/1/13 6:44, Jaegeuk Kim wrote: > This patch adds a kernel thread to issue discard commands. > It proposes three states, D_PREP, D_SUBMIT, and D_DONE to identify current > bio status. > > Signed-off-by: Jaegeuk Kim Reviewed-by: Chao Yu
Re: [PATCH 5/6] f2fs: add a kernel thread to issue discard commands asynchronously
On 02/08, Christoph Hellwig wrote: > On Mon, Feb 06, 2017 at 07:44:03PM -0800, Jaegeuk Kim wrote: > > Sorry for the late response due to the travel. > > > > When doing fstrim with a fresh f2fs image fomatted on Intel NVMe SSD whose > > model name is SSDPE2MW012T4, I've got the following trace. > > > > > So, I investigated why block_rq_complete() happened in more detail. > > > > The root-caused call path looks like: > > - submit_bio > > - generic_make_request > >- q->make_request_fn > > - blk_mq_make_request > > - blk_mq_map_request > > - blk_mq_alloc_request > >- blk_mq_get_tag > > - __blk_mq_get_tag > > - bt_get > > - blk_mq_run_hw_queue > > - finish_wait > > --> this waits for pending 8 discard bios! > > You're blocking on tag allocation. How many tags per queue does > your device have?, e.g. do a > > cat /sys/block/nvme0n1/mq/0/nr_tags It shows 1023. > > It seems the problem comes from the storage processing discard commands too > > slowly comparing to normal read/write IOs. > > > > Any thoughts? > > Deallocate is always going to be an exception path compared to normal > read/write… but just how much slower is going to be device > dependent. > > One option would be to reuse the number of discards, for that can you > try the series here to support vectored discards: > > http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/vectored-discard-for-axboe I tried this, but couldn't see any difference. Thanks,
Re: [PATCH 5/6] f2fs: add a kernel thread to issue discard commands asynchronously
On Mon, Feb 06, 2017 at 07:44:03PM -0800, Jaegeuk Kim wrote: > Sorry for the late response due to the travel. > > When doing fstrim with a fresh f2fs image fomatted on Intel NVMe SSD whose > model name is SSDPE2MW012T4, I've got the following trace. > So, I investigated why block_rq_complete() happened in more detail. > > The root-caused call path looks like: > - submit_bio > - generic_make_request >- q->make_request_fn > - blk_mq_make_request > - blk_mq_map_request > - blk_mq_alloc_request >- blk_mq_get_tag > - __blk_mq_get_tag > - bt_get > - blk_mq_run_hw_queue > - finish_wait > --> this waits for pending 8 discard bios! You're blocking on tag allocation. How many tags per queue does your device have?, e.g. do a cat /sys/block/nvme0n1/mq/0/nr_tags > It seems the problem comes from the storage processing discard commands too > slowly comparing to normal read/write IOs. > > Any thoughts? Deallocate is always going to be an exception path compared to normal read/write… but just how much slower is going to be device dependent. One option would be to reuse the number of discards, for that can you try the series here to support vectored discards: http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/vectored-discard-for-axboe
Re: [PATCH 5/6] f2fs: add a kernel thread to issue discard commands asynchronously
On 02/05, Christoph Hellwig wrote: > On Mon, Jan 16, 2017 at 09:32:20AM -0800, Christoph Hellwig wrote: > > On Fri, Jan 13, 2017 at 11:12:11AM -0800, Jaegeuk Kim wrote: > > > Previously, I've done to issue discard bios asynchronously. But the > > > problem that > > > I've got is that was not enough. When testing nvme SSD with noop IO > > > scheduler, > > > submit_bio() was blocked at every 8 async discard bios, resulting in very > > > slow > > > checkpoint process which blocks most of other FS operations. > > > > Where does it block? Are you running out of request? What driver is > > this on top of? > > Ping? I'm currently spending a lot of effort on fs and block dіscard > code, and I'd like to make sure we get common infrastructure instead > of local hacks. Sorry for the late response due to the travel. When doing fstrim with a fresh f2fs image fomatted on Intel NVMe SSD whose model name is SSDPE2MW012T4, I've got the following trace. ... fstrim-12620 [000] 334572.907534: f2fs_issue_discard: dev = (259,1), blkstart = 0x902900, blklen = 0x400 fstrim-12620 [000] 334572.907535: block_bio_remap: 259,0 D 75583488 + 8192 <- (259,1) 75581440 fstrim-12620 [000] 334572.907535: block_bio_queue: 259,0 D 75583488 + 8192 [fstrim] fstrim-12620 [000] 334572.907535: block_getrq: 259,0 D 75583488 + 8192 [fstrim] fstrim-12620 [000] 334572.907536: block_unplug: [fstrim] 1 fstrim-12620 [000] 334572.907536: block_rq_insert: 259,0 D 0 () 75583488 + 8192 [fstrim] fstrim-12620 [000] 334572.907536: block_rq_issue: 259,0 D 0 () 75583488 + 8192 [fstrim] < repeat 6 times > fstrim-12620 [000] 334572.907620: f2fs_issue_discard: dev = (259,1), blkstart = 0x904500, blklen = 0x400 fstrim-12620 [000] 334572.907620: block_bio_remap: 259,0 D 75640832 + 8192 <- (259,1) 75638784 fstrim-12620 [000] 334572.907620: block_bio_queue: 259,0 D 75640832 + 8192 [fstrim] fstrim-12620 [000] 334572.907621: block_getrq: 259,0 D 75640832 + 8192 [fstrim] -0 [000] d.h. 334572.907723: block_rq_complete: 259,0 D () 67260416 + 8192 [0] -0 [000] d.h. 334572.907942: block_rq_complete: 259,0 D () 67268608 + 8192 [0] -0 [000] d.h. 334572.908155: block_rq_complete: 259,0 D () 67276800 + 8192 [0] -0 [000] d.h. 334572.908374: block_rq_complete: 259,0 D () 67284992 + 8192 [0] -0 [000] d.h. 334572.908597: block_rq_complete: 259,0 D () 67293184 + 8192 [0] -0 [000] d.h. 334572.908823: block_rq_complete: 259,0 D () 67301376 + 8192 [0] -0 [000] d.h. 334572.909033: block_rq_complete: 259,0 D () 67309568 + 8192 [0] -0 [000] d.h. 334572.909216: block_rq_complete: 259,0 D () 67317760 + 8192 [0] fstrim-12620 [000] 334572.909222: block_unplug: [fstrim] 1 fstrim-12620 [000] 334572.909223: block_rq_insert: 259,0 D 0 () 75640832 + 8192 [fstrim] fstrim-12620 [000] 334572.909224: block_rq_issue: 259,0 D 0 () 75640832 + 8192 [fstrim] fstrim-12620 [000] 334572.909240: f2fs_issue_discard: dev = (259,1), blkstart = 0x904900, blklen = 0x400 fstrim-12620 [000] 334572.909241: block_bio_remap: 259,0 D 75649024 + 8192 <- (259,1) 75646976 fstrim-12620 [000] 334572.909241: block_bio_queue: 259,0 D 75649024 + 8192 [fstrim] fstrim-12620 [000] 334572.909241: block_getrq: 259,0 D 75649024 + 8192 [fstrim] fstrim-12620 [000] 334572.909242: block_unplug: [fstrim] 1 fstrim-12620 [000] 334572.909242: block_rq_insert: 259,0 D 0 () 75649024 + 8192 [fstrim] fstrim-12620 [000] 334572.909242: block_rq_issue: 259,0 D 0 () 75649024 + 8192 [fstrim] < repeat > So, I investigated why block_rq_complete() happened in more detail. The root-caused call path looks like: - submit_bio - generic_make_request - q->make_request_fn - blk_mq_make_request - blk_mq_map_request - blk_mq_alloc_request - blk_mq_get_tag - __blk_mq_get_tag - bt_get - blk_mq_run_hw_queue - finish_wait --> this waits for pending 8 discard bios! It seems the problem comes from the storage processing discard commands too slowly comparing to normal read/write IOs. Any thoughts? Thanks,
Re: [PATCH 5/6] f2fs: add a kernel thread to issue discard commands asynchronously
On Mon, Jan 16, 2017 at 09:32:20AM -0800, Christoph Hellwig wrote: > On Fri, Jan 13, 2017 at 11:12:11AM -0800, Jaegeuk Kim wrote: > > Previously, I've done to issue discard bios asynchronously. But the problem > > that > > I've got is that was not enough. When testing nvme SSD with noop IO > > scheduler, > > submit_bio() was blocked at every 8 async discard bios, resulting in very > > slow > > checkpoint process which blocks most of other FS operations. > > Where does it block? Are you running out of request? What driver is > this on top of? Ping? I'm currently spending a lot of effort on fs and block dіscard code, and I'd like to make sure we get common infrastructure instead of local hacks.
Re: [PATCH 5/6] f2fs: add a kernel thread to issue discard commands asynchronously
On Fri, Jan 13, 2017 at 11:12:11AM -0800, Jaegeuk Kim wrote: > Previously, I've done to issue discard bios asynchronously. But the problem > that > I've got is that was not enough. When testing nvme SSD with noop IO scheduler, > submit_bio() was blocked at every 8 async discard bios, resulting in very slow > checkpoint process which blocks most of other FS operations. Where does it block? Are you running out of request? What driver is this on top of?
Re: [PATCH 5/6] f2fs: add a kernel thread to issue discard commands asynchronously
On 01/13, Christoph Hellwig wrote: > On Thu, Jan 12, 2017 at 02:44:06PM -0800, Jaegeuk Kim wrote: > > This patch adds a kernel thread to issue discard commands. > > It proposes three states, D_PREP, D_SUBMIT, and D_DONE to identify current > > bio status. > > Why? Instead of creating the complexity of a thread you should look > into issuing the discard bios asynchronously, which should help to > simplify a lot of the discard logic in f2fs. Previously, I've done to issue discard bios asynchronously. But the problem that I've got is that was not enough. When testing nvme SSD with noop IO scheduler, submit_bio() was blocked at every 8 async discard bios, resulting in very slow checkpoint process which blocks most of other FS operations. Thanks,
Re: [PATCH 5/6] f2fs: add a kernel thread to issue discard commands asynchronously
On Thu, Jan 12, 2017 at 02:44:06PM -0800, Jaegeuk Kim wrote: > This patch adds a kernel thread to issue discard commands. > It proposes three states, D_PREP, D_SUBMIT, and D_DONE to identify current > bio status. Why? Instead of creating the complexity of a thread you should look into issuing the discard bios asynchronously, which should help to simplify a lot of the discard logic in f2fs.
[PATCH 5/6] f2fs: add a kernel thread to issue discard commands asynchronously
This patch adds a kernel thread to issue discard commands. It proposes three states, D_PREP, D_SUBMIT, and D_DONE to identify current bio status. Signed-off-by: Jaegeuk Kim --- fs/f2fs/f2fs.h| 11 + fs/f2fs/segment.c | 128 +- 2 files changed, 108 insertions(+), 31 deletions(-) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index e7b403cbd70f..ef5e3709c161 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -129,6 +129,7 @@ enum { }; #define MAX_DISCARD_BLOCKS(sbi) (1 << (sbi)->log_blocks_per_seg) +#define DISCARD_ISSUE_RATE 8 #define DEF_CP_INTERVAL60 /* 60 secs */ #define DEF_IDLE_INTERVAL 5 /* 5 secs */ @@ -177,18 +178,28 @@ struct discard_entry { int len;/* # of consecutive blocks of the discard */ }; +enum { + D_PREP, + D_SUBMIT, + D_DONE, +}; + struct discard_cmd { struct list_head list; /* command list */ struct completion wait; /* compleation */ block_t lstart; /* logical start address */ block_t len;/* length */ struct bio *bio;/* bio */ + int state; /* state */ }; struct discard_cmd_control { + struct task_struct *f2fs_issue_discard; /* discard thread */ struct list_head discard_entry_list;/* 4KB discard entry list */ int nr_discards;/* # of discards in the list */ struct list_head discard_cmd_list; /* discard cmd list */ + wait_queue_head_t discard_wait_queue; /* waiting queue for wake-up */ + struct mutex cmd_lock; int max_discards; /* max. discards to be issued */ }; diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index e3bec31de961..74364006bfa6 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -628,7 +628,7 @@ static void locate_dirty_segment(struct f2fs_sb_info *sbi, unsigned int segno) mutex_unlock(&dirty_i->seglist_lock); } -static struct discard_cmd *__add_discard_cmd(struct f2fs_sb_info *sbi, +static void __add_discard_cmd(struct f2fs_sb_info *sbi, struct bio *bio, block_t lstart, block_t len) { struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info; @@ -638,12 +638,30 @@ static struct discard_cmd *__add_discard_cmd(struct f2fs_sb_info *sbi, dc = f2fs_kmem_cache_alloc(discard_cmd_slab, GFP_NOFS); INIT_LIST_HEAD(&dc->list); dc->bio = bio; + bio->bi_private = dc; dc->lstart = lstart; dc->len = len; + dc->state = D_PREP; init_completion(&dc->wait); + + mutex_lock(&dcc->cmd_lock); list_add_tail(&dc->list, cmd_list); + mutex_unlock(&dcc->cmd_lock); +} + +static void __remove_discard_cmd(struct f2fs_sb_info *sbi, struct discard_cmd *dc) +{ + int err = dc->bio->bi_error; - return dc; + if (err == -EOPNOTSUPP) + err = 0; + + if (err) + f2fs_msg(sbi->sb, KERN_INFO, + "Issue discard failed, ret: %d", err); + bio_put(dc->bio); + list_del(&dc->list); + kmem_cache_free(discard_cmd_slab, dc); } /* This should be covered by global mutex, &sit_i->sentry_lock */ @@ -653,31 +671,28 @@ void f2fs_wait_discard_bio(struct f2fs_sb_info *sbi, block_t blkaddr) struct list_head *wait_list = &(dcc->discard_cmd_list); struct discard_cmd *dc, *tmp; + mutex_lock(&dcc->cmd_lock); list_for_each_entry_safe(dc, tmp, wait_list, list) { - struct bio *bio = dc->bio; - int err; - if (!completion_done(&dc->wait)) { - if ((dc->lstart <= blkaddr && - blkaddr < dc->lstart + dc->len) || - blkaddr == NULL_ADDR) + if (blkaddr == NULL_ADDR) { + if (dc->state == D_PREP) { + dc->state = D_SUBMIT; + submit_bio(dc->bio); + } + wait_for_completion_io(&dc->wait); + + __remove_discard_cmd(sbi, dc); + continue; + } + + if (dc->lstart <= blkaddr && blkaddr < dc->lstart + dc->len) { + if (dc->state == D_SUBMIT) wait_for_completion_io(&dc->wait); else - continue; + __remove_discard_cmd(sbi, dc); } - - err = bio->bi_error; - if (err == -EOPNOTSUPP) - err = 0; - - if (err) - f2fs_msg(sbi->sb, KERN_INFO, - "Issue discard failed, ret: %d", err); - -