Re: [PATCH 5/6] f2fs: add a kernel thread to issue discard commands asynchronously

2017-02-22 Thread Chao Yu
On 2017/1/13 6:44, Jaegeuk Kim wrote:
> This patch adds a kernel thread to issue discard commands.
> It proposes three states, D_PREP, D_SUBMIT, and D_DONE to identify current
> bio status.
> 
> Signed-off-by: Jaegeuk Kim 

Reviewed-by: Chao Yu 



Re: [PATCH 5/6] f2fs: add a kernel thread to issue discard commands asynchronously

2017-02-08 Thread Jaegeuk Kim
On 02/08, Christoph Hellwig wrote:
> On Mon, Feb 06, 2017 at 07:44:03PM -0800, Jaegeuk Kim wrote:
> > Sorry for the late response due to the travel.
> > 
> > When doing fstrim with a fresh f2fs image fomatted on Intel NVMe SSD whose
> > model name is SSDPE2MW012T4, I've got the following trace.
> 
> 
> 
> > So, I investigated why block_rq_complete() happened in more detail.
> > 
> > The root-caused call path looks like:
> >  - submit_bio
> >   - generic_make_request
> >- q->make_request_fn
> > - blk_mq_make_request
> >  - blk_mq_map_request
> >   - blk_mq_alloc_request
> >- blk_mq_get_tag
> > - __blk_mq_get_tag
> >  - bt_get
> >   - blk_mq_run_hw_queue
> >   - finish_wait
> >   --> this waits for pending 8 discard bios!
> 
> You're blocking on tag allocation.  How many tags per queue does
> your device have?, e.g. do a
> 
> cat /sys/block/nvme0n1/mq/0/nr_tags

It shows 1023.

> > It seems the problem comes from the storage processing discard commands too
> > slowly comparing to normal read/write IOs.
> > 
> > Any thoughts?
> 
> Deallocate is always going to be an exception path compared to normal
> read/write… but just how much slower is going to be device
> dependent.
> 
> One option would be to reuse the number of discards, for that can you
> try the series here to support vectored discards:
> 
> http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/vectored-discard-for-axboe

I tried this, but couldn't see any difference.

Thanks,


Re: [PATCH 5/6] f2fs: add a kernel thread to issue discard commands asynchronously

2017-02-08 Thread Christoph Hellwig
On Mon, Feb 06, 2017 at 07:44:03PM -0800, Jaegeuk Kim wrote:
> Sorry for the late response due to the travel.
> 
> When doing fstrim with a fresh f2fs image fomatted on Intel NVMe SSD whose
> model name is SSDPE2MW012T4, I've got the following trace.



> So, I investigated why block_rq_complete() happened in more detail.
> 
> The root-caused call path looks like:
>  - submit_bio
>   - generic_make_request
>- q->make_request_fn
> - blk_mq_make_request
>  - blk_mq_map_request
>   - blk_mq_alloc_request
>- blk_mq_get_tag
> - __blk_mq_get_tag
>  - bt_get
>   - blk_mq_run_hw_queue
>   - finish_wait
>   --> this waits for pending 8 discard bios!

You're blocking on tag allocation.  How many tags per queue does
your device have?, e.g. do a

cat /sys/block/nvme0n1/mq/0/nr_tags

> It seems the problem comes from the storage processing discard commands too
> slowly comparing to normal read/write IOs.
> 
> Any thoughts?

Deallocate is always going to be an exception path compared to normal
read/write… but just how much slower is going to be device
dependent.

One option would be to reuse the number of discards, for that can you
try the series here to support vectored discards:

http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/vectored-discard-for-axboe


Re: [PATCH 5/6] f2fs: add a kernel thread to issue discard commands asynchronously

2017-02-06 Thread Jaegeuk Kim
On 02/05, Christoph Hellwig wrote:
> On Mon, Jan 16, 2017 at 09:32:20AM -0800, Christoph Hellwig wrote:
> > On Fri, Jan 13, 2017 at 11:12:11AM -0800, Jaegeuk Kim wrote:
> > > Previously, I've done to issue discard bios asynchronously. But the 
> > > problem that
> > > I've got is that was not enough. When testing nvme SSD with noop IO 
> > > scheduler,
> > > submit_bio() was blocked at every 8 async discard bios, resulting in very 
> > > slow
> > > checkpoint process which blocks most of other FS operations.
> > 
> > Where does it block?  Are you running out of request?  What driver is
> > this on top of?
> 
> Ping?  I'm currently spending a lot of effort on fs and block dіscard
> code, and I'd like to make sure we get common infrastructure instead
> of local hacks.

Sorry for the late response due to the travel.

When doing fstrim with a fresh f2fs image fomatted on Intel NVMe SSD whose
model name is SSDPE2MW012T4, I've got the following trace.

...
fstrim-12620 [000]  334572.907534: f2fs_issue_discard: dev = (259,1), 
blkstart = 0x902900, blklen = 0x400
fstrim-12620 [000]  334572.907535: block_bio_remap: 259,0 D 75583488 + 8192 
<- (259,1) 75581440
fstrim-12620 [000]  334572.907535: block_bio_queue: 259,0 D 75583488 + 8192 
[fstrim]
fstrim-12620 [000]  334572.907535: block_getrq: 259,0 D 75583488 + 8192 
[fstrim]
fstrim-12620 [000]  334572.907536: block_unplug: [fstrim] 1
fstrim-12620 [000]  334572.907536: block_rq_insert: 259,0 D 0 () 75583488 + 
8192 [fstrim]
fstrim-12620 [000]  334572.907536: block_rq_issue: 259,0 D 0 () 75583488 + 
8192 [fstrim]
 < repeat 6 times >
fstrim-12620 [000]  334572.907620: f2fs_issue_discard: dev = (259,1), 
blkstart = 0x904500, blklen = 0x400
fstrim-12620 [000]  334572.907620: block_bio_remap: 259,0 D 75640832 + 8192 
<- (259,1) 75638784
fstrim-12620 [000]  334572.907620: block_bio_queue: 259,0 D 75640832 + 8192 
[fstrim]
fstrim-12620 [000]  334572.907621: block_getrq: 259,0 D 75640832 + 8192 
[fstrim]
-0 [000] d.h. 334572.907723: block_rq_complete: 259,0 D () 67260416 + 
8192 [0]
-0 [000] d.h. 334572.907942: block_rq_complete: 259,0 D () 67268608 + 
8192 [0]
-0 [000] d.h. 334572.908155: block_rq_complete: 259,0 D () 67276800 + 
8192 [0]
-0 [000] d.h. 334572.908374: block_rq_complete: 259,0 D () 67284992 + 
8192 [0]
-0 [000] d.h. 334572.908597: block_rq_complete: 259,0 D () 67293184 + 
8192 [0]
-0 [000] d.h. 334572.908823: block_rq_complete: 259,0 D () 67301376 + 
8192 [0]
-0 [000] d.h. 334572.909033: block_rq_complete: 259,0 D () 67309568 + 
8192 [0]
-0 [000] d.h. 334572.909216: block_rq_complete: 259,0 D () 67317760 + 
8192 [0]
fstrim-12620 [000]  334572.909222: block_unplug: [fstrim] 1
fstrim-12620 [000]  334572.909223: block_rq_insert: 259,0 D 0 () 75640832 + 
8192 [fstrim]
fstrim-12620 [000]  334572.909224: block_rq_issue: 259,0 D 0 () 75640832 + 
8192 [fstrim]
fstrim-12620 [000]  334572.909240: f2fs_issue_discard: dev = (259,1), 
blkstart = 0x904900, blklen = 0x400
fstrim-12620 [000]  334572.909241: block_bio_remap: 259,0 D 75649024 + 8192 
<- (259,1) 75646976
fstrim-12620 [000]  334572.909241: block_bio_queue: 259,0 D 75649024 + 8192 
[fstrim]
fstrim-12620 [000]  334572.909241: block_getrq: 259,0 D 75649024 + 8192 
[fstrim]
fstrim-12620 [000]  334572.909242: block_unplug: [fstrim] 1
fstrim-12620 [000]  334572.909242: block_rq_insert: 259,0 D 0 () 75649024 + 
8192 [fstrim]
fstrim-12620 [000]  334572.909242: block_rq_issue: 259,0 D 0 () 75649024 + 
8192 [fstrim]
 < repeat >

So, I investigated why block_rq_complete() happened in more detail.

The root-caused call path looks like:
 - submit_bio
  - generic_make_request
   - q->make_request_fn
- blk_mq_make_request
 - blk_mq_map_request
  - blk_mq_alloc_request
   - blk_mq_get_tag
- __blk_mq_get_tag
 - bt_get
  - blk_mq_run_hw_queue
  - finish_wait
  --> this waits for pending 8 discard bios!

It seems the problem comes from the storage processing discard commands too
slowly comparing to normal read/write IOs.

Any thoughts?

Thanks,


Re: [PATCH 5/6] f2fs: add a kernel thread to issue discard commands asynchronously

2017-02-05 Thread Christoph Hellwig
On Mon, Jan 16, 2017 at 09:32:20AM -0800, Christoph Hellwig wrote:
> On Fri, Jan 13, 2017 at 11:12:11AM -0800, Jaegeuk Kim wrote:
> > Previously, I've done to issue discard bios asynchronously. But the problem 
> > that
> > I've got is that was not enough. When testing nvme SSD with noop IO 
> > scheduler,
> > submit_bio() was blocked at every 8 async discard bios, resulting in very 
> > slow
> > checkpoint process which blocks most of other FS operations.
> 
> Where does it block?  Are you running out of request?  What driver is
> this on top of?

Ping?  I'm currently spending a lot of effort on fs and block dіscard
code, and I'd like to make sure we get common infrastructure instead
of local hacks.


Re: [PATCH 5/6] f2fs: add a kernel thread to issue discard commands asynchronously

2017-01-16 Thread Christoph Hellwig
On Fri, Jan 13, 2017 at 11:12:11AM -0800, Jaegeuk Kim wrote:
> Previously, I've done to issue discard bios asynchronously. But the problem 
> that
> I've got is that was not enough. When testing nvme SSD with noop IO scheduler,
> submit_bio() was blocked at every 8 async discard bios, resulting in very slow
> checkpoint process which blocks most of other FS operations.

Where does it block?  Are you running out of request?  What driver is
this on top of?


Re: [PATCH 5/6] f2fs: add a kernel thread to issue discard commands asynchronously

2017-01-13 Thread Jaegeuk Kim
On 01/13, Christoph Hellwig wrote:
> On Thu, Jan 12, 2017 at 02:44:06PM -0800, Jaegeuk Kim wrote:
> > This patch adds a kernel thread to issue discard commands.
> > It proposes three states, D_PREP, D_SUBMIT, and D_DONE to identify current
> > bio status.
> 
> Why?  Instead of creating the complexity of a thread you should look
> into issuing the discard bios asynchronously, which should help to
> simplify a lot of the discard logic in f2fs.

Previously, I've done to issue discard bios asynchronously. But the problem that
I've got is that was not enough. When testing nvme SSD with noop IO scheduler,
submit_bio() was blocked at every 8 async discard bios, resulting in very slow
checkpoint process which blocks most of other FS operations.

Thanks,


Re: [PATCH 5/6] f2fs: add a kernel thread to issue discard commands asynchronously

2017-01-13 Thread Christoph Hellwig
On Thu, Jan 12, 2017 at 02:44:06PM -0800, Jaegeuk Kim wrote:
> This patch adds a kernel thread to issue discard commands.
> It proposes three states, D_PREP, D_SUBMIT, and D_DONE to identify current
> bio status.

Why?  Instead of creating the complexity of a thread you should look
into issuing the discard bios asynchronously, which should help to
simplify a lot of the discard logic in f2fs.


[PATCH 5/6] f2fs: add a kernel thread to issue discard commands asynchronously

2017-01-12 Thread Jaegeuk Kim
This patch adds a kernel thread to issue discard commands.
It proposes three states, D_PREP, D_SUBMIT, and D_DONE to identify current
bio status.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/f2fs.h|  11 +
 fs/f2fs/segment.c | 128 +-
 2 files changed, 108 insertions(+), 31 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index e7b403cbd70f..ef5e3709c161 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -129,6 +129,7 @@ enum {
 };
 
 #define MAX_DISCARD_BLOCKS(sbi) (1 << (sbi)->log_blocks_per_seg)
+#define DISCARD_ISSUE_RATE 8
 #define DEF_CP_INTERVAL60  /* 60 secs */
 #define DEF_IDLE_INTERVAL  5   /* 5 secs */
 
@@ -177,18 +178,28 @@ struct discard_entry {
int len;/* # of consecutive blocks of the discard */
 };
 
+enum {
+   D_PREP,
+   D_SUBMIT,
+   D_DONE,
+};
+
 struct discard_cmd {
struct list_head list;  /* command list */
struct completion wait; /* compleation */
block_t lstart; /* logical start address */
block_t len;/* length */
struct bio *bio;/* bio */
+   int state;  /* state */
 };
 
 struct discard_cmd_control {
+   struct task_struct *f2fs_issue_discard; /* discard thread */
struct list_head discard_entry_list;/* 4KB discard entry list */
int nr_discards;/* # of discards in the list */
struct list_head discard_cmd_list;  /* discard cmd list */
+   wait_queue_head_t discard_wait_queue;   /* waiting queue for wake-up */
+   struct mutex cmd_lock;
int max_discards;   /* max. discards to be issued */
 };
 
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index e3bec31de961..74364006bfa6 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -628,7 +628,7 @@ static void locate_dirty_segment(struct f2fs_sb_info *sbi, 
unsigned int segno)
mutex_unlock(&dirty_i->seglist_lock);
 }
 
-static struct discard_cmd *__add_discard_cmd(struct f2fs_sb_info *sbi,
+static void __add_discard_cmd(struct f2fs_sb_info *sbi,
struct bio *bio, block_t lstart, block_t len)
 {
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
@@ -638,12 +638,30 @@ static struct discard_cmd *__add_discard_cmd(struct 
f2fs_sb_info *sbi,
dc = f2fs_kmem_cache_alloc(discard_cmd_slab, GFP_NOFS);
INIT_LIST_HEAD(&dc->list);
dc->bio = bio;
+   bio->bi_private = dc;
dc->lstart = lstart;
dc->len = len;
+   dc->state = D_PREP;
init_completion(&dc->wait);
+
+   mutex_lock(&dcc->cmd_lock);
list_add_tail(&dc->list, cmd_list);
+   mutex_unlock(&dcc->cmd_lock);
+}
+
+static void __remove_discard_cmd(struct f2fs_sb_info *sbi, struct discard_cmd 
*dc)
+{
+   int err = dc->bio->bi_error;
 
-   return dc;
+   if (err == -EOPNOTSUPP)
+   err = 0;
+
+   if (err)
+   f2fs_msg(sbi->sb, KERN_INFO,
+   "Issue discard failed, ret: %d", err);
+   bio_put(dc->bio);
+   list_del(&dc->list);
+   kmem_cache_free(discard_cmd_slab, dc);
 }
 
 /* This should be covered by global mutex, &sit_i->sentry_lock */
@@ -653,31 +671,28 @@ void f2fs_wait_discard_bio(struct f2fs_sb_info *sbi, 
block_t blkaddr)
struct list_head *wait_list = &(dcc->discard_cmd_list);
struct discard_cmd *dc, *tmp;
 
+   mutex_lock(&dcc->cmd_lock);
list_for_each_entry_safe(dc, tmp, wait_list, list) {
-   struct bio *bio = dc->bio;
-   int err;
 
-   if (!completion_done(&dc->wait)) {
-   if ((dc->lstart <= blkaddr &&
-   blkaddr < dc->lstart + dc->len) ||
-   blkaddr == NULL_ADDR)
+   if (blkaddr == NULL_ADDR) {
+   if (dc->state == D_PREP) {
+   dc->state = D_SUBMIT;
+   submit_bio(dc->bio);
+   }
+   wait_for_completion_io(&dc->wait);
+
+   __remove_discard_cmd(sbi, dc);
+   continue;
+   }
+
+   if (dc->lstart <= blkaddr && blkaddr < dc->lstart + dc->len) {
+   if (dc->state == D_SUBMIT)
wait_for_completion_io(&dc->wait);
else
-   continue;
+   __remove_discard_cmd(sbi, dc);
}
-
-   err = bio->bi_error;
-   if (err == -EOPNOTSUPP)
-   err = 0;
-
-   if (err)
-   f2fs_msg(sbi->sb, KERN_INFO,
-   "Issue discard failed, ret: %d", err);
-
-