Re: [PATCH 1/5] bcache: don't write back data if reading it failed

2017-09-27 Thread Michael Lyle
Ah-- re #1 -- I was investigating earlier why not as much was combined
as I thought should be when idle.  This is surely a factor.  Thanks
for the catch-- KEY_OFFSET is correct.  I will fix and retest.

(Under heavy load, the correct thing still happens, but not under
light or intermediate load0.

About #2-- I wanted to attain a bounded amount of "combining" of
operations.  If we have 5 4k extents in a row to dispatch, it seems
really wasteful to issue them as 5 IOs 60ms apart, which the existing
code would be willing to do-- I'd rather do a 20k write IO (basically
the same cost as a 4k write IO) and then sleep 300ms.  It is dependent
on the elevator/IO scheduler merging the requests.  At the same time,
I'd rather not combine a really large request.

It would be really neat to blk_plug the backing device during the
write issuance, but that requires further work.

Thanks

Mike

On Tue, Sep 26, 2017 at 11:51 PM,   wrote:
> From: Tang Junhui 
>
> Hello Lyle:
>
> Two questions:
> 1) In keys_contiguous(), you judge I/O contiguous in cache device, but not
> in backing device. I think you should judge it by backing device (remove
> PTR_CACHE() and use KEY_OFFSET() instead of PTR_OFFSET()?).
>
> 2) I did not see you combine samll contiguous I/Os to big I/O, so I think
> it is useful when writeback rate was low by avoiding single I/O write, but
> have no sense in high writeback rate, since previously it is also write
> I/Os asynchronously.
>
> ---
> Tang Junhui
>
>> Previously, there was some logic that attempted to immediately issue
>> writeback of backing-contiguous blocks when the writeback rate was
>> fast.
>>
>> The previous logic did not have any limits on the aggregate size it
>> would issue, nor the number of keys it would combine at once.  It
>> would also discard the chance to do a contiguous write when the
>> writeback rate was low-- e.g. at "background" writeback of target
>> rate = 8, it would not combine two adjacent 4k writes and would
>> instead seek the disk twice.
>>
>> This patch imposes limits and explicitly understands the size of
>> contiguous I/O during issue.  It also will combine contiguous I/O
>> in all circumstances, not just when writeback is requested to be
>> relatively fast.
>>
>> It is a win on its own, but also lays the groundwork for skip writes to
>> short keys to make the I/O more sequential/contiguous.
>>
>> Signed-off-by: Michael Lyle 
>> ---
>>  drivers/md/bcache/bcache.h|   6 --
>>  drivers/md/bcache/writeback.c | 131 
>> ++
>>  2 files changed, 93 insertions(+), 44 deletions(-)
>>
>> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
>> index eb83be693d60..da803a3b1981 100644
>> --- a/drivers/md/bcache/bcache.h
>> +++ b/drivers/md/bcache/bcache.h
>> @@ -321,12 +321,6 @@ struct cached_dev {
>>struct bch_ratelimitwriteback_rate;
>>struct delayed_work writeback_rate_update;
>>
>> -  /*
>> -   * Internal to the writeback code, so read_dirty() can keep 
>> track of
>> -   * where it's at.
>> -   */
>> -  sector_tlast_read;
>> -
>>/* Limit number of writeback bios in flight */
>>struct semaphorein_flight;
>>struct task_struct  *writeback_thread;
>> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
>> index 0b7c89813635..cf29c44605b9 100644
>> --- a/drivers/md/bcache/writeback.c
>> +++ b/drivers/md/bcache/writeback.c
>> @@ -229,10 +229,26 @@ static void read_dirty_submit(struct closure *cl)
>>continue_at(cl, write_dirty, io->dc->writeback_write_wq);
>>  }
>>
>> +static inline bool keys_contiguous(struct cached_dev *dc,
>> +  struct keybuf_key *first, struct keybuf_key 
>> *second)
>> +{
>> +  if (PTR_CACHE(dc->disk.c, >key, 0)->bdev !=
>> +  PTR_CACHE(dc->disk.c, 
>> >key, 0)->bdev)
>> +  return false;
>> +
>> +  if (PTR_OFFSET(>key, 0) !=
>> +  (PTR_OFFSET(>key, 0) + 
>> KEY_SIZE(>key)))
>> +  return false;
>> +
>> +  return true;
>> +}
>> +
>>  static void read_dirty(struct cached_dev *dc)
>>  {
>>unsigned delay = 0;
>> -  struct keybuf_key *w;
>> +  struct keybuf_key *next, *keys[5], *w;
>> +  size_t size;
>> +  int nk, i;
>>struct dirty_io *io;
>>struct closure cl;
>>
>> @@ -243,45 +259,84 @@ static void read_dirty(struct cached_dev *dc)
>> * mempools.
>> */
>>
>> -  while (!kthread_should_stop()) {
>> -
>> -  w = 

Re: [PATCH 1/5] bcache: don't write back data if reading it failed

2017-09-27 Thread tang . junhui
From: Tang Junhui 

Hello Lyle:

Two questions:
1) In keys_contiguous(), you judge I/O contiguous in cache device, but not
in backing device. I think you should judge it by backing device (remove 
PTR_CACHE() and use KEY_OFFSET() instead of PTR_OFFSET()?).

2) I did not see you combine samll contiguous I/Os to big I/O, so I think
it is useful when writeback rate was low by avoiding single I/O write, but
have no sense in high writeback rate, since previously it is also write 
I/Os asynchronously.

---
Tang Junhui

> Previously, there was some logic that attempted to immediately issue
> writeback of backing-contiguous blocks when the writeback rate was
> fast.
> 
> The previous logic did not have any limits on the aggregate size it
> would issue, nor the number of keys it would combine at once.  It
> would also discard the chance to do a contiguous write when the
> writeback rate was low-- e.g. at "background" writeback of target
> rate = 8, it would not combine two adjacent 4k writes and would
> instead seek the disk twice.
> 
> This patch imposes limits and explicitly understands the size of
> contiguous I/O during issue.  It also will combine contiguous I/O
> in all circumstances, not just when writeback is requested to be
> relatively fast.
> 
> It is a win on its own, but also lays the groundwork for skip writes to
> short keys to make the I/O more sequential/contiguous.
> 
> Signed-off-by: Michael Lyle 
> ---
>  drivers/md/bcache/bcache.h|   6 --
>  drivers/md/bcache/writeback.c | 131 
> ++
>  2 files changed, 93 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> index eb83be693d60..da803a3b1981 100644
> --- a/drivers/md/bcache/bcache.h
> +++ b/drivers/md/bcache/bcache.h
> @@ -321,12 +321,6 @@ struct cached_dev {
>struct bch_ratelimitwriteback_rate;
>struct delayed_work writeback_rate_update;
>  
> -  /*
> -   * Internal to the writeback code, so read_dirty() can keep 
> track of
> -   * where it's at.
> -   */
> -  sector_tlast_read;
> -
>/* Limit number of writeback bios in flight */
>struct semaphorein_flight;
>struct task_struct  *writeback_thread;
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index 0b7c89813635..cf29c44605b9 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -229,10 +229,26 @@ static void read_dirty_submit(struct closure *cl)
>continue_at(cl, write_dirty, io->dc->writeback_write_wq);
>  }
>  
> +static inline bool keys_contiguous(struct cached_dev *dc,
> +  struct keybuf_key *first, struct keybuf_key 
> *second)
> +{
> +  if (PTR_CACHE(dc->disk.c, >key, 0)->bdev !=
> +  PTR_CACHE(dc->disk.c, 
> >key, 0)->bdev)
> +  return false;
> +
> +  if (PTR_OFFSET(>key, 0) !=
> +  (PTR_OFFSET(>key, 0) + 
> KEY_SIZE(>key)))
> +  return false;
> +
> +  return true;
> +}
> +
>  static void read_dirty(struct cached_dev *dc)
>  {
>unsigned delay = 0;
> -  struct keybuf_key *w;
> +  struct keybuf_key *next, *keys[5], *w;
> +  size_t size;
> +  int nk, i;
>struct dirty_io *io;
>struct closure cl;
>  
> @@ -243,45 +259,84 @@ static void read_dirty(struct cached_dev *dc)
> * mempools.
> */
>  
> -  while (!kthread_should_stop()) {
> -
> -  w = bch_keybuf_next(>writeback_keys);
> -  if (!w)
> -  break;
> -
> -  BUG_ON(ptr_stale(dc->disk.c, >key, 0));
> -
> -  if (KEY_START(>key) != dc->last_read ||
> -  jiffies_to_msecs(delay) > 50)
> -  while (!kthread_should_stop() 
> && delay)
> -  delay = 
> schedule_timeout_interruptible(delay);
> -
> -  dc->last_read   = KEY_OFFSET(>key);
> -
> -  io = kzalloc(sizeof(struct dirty_io) + 
> sizeof(struct bio_vec)
> -   * 
> DIV_ROUND_UP(KEY_SIZE(>key), PAGE_SECTORS),
> -   GFP_KERNEL);
> -  if (!io)
> -  goto err;
> -
> -  

Re: [PATCH 1/5] bcache: don't write back data if reading it failed

2017-09-26 Thread tang . junhui
From: Tang Junhui 

It looks good to me,
I have noted this issue before.
Thanks.

---
Tang Junhui

> If an IO operation fails, and we didn't successfully read data from the
> cache, don't writeback invalid/partial data to the backing disk.
> 
> Signed-off-by: Michael Lyle 
> ---
>  drivers/md/bcache/writeback.c | 19 +--
>  1 file changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index e663ca082183..eea49bf36401 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -179,13 +179,20 @@ static void write_dirty(struct closure *cl)
>struct dirty_io *io = container_of(cl, struct dirty_io, cl);
>struct keybuf_key *w = io->bio.bi_private;
>  
> -  dirty_init(w);
> -  bio_set_op_attrs(>bio, REQ_OP_WRITE, 0);
> -  io->bio.bi_iter.bi_sector = KEY_START(>key);
> -  bio_set_dev(>bio, io->dc->bdev);
> -  io->bio.bi_end_io   = dirty_endio;
> +  /* IO errors are signalled using the dirty bit on the key.
> +   * If we failed to read, we should not attempt to write to the
> +   * backing device.  Instead, immediately go to 
> write_dirty_finish
> +   * to clean up.
> +   */
> +  if (KEY_DIRTY(>key)) {
> +  dirty_init(w);
> +  bio_set_op_attrs(>bio, REQ_OP_WRITE, 0);
> +  io->bio.bi_iter.bi_sector = KEY_START(>key);
> +  bio_set_dev(>bio, io->dc->bdev);
> +  io->bio.bi_end_io   = dirty_endio;
>  
> -  closure_bio_submit(>bio, cl);
> +  closure_bio_submit(>bio, cl);
> +  }
>  
>continue_at(cl, write_dirty_finish, 
> io->dc->writeback_write_wq);
>  }
> --