Re: [PATCH 1/5] bcache: don't write back data if reading it failed
Ah-- re #1 -- I was investigating earlier why not as much was combined as I thought should be when idle. This is surely a factor. Thanks for the catch-- KEY_OFFSET is correct. I will fix and retest. (Under heavy load, the correct thing still happens, but not under light or intermediate load0. About #2-- I wanted to attain a bounded amount of "combining" of operations. If we have 5 4k extents in a row to dispatch, it seems really wasteful to issue them as 5 IOs 60ms apart, which the existing code would be willing to do-- I'd rather do a 20k write IO (basically the same cost as a 4k write IO) and then sleep 300ms. It is dependent on the elevator/IO scheduler merging the requests. At the same time, I'd rather not combine a really large request. It would be really neat to blk_plug the backing device during the write issuance, but that requires further work. Thanks Mike On Tue, Sep 26, 2017 at 11:51 PM,wrote: > From: Tang Junhui > > Hello Lyle: > > Two questions: > 1) In keys_contiguous(), you judge I/O contiguous in cache device, but not > in backing device. I think you should judge it by backing device (remove > PTR_CACHE() and use KEY_OFFSET() instead of PTR_OFFSET()?). > > 2) I did not see you combine samll contiguous I/Os to big I/O, so I think > it is useful when writeback rate was low by avoiding single I/O write, but > have no sense in high writeback rate, since previously it is also write > I/Os asynchronously. > > --- > Tang Junhui > >> Previously, there was some logic that attempted to immediately issue >> writeback of backing-contiguous blocks when the writeback rate was >> fast. >> >> The previous logic did not have any limits on the aggregate size it >> would issue, nor the number of keys it would combine at once. It >> would also discard the chance to do a contiguous write when the >> writeback rate was low-- e.g. at "background" writeback of target >> rate = 8, it would not combine two adjacent 4k writes and would >> instead seek the disk twice. >> >> This patch imposes limits and explicitly understands the size of >> contiguous I/O during issue. It also will combine contiguous I/O >> in all circumstances, not just when writeback is requested to be >> relatively fast. >> >> It is a win on its own, but also lays the groundwork for skip writes to >> short keys to make the I/O more sequential/contiguous. >> >> Signed-off-by: Michael Lyle >> --- >> drivers/md/bcache/bcache.h| 6 -- >> drivers/md/bcache/writeback.c | 131 >> ++ >> 2 files changed, 93 insertions(+), 44 deletions(-) >> >> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h >> index eb83be693d60..da803a3b1981 100644 >> --- a/drivers/md/bcache/bcache.h >> +++ b/drivers/md/bcache/bcache.h >> @@ -321,12 +321,6 @@ struct cached_dev { >>struct bch_ratelimitwriteback_rate; >>struct delayed_work writeback_rate_update; >> >> - /* >> - * Internal to the writeback code, so read_dirty() can keep >> track of >> - * where it's at. >> - */ >> - sector_tlast_read; >> - >>/* Limit number of writeback bios in flight */ >>struct semaphorein_flight; >>struct task_struct *writeback_thread; >> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c >> index 0b7c89813635..cf29c44605b9 100644 >> --- a/drivers/md/bcache/writeback.c >> +++ b/drivers/md/bcache/writeback.c >> @@ -229,10 +229,26 @@ static void read_dirty_submit(struct closure *cl) >>continue_at(cl, write_dirty, io->dc->writeback_write_wq); >> } >> >> +static inline bool keys_contiguous(struct cached_dev *dc, >> + struct keybuf_key *first, struct keybuf_key >> *second) >> +{ >> + if (PTR_CACHE(dc->disk.c, >key, 0)->bdev != >> + PTR_CACHE(dc->disk.c, >> >key, 0)->bdev) >> + return false; >> + >> + if (PTR_OFFSET(>key, 0) != >> + (PTR_OFFSET(>key, 0) + >> KEY_SIZE(>key))) >> + return false; >> + >> + return true; >> +} >> + >> static void read_dirty(struct cached_dev *dc) >> { >>unsigned delay = 0; >> - struct keybuf_key *w; >> + struct keybuf_key *next, *keys[5], *w; >> + size_t size; >> + int nk, i; >>struct dirty_io *io; >>struct closure cl; >> >> @@ -243,45 +259,84 @@ static void read_dirty(struct cached_dev *dc) >> * mempools. >> */ >> >> - while (!kthread_should_stop()) { >> - >> - w =
Re: [PATCH 1/5] bcache: don't write back data if reading it failed
From: Tang JunhuiHello Lyle: Two questions: 1) In keys_contiguous(), you judge I/O contiguous in cache device, but not in backing device. I think you should judge it by backing device (remove PTR_CACHE() and use KEY_OFFSET() instead of PTR_OFFSET()?). 2) I did not see you combine samll contiguous I/Os to big I/O, so I think it is useful when writeback rate was low by avoiding single I/O write, but have no sense in high writeback rate, since previously it is also write I/Os asynchronously. --- Tang Junhui > Previously, there was some logic that attempted to immediately issue > writeback of backing-contiguous blocks when the writeback rate was > fast. > > The previous logic did not have any limits on the aggregate size it > would issue, nor the number of keys it would combine at once. It > would also discard the chance to do a contiguous write when the > writeback rate was low-- e.g. at "background" writeback of target > rate = 8, it would not combine two adjacent 4k writes and would > instead seek the disk twice. > > This patch imposes limits and explicitly understands the size of > contiguous I/O during issue. It also will combine contiguous I/O > in all circumstances, not just when writeback is requested to be > relatively fast. > > It is a win on its own, but also lays the groundwork for skip writes to > short keys to make the I/O more sequential/contiguous. > > Signed-off-by: Michael Lyle > --- > drivers/md/bcache/bcache.h| 6 -- > drivers/md/bcache/writeback.c | 131 > ++ > 2 files changed, 93 insertions(+), 44 deletions(-) > > diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h > index eb83be693d60..da803a3b1981 100644 > --- a/drivers/md/bcache/bcache.h > +++ b/drivers/md/bcache/bcache.h > @@ -321,12 +321,6 @@ struct cached_dev { >struct bch_ratelimitwriteback_rate; >struct delayed_work writeback_rate_update; > > - /* > - * Internal to the writeback code, so read_dirty() can keep > track of > - * where it's at. > - */ > - sector_tlast_read; > - >/* Limit number of writeback bios in flight */ >struct semaphorein_flight; >struct task_struct *writeback_thread; > diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c > index 0b7c89813635..cf29c44605b9 100644 > --- a/drivers/md/bcache/writeback.c > +++ b/drivers/md/bcache/writeback.c > @@ -229,10 +229,26 @@ static void read_dirty_submit(struct closure *cl) >continue_at(cl, write_dirty, io->dc->writeback_write_wq); > } > > +static inline bool keys_contiguous(struct cached_dev *dc, > + struct keybuf_key *first, struct keybuf_key > *second) > +{ > + if (PTR_CACHE(dc->disk.c, >key, 0)->bdev != > + PTR_CACHE(dc->disk.c, > >key, 0)->bdev) > + return false; > + > + if (PTR_OFFSET(>key, 0) != > + (PTR_OFFSET(>key, 0) + > KEY_SIZE(>key))) > + return false; > + > + return true; > +} > + > static void read_dirty(struct cached_dev *dc) > { >unsigned delay = 0; > - struct keybuf_key *w; > + struct keybuf_key *next, *keys[5], *w; > + size_t size; > + int nk, i; >struct dirty_io *io; >struct closure cl; > > @@ -243,45 +259,84 @@ static void read_dirty(struct cached_dev *dc) > * mempools. > */ > > - while (!kthread_should_stop()) { > - > - w = bch_keybuf_next(>writeback_keys); > - if (!w) > - break; > - > - BUG_ON(ptr_stale(dc->disk.c, >key, 0)); > - > - if (KEY_START(>key) != dc->last_read || > - jiffies_to_msecs(delay) > 50) > - while (!kthread_should_stop() > && delay) > - delay = > schedule_timeout_interruptible(delay); > - > - dc->last_read = KEY_OFFSET(>key); > - > - io = kzalloc(sizeof(struct dirty_io) + > sizeof(struct bio_vec) > - * > DIV_ROUND_UP(KEY_SIZE(>key), PAGE_SECTORS), > - GFP_KERNEL); > - if (!io) > - goto err; > - > -
Re: [PATCH 1/5] bcache: don't write back data if reading it failed
From: Tang JunhuiIt looks good to me, I have noted this issue before. Thanks. --- Tang Junhui > If an IO operation fails, and we didn't successfully read data from the > cache, don't writeback invalid/partial data to the backing disk. > > Signed-off-by: Michael Lyle > --- > drivers/md/bcache/writeback.c | 19 +-- > 1 file changed, 13 insertions(+), 6 deletions(-) > > diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c > index e663ca082183..eea49bf36401 100644 > --- a/drivers/md/bcache/writeback.c > +++ b/drivers/md/bcache/writeback.c > @@ -179,13 +179,20 @@ static void write_dirty(struct closure *cl) >struct dirty_io *io = container_of(cl, struct dirty_io, cl); >struct keybuf_key *w = io->bio.bi_private; > > - dirty_init(w); > - bio_set_op_attrs(>bio, REQ_OP_WRITE, 0); > - io->bio.bi_iter.bi_sector = KEY_START(>key); > - bio_set_dev(>bio, io->dc->bdev); > - io->bio.bi_end_io = dirty_endio; > + /* IO errors are signalled using the dirty bit on the key. > + * If we failed to read, we should not attempt to write to the > + * backing device. Instead, immediately go to > write_dirty_finish > + * to clean up. > + */ > + if (KEY_DIRTY(>key)) { > + dirty_init(w); > + bio_set_op_attrs(>bio, REQ_OP_WRITE, 0); > + io->bio.bi_iter.bi_sector = KEY_START(>key); > + bio_set_dev(>bio, io->dc->bdev); > + io->bio.bi_end_io = dirty_endio; > > - closure_bio_submit(>bio, cl); > + closure_bio_submit(>bio, cl); > + } > >continue_at(cl, write_dirty_finish, > io->dc->writeback_write_wq); > } > --