Re: [PATCHSET v5] blk-mq: reimplement timeout handling

2018-01-13 Thread Ming Lei
On Sat, Jan 13, 2018 at 10:45:14PM +0800, Ming Lei wrote: > On Fri, Jan 12, 2018 at 04:55:34PM -0500, Laurence Oberman wrote: > > On Fri, 2018-01-12 at 20:57 +, Bart Van Assche wrote: > > > On Tue, 2018-01-09 at 08:29 -0800, Tejun Heo wrote: > > > > Currently, blk-mq timeout path synchronizes

[PATCH] blk-mq: don't clear RQF_MQ_INFLIGHT in blk_mq_rq_ctx_init()

2018-01-13 Thread Ming Lei
In case of no IO scheduler, RQF_MQ_INFLIGHT is set in blk_mq_rq_ctx_init(), but 7c3fb70f0341 clears it mistakenly, so fix it. This patch fixes systemd-udevd hang when starting multipathd service: [ 914.409660] systemd-journald[213]: Successfully sent stream file descriptor to service manager.

[PATCH v2 12/12] bcache: stop bcache device when backing device is offline

2018-01-13 Thread Coly Li
Currently bcache does not handle backing device failure, if backing device is offline and disconnected from system, its bcache device can still be accessible. If the bcache device is in writeback mode, I/O requests even can success if the requests hit on cache device. That is to say, when and how

[PATCH v2 12/12] bcache: stop bcache device when backing device is offline

2018-01-13 Thread Coly Li
Currently bcache does not handle backing device failure, if backing device is offline and disconnected from system, its bcache device can still be accessible. If the bcache device is in writeback mode, I/O requests even can success if the requests hit on cache device. That is to say, when and how

[PATCH v2 10/12] bcache: add backing_request_endio() for bi_end_io of attached backing device I/O

2018-01-13 Thread Coly Li
In order to catch I/O error of backing device, a separate bi_end_io call back is required. Then a per backing device counter can record I/O errors number and retire the backing device if the counter reaches a per backing device I/O error limit. This patch adds backing_request_endio() to bcache

[PATCH v2 06/12] bcache: set error_limit correctly

2018-01-13 Thread Coly Li
Struct cache uses io_errors for two purposes, - Error decay: when cache set error_decay is set, io_errors is used to generate a small piece of delay when I/O error happens. - I/O errors counter: in order to generate big enough value for error decay, I/O errors counter value is stored by left

[PATCH v2 07/12] bcache: add CACHE_SET_IO_DISABLE to struct cache_set flags

2018-01-13 Thread Coly Li
When too many I/Os failed on cache device, bch_cache_set_error() is called in the error handling code path to retire whole problematic cache set. If new I/O requests continue to come and take refcount dc->count, the cache set won't be retired immediately, this is a problem. Further more, there

[PATCH v2 08/12] bcache: stop all attached bcache devices for a retired cache set

2018-01-13 Thread Coly Li
When there are too many I/O errors on cache device, current bcache code will retire the whole cache set, and detach all bcache devices. But the detached bcache devices are not stopped, which is problematic when bcache is in writeback mode. If the retired cache set has dirty data of backing

[PATCH v2 09/12] bcache: fix inaccurate io state for detached bcache devices

2018-01-13 Thread Coly Li
From: Tang Junhui When we run IO in a detached device, and run iostat to shows IO status, normally it will show like bellow (Omitted some fields): Device: ... avgrq-sz avgqu-sz await r_await w_await svctm %util sdd... 15.89 0.531.820.202.23

[PATCH v2 05/12] bcache: stop dc->writeback_rate_update properly

2018-01-13 Thread Coly Li
struct delayed_work writeback_rate_update in struct cache_dev is a delayed worker to call function update_writeback_rate() in period (the interval is defined by dc->writeback_rate_update_seconds). When a metadate I/O error happens on cache device, bcache error handling routine

[PATCH v2 04/12] bcache: fix cached_dev->count usage for bch_cache_set_error()

2018-01-13 Thread Coly Li
When bcache metadata I/O fails, bcache will call bch_cache_set_error() to retire the whole cache set. The expected behavior to retire a cache set is to unregister the cache set, and unregister all backing device attached to this cache set, then remove sysfs entries of the cache set and all

[PATCH v2 03/12] bcache: set task properly in allocator_wait()

2018-01-13 Thread Coly Li
Kernel thread routine bch_allocator_thread() references macro allocator_wait() to wait for a condition or quit to do_exit() when kthread_should_stop() is true. Here is the code block, 284 while (1) { \ 285

[PATCH v2 02/12] bcache: properly set task state in bch_writeback_thread()

2018-01-13 Thread Coly Li
Kernel thread routine bch_writeback_thread() has the following code block, 447 down_write(>writeback_lock); 448~450 if (check conditions) { 451 up_write(>writeback_lock); 452 set_current_state(TASK_INTERRUPTIBLE); 453 454 if

[PATCH v2 00/12] bcache: device failure handling improvement

2018-01-13 Thread Coly Li
Hi maintainers and folks, This patch set tries to improve bcache device failure handling, including cache device and backing device failures. The basic idea to handle failed cache device is, - Unregister cache set - Detach all backing devices attached to this cache set - Stop all bcache devices

[PATCH v2 01/12] bcache: set writeback_rate_update_seconds in range [1, 60] seconds

2018-01-13 Thread Coly Li
dc->writeback_rate_update_seconds can be set via sysfs and its value can be set to [1, ULONG_MAX]. It does not make sense to set such a large value, 60 seconds is long enough value considering the default 5 seconds works well for long time. Because dc->writeback_rate_update is a special delayed

[PATCH v2 06/12] bcache: set error_limit correctly

2018-01-13 Thread Coly Li
Struct cache uses io_errors for two purposes, - Error decay: when cache set error_decay is set, io_errors is used to generate a small piece of delay when I/O error happens. - I/O errors counter: in order to generate big enough value for error decay, I/O errors counter value is stored by left

[PATCH v2 05/12] bcache: stop dc->writeback_rate_update properly

2018-01-13 Thread Coly Li
struct delayed_work writeback_rate_update in struct cache_dev is a delayed worker to call function update_writeback_rate() in period (the interval is defined by dc->writeback_rate_update_seconds). When a metadate I/O error happens on cache device, bcache error handling routine

[PATCH v2 00/12] bcache: device failure handling improvement

2018-01-13 Thread Coly Li
Hi maintainers and folks, This patch set tries to improve bcache device failure handling, including cache device and backing device failures. The basic idea to handle failed cache device is, - Unregister cache set - Detach all backing devices attached to this cache set - Stop all bcache devices

[PATCH v2 01/12] bcache: set writeback_rate_update_seconds in range [1, 60] seconds

2018-01-13 Thread Coly Li
dc->writeback_rate_update_seconds can be set via sysfs and its value can be set to [1, ULONG_MAX]. It does not make sense to set such a large value, 60 seconds is long enough value considering the default 5 seconds works well for long time. Because dc->writeback_rate_update is a special delayed

[PATCH v2 02/12] bcache: properly set task state in bch_writeback_thread()

2018-01-13 Thread Coly Li
Kernel thread routine bch_writeback_thread() has the following code block, 447 down_write(>writeback_lock); 448~450 if (check conditions) { 451 up_write(>writeback_lock); 452 set_current_state(TASK_INTERRUPTIBLE); 453 454 if

[PATCH v2 04/12] bcache: fix cached_dev->count usage for bch_cache_set_error()

2018-01-13 Thread Coly Li
When bcache metadata I/O fails, bcache will call bch_cache_set_error() to retire the whole cache set. The expected behavior to retire a cache set is to unregister the cache set, and unregister all backing device attached to this cache set, then remove sysfs entries of the cache set and all

[PATCH v2 03/12] bcache: set task properly in allocator_wait()

2018-01-13 Thread Coly Li
Kernel thread routine bch_allocator_thread() references macro allocator_wait() to wait for a condition or quit to do_exit() when kthread_should_stop() is true. Here is the code block, 284 while (1) { \ 285

Re: [PATCH v2 3/4] scsi: Avoid that .queuecommand() gets called for a quiesced SCSI device

2018-01-13 Thread Ming Lei
On Fri, Jan 12, 2018 at 10:45:57PM +, Bart Van Assche wrote: > On Thu, 2018-01-11 at 10:23 +0800, Ming Lei wrote: > > > not sufficient to prevent .queuecommand() calls from scsi_send_eh_cmnd(). > > > > Given it is error handling, do we need to prevent the .queuecommand() call > > in

Re: [PATCH v2 4/4] IB/srp: Fix a sleep-in-invalid-context bug

2018-01-13 Thread Ming Lei
On Wed, Jan 10, 2018 at 10:18:17AM -0800, Bart Van Assche wrote: > The previous two patches guarantee that srp_queuecommand() does not get > invoked while reconnecting occurs. Hence remove the code from > srp_queuecommand() that prevents command queueing while reconnecting. > This patch avoids

Re: [PATCH V3 0/5] dm-rq: improve sequential I/O performance

2018-01-13 Thread Mike Snitzer
On Fri, Jan 12 2018 at 8:37pm -0500, Mike Snitzer wrote: > On Fri, Jan 12 2018 at 8:00pm -0500, > Bart Van Assche wrote: > > > On Fri, 2018-01-12 at 19:52 -0500, Mike Snitzer wrote: > > > It was 50 ms before it was 100 ms. No real explaination for

Re: [PATCH V3 0/5] dm-rq: improve sequential I/O performance

2018-01-13 Thread Mike Snitzer
On Sat, Jan 13 2018 at 10:04am -0500, Ming Lei wrote: > On Fri, Jan 12, 2018 at 05:31:17PM -0500, Mike Snitzer wrote: > > > > Ming or Jens: might you be able to shed some light on how dm-mpath > > would/could set BLK_MQ_S_SCHED_RESTART? A new function added that can > >

Re: [PATCH V3 0/5] dm-rq: improve sequential I/O performance

2018-01-13 Thread Ming Lei
On Fri, Jan 12, 2018 at 05:31:17PM -0500, Mike Snitzer wrote: > On Fri, Jan 12 2018 at 1:54pm -0500, > Bart Van Assche wrote: > > > On Fri, 2018-01-12 at 13:06 -0500, Mike Snitzer wrote: > > > OK, you have the stage: please give me a pointer to your best > > >

Re: [PATCHSET v5] blk-mq: reimplement timeout handling

2018-01-13 Thread Ming Lei
On Fri, Jan 12, 2018 at 04:55:34PM -0500, Laurence Oberman wrote: > On Fri, 2018-01-12 at 20:57 +, Bart Van Assche wrote: > > On Tue, 2018-01-09 at 08:29 -0800, Tejun Heo wrote: > > > Currently, blk-mq timeout path synchronizes against the usual > > > issue/completion path using a complex

Re: [PATCH V3 0/5] dm-rq: improve sequential I/O performance

2018-01-13 Thread Ming Lei
On Fri, Jan 12, 2018 at 06:54:49PM +, Bart Van Assche wrote: > On Fri, 2018-01-12 at 13:06 -0500, Mike Snitzer wrote: > > OK, you have the stage: please give me a pointer to your best > > explaination of the several. > > Since the previous discussion about this topic occurred more than a

Re: [PATCH BUGFIX/IMPROVEMENT 0/2] block, bfq: two pending patches

2018-01-13 Thread Oleksandr Natalenko
Hi. 13.01.2018 12:05, Paolo Valente wrote: Hi Jens, here are again the two pending patches you asked me to resend [1]. One of them, fixing read-starvation problems, was accompanied by a cover letter. I'm pasting the content of that cover letter below. The patch addresses (serious) starvation

Re: [for-4.16 PATCH v6 2/4] block: properly protect the 'queue' kobj in blk_unregister_queue

2018-01-13 Thread Ming Lei
On Fri, Jan 12, 2018 at 11:03:52AM -0500, Mike Snitzer wrote: > The original commit e9a823fb34a8b (block: fix warning when I/O elevator > is changed as request_queue is being removed) is pretty conflated. > "conflated" because the resource being protected by q->sysfs_lock isn't > the queue_flags

Re: [PATCHSET v5] blk-mq: reimplement timeout handling

2018-01-13 Thread Ming Lei
On Fri, Jan 12, 2018 at 04:55:34PM -0500, Laurence Oberman wrote: > On Fri, 2018-01-12 at 20:57 +, Bart Van Assche wrote: > > On Tue, 2018-01-09 at 08:29 -0800, Tejun Heo wrote: > > > Currently, blk-mq timeout path synchronizes against the usual > > > issue/completion path using a complex

[PATCH BUGFIX/IMPROVEMENT 1/2] block, bfq: limit tags for writes and async I/O

2018-01-13 Thread Paolo Valente
Asynchronous I/O can easily starve synchronous I/O (both sync reads and sync writes), by consuming all request tags. Similarly, storms of synchronous writes, such as those that sync(2) may trigger, can starve synchronous reads. In their turn, these two problems may also cause BFQ to loose control

[PATCH BUGFIX/IMPROVEMENT 2/2] block, bfq: limit sectors served with interactive weight raising

2018-01-13 Thread Paolo Valente
To maximise responsiveness, BFQ raises the weight, and performs device idling, for bfq_queues associated with processes deemed as interactive. In particular, weight raising has a maximum duration, equal to the time needed to start a large application. If a weight-raised process goes on doing I/O

[PATCH BUGFIX/IMPROVEMENT 0/2] block, bfq: two pending patches

2018-01-13 Thread Paolo Valente
Hi Jens, here are again the two pending patches you asked me to resend [1]. One of them, fixing read-starvation problems, was accompanied by a cover letter. I'm pasting the content of that cover letter below. The patch addresses (serious) starvation problems caused by request-tag exhaustion, as

Re: [PATCH v2] delayacct: Account blkio completion on the correct task

2018-01-13 Thread Balbir Singh
On Mon, Dec 18, 2017 at 9:45 PM, Josh Snyder wrote: > Before commit e33a9bba85a8 ("sched/core: move IO scheduling accounting from > io_schedule_timeout() into scheduler"), delayacct_blkio_end was called after > context-switching into the task which completed I/O. This resulted