Re: [PATCH 2/8] blk-mq: protect completion path with RCU

2018-01-08 Thread Holger Hoffstätte
On 01/09/18 00:27, Holger Hoffstätte wrote: > On 01/08/18 23:55, Jens Axboe wrote: >> the good old >> >> int srcu_idx = srcu_idx; >> >> should get the job done. > > (Narrator: It didn't.) Narrator: we retract our previous statement and apologize for the confusion. It works fine when you

Re: [RFC PATCH] blk-throttle: dispatch more sync writes in block throttle layer

2018-01-08 Thread xuejiufei
Hi Tejun, Thanks for your reply. On 2018/1/8 下午8:07, Tejun Heo wrote: > Hello, > > On Fri, Jan 05, 2018 at 01:16:26PM +0800, xuejiufei wrote: >> From: Jiufei Xue >> >> Cgroup writeback is supported since v4.2. But there exists a problem >> in the following case. >>

Re: [PATCH 1/4] blk-mq: Rename request_queue.mq_freeze_wq into mq_wq

2018-01-08 Thread Hannes Reinecke
On 01/08/2018 07:50 PM, Bart Van Assche wrote: > Rename a waitqueue in struct request_queue since the next patch will > add code that uses this waitqueue outside the request queue freezing > implementation. > > Signed-off-by: Bart Van Assche > Cc: Christoph Hellwig

Re: [PATCH 2/4] block: Introduce blk_wait_if_quiesced() and blk_finish_wait_if_quiesced()

2018-01-08 Thread Hannes Reinecke
On 01/08/2018 07:50 PM, Bart Van Assche wrote: > Introduce functions that allow block drivers to wait while a request > queue is in the quiesced state (blk-mq) or in the stopped state (legacy > block layer). The next patch will add calls to these functions in the > SCSI core. > > Signed-off-by:

Re: [PATCH 3/4] scsi: Avoid that .queuecommand() gets called for a quiesced SCSI device

2018-01-08 Thread Hannes Reinecke
On 01/08/2018 07:50 PM, Bart Van Assche wrote: > Several SCSI transport and LLD drivers surround code that does not > tolerate concurrent calls of .queuecommand() with scsi_target_block() / > scsi_target_unblock(). These last two functions use > blk_mq_quiesce_queue() / blk_mq_unquiesce_queue()

Re: [PATCH 4/4] IB/srp: Fix a sleep-in-invalid-context bug

2018-01-08 Thread Hannes Reinecke
On 01/08/2018 07:50 PM, Bart Van Assche wrote: > The previous two patches guarantee that srp_queuecommand() does not get > invoked while reconnecting occurs. Hence remove the code from > srp_queuecommand() that prevents command queueing while reconnecting. > This patch avoids that the following

Re: [PATCH 2/8] blk-mq: protect completion path with RCU

2018-01-08 Thread Hannes Reinecke
On 01/08/2018 08:15 PM, Tejun Heo wrote: > Currently, blk-mq protects only the issue path with RCU. This patch > puts the completion path under the same RCU protection. This will be > used to synchronize issue/completion against timeout by later patches, > which will also add the comments. > >

Re: [PATCH] block: Fix kernel-doc warnings reported when building with W=1

2018-01-08 Thread Johannes Thumshirn
Looks good, Reviewed-by: Johannes Thumshirn -- Johannes Thumshirn Storage jthumsh...@suse.de+49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham

Re: [PATCH 3/8] blk-mq: replace timeout synchronization with a RCU and generation based scheme

2018-01-08 Thread Bart Van Assche
On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote: > @@ -230,6 +232,27 @@ struct request { > > unsigned short write_hint; > > + /* > + * On blk-mq, the lower bits of ->gstate carry the MQ_RQ_* state > + * value and the upper bits the generation number which is > + *

Re: [PATCH 1/5] nvme: Add more command status translation

2018-01-08 Thread Hannes Reinecke
On 01/04/2018 11:46 PM, Keith Busch wrote: > This adds more NVMe status code translations to blk_status_t values, > and captures all the current status codes NVMe multipath uses. > > Signed-off-by: Keith Busch > --- > drivers/nvme/host/core.c | 6 ++ > 1 file changed,

Re: [PATCH 0/5] Failover criteria unification

2018-01-08 Thread Christoph Hellwig
On Thu, Jan 04, 2018 at 04:47:27PM -0700, Keith Busch wrote: > It looks like you can also touch up dm to allow it to multipath nvme > even if CONFIG_NVME_MULTIPATH is set. It may be useful since native NVMe > doesn't multipath namespaces across subsystems, and some crack smoking > people want to

Re: [PATCH 2/5] nvme/multipath: Consult blk_status_t for failover

2018-01-08 Thread Christoph Hellwig
> - if (unlikely(nvme_req(req)->status && nvme_req_needs_retry(req))) { > - if (nvme_req_needs_failover(req)) { > + blk_status_t status = nvme_error_status(req); > + > + if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) { > + if

Re: [PATCH 1/5] nvme: Add more command status translation

2018-01-08 Thread Hannes Reinecke
On 01/08/2018 10:55 AM, Christoph Hellwig wrote: > On Thu, Jan 04, 2018 at 03:46:19PM -0700, Keith Busch wrote: >> This adds more NVMe status code translations to blk_status_t values, >> and captures all the current status codes NVMe multipath uses. >> >> Signed-off-by: Keith Busch

Re: [PATCH 5/5] dm mpath: Use blk_retryable

2018-01-08 Thread Hannes Reinecke
On 01/04/2018 11:46 PM, Keith Busch wrote: > Uses common code for determining if an error should be retried on > alternate path. > > Signed-off-by: Keith Busch > --- > drivers/md/dm-mpath.c | 19 ++- > 1 file changed, 2 insertions(+), 17 deletions(-) >

Re: [PATCH 4/5] nvme/multipath: Use blk_retryable

2018-01-08 Thread Hannes Reinecke
On 01/04/2018 11:46 PM, Keith Busch wrote: > Uses common code for determining if an error should be retried on > alternate path. > > Signed-off-by: Keith Busch > --- > drivers/nvme/host/multipath.c | 14 +- > 1 file changed, 1 insertion(+), 13 deletions(-) >

Re: [PATCH 3/5] block: Provide blk_status_t decoding for retryable errors

2018-01-08 Thread Hannes Reinecke
On 01/04/2018 11:46 PM, Keith Busch wrote: > This patch provides a common decoder for block status that may be retried > so various entities wishing to consult this do not have to duplicate > this decision. > > Signed-off-by: Keith Busch > --- > include/linux/blk_types.h

Re: [PATCH 2/5] nvme/multipath: Consult blk_status_t for failover

2018-01-08 Thread Hannes Reinecke
On 01/04/2018 11:46 PM, Keith Busch wrote: > This removes nvme multipath's specific status decoding to see if failover > is needed, using the generic blk_status_t that was translated earlier. This > abstraction from the raw NVMe status means nvme status decoding exists > in just one place. > >

Re: [PATCH 1/5] nvme: Add more command status translation

2018-01-08 Thread Christoph Hellwig
On Mon, Jan 08, 2018 at 11:09:03AM +0100, Hannes Reinecke wrote: > >>case NVME_SC_SUCCESS: > >>return BLK_STS_OK; > >>case NVME_SC_CAP_EXCEEDED: > >> + case NVME_SC_LBA_RANGE: > >>return BLK_STS_NOSPC; > > > > lba range isn't really enospc. It is returned when

Re: [PATCH 08/10] block: move bio_alloc_pages() to bcache

2018-01-08 Thread Michael Lyle
On 12/08/2017 05:14 AM, Ming Lei wrote: > bcache is the only user of bio_alloc_pages(), and all users should use > bio_add_page() instead, so move this function into bcache, and avoid > it misused in future. Can things like this -please- be sent to the bcache list and bcache maintainers? I'm

Re: [PATCH 06/12] IB/core: Add optional PCI P2P flag to rdma_rw_ctx_[init|destroy]()

2018-01-08 Thread Jason Gunthorpe
On Mon, Jan 08, 2018 at 11:17:38AM -0700, Logan Gunthorpe wrote: > >>If at all it should be in the dma_map* wrappers, but for that we'd need > >>a good identifier. And it still would not solve the whole fake dma > >>ops issue. > > > >Very long term the IOMMUs under the ops will need to care

Re: [PATCH 06/12] IB/core: Add optional PCI P2P flag to rdma_rw_ctx_[init|destroy]()

2018-01-08 Thread Christoph Hellwig
On Mon, Jan 08, 2018 at 11:09:17AM -0700, Jason Gunthorpe wrote: > > As usual we implement what actually has a consumer. On top of that the > > R/W API is the only core RDMA API that actually does DMA mapping for the > > ULP at the moment. > > Well again the same can be said for dma_map_page vs

Re: [PATCH v3] bcache: fix writeback target calc on large devices

2018-01-08 Thread Michael Lyle
Tang Junhui-- Thanks for your feedback, help, and flexibility. I will try to make this better overall in the long term. On Sun, Jan 7, 2018 at 6:46 PM, wrote: > OK, please replace 16384 with macro, and replease 16384 * > bdev_sectors(dc->bdev) > with bit shift

Re: [PATCH 06/12] IB/core: Add optional PCI P2P flag to rdma_rw_ctx_[init|destroy]()

2018-01-08 Thread Christoph Hellwig
On Thu, Jan 04, 2018 at 09:50:31PM -0700, Jason Gunthorpe wrote: > Well that argument applies equally to the RDMA RW API wrappers around > the DMA API. I think it is fine if sgl are defined to only have P2P or > not, and that debugging support seemed reasonable to me.. > > > It's also very

Re: [PATCH 1/5] nvme: Add more command status translation

2018-01-08 Thread Mike Snitzer
On Mon, Jan 08 2018 at 5:19am -0500, Christoph Hellwig wrote: > On Mon, Jan 08, 2018 at 11:09:03AM +0100, Hannes Reinecke wrote: > > >> case NVME_SC_SUCCESS: > > >> return BLK_STS_OK; > > >> case NVME_SC_CAP_EXCEEDED: > > >> +case

[PATCH 3/4] scsi: Avoid that .queuecommand() gets called for a quiesced SCSI device

2018-01-08 Thread Bart Van Assche
Several SCSI transport and LLD drivers surround code that does not tolerate concurrent calls of .queuecommand() with scsi_target_block() / scsi_target_unblock(). These last two functions use blk_mq_quiesce_queue() / blk_mq_unquiesce_queue() for scsi-mq request queues to prevent concurrent

[PATCH 0/4] Make SCSI transport recovery more robust

2018-01-08 Thread Bart Van Assche
Hello Jens, A longstanding issue with the SCSI core is that several SCSI transport drivers use scsi_target_block() and scsi_target_unblock() to avoid concurrent .queuecommand() calls during e.g. transport recovery but that this is not sufficient to protect from such calls. Hence this patch

Re: [PATCH 06/12] IB/core: Add optional PCI P2P flag to rdma_rw_ctx_[init|destroy]()

2018-01-08 Thread Jason Gunthorpe
On Mon, Jan 08, 2018 at 07:34:34PM +0100, Christoph Hellwig wrote: > > > > And on that topic, does this scheme work with HFI? > > > > > > No, and I guess we need an opt-out. HFI generally seems to be > > > extremely weird. > > > > This series needs some kind of fix so HFI, QIB, rxe, etc don't

[PATCH 4/4] IB/srp: Fix a sleep-in-invalid-context bug

2018-01-08 Thread Bart Van Assche
The previous two patches guarantee that srp_queuecommand() does not get invoked while reconnecting occurs. Hence remove the code from srp_queuecommand() that prevents command queueing while reconnecting. This patch avoids that the following can appear in the kernel log: BUG: sleeping function

[PATCH 1/4] blk-mq: Rename request_queue.mq_freeze_wq into mq_wq

2018-01-08 Thread Bart Van Assche
Rename a waitqueue in struct request_queue since the next patch will add code that uses this waitqueue outside the request queue freezing implementation. Signed-off-by: Bart Van Assche Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Johannes

Re: [PATCH 06/12] IB/core: Add optional PCI P2P flag to rdma_rw_ctx_[init|destroy]()

2018-01-08 Thread Christoph Hellwig
On Mon, Jan 08, 2018 at 11:44:41AM -0700, Logan Gunthorpe wrote: >> Think about what the dma mapping routines do: >> >> (a) translate from host address to bus addresses >> >> and >> >> (b) flush caches (in non-coherent architectures) >> >> Both are obviously not needed for P2P transfers, as

[PATCH 8/8] blk-mq: rename blk_mq_hw_ctx->queue_rq_srcu to ->srcu

2018-01-08 Thread Tejun Heo
The RCU protection has been expanded to cover both queueing and completion paths making ->queue_rq_srcu a misnomer. Rename it to ->srcu as suggested by Bart. Signed-off-by: Tejun Heo Cc: Bart Van Assche --- block/blk-mq.c | 14 +++---

[PATCH 5/8] blk-mq: make blk_abort_request() trigger timeout path

2018-01-08 Thread Tejun Heo
With issue/complete and timeout paths now using the generation number and state based synchronization, blk_abort_request() is the only one which depends on REQ_ATOM_COMPLETE for arbitrating completion. There's no reason for blk_abort_request() to be a completely separate path. This patch makes

Re: [PATCH 1/8] blk-mq: move hctx lock/unlock into a helper

2018-01-08 Thread Bart Van Assche
On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote: > +static void hctx_unlock(struct blk_mq_hw_ctx *hctx, int srcu_idx) > +{ > + if (!(hctx->flags & BLK_MQ_F_BLOCKING)) > + rcu_read_unlock(); > + else > + srcu_read_unlock(hctx->queue_rq_srcu, srcu_idx); > +} > + >

[PATCH 6/8] blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq

2018-01-08 Thread Tejun Heo
After the recent updates to use generation number and state based synchronization, blk-mq no longer depends on REQ_ATOM_COMPLETE except to avoid firing the same timeout multiple times. Remove all REQ_ATOM_COMPLETE usages and use a new rq_flags flag RQF_MQ_TIMEOUT_EXPIRED to avoid firing the same

Re: [GIT PULL] nvme fixes for Linux 4.16

2018-01-08 Thread Jens Axboe
On 1/8/18 11:52 AM, Christoph Hellwig wrote: > Hi Jens, > > below are the pending nvme updates for Linux 4.16. Just fixes and > cleanups from various contributors this time around. > > The following changes since commit fb350e0ad99359768e1e80b4784692031ec340e4: > > blk-mq: fix race between

Re: [PATCH 06/12] IB/core: Add optional PCI P2P flag to rdma_rw_ctx_[init|destroy]()

2018-01-08 Thread Jason Gunthorpe
On Mon, Jan 8, 2018 at 11:57 AM, Christoph Hellwig wrote: >> (c) setup/manage any security permissions on mappings >> Which P2P may at some point be concerned with. > > Maybe once root complexes with iommus actually support P2P. But until > then we have a lot more more important

Re: [PATCH 2/8] blk-mq: protect completion path with RCU

2018-01-08 Thread Holger Hoffstätte
On 01/08/18 20:15, Tejun Heo wrote: > Currently, blk-mq protects only the issue path with RCU. This patch > puts the completion path under the same RCU protection. This will be > used to synchronize issue/completion against timeout by later patches, > which will also add the comments. > >

[GIT PULL] nvme fixes for Linux 4.16

2018-01-08 Thread Christoph Hellwig
Hi Jens, below are the pending nvme updates for Linux 4.16. Just fixes and cleanups from various contributors this time around. The following changes since commit fb350e0ad99359768e1e80b4784692031ec340e4: blk-mq: fix race between updating nr_hw_queues and switching io sched (2018-01-06

[PATCH 7/8] blk-mq: remove REQ_ATOM_STARTED

2018-01-08 Thread Tejun Heo
After the recent updates to use generation number and state based synchronization, we can easily replace REQ_ATOM_STARTED usages by adding an extra state to distinguish completed but not yet freed state. Add MQ_RQ_COMPLETE and replace REQ_ATOM_STARTED usages with blk_mq_rq_state() tests.

[PATCH 4/8] blk-mq: use blk_mq_rq_state() instead of testing REQ_ATOM_COMPLETE

2018-01-08 Thread Tejun Heo
blk_mq_check_inflight() and blk_mq_poll_hybrid_sleep() test REQ_ATOM_COMPLETE to determine the request state. Both uses are speculative and we can test REQ_ATOM_STARTED and blk_mq_rq_state() for equivalent results. Replace the tests. This will allow removing REQ_ATOM_COMPLETE usages from

[PATCH 2/8] blk-mq: protect completion path with RCU

2018-01-08 Thread Tejun Heo
Currently, blk-mq protects only the issue path with RCU. This patch puts the completion path under the same RCU protection. This will be used to synchronize issue/completion against timeout by later patches, which will also add the comments. Signed-off-by: Tejun Heo ---

[PATCH 3/8] blk-mq: replace timeout synchronization with a RCU and generation based scheme

2018-01-08 Thread Tejun Heo
Currently, blk-mq timeout path synchronizes against the usual issue/completion path using a complex scheme involving atomic bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence rules. Unfortunately, it contains quite a few holes. There's a complex dancing around REQ_ATOM_STARTED and

[PATCHSET v4] blk-mq: reimplement timeout handling

2018-01-08 Thread Tejun Heo
Hello, Changes from [v3] - Rebased on top of for-4.16/block. - Integrated Jens's hctx_[un]lock() factoring patch and refreshed the patches accordingly. - Added comment explaining the use of hctx_lock() instead of rcu_read_lock() in completion path. Changes from [v2] - Possible extended

Re: [416 PATCH 00/13] Bcache changes for 4.16

2018-01-08 Thread Jens Axboe
On 1/8/18 1:21 PM, Michael Lyle wrote: > Jens, > > Please pick up the following reviewed changes for 4.16. There's some > small cleanliness changes, a few minor bug fixes (some in preparation > for larger work), and ongoing work on writeback performance: > > 11 files changed, 303

Re: [PATCH 2/8] blk-mq: protect completion path with RCU

2018-01-08 Thread Jens Axboe
On 1/8/18 12:57 PM, Holger Hoffstätte wrote: > On 01/08/18 20:15, Tejun Heo wrote: >> Currently, blk-mq protects only the issue path with RCU. This patch >> puts the completion path under the same RCU protection. This will be >> used to synchronize issue/completion against timeout by later

[416 PATCH 00/13] Bcache changes for 4.16

2018-01-08 Thread Michael Lyle
Jens, Please pick up the following reviewed changes for 4.16. There's some small cleanliness changes, a few minor bug fixes (some in preparation for larger work), and ongoing work on writeback performance: 11 files changed, 303 insertions(+), 133 deletions(-) Coly Li (2): bcache: reduce

[416 PATCH 03/13] bcache: Use PTR_ERR_OR_ZERO()

2018-01-08 Thread Michael Lyle
From: Vasyl Gomonovych Fix ptr_ret.cocci warnings: drivers/md/bcache/btree.c:1800:1-3: WARNING: PTR_ERR_OR_ZERO can be used Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR Generated by: scripts/coccinelle/api/ptr_ret.cocci Signed-off-by: Vasyl Gomonovych

[416 PATCH 06/13] bcache: writeback: properly order backing device IO

2018-01-08 Thread Michael Lyle
Writeback keys are presently iterated and dispatched for writeback in order of the logical block address on the backing device. Multiple may be, in parallel, read from the cache device and then written back (especially when there are contiguous I/O). However-- there was no guarantee with the

[416 PATCH 01/13] bcache: ret IOERR when read meets metadata error

2018-01-08 Thread Michael Lyle
From: Rui Hua The read request might meet error when searching the btree, but the error was not handled in cache_lookup(), and this kind of metadata failure will not go into cached_dev_read_error(), finally, the upper layer will receive bi_status=0. In this patch we judge

[416 PATCH 08/13] bcache: Fix, improve efficiency of closure_sync()

2018-01-08 Thread Michael Lyle
From: Kent Overstreet Eliminates cases where sync can race and fail to complete / get stuck. Removes many status flags and simplifies entering-and-exiting closure sleeping behaviors. [mlyle: fixed conflicts due to changed return behavior in mainline. extended commit

[416 PATCH 05/13] bcache: fix wrong return value in bch_debug_init()

2018-01-08 Thread Michael Lyle
From: Tang Junhui in bch_debug_init(), ret is always 0, and the return value is useless, change it to return 0 if be success after calling debugfs_create_dir(), else return a non-zero value. Signed-off-by: Tang Junhui Reviewed-by: Michael Lyle

[416 PATCH 13/13] bcache: fix writeback target calc on large devices

2018-01-08 Thread Michael Lyle
Bcache needs to scale the dirty data in the cache over the multiple backing disks in order to calculate writeback rates for each. The previous code did this by multiplying the target number of dirty sectors by the backing device size, and expected it to fit into a uint64_t; this blows up on

[416 PATCH 10/13] bcache: fix unmatched generic_end_io_acct() & generic_start_io_acct()

2018-01-08 Thread Michael Lyle
From: Zhai Zhaoxuan The function cached_dev_make_request() and flash_dev_make_request() call generic_start_io_acct() with (struct bcache_device)->disk when they start a closure. Then the function bio_complete() calls generic_end_io_acct() with (struct

[416 PATCH 02/13] bcache: stop writeback thread after detaching

2018-01-08 Thread Michael Lyle
From: Tang Junhui Currently, when a cached device detaching from cache, writeback thread is not stopped, and writeback_rate_update work is not canceled. For example, after the following command: echo 1 >/sys/block/sdb/bcache/detach you can still see the writeback thread.

[416 PATCH 04/13] bcache: segregate flash only volume write streams

2018-01-08 Thread Michael Lyle
From: Tang Junhui In such scenario that there are some flash only volumes , and some cached devices, when many tasks request these devices in writeback mode, the write IOs may fall to the same bucket as bellow: | cached data | flash data | cached data | cached data| flash

[416 PATCH 07/13] bcache: allow quick writeback when backing idle

2018-01-08 Thread Michael Lyle
If the control system would wait for at least half a second, and there's been no reqs hitting the backing disk for awhile: use an alternate mode where we have at most one contiguous set of writebacks in flight at a time. (But don't otherwise delay). If front-end IO appears, it will still be

[416 PATCH 11/13] bcache: reduce cache_set devices iteration by devices_max_used

2018-01-08 Thread Michael Lyle
From: Coly Li Member devices of struct cache_set is used to reference all attached bcache devices to this cache set. If it is treated as array of pointers, size of devices[] is indicated by member nr_uuids of struct cache_set. nr_uuids is calculated in

[416 PATCH 09/13] bcache: mark closure_sync() __sched

2018-01-08 Thread Michael Lyle
From: Kent Overstreet [edit by mlyle: include sched/debug.h to get __sched] Signed-off-by: Kent Overstreet Signed-off-by: Michael Lyle Reviewed-by: Michael Lyle --- drivers/md/bcache/closure.c | 3 ++- 1

[416 PATCH 12/13] bcache: fix misleading error message in bch_count_io_errors()

2018-01-08 Thread Michael Lyle
From: Coly Li Bcache only does recoverable I/O for read operations by calling cached_dev_read_error(). For write opertions there is no I/O recovery for failed requests. But in bch_count_io_errors() no matter read or write I/Os, before errors counter reaches io error limit,

Re: [PATCH V4 13/45] block: blk-merge: try to make front segments in full size

2018-01-08 Thread Dmitry Osipenko
> @@ -393,6 +433,10 @@ __blk_segment_map_sg(struct request_queue *q, struct > bio_vec *bvec, > > sg_set_page(*sg, bvec->bv_page, nbytes, bvec->bv_offset); > (*nsegs)++; > + > + /* for making iterator happy */ > + b

Re: [PATCH 3/8] blk-mq: replace timeout synchronization with a RCU and generation based scheme

2018-01-08 Thread Bart Van Assche
On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote: > +static void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate) > +{ > + unsigned long flags; > + > + local_irq_save(flags); > + u64_stats_update_begin(>aborted_gstate_sync); > + rq->aborted_gstate = gstate; > +

Re: [PATCH 06/12] IB/core: Add optional PCI P2P flag to rdma_rw_ctx_[init|destroy]()

2018-01-08 Thread Jason Gunthorpe
On Mon, Jan 08, 2018 at 03:59:01PM +0100, Christoph Hellwig wrote: > On Thu, Jan 04, 2018 at 09:50:31PM -0700, Jason Gunthorpe wrote: > > Well that argument applies equally to the RDMA RW API wrappers around > > the DMA API. I think it is fine if sgl are defined to only have P2P or > > not, and

Re: [PATCH 06/12] IB/core: Add optional PCI P2P flag to rdma_rw_ctx_[init|destroy]()

2018-01-08 Thread Logan Gunthorpe
On 08/01/18 11:09 AM, Jason Gunthorpe wrote: It could, if we had a DMA op for p2p then the drivers that provide their own ops can implement it appropriately or not at all. I was thinking of doing something like this. I'll probably rough out a patch and send it along today or tomorrow. If

Re: [PATCH 4/8] blk-mq: use blk_mq_rq_state() instead of testing REQ_ATOM_COMPLETE

2018-01-08 Thread Bart Van Assche
On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote: > blk_mq_check_inflight() and blk_mq_poll_hybrid_sleep() test > REQ_ATOM_COMPLETE to determine the request state. Both uses are > speculative and we can test REQ_ATOM_STARTED and blk_mq_rq_state() for > equivalent results. Replace the tests.

Re: [PATCH 2/8] blk-mq: protect completion path with RCU

2018-01-08 Thread Jens Axboe
On 1/8/18 1:15 PM, Jens Axboe wrote: > On 1/8/18 12:57 PM, Holger Hoffstätte wrote: >> On 01/08/18 20:15, Tejun Heo wrote: >>> Currently, blk-mq protects only the issue path with RCU. This patch >>> puts the completion path under the same RCU protection. This will be >>> used to synchronize

Re: [PATCH 2/8] blk-mq: protect completion path with RCU

2018-01-08 Thread Holger Hoffstätte
On 01/08/18 23:55, Jens Axboe wrote: > On 1/8/18 1:15 PM, Jens Axboe wrote: >> On 1/8/18 12:57 PM, Holger Hoffstätte wrote: >>> On 01/08/18 20:15, Tejun Heo wrote: Currently, blk-mq protects only the issue path with RCU. This patch puts the completion path under the same RCU protection.

Re: [PATCH 5/8] blk-mq: make blk_abort_request() trigger timeout path

2018-01-08 Thread Bart Van Assche
On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote: > @@ -156,12 +156,12 @@ void blk_timeout_work(struct work_struct *work) > */ > void blk_abort_request(struct request *req) > { > - if (blk_mark_rq_complete(req)) > - return; > - > if (req->q->mq_ops) { > -

Re: [PATCH] bcache: fix inaccurate io state for detached bcache devices

2018-01-08 Thread Coly Li
On 09/01/2018 10:27 AM, tang.jun...@zte.com.cn wrote: > From: Tang Junhui > > When we run IO in a detached device, and run iostat to shows IO status, > normally it will show like bellow (Omitted some fields): > Device: ... avgrq-sz avgqu-sz await r_await w_await svctm

Re: [PATCH 6/8] blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq

2018-01-08 Thread Bart Van Assche
On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote: > After the recent updates to use generation number and state based > synchronization, blk-mq no longer depends on REQ_ATOM_COMPLETE except > to avoid firing the same timeout multiple times. > > Remove all REQ_ATOM_COMPLETE usages and use a new

Re: [PATCH V4 13/45] block: blk-merge: try to make front segments in full size

2018-01-08 Thread Ming Lei
advance = queue_max_segment_size(q) - (*sg)->length; > > + if (advance) { > > + (*sg)->length += advance; > > + bvec->bv_offset += advance; > > + bvec->bv_le

Re: [PATCH v2 3/3] scsi-mq-debugfs: Show more information

2018-01-08 Thread Martin K. Petersen
Bart, > Show the request result, request timeout and SCSI command flags. > This information is very helpful when trying to figure out why a > queue got stuck. An example of the information that is exported > through debugfs: Applied to 4.16/scsi-fixes, thanks. -- Martin K. Petersen

Re: [PATCH 7/8] blk-mq: remove REQ_ATOM_STARTED

2018-01-08 Thread Bart Van Assche
On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote: > After the recent updates to use generation number and state based > synchronization, we can easily replace REQ_ATOM_STARTED usages by > adding an extra state to distinguish completed but not yet freed > state. > > Add MQ_RQ_COMPLETE and

Re: [PATCH 8/8] blk-mq: rename blk_mq_hw_ctx->queue_rq_srcu to ->srcu

2018-01-08 Thread Bart Van Assche
On Mon, 2018-01-08 at 11:15 -0800, Tejun Heo wrote: > The RCU protection has been expanded to cover both queueing and > completion paths making ->queue_rq_srcu a misnomer. Rename it to > ->srcu as suggested by Bart. Reviewed-by: Bart Van Assche

[PATCH] bcache: fix inaccurate io state for detached bcache devices

2018-01-08 Thread tang . junhui
From: Tang Junhui When we run IO in a detached device, and run iostat to shows IO status, normally it will show like bellow (Omitted some fields): Device: ... avgrq-sz avgqu-sz await r_await w_await svctm %util sdd... 15.89 0.531.820.202.23

Re: [PATCH V9 0/7] blk-mq support for ZBC disks

2018-01-08 Thread Martin K. Petersen
Jens, > Completely up to you - I already have 1-5, I can add 6/7 as well, or > just can do it in your tree. Let me know what you prefer. Started my 4.16/scsi-fixes branch early based on your tree. I queued these two up. -- Martin K. Petersen Oracle Linux Engineering

Re: [GIT PULL 24/25] lightnvm: pblk: add iostat support

2018-01-08 Thread Matias Bjørling
On Mon, Jan 8, 2018 at 1:53 PM, Javier González wrote: >> On 8 Jan 2018, at 12.54, Christoph Hellwig wrote: >> >> On Fri, Jan 05, 2018 at 07:33:36PM +0100, Matias Bjørling wrote: >>> On 01/05/2018 04:42 PM, Jens Axboe wrote: On Fri, Jan 05 2018, Matias

Re: [PATCH 0/4] mylex: Replace DAC960 block driver

2018-01-08 Thread Christoph Hellwig
Btw, did you manage to get any further with these new drivers?

Re: [PATCH v1 01/10] bcache: exit bch_writeback_thread() with proper task state

2018-01-08 Thread Coly Li
On 08/01/2018 3:09 PM, Hannes Reinecke wrote: > On 01/03/2018 03:03 PM, Coly Li wrote: >> Kernel thread routine bch_writeback_thread() has the following code block, >> >> 452 set_current_state(TASK_INTERRUPTIBLE); >> 453 >> 454 if

Re: [PATCH v3 RESEND 1/2] blk-throttle: track read and write request individually

2018-01-08 Thread Joseph Qi
A polite ping for the two pending patches... Thanks, Joseph On 17/11/24 13:13, Jens Axboe wrote: > On 11/23/2017 06:31 PM, Joseph Qi wrote: >> Hi Jens, >> Could you please give your advice for the two patches or pick them up if >> you think they are good? > > It looks OK to me, but my

is a bug in blk-throttle?

2018-01-08 Thread yuyufen
Hi, all I test blk throttle in linux.4.15-rc6, and find that, system can't throttle IOs accurately, when io.max is set a small value. I make a directory 'mytest' in cgroup2 and set /dev/vdc io.max to 1M/s. Device /dev/vdc(253:32) is mounted on /ext4 with type ext4. $ echo "253:32 wbps=1048576"

Re: [GIT PULL 24/25] lightnvm: pblk: add iostat support

2018-01-08 Thread Christoph Hellwig
On Fri, Jan 05, 2018 at 07:33:36PM +0100, Matias Bjørling wrote: > On 01/05/2018 04:42 PM, Jens Axboe wrote: > > On Fri, Jan 05 2018, Matias Bjørling wrote: > > > From: Javier González > > > > > > Since pblk registers its own block device, the iostat accounting is > > > not

Re: [RFC PATCH] blk-throttle: dispatch more sync writes in block throttle layer

2018-01-08 Thread Tejun Heo
Hello, On Fri, Jan 05, 2018 at 01:16:26PM +0800, xuejiufei wrote: > From: Jiufei Xue > > Cgroup writeback is supported since v4.2. But there exists a problem > in the following case. > > A cgroup may send both buffer and direct/sync IOs. The foreground > thread will

Re: [GIT PULL 24/25] lightnvm: pblk: add iostat support

2018-01-08 Thread Javier González
> On 8 Jan 2018, at 12.54, Christoph Hellwig wrote: > > On Fri, Jan 05, 2018 at 07:33:36PM +0100, Matias Bjørling wrote: >> On 01/05/2018 04:42 PM, Jens Axboe wrote: >>> On Fri, Jan 05 2018, Matias Bjørling wrote: From: Javier González

Re: [PATCH 1/5] nvme: Add more command status translation

2018-01-08 Thread Christoph Hellwig
On Mon, Jan 08, 2018 at 10:29:33AM -0500, Mike Snitzer wrote: > No argument needed. Definitely needs fixing. Too many upper layers > consider BLK_STS_NOSPC retryable (XFS, ext4, dm-thinp, etc). Which > NVME_SC_LBA_RANGE absolutely isn't. > > When I backfilled NVME_SC_LBA_RANGE handling I

Re: [PATCH V9 0/7] blk-mq support for ZBC disks

2018-01-08 Thread Martin K. Petersen
Jens, > This looks OK for me for 4.16. I can grab all of them, or I can leave > the last two for Martin to apply if he prefers that, though that will > add a block tree dependency for SCSI. I already have a block dependency for 4.16. But it doesn't matter much. -- Martin K. Petersen

Re: [PATCH v1 05/10] bcache: stop dc->writeback_rate_update if cache set is stopping

2018-01-08 Thread Coly Li
On 08/01/2018 3:22 PM, Hannes Reinecke wrote: > On 01/03/2018 03:03 PM, Coly Li wrote: >> struct delayed_work writeback_rate_update in struct cache_dev is a delayed >> worker to call function update_writeback_rate() in period (the interval is >> defined by dc->writeback_rate_update_seconds). >> >>

Re: [PATCH 1/5] nvme: Add more command status translation

2018-01-08 Thread Keith Busch
On Mon, Jan 08, 2018 at 04:34:36PM +0100, Christoph Hellwig wrote: > It's basically a kernel bug as it tries to access lbas that do not > exist. BLK_STS_TARGET should be fine. Okay, I'll fix this and address your other comments, and resend. Thanks for the feedback.

Re: [PATCH V9 0/7] blk-mq support for ZBC disks

2018-01-08 Thread Jens Axboe
On 1/8/18 8:52 AM, Martin K. Petersen wrote: > > Jens, > >> This looks OK for me for 4.16. I can grab all of them, or I can leave >> the last two for Martin to apply if he prefers that, though that will >> add a block tree dependency for SCSI. > > I already have a block dependency for 4.16. But

[PATCH] block: Fix kernel-doc warnings reported when building with W=1

2018-01-08 Thread Bart Van Assche
Commit 3a025e1d1c2e ("Add optional check for bad kernel-doc comments") causes W=1 the kernel-doc script to be run and thereby causes several new warnings to appear when building the kernel with W=1. Fix the block layer kernel-doc headers such that the block layer again builds cleanly with W=1.