Re: [PATCH 1/4] block: move req_set_nomerge to blk.h

2017-02-15 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg <s...@grimberg.me>

Re: [PATCH 4/4] nvme: support ranged discard requests

2017-02-15 Thread Sagi Grimberg
On 07/02/17 21:34, Christoph Hellwig wrote: On Tue, Feb 07, 2017 at 01:34:54PM -0500, Keith Busch wrote: On Tue, Feb 07, 2017 at 05:46:58PM +0100, Christoph Hellwig wrote: @@ -1233,6 +1243,8 @@ static void nvme_set_queue_limits(struct nvme_ctrl *ctrl, if (ctrl->vwc &

Re: [PATCH 2/4] block: enumify ELEVATOR_*_MERGE

2017-02-15 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg <s...@grimbeg.me>

Re: [PATCH 3/4] block: optionally merge discontiguous discard bios into a single request

2017-02-15 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg <s...@grimbeg.me>

Re: support for multi-range discard requests V3

2017-02-15 Thread Sagi Grimberg
Hi all, this series adds support for merging discontiguous discard bios into a single request if the driver supports it. This reduces the number of discards sent to the device by about a factor of 5-6 for typical workloads on NVMe, and for slower devices that use I/O scheduling the number

Re: [PATCH 4/4] nvme: support ranged discard requests

2017-02-15 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg <s...@grimberg.me>

Re: [PATCH 3/4] nvme: use blk_rq_payload_bytes

2017-01-18 Thread Sagi Grimberg
@@ -1014,9 +1013,9 @@ static int nvme_rdma_map_data(struct nvme_rdma_queue *queue, } Christoph, a little above here we still look at blk_rq_bytes(), shouldn't that look at blk_rq_payload_bytes() too? The check is ok for now as it's just zero vs non-zero. It's somewhat broken for

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-19 Thread Sagi Grimberg
Christoph suggest to me once that we can take a hybrid approach where we consume a small amount of completions (say 4) right away from the interrupt handler and if we have more we schedule irq-poll to reap the rest. But back then it didn't work better which is not aligned with my observations

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-17 Thread Sagi Grimberg
-- [1] queue = b'nvme0q1' usecs : count distribution 0 -> 1 : 7310 || 2 -> 3 : 11 | | 4 -> 7 : 10 | | 8 -> 15 : 20 | |

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-17 Thread Sagi Grimberg
So it looks like we are super not efficient because most of the times we catch 1 completion per interrupt and the whole point is that we need to find more! This fio is single threaded with QD=32 so I'd expect that we be somewhere in 8-31 almost all the time... I also

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-17 Thread Sagi Grimberg
Hey, so I made some initial analysis of whats going on with irq-poll. First, I sampled how much time it takes before we get the interrupt in nvme_irq and the initial visit to nvme_irqpoll_handler. I ran a single threaded fio with QD=32 of 4K reads. This is two displays of a histogram of the

Re: [PATCH 3/4] nvme: use blk_rq_payload_bytes

2017-01-17 Thread Sagi Grimberg
@@ -1014,9 +1013,9 @@ static int nvme_rdma_map_data(struct nvme_rdma_queue *queue, } Christoph, a little above here we still look at blk_rq_bytes(), shouldn't that look at blk_rq_payload_bytes() too? if (count == 1) { - if (rq_data_dir(rq) == WRITE && -

Re: [PATCH] nbd: use an idr to keep track of nbd devices

2017-01-16 Thread Sagi Grimberg
Hey Josef, I'm going to use it the same way loop does, there will be a /dev/nbd-control where you can say ADD, REMOVE, and GET_NEXT. I need the search functionality to see if we are adding something that already exists, and to see what is the next unused device that can be used for a

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-18 Thread Sagi Grimberg
So what you say is you saw a consomed == 1 [1] most of the time? [1] from http://git.infradead.org/nvme.git/commitdiff/eed5a9d925c59e43980047059fde29e3aa0b7836 Exactly. By processing 1 completion per interrupt it makes perfect sense why this performs poorly, it's not worth paying the

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-18 Thread Sagi Grimberg
Hannes just spotted this: static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx, const struct blk_mq_queue_data *bd) { [...] __nvme_submit_cmd(nvmeq, ); nvme_process_cq(nvmeq); spin_unlock_irq(>q_lock); return BLK_MQ_RQ_QUEUE_OK;

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-18 Thread Sagi Grimberg
Your report provided this stats with one-completion dominance for the single-threaded case. Does it also hold if you run multiple fio threads per core? It's useless to run more threads on that core, it's already fully utilized. That single threads is already posting a fair amount of

Re: [PATCHv2 2/2] nvme: Complete all stuck requests

2017-02-27 Thread Sagi Grimberg
On 24/02/17 02:36, Keith Busch wrote: If the block layer has entered requests and gets a CPU hot plug event prior to the resume event, it will wait for those requests to exit. If the nvme driver is shutting down, it will not start the queues back up, preventing forward progress. To fix that,

nvmf regression with mq-deadline

2017-02-27 Thread Sagi Grimberg
Hey Jens, I'm getting a regression in nvme-rdma/nvme-loop with for-linus [1] with a small script to trigger it. The reason seems to be that the sched_tags does not take into account the tag_set reserved tags. This solves it for me, any objections on this? -- diff --git a/block/blk-mq-sched.c

Re: [PATCHv2 1/2] blk-mq: Export blk_mq_freeze_queue_wait

2017-02-27 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg <s...@grimberg.me>

Re: [PATCHv2 2/2] nvme: Complete all stuck requests

2017-02-28 Thread Sagi Grimberg
OK, I think we can get it for fabrics too, need to figure out how to handle it there too. Do you have a reproducer? To repro, I have to run a buffered writer workload then put the system into S3. This fio job seems to reproduce for me: fio --name=global --filename=/dev/nvme0n1

Re: [PATCH 1/2] blk-mq-sched: Allocate sched reserved tags as specified in the original queue tagset

2017-02-27 Thread Sagi Grimberg
Hm, this may fix the crash, but I'm not sure it'll work as intended. When we allocate the request, we'll get a reserved scheduler tag, but then when we go to dispatch the request and call blk_mq_get_driver_tag(), we'll be competing with all of the normal requests for a regular driver tag. So

Re: nvmf regression with mq-deadline

2017-02-27 Thread Sagi Grimberg
Now I'm getting a NULL deref with nvme-rdma [1]. For some reason blk_mq_tag_to_rq() is returning NULL on tag 0x0 which is io queue connect. I'll try to see where this is coming from. This does not happen with loop though... That's because the loop driver does not rely on the cqe.command_id

[PATCH 1/2] blk-mq-sched: Allocate sched reserved tags as specified in the original queue tagset

2017-02-27 Thread Sagi Grimberg
Signed-off-by: Sagi Grimberg <s...@grimberg.me> --- block/blk-mq-sched.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 98c7b061781e..46ca965fff5c 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -454,7

[PATCH 2/2] blk-mq: make sure to back-assign the request to rq_map in blk_mq_alloc_request_hctx

2017-02-27 Thread Sagi Grimberg
Otherwise we won't be able to retrieve the request from the tag. Signed-off-by: Sagi Grimberg <s...@grimberg.me> --- block/blk-mq.c | 1 + 1 file changed, 1 insertion(+) diff --git a/block/blk-mq.c b/block/blk-mq.c index d84c66fb37b7..9611cd9920e9 100644 --- a/block/blk-mq.c +++ b/blo

Re: nvmf regression with mq-deadline

2017-02-27 Thread Sagi Grimberg
Hey Jens, I'm getting a regression in nvme-rdma/nvme-loop with for-linus [1] with a small script to trigger it. The reason seems to be that the sched_tags does not take into account the tag_set reserved tags. This solves it for me, any objections on this? -- diff --git a/block/blk-mq-sched.c

Re: [PATCH 1/2] blk-mq-sched: Allocate sched reserved tags as specified in the original queue tagset

2017-02-27 Thread Sagi Grimberg
Can't we just not go through the scheduler for reserved tags? Obviously there is no point in scheduling them... Right, that would be possible. But I'd rather not treat any requests differently, it's a huge pain in the ass that flush request currently insert with a driver tag already

Re: [PATCH 1/2] lightnvm: add generic ocssd detection

2017-02-27 Thread Sagi Grimberg
[adding linux-nvme to Cc as the patch changes the nvme driver, despite the subject line] On Sat, Feb 25, 2017 at 08:16:04PM +0100, Matias Bjørling wrote: On 02/25/2017 07:21 PM, Christoph Hellwig wrote: On Fri, Feb 24, 2017 at 06:16:48PM +0100, Matias Bjørling wrote: More implementations of

Re: [PATCH 2/2] blk-mq: make sure to back-assign the request to rq_map in blk_mq_alloc_request_hctx

2017-02-27 Thread Sagi Grimberg
Thanks. Sagi, I updated your first patch as follows: http://git.kernel.dk/cgit/linux-block/commit/?h=for-linus=d06f713e5d200959cdb445a0104e71d9e6070c51 and this is now head of for-linus. thanks.

Re: [PATCHv2 2/2] nvme: Complete all stuck requests

2017-02-27 Thread Sagi Grimberg
If the block layer has entered requests and gets a CPU hot plug event prior to the resume event, it will wait for those requests to exit. If the nvme driver is shutting down, it will not start the queues back up, preventing forward progress. To fix that, this patch freezes the request queues

Re: [PATCH 1/3] blk-mq: make blk_mq_alloc_request_hctx() allocate a scheduler request

2017-02-27 Thread Sagi Grimberg
n't apply on top of the reserved tag patch. Yup, I had those based on Sagi's original patches for some reason. I fat-fingered send-email, sent as a reply to the original patch 1 instead of this email. I got it, applied all 3, thanks Omar! FWIW, you can add my: Tested-by: Sagi Grimberg

Re: [PATCH 1/3] blk-mq: make blk_mq_alloc_request_hctx() allocate a scheduler request

2017-02-27 Thread Sagi Grimberg
blk_mq_alloc_request_hctx() allocates a driver request directly, unlike its blk_mq_alloc_request() counterpart. It also crashes because it doesn't update the tags->rqs map. Fix it by making it allocate a scheduler request. Reported-by: Sagi Grimberg <s...@grimberg.me> Signed-off

Re: [PATCH 1/2] blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue()

2016-09-23 Thread Sagi Grimberg
e when scheduling internally, no need to duplicate it. Looks good too, Reviewed-by: Sagi Grimberg <s...@grimberg.me> Question (while we're on the subject): Do consumers have a way to restrict blk-mq to block on lack of tags? I'm thinking in the context of nvme-target that can do more useful

Re: [rfc] weirdness in bio_map_user_iov()

2016-09-23 Thread Sagi Grimberg
Hey Al, What happens if we feed it a 3-element iovec array, one page in each? AFAICS, bio_add_pc_page() is called for each of those pages, even if the previous calls have failed - break is only out of the inner loop. Sure, failure due to exceeded request size means that everything after that

Re: [PATCH 1/2] blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue()

2016-09-23 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg <s...@rimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 11/13] nvme: switch to use pci_alloc_irq_vectors

2016-09-23 Thread Sagi Grimberg
On 14/09/16 07:18, Christoph Hellwig wrote: Use the new helper to automatically select the right interrupt type, as well as to use the automatic interupt affinity assignment. Patch title and the change description are a little short IMO to describe what is going on here (need the blk-mq side

Re: [PATCH v2 4/8] blk-mq: Cleanup a loop exit condition

2016-10-05 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg <s...@grimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 3/8] blk-mq: Fix hardware context data node selection

2016-10-05 Thread Sagi Grimberg
Looks fine, Reviewed-by: Sagi Grimberg <s...@grimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 5/8] blk-mq: Cleanup blk_mq_hw_ctx::cpumask (de-)allocation

2016-10-05 Thread Sagi Grimberg
Looks fine, Reviewed-by: Sagi Grimberg <s...@grimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 7/8] blk-mq: Pair blk_mq_hctx_kobj_init() with blk_mq_hctx_kobj_put()

2016-10-05 Thread Sagi Grimberg
Looks fine, Reviewed-by: Sagi Grimberg <s...@grimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 8/8] blk-mq: Cleanup (de-)allocation of blk_mq_hw_ctx::ctxs

2016-10-05 Thread Sagi Grimberg
Looks fine, Reviewed-by: Sagi Grimberg <s...@grimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 6/8] blk-mq: Rework blk_mq_realloc_hw_ctxs()

2016-10-05 Thread Sagi Grimberg
@@ -1908,33 +1909,36 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set, if (node == NUMA_NO_NODE) node = set->numa_node; - hctxs[i] = kzalloc_node(sizeof(struct blk_mq_hw_ctx), -

Re: [PATCH v2 6/7] SRP transport: Port srp_wait_for_queuecommand() to scsi-mq

2016-10-05 Thread Sagi Grimberg
+static void srp_mq_wait_for_queuecommand(struct Scsi_Host *shost) +{ + struct scsi_device *sdev; + struct request_queue *q; + + shost_for_each_device(sdev, shost) { + q = sdev->request_queue; + + blk_mq_quiesce_queue(q); +

Re: [PATCH v2 3/7] [RFC] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code

2016-10-05 Thread Sagi Grimberg
Make nvme_requeue_req() check BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED. Remove the QUEUE_FLAG_STOPPED manipulations that became superfluous because of this change. This patch fixes a race condition: using queue_flag_clear_unlocked() is not safe if any other function that manipulates the

Re: [PATCH v2 1/7] blk-mq: Introduce blk_mq_queue_stopped()

2016-10-05 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg <s...@grimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 2/8] blk-mq: Remove a redundant assignment

2016-10-05 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg <s...@grimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/8] block: Get rid of unused request_queue::nr_queues member

2016-10-05 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg <s...@grimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH] softirq: Display IRQ_POLL for irq-poll statistics

2016-10-10 Thread Sagi Grimberg
This library was moved to the generic area and was renamed to irq-poll. Hence, update proc/softirqs output accordingly. Signed-off-by: Sagi Grimberg <s...@grimberg.me> --- kernel/softirq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/softirq.c b/kernel/sof

Re: [PATCH 0/3] iopmem : A block device for PCIe memory

2016-10-27 Thread Sagi Grimberg
You do realise that local filesystems can silently change the location of file data at any point in time, so there is no such thing as a "stable mapping" of file data to block device addresses in userspace? If you want remote access to the blocks owned and controlled by a filesystem, then you

Re: [PATCH 5/5] nvmet: add support for the Write Zeroes command

2016-11-16 Thread Sagi Grimberg
+static void nvmet_execute_write_zeroes(struct nvmet_req *req) +{ + struct nvme_write_zeroes_cmd *write_zeroes = >cmd->write_zeroes; + struct bio *bio = NULL; + u16 status = NVME_SC_SUCCESS; + sector_t sector; + sector_t nr_sector; + + sector =

Re: [PATCH 4/5] nvme: add support for the Write Zeroes command

2016-11-16 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg <s...@grimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/5] nvme.h: add Write Zeroes definitions

2016-11-16 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg <s...@grimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] irq-poll: Reduce local_irq_save/restore operations in irq_poll_softirq

2016-11-12 Thread Sagi Grimberg
splice to a local list (and splice back when done) so we won't need to enable/disable local_irq in each iteration. Signed-off-by: Sagi Grimberg <s...@grimberg.me> --- lib/irq_poll.c | 31 --- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git

[PATCH 0/3] micro-optimize irq-poll

2016-11-12 Thread Sagi Grimberg
Some useful patches I came up with when working on nvme irq-poll conversion (which still needs some work). Sagi Grimberg (3): irq-poll: Remove redundant include irq-poll: micro optimize some conditions irq-poll: Reduce local_irq_save/restore operations in irq_poll_softirq lib/irq_poll.c

Re: [PATCH 2/3] irq-poll: micro optimize some branch predictions

2016-11-13 Thread Sagi Grimberg
Are they really that unlikely? I don't like these annotations unless it's clearly an error path or they have a high, demonstrable benefit. IRQ_POLL_F_DISABLE is set when disabling the iop (in the end of the world). IRQ_POLL_F_SCHED is set on irq_poll_sched() itself so this cond would match

Re: [PATCH 3/3] irq-poll: Reduce local_irq_save/restore operations in irq_poll_softirq

2016-11-13 Thread Sagi Grimberg
+ while (!list_empty()) { Maybe do a list_first_entry_or_null here if you're touching the list iteration anyway? I can do that. + local_irq_disable(); + list_splice_tail_init(iop_list, ); + list_splice(, iop_list); + if (rearm)

Re: [RFC PATCH 5/6] nvme: Add unlock_from_suspend

2016-11-01 Thread Sagi Grimberg
+struct sed_cb_data { + sec_cb *cb; + void*cb_data; + struct nvme_command cmd; +}; + +static void sec_submit_endio(struct request *req, int error) +{ + struct sed_cb_data *sed_data = req->end_io_data; + + if (sed_data->cb) + sed_data->cb(error,

Re: [PATCH v5 05/14] blk-mq: Avoid that requeueing starts stopped queues

2016-11-01 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg <s...@grimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 06/14] blk-mq: Remove blk_mq_cancel_requeue_work()

2016-11-01 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg <s...@grimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/3] scsi: allow LLDDs to expose the queue mapping to blk-mq

2016-11-01 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg <s...@grimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 08/14] blk-mq: Add a kick_requeue_list argument to blk_mq_requeue_request()

2016-11-01 Thread Sagi Grimberg
Looks useful, Reviewed-by: Sagi Grimberg <s...@grimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 11/14] SRP transport: Move queuecommand() wait code to SCSI core

2016-11-01 Thread Sagi Grimberg
Again, Reviewed-by: Sagi Grimberg <s...@grimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 12/14] SRP transport, scsi-mq: Wait for .queue_rq() if necessary

2016-11-01 Thread Sagi Grimberg
and again, Reviewed-by: Sagi Grimberg <s...@grimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 13/14] nvme: Fix a race condition related to stopping queues

2016-11-01 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg <s...@grimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 11/12] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code

2016-10-27 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg <s...@grimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: A question regarding "multiple SGL"

2016-10-27 Thread Sagi Grimberg
Hi Robert, Hey Robert, Christoph, please explain your use cases that isn't handled. The one and only reason to set MSDBD to 1 is to make the code a lot simpler given that there is no real use case for supporting more. RDMA uses memory registrations to register large and possibly

Re: [PATCH 02/12] blk-mq: Introduce blk_mq_hctx_stopped()

2016-10-27 Thread Sagi Grimberg
Looks fine, Reviewed-by: Sagi Grimberg <s...@grimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 10/12] SRP transport, scsi-mq: Wait for .queue_rq() if necessary

2016-10-27 Thread Sagi Grimberg
Thanks for moving it, Reviewed-by: Sagi Grimberg <s...@grimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 04/12] blk-mq: Move more code into blk_mq_direct_issue_request()

2016-10-27 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg <s...@grimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 09/12] SRP transport: Move queuecommand() wait code to SCSI core

2016-10-27 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg <s...@grimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH][V3] nbd: add multi-connection support

2016-10-11 Thread Sagi Grimberg
NBD can become contended on its single connection. We have to serialize all writes and we can only process one read response at a time. Fix this by allowing userspace to provide multiple connections to a single nbd device. This coupled with block-mq drastically increases performance in

Re: [PATCH 5/6] blk-mq: Fix queue freeze deadlock

2017-01-13 Thread Sagi Grimberg
If hardware queues are stopped for some event, like the device has been suspended by power management, requests allocated on that hardware queue are indefinitely stuck causing a queue freeze to wait forever. I have a problem with this patch. IMO, this is a general issue so, so why do we tie a

Re: [PATCH 3/4] nvme: use blk_rq_payload_bytes

2017-01-13 Thread Sagi Grimberg
This looks good, Reviewed-by: Sagi Grimberg <s...@grimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/6] irq/affinity: Assign all online CPUs to vectors

2017-01-13 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg<s...@grimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/6] irq/affinity: Assign offline CPUs a vector

2017-01-13 Thread Sagi Grimberg
The offline CPUs need to assigned to something incase they come online later, otherwise anyone using the mapping for things other than affinity will have blank entries for that online CPU. I don't really like the idea behind it. Back when we came up with this code I had some discussion with

Re: [PATCH 3/6] nvme/pci: Start queues after tagset is updated

2017-01-13 Thread Sagi Grimberg
We need to leave the block queues stopped if we're changing the tagset's number of queues. Umm, Don't we need to fail these requests? What am I missing? Won't these requests block until timeout expiration and will trigger error recovery again? -- To unsubscribe from this list: send the line

Re: [PATCH 6/6] blk-mq: Remove unused variable

2017-01-13 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg <s...@grimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/4] block: add blk_rq_payload_bytes

2017-01-13 Thread Sagi Grimberg
Add a helper to calculate the actual data transfer size for special payload requests. Signed-off-by: Christoph Hellwig --- include/linux/blkdev.h | 13 + 1 file changed, 13 insertions(+) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index

Re: [PATCH 2/4] scsi: use blk_rq_payload_bytes

2017-01-13 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg <s...@grimberg.me> -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/6] blk-mq: Update queue map when changing queue count

2017-01-13 Thread Sagi Grimberg
); set->nr_hw_queues = nr_hw_queues; + if (set->ops->map_queues) + set->ops->map_queues(set); + else + blk_mq_map_queues(set); + Makes sense, Reviewed-by: Sagi Grimberg <s...@grimberg.me> -- To unsubscribe from this list: send

Re: [PATCH] nbd: create a recv workqueue per nbd device

2017-01-13 Thread Sagi Grimberg
Hey Josef, Since we are in the memory reclaim path we need our recv work to be on a workqueue that has WQ_MEM_RECLAIM set so we can avoid deadlocks. Also set WQ_HIGHPRI since we are in the completion path for IO. Really a workqueue per device?? Did this really give performance advantage? Can

Re: [PATCH] nbd: use an idr to keep track of nbd devices

2017-01-13 Thread Sagi Grimberg
Hey Josef, To prepare for dynamically adding new nbd devices to the system switch from using an array for the nbd devices and instead use an idr. This copies what loop does for keeping track of its devices. I think ida_simple_* is simpler and sufficient here isn't it? -- To unsubscribe from

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-12 Thread Sagi Grimberg
I'd like to attend LSF/MM and would like to discuss polling for block drivers. Currently there is blk-iopoll but it is neither as widely used as NAPI in the networking field and accoring to Sagi's findings in [1] performance with polling is not on par with IRQ usage. On LSF/MM I'd like to

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-12 Thread Sagi Grimberg
Hi all, I'd like to attend LSF/MM and would like to discuss polling for block drivers. Currently there is blk-iopoll but it is neither as widely used as NAPI in the networking field and accoring to Sagi's findings in [1] performance with polling is not on par with IRQ usage. On LSF/MM I'd

Re: [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-12 Thread Sagi Grimberg
I agree with Jens that we'll need some analysis if we want the discussion to be affective, and I can spend some time this if I can find volunteers with high-end nvme devices (I only have access to client nvme devices. I have a P3700 but somehow burned the FW. Let me see if I can bring it back

Re: [Lsf-pc] [LFS/MM TOPIC][LFS/MM ATTEND]: - Storage Stack and Driver Testing methodology.

2017-01-12 Thread Sagi Grimberg
Hi Folks, I would like to propose a general discussion on Storage stack and device driver testing. I think its very useful and needed. Purpose:- - The main objective of this discussion is to address the need for a Unified Test Automation Framework which can be used by

Re: [Lsf-pc] [LSF/MM TOPIC] [LSF/MM ATTEND] md raid general discussion

2017-01-12 Thread Sagi Grimberg
Hey Coly, Also I receive reports from users that raid1 performance is desired when it is built on NVMe SSDs as a cache (maybe bcache or dm-cache). I am working on some raid1 performance improvement (e.g. new raid1 I/O barrier and lockless raid1 I/O submit), and have some more ideas to discuss.

Re: [Lsf-pc] [LSF/MM TOPIC][LSF/MM ATTEND] NAPI polling for block drivers

2017-01-12 Thread Sagi Grimberg
**Note: when I ran multiple threads on more cpus the performance degradation phenomenon disappeared, but I tested on a VM with qemu emulation backed by null_blk so I figured I had some other bottleneck somewhere (that's why I asked for some more testing). That could be because of the vmexits

Re: [PATCH] nbd: use an idr to keep track of nbd devices

2017-01-14 Thread Sagi Grimberg
Hey Josef, To prepare for dynamically adding new nbd devices to the system switch from using an array for the nbd devices and instead use an idr. This copies what loop does for keeping track of its devices. I think ida_simple_* is simpler and sufficient here isn't it? I use more of the

Re: [PATCH] nbd: create a recv workqueue per nbd device

2017-01-14 Thread Sagi Grimberg
Hey Josef, Since we are in the memory reclaim path we need our recv work to be on a workqueue that has WQ_MEM_RECLAIM set so we can avoid deadlocks. Also set WQ_HIGHPRI since we are in the completion path for IO. Really a workqueue per device?? Did this really give performance advantage?

Re: [PATCH 2/2] blk-stat: add a poll_size value to the request_queue struct

2017-03-28 Thread Sagi Grimberg
On 26/03/17 05:18, sba...@raithlin.com wrote: From: Stephen Bates In order to bucket IO for the polling algorithm we use a sysfs entry to set the filter value. It is signed and we will use that as follows: 0 : No filtering. All IO are considered in stat generation >

Re: [RFC PATCH 00/28] INFINIBAND NETWORK BLOCK DEVICE (IBNBD)

2017-03-26 Thread Sagi Grimberg
This series introduces IBNBD/IBTRS kernel modules. IBNBD (InfiniBand network block device) allows for an RDMA transfer of block IO over InfiniBand network. The driver presents itself as a block device on client side and transmits the block requests in a zero-copy fashion to the server-side via

Re: [PATCH] block: do not put mq context in blk_mq_alloc_request_hctx

2017-03-30 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg <s...@grimberg.me>

Re: kmemleak complaints on request queue stats (virtio)

2017-03-29 Thread Sagi Grimberg
You don't mention what you are running? But I'm assuming it was my 4.12 branch. Ehh, details... If so, this is fixed in a later revision of it. If you pull an update, it should go away. Will try, thanks Jens.

[PATCH] blk-mq-pci: Fix two spelling mistakes

2017-03-29 Thread Sagi Grimberg
Signed-off-by: Sagi Grimberg <s...@grimberg.me> --- block/blk-mq-pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-mq-pci.c b/block/blk-mq-pci.c index 966c2169762e..0c3354cf3552 100644 --- a/block/blk-mq-pci.c +++ b/block/blk-mq-pci.c @@ -23,7 +23,7 @@ *

Re: [PATCH] block-mq: don't re-queue if we get a queue error

2017-03-29 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg <s...@grimberg.me>

Re: [PATCH rfc 10/10] target: Use non-selective polling

2017-03-21 Thread Sagi Grimberg
Hey Sagi, Hey Nic Let's make 'batch' into a backend specific attribute so it can be changed on-the-fly per device, instead of a hard-coded value. Here's a quick patch to that end. Feel free to fold it into your series. I will, thanks!

Re: [PATCH 3/3] scsi: Ensure that scsi_run_queue() runs all hardware queues

2017-04-02 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg <s...@grimberg.me>

[PATCH rfc 2/6] mlx5: move affinity hints assignments to generic code

2017-04-02 Thread Sagi Grimberg
). The affinity assignments should match what mlx5 tried to do earlier but now we do not set affinity to async, cmd and pages dedicated vectors. Signed-off-by: Sagi Grimberg <s...@grimberg.me> --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 +- drivers/net/ethernet/mellanox/mlx5/core/

[PATCH rfc 1/6] mlx5: convert to generic pci_alloc_irq_vectors

2017-04-02 Thread Sagi Grimberg
Now that we have a generic code to allocate an array of irq vectors and even correctly spread their affinity, correctly handle cpu hotplug events and more, were much better off using it. Signed-off-by: Sagi Grimberg <s...@grimberg.me> --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c

[PATCH rfc 4/6] mlx5: support ->get_vector_affinity

2017-04-02 Thread Sagi Grimberg
Simply refer to the generic affinity mask helper. Signed-off-by: Sagi Grimberg <s...@grimberg.me> --- drivers/infiniband/hw/mlx5/main.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 4dc0a8

  1   2   3   4   5   >