Re: [dm-devel] [RFC PATCH] dm: fix excessive dm-mq context switching

2016-02-08 Thread Sagi Grimberg
The perf report is very similar to the one that started this effort.. I'm afraid we'll need to resolve the per-target m->lock in order to scale with NUMA... Could be. Just for testing, you can try the 2 topmost commits I've put here (once applied both __multipath_map and multipath_busy

Re: [dm-devel] [RFC PATCH] dm: fix excessive dm-mq context switching

2016-02-07 Thread Sagi Grimberg
Hello Sagi, Hey Bart, Did you run your test on a NUMA system ? I did. If so, can you check with e.g. perf record -ags -e LLC-load-misses sleep 10 && perf report whether this workload triggers perhaps lock contention ? What you need to look for in the perf output is whether any functions

Re: [dm-devel] [RFC PATCH] dm: fix excessive dm-mq context switching

2016-02-07 Thread Sagi Grimberg
If so, can you check with e.g. perf record -ags -e LLC-load-misses sleep 10 && perf report whether this workload triggers perhaps lock contention ? What you need to look for in the perf output is whether any functions occupy more than 10% CPU time. I will, thanks for the tip! The perf

Re: [dm-devel] [RFC PATCH] dm: fix excessive dm-mq context switching

2016-02-07 Thread Sagi Grimberg
Hi Mike, So I gave your patches a go (dm-4.6) but I still don't see the improvement you reported (while I do see a minor improvement). null_blk queue_mode=2 submit_queues=24 dm_mod blk_mq_nr_hw_queues=24 blk_mq_queue_depth=4096 use_blk_mq=Y I see 620K IOPs on dm_mq vs. 1750K IOPs on raw

Re: [dm-devel] NVMeoF multi-path setup

2016-07-13 Thread Sagi Grimberg
On 01/07/16 01:52, Mike Snitzer wrote: On Thu, Jun 30 2016 at 5:57pm -0400, Ming Lin wrote: On Thu, 2016-06-30 at 14:08 -0700, Ming Lin wrote: Hi Mike, I'm trying to test NVMeoF multi-path. root@host:~# lsmod |grep dm_multipath dm_multipath 24576 0

Re: [dm-devel] hch's native NVMe multipathing [was: Re: [PATCH 1/2] Don't blacklist nvme]

2017-02-16 Thread Sagi Grimberg
I'm fine with the path selectors getting moved out; maybe it'll encourage new path selectors to be developed. But there will need to be some userspace interface stood up to support your native NVMe multipathing (you may not think it needed but think in time there will be a need to configure

Re: [dm-devel] [for-4.16 PATCH 4/5] dm mpath: use NVMe error handling to know when an error is retryable

2017-12-20 Thread Sagi Grimberg
But interestingly, with my "mptest" link failure test (test_01_nvme_offline) I'm not actually seeing NVMe trigger a failure that needs a multipath layer (be it NVMe multipath or DM multipath) to fail a path and retry the IO. The pattern is that the link goes down, and nvme waits for it to come

Re: [dm-devel] [PATCH V14 00/18] block: support multi-page bvec

2019-01-21 Thread Sagi Grimberg
V14: - drop patch(patch 4 in V13) for renaming bvec helpers, as suggested by Jens - use mp_bvec_* as multi-page bvec helper name - fix one build issue, which is caused by missing one converion of bio_for_each_segment_all in fs/gfs2 - fix one 32bit ARCH

Re: [dm-devel] [PATCH V10 09/19] block: introduce bio_bvecs()

2018-11-20 Thread Sagi Grimberg
The only user in your final tree seems to be the loop driver, and even that one only uses the helper for read/write bios. I think something like this would be much simpler in the end: The recently submitted nvme-tcp host driver should also be a user of this. Does it make sense to keep it as

Re: [dm-devel] [PATCH V10 09/19] block: introduce bio_bvecs()

2018-11-20 Thread Sagi Grimberg
Yeah, that is the most common example, given merge is enabled in most of cases. If the driver or device doesn't care merge, you can disable it and always get single bio request, then the bio's bvec table can be reused for send(). Does bvec_iter span bvecs with your patches? I didn't see that

Re: [dm-devel] [PATCH V10 09/19] block: introduce bio_bvecs()

2018-11-20 Thread Sagi Grimberg
I would like to avoid growing bvec tables and keep everything preallocated. Plus, a bvec_iter operates on a bvec which means we'll need a table there as well... Not liking it so far... In case of bios in one request, we can't know how many bvecs there are except for calling rq_bvecs(), so it

Re: [dm-devel] [PATCH V10 09/19] block: introduce bio_bvecs()

2018-11-20 Thread Sagi Grimberg
Wait, I see that the bvec is still a single array per bio. When you said a table I thought you meant a 2-dimentional array... I mean a new 1-d table A has to be created for multiple bios in one rq, and build it in the following way rq_for_each_bvec(tmp, rq, rq_iter)

Re: [dm-devel] [PATCH V10 09/19] block: introduce bio_bvecs()

2018-11-20 Thread Sagi Grimberg
Not sure I understand the 'blocking' problem in this case. We can build a bvec table from this req, and send them all in send(), I would like to avoid growing bvec tables and keep everything preallocated. Plus, a bvec_iter operates on a bvec which means we'll need a table there as well...

Re: [dm-devel] [PATCH V10 09/19] block: introduce bio_bvecs()

2018-11-19 Thread Sagi Grimberg
The only user in your final tree seems to be the loop driver, and even that one only uses the helper for read/write bios. I think something like this would be much simpler in the end: The recently submitted nvme-tcp host driver should also be a user of this. Does it make sense to keep it as

Re: [dm-devel] [PATCH 3/3] nvme: don't call revalidate_disk from nvme_set_queue_dying

2020-08-24 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [PATCH 1/3] block: replace bd_set_size with bd_set_nr_sectors

2020-08-24 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] [PATCH 2/3] block: fix locking for struct block_device size updates

2020-08-24 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel

Re: [dm-devel] nvme: restore use of blk_path_error() in nvme_complete_rq()

2020-08-06 Thread Sagi Grimberg
Hey Mike, The point is: blk_path_error() has nothing to do with NVMe errors. This is dm-multipath logic stuck in the middle of the NVMe error handling code. No, it is a means to have multiple subsystems (to this point both SCSI and NVMe) doing the correct thing when translating subsystem

Re: [dm-devel] nvme: restore use of blk_path_error() in nvme_complete_rq()

2020-08-07 Thread Sagi Grimberg
Hey Mike, The point is: blk_path_error() has nothing to do with NVMe errors. This is dm-multipath logic stuck in the middle of the NVMe error handling code. No, it is a means to have multiple subsystems (to this point both SCSI and NVMe) doing the correct thing when translating subsystem

Re: [dm-devel] [RESEND PATCH] nvme: explicitly use normal NVMe error handling when appropriate

2020-08-14 Thread Sagi Grimberg
+ switch (nvme_req_disposition(req)) { + case COMPLETE: + nvme_complete_req(req); nvme_complete_rq calling nvme_complete_req... Maybe call it __nvme_complete_rq instead? That's what I had first, but it felt so strangely out of place next to the other nvme_*_req

Re: [dm-devel] [RESEND PATCH] nvme: explicitly use normal NVMe error handling when appropriate

2020-08-14 Thread Sagi Grimberg
+static inline enum nvme_disposition nvme_req_disposition(struct request *req) +{ + if (likely(nvme_req(req)->status == 0)) + return COMPLETE; + + if (blk_noretry_request(req) || + (nvme_req(req)->status & NVME_SC_DNR) || + nvme_req(req)->retries

Re: [dm-devel] [PATCH 03/24] nvme: let set_capacity_revalidate_and_notify update the bdev size

2020-11-09 Thread Sagi Grimberg
[ .. ] Originally nvme multipath would update/change the size of the multipath device according to the underlying path devices. With this patch the size of the multipath device will _not_ change if there is a change on the underlying devices. Yes, it will.  Take a close look at

Re: [dm-devel] [RFC PATCH V2 09/13] block: use per-task poll context to implement bio based io poll

2021-03-22 Thread Sagi Grimberg
+static void blk_bio_poll_post_submit(struct bio *bio, blk_qc_t cookie) +{ + bio->bi_iter.bi_private_data = cookie; +} + Hey Ming, thinking about nvme-mpath, I'm thinking that this should be an exported function for failover. nvme-mpath updates bio.bi_dev when re-submitting I/Os to an

Re: [dm-devel] [RFC PATCH V2 09/13] block: use per-task poll context to implement bio based io poll

2021-03-24 Thread Sagi Grimberg
Well, when it will failover, it will probably be directed to the poll queues. Maybe I'm missing something... In this patchset, because it isn't submitted directly from FS, there isn't one polling context associated with this bio, so its HIPRI flag will be cleared, then fallback to irq mode.

Re: [dm-devel] [RFC PATCH V2 09/13] block: use per-task poll context to implement bio based io poll

2021-03-23 Thread Sagi Grimberg
+static void blk_bio_poll_post_submit(struct bio *bio, blk_qc_t cookie) +{ + bio->bi_iter.bi_private_data = cookie; +} + Hey Ming, thinking about nvme-mpath, I'm thinking that this should be an exported function for failover. nvme-mpath updates bio.bi_dev when re-submitting I/Os to an

Re: [dm-devel] [PATCH 09/11] nvme: remove a spurious clear of discard_alignment

2022-04-26 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel

Re: [PATCH for-6.2/block V3 1/2] block: Data type conversion for IO accounting

2022-12-25 Thread Sagi Grimberg
On 12/21/22 06:05, Gulam Mohamed wrote: Change the data type of start and end time IO accounting variables in, block layer, from "unsigned long" to "u64". This is to enable nano-seconds granularity, in next commit, for the devices whose latency is less than milliseconds. Changes from V2 to