Re: [PATCH v2] nvme-tcp: Check if request has started before processing it

2021-04-01 Thread Sagi Grimberg
request tag can't be zero? I forget... Of course it can. But the reserved tags are before the normal tags, so 0 would be a reserved tag for nvme. Right.

Re: [PATCH v2] nvme-tcp: Check if request has started before processing it

2021-03-31 Thread Sagi Grimberg
It is, but in this situation, the controller is sending a second completion that results in a use-after-free, which makes the transport irrelevant. Unless there is some other flow (which is unclear to me) that causes this which is a bug that needs to be fixed rather than hidden with a

Re: [PATCH v2] nvme-tcp: Check if request has started before processing it

2021-03-31 Thread Sagi Grimberg
What we can do, though, is checking the 'state' field in the tcp request, and only allow completions for commands which are in a state allowing for completions. Let's see if I can whip up a patch. That would be great. BTW in the crash dump I am looking at now, it looks like pdu->command_id

Re: [PATCH] nvme: Export fast_io_fail_tmo to sysfs

2021-03-31 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg

Re: [PATCH v2] nvme-tcp: Check if request has started before processing it

2021-03-30 Thread Sagi Grimberg
It is, but in this situation, the controller is sending a second completion that results in a use-after-free, which makes the transport irrelevant. Unless there is some other flow (which is unclear to me) that causes this which is a bug that needs to be fixed rather than hidden with a

Re: [PATCH v2] nvme: disallow passthru cmd from targeting a nsid != nsid of the block dev

2021-03-26 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg

Re: [PATCH v2] infiniband: Fix a use after free in isert_connect_request

2021-03-22 Thread Sagi Grimberg
Acked-by: Sagi Grimberg

Re: [PATCH] nvme/rdma: Fix a use after free in nvmet_rdma_write_data_done

2021-03-15 Thread Sagi Grimberg
In nvmet_rdma_write_data_done, rsp is recoverd by wc->wr_cqe and freed by nvmet_rdma_release_rsp(). But after that, pr_info() used the freed chunk's member object and could leak the freed chunk address with wc->wr_cqe by computing the offset. Signed-off-by: Lv Yunlong ---

Re: [PATCH v2] nvme-tcp: Check if request has started before processing it

2021-03-15 Thread Sagi Grimberg
Hi Sagi, On Fri, Mar 05, 2021 at 11:57:30AM -0800, Sagi Grimberg wrote: Daniel, again, there is nothing specific about this to nvme-tcp, this is a safeguard against a funky controller (or a different bug that is hidden by this). As far I can tell, the main difference between nvme-tcp

Re: [PATCH v2] nvme-tcp: Check if request has started before processing it

2021-03-05 Thread Sagi Grimberg
blk_mq_tag_to_rq() always returns a request if the tag id is in a valid range [0...max_tags). If the target replies with a tag for which we don't have a request but it's not started, the host will likely corrupt data or simply crash. Add an additional check if the a request has been started

Re: [PATCH] nvme-tcp: Check if request has started before processing it

2021-02-15 Thread Sagi Grimberg
blk_mq_tag_to_rq() will always return a request if the command_id is in the valid range. Check if the request has been started. If we blindly process the request we might double complete a request which can be fatal. How did you get to this one? did the controller send a completion for a

Re: [PATCH] nvme-tcp: Check if request has started before processing it

2021-02-15 Thread Sagi Grimberg
blk_mq_tag_to_rq() will always return a request if the command_id is in the valid range. Check if the request has been started. If we blindly process the request we might double complete a request which can be fatal. How did you get to this one? did the controller send a completion for a

Re: [PATCH] nvme-tcp: Check if request has started before processing it

2021-02-12 Thread Sagi Grimberg
blk_mq_tag_to_rq() will always return a request if the command_id is in the valid range. Check if the request has been started. If we blindly process the request we might double complete a request which can be fatal. How did you get to this one? did the controller send a completion for a

Re: [PATCH] nvme-tcp: Check if request has started before processing it

2021-02-12 Thread Sagi Grimberg
blk_mq_tag_to_rq() will always return a request if the command_id is in the valid range. Check if the request has been started. If we blindly process the request we might double complete a request which can be fatal. How did you get to this one? did the controller send a completion for a

Re: [PATCH] nvme: convert sysfs sprintf/snprintf family to sysfs_emit

2021-02-03 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg

Re: [PATCH v2] nvme-multipath: Early exit if no path is available

2021-01-28 Thread Sagi Grimberg
You can't see exactly where it dies but I followed the assembly to nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head, old) which returns NULL but nvme_next_ns() is returning NULL eventually (list_next_or_null_rcu()). So there is other bug cause nvme_next_ns abormal. I

Re: [PATCH v2] nvme-multipath: Early exit if no path is available

2021-01-28 Thread Sagi Grimberg
You can't see exactly where it dies but I followed the assembly to nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head, old) which returns NULL but nvme_next_ns() is returning NULL eventually (list_next_or_null_rcu()). So there is other bug cause nvme_next_ns abormal. I

Re: [PATCH] nvme: reject the ns when the block size is smaller than a sector

2021-01-14 Thread Sagi Grimberg
But this only catches a physical block size < 512 for NVMe, not any other block device. Please fix it for the general case in blk_queue_physical_block_size(). We actually call that later and would probably be better to check here.. We had a series for that a short while ago that got

Re: [PATCH] nvme: Constify static attribute_group structs

2021-01-13 Thread Sagi Grimberg
FWIW, Reviewed-by: Sagi Grimberg

Re: [PATCH] nvme: reject the ns when the block size is smaller than a sector

2021-01-13 Thread Sagi Grimberg
The nvme spec(1.4a, figure 248) says: "A value smaller than 9 (i.e., 512 bytes) is not supported." Signed-off-by: Li Feng --- drivers/nvme/host/core.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index

Re: [PATCH] iosched: Add i10 I/O Scheduler

2020-11-30 Thread Sagi Grimberg
Dear all: Thanks, again, for the very constructive decisions. I am writing back with quite a few updates: 1. We have now included a detailed comparison of i10 scheduler with Kyber with NVMe-over-TCP (https://github.com/i10-kernel/upstream-linux/blob/master/i10-evaluation.pdf). In a

Re: [PATCH] iosched: Add i10 I/O Scheduler

2020-11-13 Thread Sagi Grimberg
But if you think this has a better home, I'm assuming that the guys will be open to that. Also see the reply from Ming. It's a balancing act - don't want to add extra overhead to the core, but also don't want to carry an extra scheduler if the main change is really just variable dispatch

Re: [PATCH] iosched: Add i10 I/O Scheduler

2020-11-13 Thread Sagi Grimberg
But if you think this has a better home, I'm assuming that the guys will be open to that. Also see the reply from Ming. It's a balancing act - don't want to add extra overhead to the core, but also don't want to carry an extra scheduler if the main change is really just variable dispatch

Re: [PATCH] iosched: Add i10 I/O Scheduler

2020-11-13 Thread Sagi Grimberg
I haven't taken a close look at the code yet so far, but one quick note that patches like this should be against the branches for 5.11. In fact, this one doesn't even compile against current -git, as blk_mq_bio_list_merge is now called blk_bio_list_merge. Ugh, I guess that Jaehyun had this

Re: [PATCH] iosched: Add i10 I/O Scheduler

2020-11-13 Thread Sagi Grimberg
blk-mq actually has built-in batching(or sort of) mechanism, which is enabled if the hw queue is busy(hctx->dispatch_busy is > 0). We use EWMA to compute hctx->dispatch_busy, and it is adaptive, even though the implementation is quite coarse. But there should be much space to improve, IMO.

Re: [PATCH] iosched: Add i10 I/O Scheduler

2020-11-13 Thread Sagi Grimberg
I haven't taken a close look at the code yet so far, but one quick note that patches like this should be against the branches for 5.11. In fact, this one doesn't even compile against current -git, as blk_mq_bio_list_merge is now called blk_bio_list_merge. Ugh, I guess that Jaehyun had this

Re: [PATCH 3/3] blk-mq: Use llist_head for blk_cpu_done

2020-11-04 Thread Sagi Grimberg
There really aren't any rules for this, and it's perfectly legit to complete from process context. Maybe you're a kthread driven driver and that's how you handle completions. The block completion path has always been hard IRQ safe, but possible to call from anywhere. I'm not trying to put

Re: [PATCH -next] IB/isert: Fix warning Comparison to bool

2020-11-03 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg

Re: [PATCH 3/3] blk-mq: Use llist_head for blk_cpu_done

2020-10-29 Thread Sagi Grimberg
Well, usb-storage obviously seems to do it, and the block layer does not prohibit it. Also loop, nvme-tcp and then I stopped looking. Any objections about adding local_bh_disable() around it? To me it seems like the whole IPI plus potentially softirq dance is a little pointless when

Re: [PATCH 3/3] blk-mq: Use llist_head for blk_cpu_done

2020-10-29 Thread Sagi Grimberg
Well, usb-storage obviously seems to do it, and the block layer does not prohibit it. Also loop, nvme-tcp and then I stopped looking. Any objections about adding local_bh_disable() around it? To me it seems like the whole IPI plus potentially softirq dance is a little pointless when

Re: [PATCH 1/2] nvmet: introduce transport layer keep-alive

2020-10-28 Thread Sagi Grimberg
On 10/27/20 5:15 AM, zhenwei pi wrote: In the zero KATO scenario, if initiator crashes without transport layer disconnection, target side would never reclaim resources. A target could start transport layer keep-alive to detect dead connection for the admin queue. Not sure why we should

Re: [PATCH v3] nvme-rdma: handle nvme completion data length

2020-10-26 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg

Re: [PATCH v2] nvme-rdma: handle nvme completion data length

2020-10-23 Thread Sagi Grimberg
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 9e378d0a0c01..2ecadd309f4a 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -1767,6 +1767,21 @@ static void nvme_rdma_recv_done(struct ib_cq *cq, struct ib_wc *wc) return; }

Re: [PATCH v2] nvmet: fix uninitialized work for zero kato

2020-10-14 Thread Sagi Grimberg
Fixes: Don't run keep alive work with zero kato. "Fixes" tags need to have a git commit id followed by the commit subject. I can't find any commit with that subject, though. Fixes: 0d3b6a8d213a ("nvmet: Disable keep-alive timer when kato is cleared to 0h")

Re: [PATCH] nvmet: fix uninitialized work for zero kato

2020-10-13 Thread Sagi Grimberg
Hit a warning: WARNING: CPU: 1 PID: 241 at kernel/workqueue.c:1627 __queue_delayed_work+0x6d/0x90 with trace: mod_delayed_work_on+0x59/0x90 nvmet_update_cc+0xee/0x100 [nvmet] nvmet_execute_prop_set+0x72/0x80 [nvmet] nvmet_tcp_try_recv_pdu+0x2f7/0x770 [nvmet_tcp]

Re: [PATCH blk-next 1/2] blk-mq-rdma: Delete not-used multi-queue RDMA map queue code

2020-10-02 Thread Sagi Grimberg
Yes, basically usage of managed affinity caused people to report regressions not being able to change irq affinity from procfs. Well, why would they change it? The whole point of the infrastructure is that there is a single sane affinity setting for a given setup. Now that setting needed

Re: [PATCH blk-next 1/2] blk-mq-rdma: Delete not-used multi-queue RDMA map queue code

2020-09-29 Thread Sagi Grimberg
..@ziepe.ca/ commit 759ace7832802eaefbca821b2b43a44ab896b449 Author: Sagi Grimberg Date: Thu Nov 1 13:08:07 2018 -0700 i40iw: remove support for ib_get_vector_affinity commit 9afc97c29b032af9a4112c2f4a02d5313b4dc71f Author: Sagi Grimberg Date: Thu Nov 1 09:13:12 2018 -0700

Re: [PATCH] [v2] nvme: replace meaningless judgement by checking whether req is null

2020-09-29 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg

Re: [PATCH AUTOSEL 4.19 127/206] nvme: Fix controller creation races with teardown flow

2020-09-18 Thread Sagi Grimberg
This causes a regression and was reverted upstream, just FYI. On 9/17/20 7:06 PM, Sasha Levin wrote: From: Israel Rukshin [ Upstream commit ce1518139e6976cf19c133b555083354fdb629b8 ] Calling nvme_sysfs_delete() when the controller is in the middle of creation may cause several bugs. If the

Re: [PATCH] nvme: tcp: fix kconfig dependency warning when !CRYPTO

2020-09-14 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg

Re: [PATCH v2] nvme-pci: cancel nvme device request before disabling

2020-08-28 Thread Sagi Grimberg
Added to nvme-5.9-rc

Re: [PATCH] nvmet-fc: Fix a missed _irqsave version of spin_lock in 'nvmet_fc_fod_op_done()'

2020-08-24 Thread Sagi Grimberg
The way 'spin_lock()' and 'spin_lock_irqsave()' are used is not consistent in this function. Use 'spin_lock_irqsave()' also here, as there is no guarantee that interruptions are disabled at that point, according to surrounding code. Fixes: a97ec51b37ef ("nvmet_fc: Rework target side abort

Re: [PATCH 2/3] block: fix locking for struct block_device size updates

2020-08-24 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg

Re: [PATCH 3/3] nvme: don't call revalidate_disk from nvme_set_queue_dying

2020-08-24 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg

Re: [PATCH 1/3] block: replace bd_set_size with bd_set_nr_sectors

2020-08-24 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg

Re: [PATCH] nvme-pci: cancel nvme device request before disabling

2020-08-14 Thread Sagi Grimberg
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index ba725ae47305..c4f1ce0ee1e3 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -1249,8 +1249,8 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)

Re: [PATCH] nvme-pci: Use u32 for nvme_dev.q_depth and nvme_queue.q_depth

2020-08-14 Thread Sagi Grimberg
queued to nvme-5.9-rc

Re: [PATCH v2] nvme: Use spin_lock_irq() when taking the ctrl->lock

2020-08-14 Thread Sagi Grimberg
There's an unrelated whitespace change in nvme_init_identify(). Otherwise, looks fine. Oops, sorry. can this be fixed up when it's merged? Fixed and queued.

Re: [PATCH] nvmet: fix opps in nvmet_execute_passthru_cmd()

2020-08-05 Thread Sagi Grimberg
This patch adds a check in nvmet_execute_passthru_cmd() to prevent the following oops :- Why LKML and not linux-nvme? My bad (+linux-nvme). It doesn't have the patch. can you resend?

Re: [PATCH] nvmet: fix opps in nvmet_execute_passthru_cmd()

2020-08-05 Thread Sagi Grimberg
This patch adds a check in nvmet_execute_passthru_cmd() to prevent the following oops :- Why LKML and not linux-nvme?

Re: [PATCH] nvmet-passthru: Reject commands with non-sgl flags set

2020-08-03 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg

Re: [IB/srpt] c804af2c1d: last_state.test.blktests.exit_code.143

2020-08-03 Thread Sagi Grimberg
Greeting, FYI, we noticed the following commit (built with gcc-9): commit: c804af2c1d3152c0cf877eeb50d60c2d49ac0cf0 ("IB/srpt: use new shared CQ mechanism") https://git.kernel.org/cgit/linux/kernel/git/rdma/rdma.git for-next in testcase: blktests with following parameters: test:

Re: [PATCH v15 7/9] nvmet-passthru: Add passthru code to process commands

2020-07-22 Thread Sagi Grimberg
Thanks for the review Christoph. I think I should be able to make all the requested changes in the next week or two. On 2020-07-20 1:35 p.m., Sagi Grimberg wrote: I'm still not so happy about having to look up the namespace and still wonder if we should generalize the connect_q

Re: [PATCH v15 7/9] nvmet-passthru: Add passthru code to process commands

2020-07-20 Thread Sagi Grimberg
On Mon, Jul 20, 2020 at 05:01:19PM -0600, Logan Gunthorpe wrote: On 2020-07-20 4:35 p.m., Sagi Grimberg wrote: passthru commands are in essence REQ_OP_DRV_IN/REQ_OP_DRV_OUT, which means that the driver shouldn't need the ns at all. So if you have a dedicated request queue (mapped to the I/O

Re: [PATCH v15 7/9] nvmet-passthru: Add passthru code to process commands

2020-07-20 Thread Sagi Grimberg
On 7/20/20 4:17 PM, Keith Busch wrote: On Mon, Jul 20, 2020 at 05:01:19PM -0600, Logan Gunthorpe wrote: On 2020-07-20 4:35 p.m., Sagi Grimberg wrote: passthru commands are in essence REQ_OP_DRV_IN/REQ_OP_DRV_OUT, which means that the driver shouldn't need the ns at all. So if you have

Re: [PATCH v15 7/9] nvmet-passthru: Add passthru code to process commands

2020-07-20 Thread Sagi Grimberg
Thanks for the review Christoph. I think I should be able to make all the requested changes in the next week or two. On 2020-07-20 1:35 p.m., Sagi Grimberg wrote: I'm still not so happy about having to look up the namespace and still wonder if we should generalize the connect_q

Re: [PATCH v15 7/9] nvmet-passthru: Add passthru code to process commands

2020-07-20 Thread Sagi Grimberg
Thanks for the review Christoph. I think I should be able to make all the requested changes in the next week or two. On 2020-07-20 1:35 p.m., Sagi Grimberg wrote: I'm still not so happy about having to look up the namespace and still wonder if we should generalize the connect_q

Re: [PATCH v15 7/9] nvmet-passthru: Add passthru code to process commands

2020-07-20 Thread Sagi Grimberg
Add passthru command handling capability for the NVMeOF target and export passthru APIs which are used to integrate passthru code with nvmet-core. The new file passthru.c handles passthru cmd parsing and execution. In the passthru mode, we create a block layer request from the nvmet request

Re: [PATCH 5/5] nvme-pci: Use standard block status macro

2020-07-05 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg

Re: [PATCH 4/5] nvme-pci: Use the consistent return type of nvme_pci_iod_alloc_size()

2020-07-05 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg

Re: [PATCH 2/5] nvme-pci: Add a blank line after declarations

2020-07-05 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg

Re: [PATCH 1/5] nvme-pci: Fix some comments issues

2020-07-05 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg

Re: [PATCH 4.19 082/131] nvme: fix possible deadlock when I/O is blocked

2020-07-02 Thread Sagi Grimberg
Hi! From: Sagi Grimberg [ Upstream commit 3b4b19721ec652ad2c4fe51dfbe5124212b5f581 ] Revert fab7772bfbcf ("nvme-multipath: revalidate nvme_ns_head gendisk in nvme_validate_ns") When adding a new namespace to the head disk (via nvme_mpath_set_live) we will see partition

Re: [PATCH 3/3] nvme: Use USEC_PER_SEC instead of magic numbers

2020-06-23 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg

Re: [PATCH 1/3] nvme: Add Arbitration Burst support

2020-06-23 Thread Sagi Grimberg
From the NVMe spec, "In order to make efficient use of the non-volatile memory, it is often advantageous to execute multiple commands from a Submission Queue in parallel. For Submission Queues that are using weighted round robin with urgent priority class or round robin arbitration, host

Re: [PATCH 2/2] nvme-pci: remove empty line following nvme_should_reset()

2020-06-08 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg

Re: [PATCH 1/2] nvmet-loop: remove unused 'target_ctrl' in nvme_loop_ctrl

2020-06-08 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg

Re: [PATCH] nvme: do not call del_gendisk() on a disk that was never added

2020-06-07 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg

Re: [PATCH 1/1] nvme-fcloop: verify wwnn and wwpn format

2020-06-04 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg

Re: [PATCH] nvme-tcp: constify static struct blk_mq_ops

2020-05-28 Thread Sagi Grimberg
Acked-by: Sagi Grimberg

Re: remove kernel_setsockopt and kernel_getsockopt

2020-05-13 Thread Sagi Grimberg
looks quite promising: 42 files changed, 721 insertions(+), 799 deletions(-) For the nvme-tcp bits, Acked-by: Sagi Grimberg

Re: [RFC PATCH v2 1/7] block: Extand commit_rqs() to do batch processing

2020-05-12 Thread Sagi Grimberg
devices will benefit from the batching so maybe the flag needs to be inverted? BLK_MQ_F_DONT_BATCHING_SUBMISSION? Actually I'd rather to not add any flag, and we may use some algorithm (maybe EWMA or other intelligent stuff, or use hctx->dispatch_busy directly) to figure out one dynamic

Re: [RFC PATCH v2 1/7] block: Extand commit_rqs() to do batch processing

2020-05-11 Thread Sagi Grimberg
Basically, my idea is to dequeue request one by one, and for each dequeued request: - we try to get a budget and driver tag, if both succeed, add the request to one per-task list which can be stored in stack variable, then continue to dequeue more request - if either budget or driver tag

Re: [RFC PATCH v2 1/7] block: Extand commit_rqs() to do batch processing

2020-05-10 Thread Sagi Grimberg
You're mostly correct. This is exactly why an I/O scheduler may be applicable here IMO. Mostly because I/O schedulers tend to optimize for something specific and always present tradeoffs. Users need to understand what they are optimizing for. Hence I'd say this functionality can definitely be

Re: [RFC PATCH v2 1/7] block: Extand commit_rqs() to do batch processing

2020-05-08 Thread Sagi Grimberg
Hey Ming, Would it make sense to elevate this flag to a request_queue flag (QUEUE_FLAG_ALWAYS_COMMIT)? request queue flag usually is writable, however this case just needs one read-only flag, so I think it may be better to make it as tagset/hctx flag. I actually intended it to be writable.

Re: [RFC PATCH v2 1/7] block: Extand commit_rqs() to do batch processing

2020-05-08 Thread Sagi Grimberg
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index f389d7c724bd..6a20f8e8eb85 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -391,6 +391,7 @@ struct blk_mq_ops { enum { BLK_MQ_F_SHOULD_MERGE = 1 << 0, BLK_MQ_F_TAG_SHARED = 1 << 1, +

Re: [PATCH] nvme: fix uninitialized return of ret when sysfs_create_link fails

2019-10-04 Thread Sagi Grimberg
This was already fixed and merged (by Dan) On 10/2/19 5:43 AM, Colin King wrote: From: Colin Ian King Currently when the call to sysfs_create_link fails the error exit path returns an uninitialized value in variable ret. Fix this by returning the error code returned from the failed call to

Re: [PATCH v3] nvme: allow 64-bit results in passthru commands

2019-09-24 Thread Sagi Grimberg
Looks fine to me, Reviewed-by: Sagi Grimberg Keith, Christoph?

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-20 Thread Sagi Grimberg
Sagi, Sorry it took a while to bring my system back online. With the patch, the IOPS is about the same drop with the 1st patch. I think the excessive context switches are causing the drop in IOPS. The following are captured by "perf sched record" for 30 seconds during tests. "perf

Re: [PATCH v2] nvme-pci: Save PCI state before putting drive into deepest state

2019-09-20 Thread Sagi Grimberg
Keith, can we get a review from you for this?

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-20 Thread Sagi Grimberg
Hey Ming, Ok, so the real problem is per-cpu bounded tasks. I share Thomas opinion about a NAPI like approach. We already have that, its irq_poll, but it seems that for this use-case, we get lower performance for some reason. I'm not entirely sure why that is, maybe its because we need to

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-20 Thread Sagi Grimberg
It seems like we're attempting to stay in irq context for as long as we can instead of scheduling to softirq/thread context if we have more than a minimal amount of work to do. Without at least understanding why softirq/thread degrades us so much this code seems like the wrong approach to me.

Re: [PATCH] Added QUIRKs for ADATA XPG SX8200 Pro 512GB

2019-09-11 Thread Sagi Grimberg
This does not apply on nvme-5.4, can you please respin a patch that cleanly applies?

Re: [PATCH] Added QUIRKs for ADATA XPG SX8200 Pro 512GB

2019-09-11 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-09 Thread Sagi Grimberg
Hey Ming, Ok, so the real problem is per-cpu bounded tasks. I share Thomas opinion about a NAPI like approach. We already have that, its irq_poll, but it seems that for this use-case, we get lower performance for some reason. I'm not entirely sure why that is, maybe its because we need to

Re: [PATCH 1/4] softirq: implement IRQ flood detection mechanism

2019-09-06 Thread Sagi Grimberg
Ok, so the real problem is per-cpu bounded tasks. I share Thomas opinion about a NAPI like approach. We already have that, its irq_poll, but it seems that for this use-case, we get lower performance for some reason. I'm not entirely sure why that is, maybe its because we need to mask

Re: [PATCH] nvmet: Use PTR_ERR_OR_ZERO() in nvmet_init_discovery()

2019-09-06 Thread Sagi Grimberg
Reviewed-by: Sagi Grimberg applied to nvme-5.4

Re: [PATCHv2] nvme: Assign subsy instance from first ctrl

2019-09-05 Thread Sagi Grimberg
Looks good, Reviewed-by: Sagi Grimberg Applied to nvme-5.4

Re: [PATCH] nvme: tcp: remove redundant assignment to variable ret

2019-09-05 Thread Sagi Grimberg
Applied to nvme-5.4

Re: [PATCH v2 3/3] nvme: fire discovery log page change events to userspace

2019-09-03 Thread Sagi Grimberg
Still don't understand how this is ok... I have /dev/nvme0 represents a network endpoint that I would discover from, it is raising me an event to do a discovery operation (namely to issue an ioctl to it) so my udev code calls a systemd script. By the time I actually get to do that,

Re: [PATCH v2 3/3] nvme: fire discovery log page change events to userspace

2019-08-30 Thread Sagi Grimberg
Yes we do, userspace should use it to order events.  Does udev not handle that properly today? The problem is not ordering of events, its really about the fact that the chardev can be removed and reallocated for a different controller (could be a completely different discovery controller) by

Re: [PATCH v2 3/3] nvme: fire discovery log page change events to userspace

2019-08-30 Thread Sagi Grimberg
You are correct that this information can be derived from sysfs, but the main reason why we add these here, is because in udev rule we can't just go ahead and start looking these up and parsing these.. We could send the discovery aen with NVME_CTRL_NAME and have then have systemd run

Re: [PATCH v2 3/3] nvme: fire discovery log page change events to userspace

2019-08-30 Thread Sagi Grimberg
Yes we do, userspace should use it to order events. Does udev not handle that properly today? The problem is not ordering of events, its really about the fact that the chardev can be removed and reallocated for a different controller (could be a completely different discovery controller) by

Re: [PATCH v2 3/3] nvme: fire discovery log page change events to userspace

2019-08-29 Thread Sagi Grimberg
You are correct that this information can be derived from sysfs, but the main reason why we add these here, is because in udev rule we can't just go ahead and start looking these up and parsing these.. We could send the discovery aen with NVME_CTRL_NAME and have then have systemd run

Re: [PATCH v4 2/4] nvme-pci: Add support for variable IO SQ element size

2019-08-22 Thread Sagi Grimberg
I'll fix it. Note that I'm going to take it out of the tree soon because it will have conflicts with Jens for-5.4/block, so we will send it to Jens after the initial merge window, after he rebases off of Linus. Conflicts too hard to fixup at merge time ? Otherwise I could just rebase on top

Re: [PATCH v4 2/4] nvme-pci: Add support for variable IO SQ element size

2019-08-22 Thread Sagi Grimberg
wrote: +#define NVME_NVM_ADMSQES 6 #define NVME_NVM_IOSQES 6 #define NVME_NVM_IOCQES 4 The NVM in the two defines here stands for the NVM command set, so this should just be named NVME_ADM_SQES or so. But except for this the patch looks good:

Re: [PATCH v7 08/14] nvmet-core: allow one host per passthru-ctrl

2019-08-22 Thread Sagi Grimberg
I don't understand why we don't limit a regular ctrl to single access and we do it for the PT ctrl. I guess the block layer helps to sync between multiple access in parallel but we can do it as well. Also, let's say you limit the access to this subsystem to 1 user, the bdev is still

Re: [PATCH 3/3] nvme: complete request in work queue on CPU with flooded interrupts

2019-08-21 Thread Sagi Grimberg
Sagi, Here are the test results. Benchmark command: fio --bs=4k --ioengine=libaio --iodepth=64 --filename=/dev/nvme0n1:/dev/nvme1n1:/dev/nvme2n1:/dev/nvme3n1:/dev/nvme4n1:/dev/nvme5n1:/dev/nvme6n1:/dev/nvme7n1:/dev/nvme8n1:/dev/nvme9n1 --direct=1 --runtime=90 --numjobs=80 --rw=randread

Re: [PATCH 3/3] nvme: complete request in work queue on CPU with flooded interrupts

2019-08-20 Thread Sagi Grimberg
From: Long Li When a NVMe hardware queue is mapped to several CPU queues, it is possible that the CPU this hardware queue is bound to is flooded by returning I/O for other CPUs. For example, consider the following scenario: 1. CPU 0, 1, 2 and 3 share the same hardware queue 2. the hardware

Re: [PATCH v3] nvme: Add quirk for LiteON CL1 devices running FW 22301111

2019-08-19 Thread Sagi Grimberg
Jens, can you please rebase for-linus so we have the needed dependency: 4eaefe8c621c6195c91044396ed8060c179f7aae I just did as part of adding a new patch, being pushed out shortly. Thanks

  1   2   3   4   5   6   7   8   >