On Thu, Apr 12, 2018 at 08:39:54AM -0600, Keith Busch wrote:
> On Thu, Apr 12, 2018 at 10:34:37AM -0400, Sinan Kaya wrote:
> > On 4/12/2018 10:06 AM, Bjorn Helgaas wrote:
> > >
> > > I think the scenario you are describing is two systems that are
> > > ide
On Thu, Apr 12, 2018 at 10:34:37AM -0400, Sinan Kaya wrote:
> On 4/12/2018 10:06 AM, Bjorn Helgaas wrote:
> >
> > I think the scenario you are describing is two systems that are
> > identical except that in the first, the endpoint is below a hotplug
> > bridge, while in the second, it's below a no
On Mon, Apr 09, 2018 at 10:41:52AM -0400, Oza Pawandeep wrote:
> +static int find_dpc_dev_iter(struct device *device, void *data)
> +{
> + struct pcie_port_service_driver *service_driver;
> + struct device **dev;
> +
> + dev = (struct device **) data;
> +
> + if (device->bus == &pci
On Mon, Apr 09, 2018 at 10:41:53AM -0400, Oza Pawandeep wrote:
> +/**
> + * pcie_wait_for_link - Wait for link till it's active/inactive
> + * @pdev: Bridge device
> + * @active: waiting for active or inactive ?
> + *
> + * Use this to wait till link becomes active or inactive.
> + */
> +bool pcie_
On Mon, Apr 09, 2018 at 10:41:51AM -0400, Oza Pawandeep wrote:
> This patch implements generic pcie_port_find_service() routine.
>
> Signed-off-by: Oza Pawandeep
Looks good.
Reviewed-by: Keith Busch
off-by: Oza Pawandeep
Looks fine.
Reviewed-by: Keith Busch
On Mon, Apr 09, 2018 at 10:41:49AM -0400, Oza Pawandeep wrote:
> This patch renames error recovery to generic name with pcie prefix
>
> Signed-off-by: Oza Pawandeep
Looks fine.
Reviewed-by: Keith Busch
On Fri, Mar 30, 2018 at 09:04:46AM +, Eric H. Chang wrote:
> We internally call PCIe-retimer as HBA. It's not a real Host Bus Adapter that
> translates the interface from PCIe to SATA or SAS. Sorry for the confusion.
Please don't call a PCIe retimer an "HBA"! :)
While your experiment is setu
Thanks, I've applied the patch with a simpler changelog explaining
the bug.
On Mon, Apr 02, 2018 at 11:49:41AM -0300, Rodrigo R. Galvao wrote:
> When trying to issue write_zeroes command against TARGET with a 4K block
> size, it ends up hitting the following condition at __blkdev_issue_zeroout:
>
> if ((sector | nr_sects) & bs_mask)
> return -EINVAL;
On Sat, Mar 31, 2018 at 05:34:26PM -0500, Bjorn Helgaas wrote:
> From: Bjorn Helgaas
>
> Rename from pcie-dpc.c to dpc.c. The path "drivers/pci/pcie/pcie-dpc.c"
> has more occurrences of "pci" than necessary.
>
> Signed-off-by: Bjorn Helgaas
Looks good.
Acked-by: Keith Busch
On Mon, Apr 02, 2018 at 10:47:10AM -0300, Rodrigo Rosatti Galvao wrote:
> One thing that I just forgot to explain previously, but I think its
> relevant:
>
> 1. The command is failing with 4k logical block size, but works with 512B
>
> 2. With the patch, the command is working for both 512B and 4
On Fri, Mar 30, 2018 at 06:18:50PM -0300, Rodrigo R. Galvao wrote:
> sector = le64_to_cpu(write_zeroes->slba) <<
> (req->ns->blksize_shift - 9);
> nr_sector = (((sector_t)le16_to_cpu(write_zeroes->length)) <<
> - (req->ns->blksize_shift - 9)) + 1;
> +
On Fri, Mar 30, 2018 at 06:18:50PM -0300, Rodrigo R. Galvao wrote:
> When trying to issue write_zeroes command against TARGET the nr_sector is
> being incremented by 1, which ends up hitting the following condition at
> __blkdev_issue_zeroout:
>
> if ((sector | nr_sects) & bs_mask)
>
On Wed, Mar 28, 2018 at 10:06:46AM +0200, Christoph Hellwig wrote:
> For PCIe devices the right policy is not a round robin but to use
> the pcie device closer to the node. I did a prototype for that
> long ago and the concept can work. Can you look into that and
> also make that policy used auto
Thanks, applied.
Thanks, applied.
Thanks, applied.
On Wed, Mar 28, 2018 at 03:57:47PM +0200, Arnd Bergmann wrote:
> @@ -2233,8 +2233,8 @@ int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct
> nvme_ns *ns,
> c.get_log_page.lid = log_page;
> c.get_log_page.numdl = cpu_to_le16(dwlen & ((1 << 16) - 1));
> c.get_log_page.numdu = cpu_t
On Tue, Mar 27, 2018 at 08:00:33PM +0200, Matias Bjørling wrote:
> Compiling on 32 bits system produces a warning for the shift width
> when shifting 32 bit integer with 64bit integer.
>
> Make sure that offset always is 64bit, and use macros for retrieving
> lower and upper bits of the offset.
T
On Wed, Mar 21, 2018 at 08:27:07PM +0100, Matias Bjørling wrote:
> Enable the lightnvm integration to use the nvme_get_log_ext()
> function.
>
> Signed-off-by: Matias Bjørling
Thanks, applied to nvme-4.17.
On Wed, Mar 21, 2018 at 11:48:09PM +0800, Ming Lei wrote:
> On Wed, Mar 21, 2018 at 01:10:31PM +0100, Marta Rybczynska wrote:
> > > On Wed, Mar 21, 2018 at 12:00:49PM +0100, Marta Rybczynska wrote:
> > >> NVMe driver uses threads for the work at device reset, including enabling
> > >> the PCIe devi
On Wed, Mar 21, 2018 at 03:06:05AM -0700, Matias Bjørling wrote:
> > outside of nvme core so that we can use it form lightnvm.
> >
> > Signed-off-by: Javier González
> > ---
> > drivers/lightnvm/core.c | 11 +++
> > drivers/nvme/host/core.c | 6 ++--
> > drivers/nvme/host/lightn
On Wed, Mar 14, 2018 at 02:52:30PM -0600, Keith Busch wrote:
>
> Reviewed-by: Keith Busch
y INT interrupt.
>
> With current code we do not acknowledge the interrupt back in dpc_irq()
> and we get dpc interrupt storm.
>
> This patch acknowledges the interrupt in interrupt handler.
>
> Signed-off-by: Oza Pawandeep
Thanks, this looks good to me.
Reviewed-by: Keith Busch
On Mon, Mar 12, 2018 at 11:47:12PM -0400, Sinan Kaya wrote:
>
> The spec is recommending code to use "Hotplug Surprise" to differentiate
> these two cases we are looking for.
>
> The use case Keith is looking for is for hotplug support.
> The case I and Oza are more interested is for error hand
Thanks, applied for 4.17.
On Tue, Mar 13, 2018 at 06:45:00PM +0800, Ming Lei wrote:
> On Tue, Mar 13, 2018 at 05:58:08PM +0800, Jianchao Wang wrote:
> > Currently, adminq and ioq1 share the same irq vector which is set
> > affinity to cpu0. If a system allows cpu0 to be offlined, the adminq
> > will not be able work any mor
On Mon, Mar 12, 2018 at 02:47:30PM -0500, Bjorn Helgaas wrote:
> [+cc Alex]
>
> On Mon, Mar 12, 2018 at 08:25:51AM -0600, Keith Busch wrote:
> > On Sun, Mar 11, 2018 at 11:03:58PM -0400, Sinan Kaya wrote:
> > > On 3/11/2018 6:03 PM, Bjorn Helgaas wrote:
> > > >
Hi Jianchao,
The patch tests fine on all hardware I had. I'd like to queue this up
for the next 4.16-rc. Could you send a v3 with the cleanup changes Andy
suggested and a changelog aligned with Ming's insights?
Thanks,
Keith
On Mon, Mar 12, 2018 at 11:09:34AM -0700, Alexander Duyck wrote:
> On Mon, Mar 12, 2018 at 10:40 AM, Keith Busch wrote:
> > On Mon, Mar 12, 2018 at 10:21:29AM -0700, Alexander Duyck wrote:
> >> diff --git a/include/linux/pci.h b/include/linux/pci.h
> >> index 024a1b
On Mon, Mar 12, 2018 at 01:41:07PM -0400, Sinan Kaya wrote:
> I was just writing a reply to you. You acted first :)
>
> On 3/12/2018 1:33 PM, Keith Busch wrote:
> >>> After releasing a slot from DPC, the link is allowed to retrain. If
> >>> there
> >>
On Mon, Mar 12, 2018 at 10:21:29AM -0700, Alexander Duyck wrote:
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 024a1beda008..9cab9d0d51dc 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1953,6 +1953,7 @@ static inline void pci_mmcfg_late_init(void) { }
> int
On Mon, Mar 12, 2018 at 09:04:47PM +0530, p...@codeaurora.org wrote:
> On 2018-03-12 20:28, Keith Busch wrote:
> > I'm not sure I understand. The link is disabled while DPC is triggered,
> > so if anything, you'd want to un-enumerate everything below the
> > containe
On Mon, Mar 12, 2018 at 08:16:38PM +0530, p...@codeaurora.org wrote:
> On 2018-03-12 19:55, Keith Busch wrote:
> > On Sun, Mar 11, 2018 at 11:03:58PM -0400, Sinan Kaya wrote:
> > > On 3/11/2018 6:03 PM, Bjorn Helgaas wrote:
> > > > On Wed, Feb 28, 2018 at 10:34:1
On Sun, Mar 11, 2018 at 11:03:58PM -0400, Sinan Kaya wrote:
> On 3/11/2018 6:03 PM, Bjorn Helgaas wrote:
> > On Wed, Feb 28, 2018 at 10:34:11PM +0530, Oza Pawandeep wrote:
>
> > That difference has been there since the beginning of DPC, so it has
> > nothing to do with *this* series EXCEPT for the
On Thu, Mar 08, 2018 at 08:42:20AM +0100, Christoph Hellwig wrote:
>
> So I suspect we'll need to go with a patch like this, just with a way
> better changelog.
I have to agree this is required for that use case. I'll run some
quick tests and propose an alternate changelog.
Longer term, the curr
On Mon, Mar 05, 2018 at 01:10:53PM -0700, Jason Gunthorpe wrote:
> So when reading the above mlx code, we see the first wmb() being used
> to ensure that CPU stores to cachable memory are visible to the DMA
> triggered by the doorbell ring.
IIUC, we don't need a similar barrier for NVMe to ensure
On Mon, Mar 05, 2018 at 12:33:29PM +1100, Oliver wrote:
> On Thu, Mar 1, 2018 at 10:40 AM, Logan Gunthorpe wrote:
> > @@ -429,10 +429,7 @@ static void __nvme_submit_cmd(struct nvme_queue *nvmeq,
> > {
> > u16 tail = nvmeq->sq_tail;
>
> > - if (nvmeq->sq_cmds_io)
> > -
Thanks, applied for 4.17.
A side note for those interested, some bug fixing commits introduced in
the nvme-4.17 branch were applied to 4.16-rc. I've rebased these on top
of linux-block/for-next so we don't have the duplicate commits for the
4.17 merge window. The nvme branch currently may be viewe
On Thu, Mar 01, 2018 at 11:00:51PM +, Stephen Bates wrote:
>
> P2P is about offloading the memory and PCI subsystem of the host CPU
> and this is achieved no matter which p2p_dev is used.
Even within a device, memory attributes for its various regions may not be
the same. There's a meaningfu
On Thu, Mar 01, 2018 at 01:52:20AM +0100, Christoph Hellwig wrote:
> Looks fine,
>
> and we should pick this up for 4.16 independent of the rest, which
> I might need a little more review time for.
>
> Reviewed-by: Christoph Hellwig
Thanks, queued up for 4.16.
On Thu, Mar 01, 2018 at 11:12:08AM -0600, Wen Xiong wrote:
>Hi Keith,
>
>It is perfect! I go with it.
Thanks, queued up for 4.16.
On Thu, Mar 01, 2018 at 11:03:30PM +0800, Ming Lei wrote:
> If all CPUs for the 1st IRQ vector of admin queue are offline, then I
> guess NVMe can't work any more.
Yikes, with respect to admin commands, it appears you're right if your
system allows offlining CPU0.
> So looks it is a good idea to
On Thu, Mar 01, 2018 at 06:05:53PM +0800, jianchao.wang wrote:
> When the adminq is free, ioq0 irq completion path has to invoke nvme_irq
> twice, one for itself,
> one for adminq completion irq action.
Let's be a little more careful on the terminology when referring to spec
defined features: the
On Wed, Feb 28, 2018 at 04:31:37PM -0600, wenxiong wrote:
> On 2018-02-15 14:05, wenxi...@linux.vnet.ibm.com wrote:
> > From: Wen Xiong
> >
> > With b2a0eb1a0ac72869c910a79d935a0b049ec78ad9(nvme-pci: Remove watchdog
> > timer), EEH recovery stops working on ppc.
> >
> > After removing whatdog ti
Thanks, applied.
On Wed, Feb 28, 2018 at 11:46:20PM +0800, jianchao.wang wrote:
>
> the irqbalance may migrate the adminq irq away from cpu0.
No, irqbalance can't touch managed IRQs. See irq_can_set_affinity_usr().
On Wed, Feb 28, 2018 at 10:53:31AM +0800, jianchao.wang wrote:
> On 02/27/2018 11:13 PM, Keith Busch wrote:
> > On Tue, Feb 27, 2018 at 04:46:17PM +0800, Jianchao Wang wrote:
> >> Currently, adminq and ioq0 share the same irq vector. This is
> >> unfair for both amdinq a
On Tue, Feb 27, 2018 at 04:46:17PM +0800, Jianchao Wang wrote:
> Currently, adminq and ioq0 share the same irq vector. This is
> unfair for both amdinq and ioq0.
> - For adminq, its completion irq has to be bound on cpu0.
> - For ioq0, when the irq fires for io completion, the adminq irq
>act
On Mon, Feb 26, 2018 at 05:51:23PM +0900, baeg...@gmail.com wrote:
> From: Baegjae Sung
>
> If multipathing is enabled, each NVMe subsystem creates a head
> namespace (e.g., nvme0n1) and multiple private namespaces
> (e.g., nvme0c0n1 and nvme0c1n1) in sysfs. When creating links for
> private name
On Thu, Feb 15, 2018 at 07:13:41PM +0800, Jianchao Wang wrote:
> nvme cq irq is freed based on queue_count. When the sq/cq creation
> fails, irq will not be setup. free_irq will warn 'Try to free
> already-free irq'.
>
> To fix it, set the nvmeq->cq_vector to -1, then nvme_suspend_queue
> will ign
On Tue, Feb 20, 2018 at 02:21:37PM +0100, Peter Zijlstra wrote:
> Also, set_current_state(TASK_RUNNING) is dodgy (similarly in
> __blk_mq_poll), why do you need that memory barrier?
You're right. The subsequent revision that was committed removed the
barrier. The commit is here:
https://git.kerne
On Thu, Feb 15, 2018 at 02:49:56PM +0100, Julien Durillon wrote:
> I opened an issue here:
> https://github.com/dracutdevs/dracut/issues/373 for dracut. You can
> read there how dracuts enters an infinite loop.
>
> TL;DR: in linux-4.14, trying to find the last "slave" of /dev/dm-0
> ends with a ma
On Mon, Feb 12, 2018 at 09:05:13PM +0800, Jianchao Wang wrote:
> @@ -1315,9 +1315,6 @@ static int nvme_suspend_queue(struct nvme_queue *nvmeq)
> nvmeq->cq_vector = -1;
> spin_unlock_irq(&nvmeq->q_lock);
>
> - if (!nvmeq->qid && nvmeq->dev->ctrl.admin_q)
> - blk_mq_quie
Hi Sagi,
This one is fixing a deadlock in namespace detach. It is still not a
widely supported operation, but becoming more common. While the other two
patches in this series look good for 4.17, I would really recommend this
one for 4.16-rc, and add a Cc to linux-stable for 4.15 too. Sound okay?
On Mon, Feb 12, 2018 at 08:47:47PM +0200, Sagi Grimberg wrote:
> This looks fine to me, but I really want Keith and/or Christoph to have
> a look as well.
This looks fine to me as well.
Reviewed-by: Keith Busch
Looks good.
Reviewed-by: Keith Busch
This looks good.
Reviewed-by: Keith Busch
On Mon, Feb 12, 2018 at 08:43:58PM +0200, Sagi Grimberg wrote:
>
> > Currently, we will unquiesce the queues after the controller is
> > shutdown to avoid residual requests to be stuck. In fact, we can
> > do it more cleanly, just wait freeze and drain the queue in
> > nvme_dev_disable and finally
On Fri, Feb 09, 2018 at 09:50:58AM +0800, jianchao.wang wrote:
>
> if we set NVME_REQ_CANCELLED and return BLK_EH_HANDLED as the RESETTING case,
> nvme_reset_work will hang forever, because no one could complete the entered
> requests.
Except it's no longer in the "RESETTING" case since you adde
On Thu, Feb 08, 2018 at 05:56:49PM +0200, Sagi Grimberg wrote:
> Given the discussion on this set, you plan to respin again
> for 4.16?
With the exception of maybe patch 1, this needs more consideration than
I'd feel okay with for the 4.16 release.
On Sun, May 30, 2083 at 09:51:06AM +0530, Nitesh Shetty wrote:
> This removes the dependency on interrupts to wake up task. Set task
> state as TASK_RUNNING, if need_resched() returns true,
> while polling for IO completion.
> Earlier, polling task used to sleep, relying on interrupt to wake it up.
On Thu, Feb 08, 2018 at 10:17:00PM +0800, jianchao.wang wrote:
> There is a dangerous scenario which caused by nvme_wait_freeze in
> nvme_reset_work.
> please consider it.
>
> nvme_reset_work
> -> nvme_start_queues
> -> nvme_wait_freeze
>
> if the controller no response, we have to rely on t
On Wed, Feb 07, 2018 at 02:09:38PM -0600, wenxi...@linux.vnet.ibm.com wrote:
> @@ -1189,6 +1183,12 @@ static enum blk_eh_timer_return nvme_timeout(struct
> request *req, bool reserved)
> struct nvme_command cmd;
> u32 csts = readl(dev->bar + NVME_REG_CSTS);
>
> + /* If PCI error
On Wed, Feb 07, 2018 at 10:13:51AM +0800, jianchao.wang wrote:
> What's the difference ? Can you please point out.
> I have shared my understanding below.
> But actually, I don't get the point what's the difference you said.
It sounds like you have all the pieces. Just keep this in mind: we don't
On Tue, Feb 06, 2018 at 09:46:36AM +0800, jianchao.wang wrote:
> Hi Keith
>
> Thanks for your kindly response.
>
> On 02/05/2018 11:13 PM, Keith Busch wrote:
> > but how many requests are you letting enter to their demise by
> > freezing on the wrong side of the res
On Mon, Feb 05, 2018 at 03:32:23PM -0700, sba...@raithlin.com wrote:
>
> - if (dev->cmb && (dev->cmbsz & NVME_CMBSZ_SQS)) {
> + if (dev->cmb && use_cmb_sqes && (dev->cmbsz & NVME_CMBSZ_SQS)) {
Is this a prep patch for something coming later? dev->cmb is already
NULL if use_cmb_sqes is fa
On Mon, Feb 05, 2018 at 10:26:03AM +0800, jianchao.wang wrote:
> > Freezing is not just for shutdown. It's also used so
> > blk_mq_update_nr_hw_queues will work if the queue count changes across
> > resets.
> blk_mq_update_nr_hw_queues will freeze the queue itself. Please refer to.
> static void __
to something
like:
This patch quiecses new IO prior to disabling device HMB access.
A controller using HMB may be relying on it to efficiently complete
IO commands.
Reviewed-by: Keith Busch
> ---
> drivers/nvme/host/pci.c | 8 +++-
> 1 file changed, 3 insertions(+), 5 deletions
On Fri, Feb 02, 2018 at 03:00:47PM +0800, Jianchao Wang wrote:
> Currently, the complicated relationship between nvme_dev_disable
> and nvme_timeout has become a devil that will introduce many
> circular pattern which may trigger deadlock or IO hang. Let's
> enumerate the tangles between them:
> -
On Fri, Feb 02, 2018 at 03:00:45PM +0800, Jianchao Wang wrote:
> Currently, request queue will be frozen and quiesced for both reset
> and shutdown case. This will trigger ioq requests in RECONNECTING
> state which should be avoided to prepare for following patch.
> Just freeze request queue for sh
with current code we do not acknowledge the
> interrupt and we get dpc interrupt storm.
> This patch acknowledges the interrupt in interrupt handler.
>
> Signed-off-by: Oza Pawandeep
Thanks, looks good to me.
Reviewed-by: Keith Busch
On Tue, Jan 30, 2018 at 11:41:07AM +0800, jianchao.wang wrote:
> Another point that confuses me is that whether nvme_set_host_mem is necessary
> in nvme_dev_disable ?
> As the comment:
>
> /*
>* If the controller is still alive tell it to stop using the
>
On Mon, Jan 29, 2018 at 09:55:41PM +0200, Sagi Grimberg wrote:
> > Thanks for the fix. It looks like we still have a problem, though.
> > Commands submitted with the "shutdown_lock" held need to be able to make
> > forward progress without relying on a completion, but this one could
> > block indef
On Mon, Jan 29, 2018 at 11:07:35AM +0800, Jianchao Wang wrote:
> nvme_set_host_mem will invoke nvme_alloc_request without NOWAIT
> flag, it is unsafe for nvme_dev_disable. The adminq driver tags
> may have been used up when the previous outstanding adminq requests
> cannot be completed due to some
.
>
> Suggested-by: James Smart
> Reviewed-by: James Smart
> Signed-off-by: Jianchao Wang
This looks fine. Thank you for your patience.
Reviewed-by: Keith Busch
On Wed, Jan 24, 2018 at 11:29:12PM +0100, Paul Menzel wrote:
> Am 22.01.2018 um 22:30 schrieb Keith Busch:
> > The nvme spec guides toward longer times than that. I don't see the
> > point of warning users about things operating within spec.
>
> I quickly glanced ove
On Mon, Jan 22, 2018 at 09:14:23PM +0100, Christoph Hellwig wrote:
> > Link: https://lkml.org/lkml/2018/1/19/68
> > Suggested-by: Keith Busch
> > Signed-off-by: Keith Busch
> > Signed-off-by: Jianchao Wang
>
> Why does this have a signoff from Keith?
Right, I
On Mon, Jan 22, 2018 at 10:02:12PM +0100, Paul Menzel wrote:
> Dear Linux folks,
>
>
> Benchmarking the ACPI S3 suspend and resume times with `sleepgraph.py
> -config config/suspend-callgraph.cfg` [1], shows that the NVMe disk SAMSUNG
> MZVKW512HMJP-0 in the TUXEDO Book BU1406 takes between 0
On Fri, Jan 19, 2018 at 09:56:48PM +0800, jianchao.wang wrote:
> In nvme_dev_disable, the outstanding requests will be requeued finally.
> I'm afraid the requests requeued on the q->requeue_list will be blocked until
> another requeue
> occurs, if we cancel the requeue work before it get scheduled
On Fri, Jan 19, 2018 at 05:02:06PM +0800, jianchao.wang wrote:
> We should not use blk_sync_queue here, the requeue_work and run_work will be
> canceled.
> Just flush_work(&q->timeout_work) should be ok.
I agree flushing timeout_work is sufficient. All the other work had
already better not be run
On Fri, Jan 19, 2018 at 04:14:02PM +0800, jianchao.wang wrote:
> On 01/19/2018 04:01 PM, Keith Busch wrote:
> > The nvme_dev_disable routine makes forward progress without depending on
> > timeout handling to complete expired commands. Once controller disabling
> > completes,
On Thu, Jan 18, 2018 at 06:10:00PM +0800, Jianchao Wang wrote:
> Hello
>
> Please consider the following scenario.
> nvme_reset_ctrl
> -> set state to RESETTING
> -> queue reset_work
> (scheduling)
> nvme_reset_work
> -> nvme_dev_disable
> -> quiesce queues
> -> nvme_cance
On Fri, Jan 19, 2018 at 01:55:29PM +0800, jianchao.wang wrote:
> On 01/19/2018 12:59 PM, Keith Busch wrote:
> > On Thu, Jan 18, 2018 at 06:10:02PM +0800, Jianchao Wang wrote:
> >> + * - When the ctrl.state is NVME_CTRL_RESETTING, the expired
> >> + * request sh
On Thu, Jan 18, 2018 at 06:10:02PM +0800, Jianchao Wang wrote:
> + * - When the ctrl.state is NVME_CTRL_RESETTING, the expired
> + * request should come from the previous work and we handle
> + * it as nvme_cancel_request.
> + * - When the ctrl.state is NVME_CTRL_RECONNECTIN
On Thu, Jan 18, 2018 at 11:35:59AM -0500, Sinan Kaya wrote:
> On 1/18/2018 12:32 AM, p...@codeaurora.org wrote:
> > On 2018-01-18 08:26, Keith Busch wrote:
> >> On Wed, Jan 17, 2018 at 08:27:39AM -0800, Sinan Kaya wrote:
> >>> On 1/17/2018 5:37 AM, Oza Pawande
On Thu, Jan 18, 2018 at 09:10:43AM +0100, Thomas Gleixner wrote:
> Can you please provide the output of
>
> # cat /sys/kernel/debug/irq/irqs/$ONE_I40_IRQ
# cat /sys/kernel/debug/irq/irqs/48
handler: handle_edge_irq
device: :1a:00.0
status: 0x
istate: 0x
ddepth: 0
wdep
Looks good.
Reviewed-by: Keith Busch
Looks good.
Reviewed-by: Keith Busch
On Wed, Jan 17, 2018 at 08:27:39AM -0800, Sinan Kaya wrote:
> On 1/17/2018 5:37 AM, Oza Pawandeep wrote:
> > +static bool dpc_wait_link_active(struct pci_dev *pdev)
> > +{
>
> I think you can also make this function common instead of making another copy
> here.
> Of course, this would be another
er 200 iterations that used to
fail within only a few. I'd say the problem is cured. Thanks!
Tested-by: Keith Busch
On Wed, Jan 17, 2018 at 08:34:22AM +0100, Thomas Gleixner wrote:
> Can you trace the matrix allocations from the very beginning or tell me how
> to reproduce. I'd like to figure out why this is happening.
Sure, I'll get the irq_matrix events.
I reproduce this on a machine with 112 CPUs and 3 NVMe
x86_vector_free_irqs(domain, virq, i);
> return err;
> }
>
The patch does indeed fix all the warnings and allows device binding to
succeed, albeit in a degraded performance mode. Despite that, this is
a good fix, and looks applicable to 4.4-stable, so:
Tested-by: Keith
On Tue, Jan 16, 2018 at 03:28:19PM +0100, Johannes Thumshirn wrote:
> Add tracepoints for nvme command submission and completion. The tracepoints
> are modeled after SCSI's trace_scsi_dispatch_cmd_start() and
> trace_scsi_dispatch_cmd_done() tracepoints and fulfil a similar purpose,
> namely a fast
On Tue, Jan 16, 2018 at 12:20:18PM +0100, Thomas Gleixner wrote:
> What we want is s/i + 1/i/
>
> That's correct because x86_vector_free_irqs() does:
>
>for (i = 0; i < nr; i++)
>
>
> So if we fail at the first irq, then the loop will do nothing. Failing on
> the se
This is all way over my head, but the part that obviously shows
something's gone wrong:
kworker/u674:3-1421 [028] d... 335.307051: irq_matrix_reserve_managed:
bit=56 cpu=0 online=1 avl=86 alloc=116 managed=3 online_maps=112
global_avl=22084, global_rsvd=157, total_alloc=570
kworker/u674:3
I hoped to have a better report before the weekend, but I've run out of
time and without my machine till next week, so sending what I have and
praying someone more in the know will have a better clue.
I've a few NVMe drives and occasionally the IRQ teardown and bring-up
is failing. Resetting the c
On Mon, Jan 15, 2018 at 10:02:04AM +0800, jianchao.wang wrote:
> Hi keith
>
> Thanks for your kindly review and response.
I agree with Sagi's feedback, but I can't take credit for it. :)
On Thu, Jan 11, 2018 at 06:50:40PM +0100, Maik Broemme wrote:
> I've re-run the test with 4.15rc7.r111.g5f615b97cdea and the following
> patches from Keith:
>
> [PATCH 1/4] PCI/AER: Return approrpiate value when AER is not supported
> [PATCH 2/4] PCI/AER: Provide API for getting AER information
>
501 - 600 of 931 matches
Mail list logo