Re: [net-next 5/5] PCI: disable FLR for 82579 device

2016-09-28 Thread Bjorn Helgaas
On Wed, Sep 28, 2016 at 03:33:52PM +, Neftin, Sasha wrote:
> 
> Since I worked with Sasha on this I will provide a bit of information from 
> what I understand of this bug as well.
> 
> On Tue, Sep 27, 2016 at 12:13 PM, Alex Williamson 
>  wrote:
> > On Tue, 27 Sep 2016 13:17:02 -0500
> > Bjorn Helgaas  wrote:
> >
> >> On Sun, Sep 25, 2016 at 10:02:43AM +0300, Neftin, Sasha wrote:
> >> > On 9/24/2016 12:05 AM, Jeff Kirsher wrote:
> >> > >On Fri, 2016-09-23 at 09:01 -0500, Bjorn Helgaas wrote:
> >> > >>On Thu, Sep 22, 2016 at 11:39:01PM -0700, Jeff Kirsher wrote:
> >> > >>>From: Sasha Neftin 
> >> > >>>
> >> > >>>82579 has a problem reattaching itself after the device is detached.
> >> > >>>The bug was reported by Redhat. The suggested fix is to disable 
> >> > >>>FLR capability in PCIe configuration space.
> >> > >>>
> >> > >>>Reproduction:
> >> > >>>Attach the device to a VM, then detach and try to attach again.
> >> > >>>
> >> > >>>Fix:
> >> > >>>Disable FLR capability to prevent the 82579 from hanging.
> >> > >>Is there a bugzilla or other reference URL to include here?  
> >> > >>Should this be marked for stable?
> >> > >So the author is in Israel, meaning it is their weekend now.  I do 
> >> > >not believe Sasha monitors email over the weekend, so a response 
> >> > >to your questions won't happen for a few days.
> >> > >
> >> > >I tried searching my archives for more information, but had no 
> >> > >luck finding any additional information.
> >> > >
> 
> I agree that we do probably need to update the patch description since it 
> isn't exactly clear what this is fixing or what was actually broken.
> 
> >> > >>>Signed-off-by: Sasha Neftin 
> >> > >>>Tested-by: Aaron Brown 
> >> > >>>Signed-off-by: Jeff Kirsher 
> >> > >>>---
> >> > >>>  drivers/pci/quirks.c | 21 +
> >> > >>>  1 file changed, 21 insertions(+)
> >> > >>>
> >> > >>>diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 
> >> > >>>44e0ff3..59fba6e 100644
> >> > >>>--- a/drivers/pci/quirks.c
> >> > >>>+++ b/drivers/pci/quirks.c
> >> > >>>@@ -4431,3 +4431,24 @@ static void quirk_intel_qat_vf_cap(struct 
> >> > >>>pci_dev *pdev)
> >> > >>> }
> >> > >>>  }
> >> > >>>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, 
> >> > >>>quirk_intel_qat_vf_cap);
> >> > >>>+/*
> >> > >>>+ * Workaround FLR issues for 82579
> >> > >>>+ * This code disables the FLR (Function Level Reset) via PCIe, 
> >> > >>>+in
> >> > >>>order
> >> > >>>+ * to workaround a bug found while using device passthrough, 
> >> > >>>+ where the
> >> > >>>+ * interface would become non-responsive.
> >> > >>>+ * NOTE: the FLR bit is Read/Write Once (RWO) in config space, 
> >> > >>>+ so if
> >> > >>>+ * the BIOS or kernel writes this register * then this 
> >> > >>>+ workaround will
> >> > >>>+ * not work.
> >> > >>This doesn't sound like a root cause.  Is the issue a hardware 
> >> > >>erratum?  Linux PCI core bug?  VFIO bug?  Device firmware bug?
> >> > >>
> >> > >>The changelog suggests that the problem only affects passthrough, 
> >> > >>which suggests some sort of kernel bug related to how passthrough 
> >> > >>is implemented.
> >>
> >> If this bug affects all scenarios, not just passthrough, the 
> >> changelog should not mention passthrough.
> >>
> >> > >>>+ */
> >> > >>>+static void quirk_intel_flr_cap_dis(struct pci_dev *dev) {
> >> > >>>+int pos = pci_find_capability(dev, PCI_CAP_ID_AF);
> >> > >>>+if (pos) {
> >> > >>>+u8 cap;
> >> > >>>+pci_read_config_byte(dev, pos + PCI_AF_CAP, );
> >> > >>>+cap = cap & (~PCI_AF_CAP_FLR);
> >> > >>>+pci_write_config_byte(dev, pos + PCI_AF_CAP, cap);
> >> > >>>+}
> >> > >>>+}
> >> > >>>+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502,
> >> > >>>quirk_intel_flr_cap_dis);
> >> > >>>+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503,
> >> > >>>quirk_intel_flr_cap_dis);
> >> > >>>--
> >> > >>>2.7.4
> >> > >>>
> >> > >>>--
> >> > >>>To unsubscribe from this list: send the line "unsubscribe 
> >> > >>>linux-pci" in the body of a message to majord...@vger.kernel.org 
> >> > >>>More majordomo info at  
> >> > >>>http://vger.kernel.org/majordomo-info.html
> >> >
> >> > Hello,
> >> >
> >> > Original bugzilla thread could be found here:
> >> > https://bugzilla.redhat.com/show_bug.cgi?format=multiple=966840
> >>
> >> That bugzilla is private and I can't read it.
> >
> > Hmm, I can, but I don't see anything in it that supports this.  Is 
> > that really the right bz?  It's the right hardware, but has all sorts 
> > of FUD about the version of various other components in the stack.
> 
> It looks like we had a local copy of the bugzilla saved here, though it only 
> goes up to comment 13 which is where I think we started working this on our 
> side.  I believe what this 

RE: [net-next 5/5] PCI: disable FLR for 82579 device

2016-09-28 Thread Neftin, Sasha

Since I worked with Sasha on this I will provide a bit of information from what 
I understand of this bug as well.

On Tue, Sep 27, 2016 at 12:13 PM, Alex Williamson  
wrote:
> On Tue, 27 Sep 2016 13:17:02 -0500
> Bjorn Helgaas  wrote:
>
>> On Sun, Sep 25, 2016 at 10:02:43AM +0300, Neftin, Sasha wrote:
>> > On 9/24/2016 12:05 AM, Jeff Kirsher wrote:
>> > >On Fri, 2016-09-23 at 09:01 -0500, Bjorn Helgaas wrote:
>> > >>On Thu, Sep 22, 2016 at 11:39:01PM -0700, Jeff Kirsher wrote:
>> > >>>From: Sasha Neftin 
>> > >>>
>> > >>>82579 has a problem reattaching itself after the device is detached.
>> > >>>The bug was reported by Redhat. The suggested fix is to disable 
>> > >>>FLR capability in PCIe configuration space.
>> > >>>
>> > >>>Reproduction:
>> > >>>Attach the device to a VM, then detach and try to attach again.
>> > >>>
>> > >>>Fix:
>> > >>>Disable FLR capability to prevent the 82579 from hanging.
>> > >>Is there a bugzilla or other reference URL to include here?  
>> > >>Should this be marked for stable?
>> > >So the author is in Israel, meaning it is their weekend now.  I do 
>> > >not believe Sasha monitors email over the weekend, so a response 
>> > >to your questions won't happen for a few days.
>> > >
>> > >I tried searching my archives for more information, but had no 
>> > >luck finding any additional information.
>> > >

I agree that we do probably need to update the patch description since it isn't 
exactly clear what this is fixing or what was actually broken.

>> > >>>Signed-off-by: Sasha Neftin 
>> > >>>Tested-by: Aaron Brown 
>> > >>>Signed-off-by: Jeff Kirsher 
>> > >>>---
>> > >>>  drivers/pci/quirks.c | 21 +
>> > >>>  1 file changed, 21 insertions(+)
>> > >>>
>> > >>>diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 
>> > >>>44e0ff3..59fba6e 100644
>> > >>>--- a/drivers/pci/quirks.c
>> > >>>+++ b/drivers/pci/quirks.c
>> > >>>@@ -4431,3 +4431,24 @@ static void quirk_intel_qat_vf_cap(struct 
>> > >>>pci_dev *pdev)
>> > >>> }
>> > >>>  }
>> > >>>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, 
>> > >>>quirk_intel_qat_vf_cap);
>> > >>>+/*
>> > >>>+ * Workaround FLR issues for 82579
>> > >>>+ * This code disables the FLR (Function Level Reset) via PCIe, 
>> > >>>+in
>> > >>>order
>> > >>>+ * to workaround a bug found while using device passthrough, 
>> > >>>+ where the
>> > >>>+ * interface would become non-responsive.
>> > >>>+ * NOTE: the FLR bit is Read/Write Once (RWO) in config space, 
>> > >>>+ so if
>> > >>>+ * the BIOS or kernel writes this register * then this 
>> > >>>+ workaround will
>> > >>>+ * not work.
>> > >>This doesn't sound like a root cause.  Is the issue a hardware 
>> > >>erratum?  Linux PCI core bug?  VFIO bug?  Device firmware bug?
>> > >>
>> > >>The changelog suggests that the problem only affects passthrough, 
>> > >>which suggests some sort of kernel bug related to how passthrough 
>> > >>is implemented.
>>
>> If this bug affects all scenarios, not just passthrough, the 
>> changelog should not mention passthrough.
>>
>> > >>>+ */
>> > >>>+static void quirk_intel_flr_cap_dis(struct pci_dev *dev) {
>> > >>>+int pos = pci_find_capability(dev, PCI_CAP_ID_AF);
>> > >>>+if (pos) {
>> > >>>+u8 cap;
>> > >>>+pci_read_config_byte(dev, pos + PCI_AF_CAP, );
>> > >>>+cap = cap & (~PCI_AF_CAP_FLR);
>> > >>>+pci_write_config_byte(dev, pos + PCI_AF_CAP, cap);
>> > >>>+}
>> > >>>+}
>> > >>>+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502,
>> > >>>quirk_intel_flr_cap_dis);
>> > >>>+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503,
>> > >>>quirk_intel_flr_cap_dis);
>> > >>>--
>> > >>>2.7.4
>> > >>>
>> > >>>--
>> > >>>To unsubscribe from this list: send the line "unsubscribe 
>> > >>>linux-pci" in the body of a message to majord...@vger.kernel.org 
>> > >>>More majordomo info at  
>> > >>>http://vger.kernel.org/majordomo-info.html
>> >
>> > Hello,
>> >
>> > Original bugzilla thread could be found here:
>> > https://bugzilla.redhat.com/show_bug.cgi?format=multiple=966840
>>
>> That bugzilla is private and I can't read it.
>
> Hmm, I can, but I don't see anything in it that supports this.  Is 
> that really the right bz?  It's the right hardware, but has all sorts 
> of FUD about the version of various other components in the stack.

It looks like we had a local copy of the bugzilla saved here, though it only 
goes up to comment 13 which is where I think we started working this on our 
side.  I believe what this patch is attempting to resolve is related to comment 
8 where the driver returned "probe of
:00:19.0 failed with error ‐2" instead of correctly probing the interface.

So the bug as reported was that e1000e had a problem reattaching itself to the 
PHY after it was attached to a VM.  

Re: [net-next 5/5] PCI: disable FLR for 82579 device

2016-09-27 Thread Alexander Duyck
Since I worked with Sasha on this I will provide a bit of information
from what I understand of this bug as well.

On Tue, Sep 27, 2016 at 12:13 PM, Alex Williamson
 wrote:
> On Tue, 27 Sep 2016 13:17:02 -0500
> Bjorn Helgaas  wrote:
>
>> On Sun, Sep 25, 2016 at 10:02:43AM +0300, Neftin, Sasha wrote:
>> > On 9/24/2016 12:05 AM, Jeff Kirsher wrote:
>> > >On Fri, 2016-09-23 at 09:01 -0500, Bjorn Helgaas wrote:
>> > >>On Thu, Sep 22, 2016 at 11:39:01PM -0700, Jeff Kirsher wrote:
>> > >>>From: Sasha Neftin 
>> > >>>
>> > >>>82579 has a problem reattaching itself after the device is detached.
>> > >>>The bug was reported by Redhat. The suggested fix is to disable
>> > >>>FLR capability in PCIe configuration space.
>> > >>>
>> > >>>Reproduction:
>> > >>>Attach the device to a VM, then detach and try to attach again.
>> > >>>
>> > >>>Fix:
>> > >>>Disable FLR capability to prevent the 82579 from hanging.
>> > >>Is there a bugzilla or other reference URL to include here?  Should
>> > >>this be marked for stable?
>> > >So the author is in Israel, meaning it is their weekend now.  I do not
>> > >believe Sasha monitors email over the weekend, so a response to your
>> > >questions won't happen for a few days.
>> > >
>> > >I tried searching my archives for more information, but had no luck 
>> > >finding
>> > >any additional information.
>> > >

I agree that we do probably need to update the patch description since
it isn't exactly clear what this is fixing or what was actually
broken.

>> > >>>Signed-off-by: Sasha Neftin 
>> > >>>Tested-by: Aaron Brown 
>> > >>>Signed-off-by: Jeff Kirsher 
>> > >>>---
>> > >>>  drivers/pci/quirks.c | 21 +
>> > >>>  1 file changed, 21 insertions(+)
>> > >>>
>> > >>>diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> > >>>index 44e0ff3..59fba6e 100644
>> > >>>--- a/drivers/pci/quirks.c
>> > >>>+++ b/drivers/pci/quirks.c
>> > >>>@@ -4431,3 +4431,24 @@ static void quirk_intel_qat_vf_cap(struct
>> > >>>pci_dev *pdev)
>> > >>> }
>> > >>>  }
>> > >>>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443,
>> > >>>quirk_intel_qat_vf_cap);
>> > >>>+/*
>> > >>>+ * Workaround FLR issues for 82579
>> > >>>+ * This code disables the FLR (Function Level Reset) via PCIe, in
>> > >>>order
>> > >>>+ * to workaround a bug found while using device passthrough, where the
>> > >>>+ * interface would become non-responsive.
>> > >>>+ * NOTE: the FLR bit is Read/Write Once (RWO) in config space, so if
>> > >>>+ * the BIOS or kernel writes this register * then this workaround will
>> > >>>+ * not work.
>> > >>This doesn't sound like a root cause.  Is the issue a hardware
>> > >>erratum?  Linux PCI core bug?  VFIO bug?  Device firmware bug?
>> > >>
>> > >>The changelog suggests that the problem only affects passthrough,
>> > >>which suggests some sort of kernel bug related to how passthrough is
>> > >>implemented.
>>
>> If this bug affects all scenarios, not just passthrough, the changelog
>> should not mention passthrough.
>>
>> > >>>+ */
>> > >>>+static void quirk_intel_flr_cap_dis(struct pci_dev *dev)
>> > >>>+{
>> > >>>+int pos = pci_find_capability(dev, PCI_CAP_ID_AF);
>> > >>>+if (pos) {
>> > >>>+u8 cap;
>> > >>>+pci_read_config_byte(dev, pos + PCI_AF_CAP, );
>> > >>>+cap = cap & (~PCI_AF_CAP_FLR);
>> > >>>+pci_write_config_byte(dev, pos + PCI_AF_CAP, cap);
>> > >>>+}
>> > >>>+}
>> > >>>+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502,
>> > >>>quirk_intel_flr_cap_dis);
>> > >>>+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503,
>> > >>>quirk_intel_flr_cap_dis);
>> > >>>--
>> > >>>2.7.4
>> > >>>
>> > >>>--
>> > >>>To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> > >>>the body of a message to majord...@vger.kernel.org
>> > >>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>> > Hello,
>> >
>> > Original bugzilla thread could be found here:
>> > https://bugzilla.redhat.com/show_bug.cgi?format=multiple=966840
>>
>> That bugzilla is private and I can't read it.
>
> Hmm, I can, but I don't see anything in it that supports this.  Is that
> really the right bz?  It's the right hardware, but has all sorts of FUD
> about the version of various other components in the stack.

It looks like we had a local copy of the bugzilla saved here, though
it only goes up to comment 13 which is where I think we started
working this on our side.  I believe what this patch is attempting to
resolve is related to comment 8 where the driver returned "probe of
:00:19.0 failed with error ‐2" instead of correctly probing the
interface.

So the bug as reported was that e1000e had a problem reattaching
itself to the PHY after it was attached to a VM.  Sasha, please feel
free to correct me if I have this bit 

Re: [net-next 5/5] PCI: disable FLR for 82579 device

2016-09-27 Thread Alex Williamson
On Tue, 27 Sep 2016 13:17:02 -0500
Bjorn Helgaas  wrote:

> On Sun, Sep 25, 2016 at 10:02:43AM +0300, Neftin, Sasha wrote:
> > On 9/24/2016 12:05 AM, Jeff Kirsher wrote:  
> > >On Fri, 2016-09-23 at 09:01 -0500, Bjorn Helgaas wrote:  
> > >>On Thu, Sep 22, 2016 at 11:39:01PM -0700, Jeff Kirsher wrote:  
> > >>>From: Sasha Neftin 
> > >>>
> > >>>82579 has a problem reattaching itself after the device is detached.
> > >>>The bug was reported by Redhat. The suggested fix is to disable
> > >>>FLR capability in PCIe configuration space.
> > >>>
> > >>>Reproduction:
> > >>>Attach the device to a VM, then detach and try to attach again.
> > >>>
> > >>>Fix:
> > >>>Disable FLR capability to prevent the 82579 from hanging.  
> > >>Is there a bugzilla or other reference URL to include here?  Should
> > >>this be marked for stable?  
> > >So the author is in Israel, meaning it is their weekend now.  I do not
> > >believe Sasha monitors email over the weekend, so a response to your
> > >questions won't happen for a few days.
> > >
> > >I tried searching my archives for more information, but had no luck finding
> > >any additional information.
> > >  
> > >>>Signed-off-by: Sasha Neftin 
> > >>>Tested-by: Aaron Brown 
> > >>>Signed-off-by: Jeff Kirsher 
> > >>>---
> > >>>  drivers/pci/quirks.c | 21 +
> > >>>  1 file changed, 21 insertions(+)
> > >>>
> > >>>diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > >>>index 44e0ff3..59fba6e 100644
> > >>>--- a/drivers/pci/quirks.c
> > >>>+++ b/drivers/pci/quirks.c
> > >>>@@ -4431,3 +4431,24 @@ static void quirk_intel_qat_vf_cap(struct
> > >>>pci_dev *pdev)
> > >>> }
> > >>>  }
> > >>>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443,
> > >>>quirk_intel_qat_vf_cap);
> > >>>+/*
> > >>>+ * Workaround FLR issues for 82579
> > >>>+ * This code disables the FLR (Function Level Reset) via PCIe, in
> > >>>order
> > >>>+ * to workaround a bug found while using device passthrough, where the
> > >>>+ * interface would become non-responsive.
> > >>>+ * NOTE: the FLR bit is Read/Write Once (RWO) in config space, so if
> > >>>+ * the BIOS or kernel writes this register * then this workaround will
> > >>>+ * not work.  
> > >>This doesn't sound like a root cause.  Is the issue a hardware
> > >>erratum?  Linux PCI core bug?  VFIO bug?  Device firmware bug?
> > >>
> > >>The changelog suggests that the problem only affects passthrough,
> > >>which suggests some sort of kernel bug related to how passthrough is
> > >>implemented.  
> 
> If this bug affects all scenarios, not just passthrough, the changelog
> should not mention passthrough.
> 
> > >>>+ */
> > >>>+static void quirk_intel_flr_cap_dis(struct pci_dev *dev)
> > >>>+{
> > >>>+int pos = pci_find_capability(dev, PCI_CAP_ID_AF);
> > >>>+if (pos) {
> > >>>+u8 cap;
> > >>>+pci_read_config_byte(dev, pos + PCI_AF_CAP, );
> > >>>+cap = cap & (~PCI_AF_CAP_FLR);
> > >>>+pci_write_config_byte(dev, pos + PCI_AF_CAP, cap);
> > >>>+}
> > >>>+}
> > >>>+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502,
> > >>>quirk_intel_flr_cap_dis);
> > >>>+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503,
> > >>>quirk_intel_flr_cap_dis);
> > >>>-- 
> > >>>2.7.4
> > >>>
> > >>>--
> > >>>To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> > >>>the body of a message to majord...@vger.kernel.org
> > >>>More majordomo info at  http://vger.kernel.org/majordomo-info.html  
> > 
> > Hello,
> > 
> > Original bugzilla thread could be found here:
> > https://bugzilla.redhat.com/show_bug.cgi?format=multiple=966840  
> 
> That bugzilla is private and I can't read it.

Hmm, I can, but I don't see anything in it that supports this.  Is that
really the right bz?  It's the right hardware, but has all sorts of FUD
about the version of various other components in the stack.
 
> > This is our HW bug, exist only in 82579 devices. More new devices
> > have no such problem. We have found root cause and suggested this
> > solution.   
> 
> Is there an erratum you can reference?
> 
> > This solution should work for a 95% of cases, so I do not
> > think that this is fragile. For another cases possible solution is
> > get up working system and manually disable FLR, before VM start use
> > our adapter.  
> 
> I don't think a 95% solution is sufficient.  Can you use the
> pci_dev_specific_reset() framework to make a 100% solution?

Right, plus when this does work I suspect it removes the one mechanism
we have to reset the device, which depending on how obscure the failure
scenario is, isn't a clear cut improvement for device assignment.
Thanks,

Alex


Re: [net-next 5/5] PCI: disable FLR for 82579 device

2016-09-27 Thread Bjorn Helgaas
On Sun, Sep 25, 2016 at 10:02:43AM +0300, Neftin, Sasha wrote:
> On 9/24/2016 12:05 AM, Jeff Kirsher wrote:
> >On Fri, 2016-09-23 at 09:01 -0500, Bjorn Helgaas wrote:
> >>On Thu, Sep 22, 2016 at 11:39:01PM -0700, Jeff Kirsher wrote:
> >>>From: Sasha Neftin 
> >>>
> >>>82579 has a problem reattaching itself after the device is detached.
> >>>The bug was reported by Redhat. The suggested fix is to disable
> >>>FLR capability in PCIe configuration space.
> >>>
> >>>Reproduction:
> >>>Attach the device to a VM, then detach and try to attach again.
> >>>
> >>>Fix:
> >>>Disable FLR capability to prevent the 82579 from hanging.
> >>Is there a bugzilla or other reference URL to include here?  Should
> >>this be marked for stable?
> >So the author is in Israel, meaning it is their weekend now.  I do not
> >believe Sasha monitors email over the weekend, so a response to your
> >questions won't happen for a few days.
> >
> >I tried searching my archives for more information, but had no luck finding
> >any additional information.
> >
> >>>Signed-off-by: Sasha Neftin 
> >>>Tested-by: Aaron Brown 
> >>>Signed-off-by: Jeff Kirsher 
> >>>---
> >>>  drivers/pci/quirks.c | 21 +
> >>>  1 file changed, 21 insertions(+)
> >>>
> >>>diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> >>>index 44e0ff3..59fba6e 100644
> >>>--- a/drivers/pci/quirks.c
> >>>+++ b/drivers/pci/quirks.c
> >>>@@ -4431,3 +4431,24 @@ static void quirk_intel_qat_vf_cap(struct
> >>>pci_dev *pdev)
> >>>   }
> >>>  }
> >>>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443,
> >>>quirk_intel_qat_vf_cap);
> >>>+/*
> >>>+ * Workaround FLR issues for 82579
> >>>+ * This code disables the FLR (Function Level Reset) via PCIe, in
> >>>order
> >>>+ * to workaround a bug found while using device passthrough, where the
> >>>+ * interface would become non-responsive.
> >>>+ * NOTE: the FLR bit is Read/Write Once (RWO) in config space, so if
> >>>+ * the BIOS or kernel writes this register * then this workaround will
> >>>+ * not work.
> >>This doesn't sound like a root cause.  Is the issue a hardware
> >>erratum?  Linux PCI core bug?  VFIO bug?  Device firmware bug?
> >>
> >>The changelog suggests that the problem only affects passthrough,
> >>which suggests some sort of kernel bug related to how passthrough is
> >>implemented.

If this bug affects all scenarios, not just passthrough, the changelog
should not mention passthrough.

> >>>+ */
> >>>+static void quirk_intel_flr_cap_dis(struct pci_dev *dev)
> >>>+{
> >>>+  int pos = pci_find_capability(dev, PCI_CAP_ID_AF);
> >>>+  if (pos) {
> >>>+  u8 cap;
> >>>+  pci_read_config_byte(dev, pos + PCI_AF_CAP, );
> >>>+  cap = cap & (~PCI_AF_CAP_FLR);
> >>>+  pci_write_config_byte(dev, pos + PCI_AF_CAP, cap);
> >>>+  }
> >>>+}
> >>>+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502,
> >>>quirk_intel_flr_cap_dis);
> >>>+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503,
> >>>quirk_intel_flr_cap_dis);
> >>>-- 
> >>>2.7.4
> >>>
> >>>--
> >>>To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> >>>the body of a message to majord...@vger.kernel.org
> >>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> Hello,
> 
> Original bugzilla thread could be found here:
> https://bugzilla.redhat.com/show_bug.cgi?format=multiple=966840

That bugzilla is private and I can't read it.

> This is our HW bug, exist only in 82579 devices. More new devices
> have no such problem. We have found root cause and suggested this
> solution. 

Is there an erratum you can reference?

> This solution should work for a 95% of cases, so I do not
> think that this is fragile. For another cases possible solution is
> get up working system and manually disable FLR, before VM start use
> our adapter.

I don't think a 95% solution is sufficient.  Can you use the
pci_dev_specific_reset() framework to make a 100% solution?

Bjorn


Re: [net-next 5/5] PCI: disable FLR for 82579 device

2016-09-25 Thread Neftin, Sasha

On 9/24/2016 12:05 AM, Jeff Kirsher wrote:

On Fri, 2016-09-23 at 09:01 -0500, Bjorn Helgaas wrote:

On Thu, Sep 22, 2016 at 11:39:01PM -0700, Jeff Kirsher wrote:

From: Sasha Neftin 

82579 has a problem reattaching itself after the device is detached.
The bug was reported by Redhat. The suggested fix is to disable
FLR capability in PCIe configuration space.

Reproduction:
Attach the device to a VM, then detach and try to attach again.

Fix:
Disable FLR capability to prevent the 82579 from hanging.

Is there a bugzilla or other reference URL to include here?  Should
this be marked for stable?

So the author is in Israel, meaning it is their weekend now.  I do not
believe Sasha monitors email over the weekend, so a response to your
questions won't happen for a few days.

I tried searching my archives for more information, but had no luck finding
any additional information.


Signed-off-by: Sasha Neftin 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
  drivers/pci/quirks.c | 21 +
  1 file changed, 21 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 44e0ff3..59fba6e 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4431,3 +4431,24 @@ static void quirk_intel_qat_vf_cap(struct
pci_dev *pdev)
}
  }
  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443,
quirk_intel_qat_vf_cap);
+/*
+ * Workaround FLR issues for 82579
+ * This code disables the FLR (Function Level Reset) via PCIe, in
order
+ * to workaround a bug found while using device passthrough, where the
+ * interface would become non-responsive.
+ * NOTE: the FLR bit is Read/Write Once (RWO) in config space, so if
+ * the BIOS or kernel writes this register * then this workaround will
+ * not work.

This doesn't sound like a root cause.  Is the issue a hardware
erratum?  Linux PCI core bug?  VFIO bug?  Device firmware bug?

The changelog suggests that the problem only affects passthrough,
which suggests some sort of kernel bug related to how passthrough is
implemented.


+ */
+static void quirk_intel_flr_cap_dis(struct pci_dev *dev)
+{
+   int pos = pci_find_capability(dev, PCI_CAP_ID_AF);
+   if (pos) {
+   u8 cap;
+   pci_read_config_byte(dev, pos + PCI_AF_CAP, );
+   cap = cap & (~PCI_AF_CAP_FLR);
+   pci_write_config_byte(dev, pos + PCI_AF_CAP, cap);
+   }
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502,
quirk_intel_flr_cap_dis);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503,
quirk_intel_flr_cap_dis);
--
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hello,

Original bugzilla thread could be found here: 
https://bugzilla.redhat.com/show_bug.cgi?format=multiple=966840


This is our HW bug, exist only in 82579 devices. More new devices have 
no such problem. We have found root cause and suggested this solution. 
This solution should work for a 95% of cases, so I do not think that 
this is fragile. For another cases possible solution is get up working 
system and manually disable FLR, before VM start use our adapter.


Thanks,

Sasha



Re: [net-next 5/5] PCI: disable FLR for 82579 device

2016-09-23 Thread Jeff Kirsher
On Fri, 2016-09-23 at 09:01 -0500, Bjorn Helgaas wrote:
> On Thu, Sep 22, 2016 at 11:39:01PM -0700, Jeff Kirsher wrote:
> > 
> > From: Sasha Neftin 
> > 
> > 82579 has a problem reattaching itself after the device is detached.
> > The bug was reported by Redhat. The suggested fix is to disable
> > FLR capability in PCIe configuration space.
> > 
> > Reproduction:
> > Attach the device to a VM, then detach and try to attach again.
> > 
> > Fix:
> > Disable FLR capability to prevent the 82579 from hanging.
> 
> Is there a bugzilla or other reference URL to include here?  Should
> this be marked for stable?

So the author is in Israel, meaning it is their weekend now.  I do not
believe Sasha monitors email over the weekend, so a response to your
questions won't happen for a few days.

I tried searching my archives for more information, but had no luck finding
any additional information.

> > Signed-off-by: Sasha Neftin 
> > Tested-by: Aaron Brown 
> > Signed-off-by: Jeff Kirsher 
> > ---
> >  drivers/pci/quirks.c | 21 +
> >  1 file changed, 21 insertions(+)
> > 
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index 44e0ff3..59fba6e 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -4431,3 +4431,24 @@ static void quirk_intel_qat_vf_cap(struct
> > pci_dev *pdev)
> >     }
> >  }
> >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443,
> > quirk_intel_qat_vf_cap);
> > +/*
> > + * Workaround FLR issues for 82579
> > + * This code disables the FLR (Function Level Reset) via PCIe, in
> > order
> > + * to workaround a bug found while using device passthrough, where the
> > + * interface would become non-responsive.
> > + * NOTE: the FLR bit is Read/Write Once (RWO) in config space, so if
> > + * the BIOS or kernel writes this register * then this workaround will
> > + * not work.
> 
> This doesn't sound like a root cause.  Is the issue a hardware
> erratum?  Linux PCI core bug?  VFIO bug?  Device firmware bug?
> 
> The changelog suggests that the problem only affects passthrough,
> which suggests some sort of kernel bug related to how passthrough is
> implemented.
> 
> > 
> > + */
> > +static void quirk_intel_flr_cap_dis(struct pci_dev *dev)
> > +{
> > +   int pos = pci_find_capability(dev, PCI_CAP_ID_AF);
> > +   if (pos) {
> > +   u8 cap;
> > +   pci_read_config_byte(dev, pos + PCI_AF_CAP, );
> > +   cap = cap & (~PCI_AF_CAP_FLR);
> > +   pci_write_config_byte(dev, pos + PCI_AF_CAP, cap);
> > +   }
> > +}
> > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502,
> > quirk_intel_flr_cap_dis);
> > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503,
> > quirk_intel_flr_cap_dis);
> > -- 
> > 2.7.4
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html


signature.asc
Description: This is a digitally signed message part


Re: [net-next 5/5] PCI: disable FLR for 82579 device

2016-09-23 Thread Bjorn Helgaas
On Thu, Sep 22, 2016 at 11:39:01PM -0700, Jeff Kirsher wrote:
> From: Sasha Neftin 
> 
> 82579 has a problem reattaching itself after the device is detached.
> The bug was reported by Redhat. The suggested fix is to disable
> FLR capability in PCIe configuration space.
> 
> Reproduction:
> Attach the device to a VM, then detach and try to attach again.
> 
> Fix:
> Disable FLR capability to prevent the 82579 from hanging.

Is there a bugzilla or other reference URL to include here?  Should
this be marked for stable?

> Signed-off-by: Sasha Neftin 
> Tested-by: Aaron Brown 
> Signed-off-by: Jeff Kirsher 
> ---
>  drivers/pci/quirks.c | 21 +
>  1 file changed, 21 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 44e0ff3..59fba6e 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4431,3 +4431,24 @@ static void quirk_intel_qat_vf_cap(struct pci_dev 
> *pdev)
>   }
>  }
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, quirk_intel_qat_vf_cap);
> +/*
> + * Workaround FLR issues for 82579
> + * This code disables the FLR (Function Level Reset) via PCIe, in order
> + * to workaround a bug found while using device passthrough, where the
> + * interface would become non-responsive.
> + * NOTE: the FLR bit is Read/Write Once (RWO) in config space, so if
> + * the BIOS or kernel writes this register * then this workaround will
> + * not work.

This doesn't sound like a root cause.  Is the issue a hardware
erratum?  Linux PCI core bug?  VFIO bug?  Device firmware bug?

The changelog suggests that the problem only affects passthrough,
which suggests some sort of kernel bug related to how passthrough is
implemented.

> + */
> +static void quirk_intel_flr_cap_dis(struct pci_dev *dev)
> +{
> + int pos = pci_find_capability(dev, PCI_CAP_ID_AF);
> + if (pos) {
> + u8 cap;
> + pci_read_config_byte(dev, pos + PCI_AF_CAP, );
> + cap = cap & (~PCI_AF_CAP_FLR);
> + pci_write_config_byte(dev, pos + PCI_AF_CAP, cap);
> + }
> +}
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502, 
> quirk_intel_flr_cap_dis);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503, 
> quirk_intel_flr_cap_dis);
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next 5/5] PCI: disable FLR for 82579 device

2016-09-23 Thread Alex Williamson
On Thu, 22 Sep 2016 23:39:01 -0700
Jeff Kirsher  wrote:

> From: Sasha Neftin 
> 
> 82579 has a problem reattaching itself after the device is detached.
> The bug was reported by Redhat. The suggested fix is to disable
> FLR capability in PCIe configuration space.
> 
> Reproduction:
> Attach the device to a VM, then detach and try to attach again.
> 
> Fix:
> Disable FLR capability to prevent the 82579 from hanging.
> 
> Signed-off-by: Sasha Neftin 
> Tested-by: Aaron Brown 
> Signed-off-by: Jeff Kirsher 
> ---
>  drivers/pci/quirks.c | 21 +
>  1 file changed, 21 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 44e0ff3..59fba6e 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4431,3 +4431,24 @@ static void quirk_intel_qat_vf_cap(struct pci_dev 
> *pdev)
>   }
>  }
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, quirk_intel_qat_vf_cap);
> +/*
> + * Workaround FLR issues for 82579
> + * This code disables the FLR (Function Level Reset) via PCIe, in order
> + * to workaround a bug found while using device passthrough, where the
> + * interface would become non-responsive.
> + * NOTE: the FLR bit is Read/Write Once (RWO) in config space, so if
> + * the BIOS or kernel writes this register * then this workaround will
> + * not work.
> + */
> +static void quirk_intel_flr_cap_dis(struct pci_dev *dev)
> +{
> + int pos = pci_find_capability(dev, PCI_CAP_ID_AF);
> + if (pos) {
> + u8 cap;
> + pci_read_config_byte(dev, pos + PCI_AF_CAP, );
> + cap = cap & (~PCI_AF_CAP_FLR);
> + pci_write_config_byte(dev, pos + PCI_AF_CAP, cap);
> + }
> +}
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502, 
> quirk_intel_flr_cap_dis);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503, 
> quirk_intel_flr_cap_dis);

This seems like a pretty fragile quirk since we're just hoping that the
BIOS hasn't already written this byte.  Should we at least re-read and
warn if the write didn't take?  What about using dev_flags or a device
specific reset to make this less fragile?  A device specific reset
could pick the best reset mechanism for the device, ignoring AF FLR.
Thanks,

Alex


Re: [net-next 5/5] PCI: disable FLR for 82579 device

2016-09-23 Thread Sergei Shtylyov

Hello.

On 9/23/2016 9:39 AM, Jeff Kirsher wrote:


From: Sasha Neftin 

82579 has a problem reattaching itself after the device is detached.
The bug was reported by Redhat. The suggested fix is to disable
FLR capability in PCIe configuration space.

Reproduction:
Attach the device to a VM, then detach and try to attach again.

Fix:
Disable FLR capability to prevent the 82579 from hanging.

Signed-off-by: Sasha Neftin 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/pci/quirks.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 44e0ff3..59fba6e 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4431,3 +4431,24 @@ static void quirk_intel_qat_vf_cap(struct pci_dev *pdev)
}
 }
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, quirk_intel_qat_vf_cap);
+/*
+ * Workaround FLR issues for 82579
+ * This code disables the FLR (Function Level Reset) via PCIe, in order
+ * to workaround a bug found while using device passthrough, where the
+ * interface would become non-responsive.
+ * NOTE: the FLR bit is Read/Write Once (RWO) in config space, so if
+ * the BIOS or kernel writes this register * then this workaround will

  ^
   That asterisk shouldn't be there.


+ * not work.
+ */
+static void quirk_intel_flr_cap_dis(struct pci_dev *dev)
+{
+   int pos = pci_find_capability(dev, PCI_CAP_ID_AF);


   Should be an empty line here...


+   if (pos) {
+   u8 cap;


   And here...


+   pci_read_config_byte(dev, pos + PCI_AF_CAP, );
+   cap = cap & (~PCI_AF_CAP_FLR);


   () not needed.


+   pci_write_config_byte(dev, pos + PCI_AF_CAP, cap);
+   }
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502, quirk_intel_flr_cap_dis);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503, quirk_intel_flr_cap_dis);


MBR, Sergei



[net-next 5/5] PCI: disable FLR for 82579 device

2016-09-23 Thread Jeff Kirsher
From: Sasha Neftin 

82579 has a problem reattaching itself after the device is detached.
The bug was reported by Redhat. The suggested fix is to disable
FLR capability in PCIe configuration space.

Reproduction:
Attach the device to a VM, then detach and try to attach again.

Fix:
Disable FLR capability to prevent the 82579 from hanging.

Signed-off-by: Sasha Neftin 
Tested-by: Aaron Brown 
Signed-off-by: Jeff Kirsher 
---
 drivers/pci/quirks.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 44e0ff3..59fba6e 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4431,3 +4431,24 @@ static void quirk_intel_qat_vf_cap(struct pci_dev *pdev)
}
 }
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, quirk_intel_qat_vf_cap);
+/*
+ * Workaround FLR issues for 82579
+ * This code disables the FLR (Function Level Reset) via PCIe, in order
+ * to workaround a bug found while using device passthrough, where the
+ * interface would become non-responsive.
+ * NOTE: the FLR bit is Read/Write Once (RWO) in config space, so if
+ * the BIOS or kernel writes this register * then this workaround will
+ * not work.
+ */
+static void quirk_intel_flr_cap_dis(struct pci_dev *dev)
+{
+   int pos = pci_find_capability(dev, PCI_CAP_ID_AF);
+   if (pos) {
+   u8 cap;
+   pci_read_config_byte(dev, pos + PCI_AF_CAP, );
+   cap = cap & (~PCI_AF_CAP_FLR);
+   pci_write_config_byte(dev, pos + PCI_AF_CAP, cap);
+   }
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502, quirk_intel_flr_cap_dis);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503, quirk_intel_flr_cap_dis);
-- 
2.7.4