Re: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported

2017-08-09 Thread Ding Tianhong
On 2017/8/9 11:25, Bjorn Helgaas wrote:
> On Tue, Aug 08, 2017 at 09:22:39PM -0500, Bjorn Helgaas wrote:
>> On Sat, Aug 05, 2017 at 03:15:11PM +0800, Ding Tianhong wrote:
>>> When bit4 is set in the PCIe Device Control register, it indicates

> After looking at the driver, I wonder if it would be simpler like
> this:
> 
>   int pcie_relaxed_ordering_enabled(struct pci_dev *dev)
>   {
> u16 ctl;
> 
> pcie_capability_read_word(dev, PCI_EXP_DEVCTL, );
> return ctl & PCI_EXP_DEVCTL_RELAX_EN;
>   }
>   EXPORT_SYMBOL(pcie_relaxed_ordering_enabled);
> 
>   static void pci_configure_relaxed_ordering(struct pci_dev *dev)
>   {
> struct pci_dev *root;
> 
> if (dev->is_virtfn)
>   return;  /* PCI_EXP_DEVCTL_RELAX_EN is RsvdP in VFs */
> 
> if (!pcie_relaxed_ordering_enabled(dev))
>   return;
> 
> /*
>  * For now, we only deal with Relaxed Ordering issues with Root
>  * Ports.  Peer-to-peer DMA is another can of worms.
>  */
> root = pci_find_pcie_root_port(dev);
> if (!root)
>   return;
> 
> if (root->relaxed_ordering_broken)
>   pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
>  PCI_EXP_DEVCTL_RELAX_EN);
>   }
> 
> This doesn't check every intervening switch, but I don't think we know
> about any issues except with root ports.
> 

Yes

> And the driver could do:
> 
>   if (!pcie_relaxed_ordering_enabled(pdev))
> adapter->flags |= ROOT_NO_RELAXED_ORDERING;
> 
> The driver code wouldn't show anything about coherent memory vs.
> peer-to-peer, but we really don't have a clue about how to handle that
> yet anyway.
> 
> I guess this is back to exactly what you proposed, except that I
> changed the name of pcie_relaxed_ordering_supported() to
> pcie_relaxed_ordering_enabled(), which I think is slightly more
> specific from the device's point of view.
> 

OK, looks like we reach a consensus finally, I will follow your new opinion and 
resend, thanks.

Ding

> Bjorn
> 
> .
> 



Re: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported

2017-08-09 Thread Ding Tianhong
On 2017/8/9 11:25, Bjorn Helgaas wrote:
> On Tue, Aug 08, 2017 at 09:22:39PM -0500, Bjorn Helgaas wrote:
>> On Sat, Aug 05, 2017 at 03:15:11PM +0800, Ding Tianhong wrote:
>>> When bit4 is set in the PCIe Device Control register, it indicates

> After looking at the driver, I wonder if it would be simpler like
> this:
> 
>   int pcie_relaxed_ordering_enabled(struct pci_dev *dev)
>   {
> u16 ctl;
> 
> pcie_capability_read_word(dev, PCI_EXP_DEVCTL, );
> return ctl & PCI_EXP_DEVCTL_RELAX_EN;
>   }
>   EXPORT_SYMBOL(pcie_relaxed_ordering_enabled);
> 
>   static void pci_configure_relaxed_ordering(struct pci_dev *dev)
>   {
> struct pci_dev *root;
> 
> if (dev->is_virtfn)
>   return;  /* PCI_EXP_DEVCTL_RELAX_EN is RsvdP in VFs */
> 
> if (!pcie_relaxed_ordering_enabled(dev))
>   return;
> 
> /*
>  * For now, we only deal with Relaxed Ordering issues with Root
>  * Ports.  Peer-to-peer DMA is another can of worms.
>  */
> root = pci_find_pcie_root_port(dev);
> if (!root)
>   return;
> 
> if (root->relaxed_ordering_broken)
>   pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
>  PCI_EXP_DEVCTL_RELAX_EN);
>   }
> 
> This doesn't check every intervening switch, but I don't think we know
> about any issues except with root ports.
> 

Yes

> And the driver could do:
> 
>   if (!pcie_relaxed_ordering_enabled(pdev))
> adapter->flags |= ROOT_NO_RELAXED_ORDERING;
> 
> The driver code wouldn't show anything about coherent memory vs.
> peer-to-peer, but we really don't have a clue about how to handle that
> yet anyway.
> 
> I guess this is back to exactly what you proposed, except that I
> changed the name of pcie_relaxed_ordering_supported() to
> pcie_relaxed_ordering_enabled(), which I think is slightly more
> specific from the device's point of view.
> 

OK, looks like we reach a consensus finally, I will follow your new opinion and 
resend, thanks.

Ding

> Bjorn
> 
> .
> 



Re: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported

2017-08-09 Thread Ding Tianhong
Hi Bjorn:

On 2017/8/9 10:22, Bjorn Helgaas wrote:
> On Sat, Aug 05, 2017 at 03:15:11PM +0800, Ding Tianhong wrote:
>> When bit4 is set in the PCIe Device Control register, it indicates
>> whether the device is permitted to use relaxed ordering.
>> On some platforms using relaxed ordering can have performance issues or
>> due to erratum can cause data-corruption. In such cases devices must avoid
>> using relaxed ordering.
>>
>> This patch checks if there is any node in the hierarchy that indicates that
>> using relaxed ordering is not safe. 
> 
> I think you only check the devices between the root port and the
> target device.  For example, you don't check siblings or cousins of
> the target device.
> 

OK, update the description.

>> In such cases the patch turns off the
>> relaxed ordering by clearing the eapability for this device.
> 
> s/eapability/capability/
> 
>> And if the
>> device is probably running in a guest machine, we should do nothing.
> 
> I don't know what this sentence means.  "Probably running in a guest
> machine" doesn't really make sense, and there's nothing in your patch
> that explicitly checks for being in a guest machine.
> 

Alex noticed that we should do nothing if in the virtual machine because
the Root Complex is NULL at that time, so I think this word should be more
clearly here.

>> Signed-off-by: Ding Tianhong 
>> Acked-by: Alexander Duyck 
>> Acked-by: Ashok Raj 
>> ---
>>  drivers/pci/pci.c   | 29 +
>>  drivers/pci/probe.c | 37 +
>>  include/linux/pci.h |  2 ++
>>  3 files changed, 68 insertions(+)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index af0cc34..4f9d7c1 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
>>  EXPORT_SYMBOL(pcie_set_mps);
>>  
>>  /**
>> + * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit
>> + * @dev: PCI device to query
>> + *
>> + * If possible clear relaxed ordering
> 
> Why "If possible"?  The bit is required to be RW or hardwired to zero,
> so PCI_EXP_DEVCTL_RELAX_EN should *always* be zero when this returns.
> 

OK

>> + */
>> +int pcie_clear_relaxed_ordering(struct pci_dev *dev)
>> +{
...
>>
>> ___
>> linux-arm-kernel mailing list
>> linux-arm-ker...@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> .
> 



Re: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported

2017-08-09 Thread Ding Tianhong
Hi Bjorn:

On 2017/8/9 10:22, Bjorn Helgaas wrote:
> On Sat, Aug 05, 2017 at 03:15:11PM +0800, Ding Tianhong wrote:
>> When bit4 is set in the PCIe Device Control register, it indicates
>> whether the device is permitted to use relaxed ordering.
>> On some platforms using relaxed ordering can have performance issues or
>> due to erratum can cause data-corruption. In such cases devices must avoid
>> using relaxed ordering.
>>
>> This patch checks if there is any node in the hierarchy that indicates that
>> using relaxed ordering is not safe. 
> 
> I think you only check the devices between the root port and the
> target device.  For example, you don't check siblings or cousins of
> the target device.
> 

OK, update the description.

>> In such cases the patch turns off the
>> relaxed ordering by clearing the eapability for this device.
> 
> s/eapability/capability/
> 
>> And if the
>> device is probably running in a guest machine, we should do nothing.
> 
> I don't know what this sentence means.  "Probably running in a guest
> machine" doesn't really make sense, and there's nothing in your patch
> that explicitly checks for being in a guest machine.
> 

Alex noticed that we should do nothing if in the virtual machine because
the Root Complex is NULL at that time, so I think this word should be more
clearly here.

>> Signed-off-by: Ding Tianhong 
>> Acked-by: Alexander Duyck 
>> Acked-by: Ashok Raj 
>> ---
>>  drivers/pci/pci.c   | 29 +
>>  drivers/pci/probe.c | 37 +
>>  include/linux/pci.h |  2 ++
>>  3 files changed, 68 insertions(+)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index af0cc34..4f9d7c1 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
>>  EXPORT_SYMBOL(pcie_set_mps);
>>  
>>  /**
>> + * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit
>> + * @dev: PCI device to query
>> + *
>> + * If possible clear relaxed ordering
> 
> Why "If possible"?  The bit is required to be RW or hardwired to zero,
> so PCI_EXP_DEVCTL_RELAX_EN should *always* be zero when this returns.
> 

OK

>> + */
>> +int pcie_clear_relaxed_ordering(struct pci_dev *dev)
>> +{
...
>>
>> ___
>> linux-arm-kernel mailing list
>> linux-arm-ker...@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> .
> 



Re: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported

2017-08-09 Thread Casey Leedom
| From: Bjorn Helgaas 
| Sent: Tuesday, August 8, 2017 7:22 PM
| ...
| and the caller should do something like this:
| 
| if (pcie_relaxed_ordering_broken(pci_find_pcie_root_port(pdev)))
| adapter->flags |= ROOT_NO_RELAXED_ORDERING;
| 
| That way it's obvious where the issue is, and it's obvious that the
| answer might be different for peer-to-peer transactions than it is for
| transactions to the root port, i.e., to coherent memory.
| ...

Which is back to something very close to what I initially suggested in my
first patch submission.  Because you're right, this isn't about broken
Source Devices, it's about broken Completing Devices.

Unfortunately, as Alexander Duyck noted, in a Virtual Machine we won't be
able to see our Root Port in order to make this determination.  (And in some
Hypervisor implementations I've seen, there's not even a synthetic Root Port
available to the VM at all, let alone read-only access to the real one.)

So the current scheme was developed of having the Hypervisor kernel traverse
down the PCIe Fabric when it finds a broken Root Port implementation (the
issue that we've mostly been primarily focused upon), and turning off the
PCIe Capability Device Control[Relaxed Ordering Enable].  This was to serve
two purposes:

 1. Turn that off in order to prevent sending any Transaction Layer
Packets with the Relaxed Ordering Attribute to any Completer.
Which unfortunately would also prevent Peer-to-Peer use of the
Relaxed Ordering Attribute.

 2. Act as a message to Device Drivers for those downstream devices
that the they were dealing with a broken Root Port implementation.
And this would work even for a driver in a VM with an attached
device since it would be able to see the PCIe Configuration Space
for the attached device.

I haven't been excited about any of this because:

 A. While so far all of the examples we've talked about are broken
Root Port Completers, it's perfectly possible that other devices
could be broken -- say an NVMe Device which is not "Coherent
Memory".  How would this fit into the framework APIs being
described?

 B. I have yet to see a n example of how the currently proposed
API framework would be used in a hybrid environment where
TLPs to the Root Port would not use Relaxed Ordering, but TLPs
to a Peer would use Relaxed Ordering.  So far its all been about
using a "big hammer" to completely disable the use of Relaxed
Ordering.

But the VM problem keeps cropping up over and over.  A driver in a VM
doesn't have access to the Root Port to determine if its on a "Black List"
and our only way of communicating with the driver in the VM is to leave the
device in a particular state (i.e. initialize the PCIe Capability Device
Control[Relaxed Ordering Enable] to "off").

Oh, and also, on the current patch submission's focus on broken Root Port
implementations: one could suggest that even if we're stuck with the "Device
attached to a VM Conundrum", that what we should really be thinking about is
if ANY device within a PCIe Fabric has broken Relaxed Ordering completion
problems, and, if so, "poisoning" the entire containing PCIe Fabric by
turning off Relaxed Ordering Enable for every device, up, down sideways --
including the Root Port itself.

| ...
| This associates the message with the Requester that may potentially
| use relaxed ordering. But there's nothing wrong or unusual about the
| Requester; the issue is with the *Completer*, so I think the message
| should be in the quirk where we set PCI_DEV_FLAGS_NO_RELAXED_ORDERING.
| Maybe it should be both places; I dunno.
| 
| This implementation assumes the device only initiates transactions to
| coherent memory, i.e., it assumes the device never does peer-to-peer
| DMA.  I guess we'll have to wait and see if we trip over any
| peer-to-peer issues, then figure out how to handle them.
| ...

Yes, as soon as we want to implement the hybrid use of Relaxed Ordering I
mentioned above.  And that the Intel document mentions itself.

Casey


Re: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported

2017-08-09 Thread Casey Leedom
| From: Bjorn Helgaas 
| Sent: Tuesday, August 8, 2017 7:22 PM
| ...
| and the caller should do something like this:
| 
| if (pcie_relaxed_ordering_broken(pci_find_pcie_root_port(pdev)))
| adapter->flags |= ROOT_NO_RELAXED_ORDERING;
| 
| That way it's obvious where the issue is, and it's obvious that the
| answer might be different for peer-to-peer transactions than it is for
| transactions to the root port, i.e., to coherent memory.
| ...

Which is back to something very close to what I initially suggested in my
first patch submission.  Because you're right, this isn't about broken
Source Devices, it's about broken Completing Devices.

Unfortunately, as Alexander Duyck noted, in a Virtual Machine we won't be
able to see our Root Port in order to make this determination.  (And in some
Hypervisor implementations I've seen, there's not even a synthetic Root Port
available to the VM at all, let alone read-only access to the real one.)

So the current scheme was developed of having the Hypervisor kernel traverse
down the PCIe Fabric when it finds a broken Root Port implementation (the
issue that we've mostly been primarily focused upon), and turning off the
PCIe Capability Device Control[Relaxed Ordering Enable].  This was to serve
two purposes:

 1. Turn that off in order to prevent sending any Transaction Layer
Packets with the Relaxed Ordering Attribute to any Completer.
Which unfortunately would also prevent Peer-to-Peer use of the
Relaxed Ordering Attribute.

 2. Act as a message to Device Drivers for those downstream devices
that the they were dealing with a broken Root Port implementation.
And this would work even for a driver in a VM with an attached
device since it would be able to see the PCIe Configuration Space
for the attached device.

I haven't been excited about any of this because:

 A. While so far all of the examples we've talked about are broken
Root Port Completers, it's perfectly possible that other devices
could be broken -- say an NVMe Device which is not "Coherent
Memory".  How would this fit into the framework APIs being
described?

 B. I have yet to see a n example of how the currently proposed
API framework would be used in a hybrid environment where
TLPs to the Root Port would not use Relaxed Ordering, but TLPs
to a Peer would use Relaxed Ordering.  So far its all been about
using a "big hammer" to completely disable the use of Relaxed
Ordering.

But the VM problem keeps cropping up over and over.  A driver in a VM
doesn't have access to the Root Port to determine if its on a "Black List"
and our only way of communicating with the driver in the VM is to leave the
device in a particular state (i.e. initialize the PCIe Capability Device
Control[Relaxed Ordering Enable] to "off").

Oh, and also, on the current patch submission's focus on broken Root Port
implementations: one could suggest that even if we're stuck with the "Device
attached to a VM Conundrum", that what we should really be thinking about is
if ANY device within a PCIe Fabric has broken Relaxed Ordering completion
problems, and, if so, "poisoning" the entire containing PCIe Fabric by
turning off Relaxed Ordering Enable for every device, up, down sideways --
including the Root Port itself.

| ...
| This associates the message with the Requester that may potentially
| use relaxed ordering. But there's nothing wrong or unusual about the
| Requester; the issue is with the *Completer*, so I think the message
| should be in the quirk where we set PCI_DEV_FLAGS_NO_RELAXED_ORDERING.
| Maybe it should be both places; I dunno.
| 
| This implementation assumes the device only initiates transactions to
| coherent memory, i.e., it assumes the device never does peer-to-peer
| DMA.  I guess we'll have to wait and see if we trip over any
| peer-to-peer issues, then figure out how to handle them.
| ...

Yes, as soon as we want to implement the hybrid use of Relaxed Ordering I
mentioned above.  And that the Intel document mentions itself.

Casey


Re: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported

2017-08-08 Thread Bjorn Helgaas
On Tue, Aug 08, 2017 at 09:22:39PM -0500, Bjorn Helgaas wrote:
> On Sat, Aug 05, 2017 at 03:15:11PM +0800, Ding Tianhong wrote:
> > When bit4 is set in the PCIe Device Control register, it indicates
> > whether the device is permitted to use relaxed ordering.
> > On some platforms using relaxed ordering can have performance issues or
> > due to erratum can cause data-corruption. In such cases devices must avoid
> > using relaxed ordering.
> > 
> > This patch checks if there is any node in the hierarchy that indicates that
> > using relaxed ordering is not safe. 
...

> > +EXPORT_SYMBOL(pcie_relaxed_ordering_supported);
> 
> This is misnamed.  This doesn't tell us anything about whether the
> device *supports* relaxed ordering.  It only tells us whether the
> device is *permitted* to use it.
> 
> When a device initiates a transaction, the hardware should set the RO
> bit in the TLP with logic something like this:
> 
>   RO =  &&
> &&
>
> 
> The issue you're fixing is that some Completers don't handle RO
> correctly.  The determining factor is not the Requester, but the
> Completer (for this series, a Root Port).  So I think this should be
> something like:
> 
>   int pcie_relaxed_ordering_broken(struct pci_dev *completer)
>   {
> if (!completer)
>   return 0;
> 
> return completer->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
>   }
> 
> and the caller should do something like this:
> 
>  if (pcie_relaxed_ordering_broken(pci_find_pcie_root_port(pdev)))
>adapter->flags |= ROOT_NO_RELAXED_ORDERING;
> 
> That way it's obvious where the issue is, and it's obvious that the
> answer might be different for peer-to-peer transactions than it is for
> transactions to the root port, i.e., to coherent memory.

After looking at the driver, I wonder if it would be simpler like
this:

  int pcie_relaxed_ordering_enabled(struct pci_dev *dev)
  {
u16 ctl;

pcie_capability_read_word(dev, PCI_EXP_DEVCTL, );
return ctl & PCI_EXP_DEVCTL_RELAX_EN;
  }
  EXPORT_SYMBOL(pcie_relaxed_ordering_enabled);

  static void pci_configure_relaxed_ordering(struct pci_dev *dev)
  {
struct pci_dev *root;

if (dev->is_virtfn)
  return;  /* PCI_EXP_DEVCTL_RELAX_EN is RsvdP in VFs */

if (!pcie_relaxed_ordering_enabled(dev))
  return;

/*
 * For now, we only deal with Relaxed Ordering issues with Root
 * Ports.  Peer-to-peer DMA is another can of worms.
 */
root = pci_find_pcie_root_port(dev);
if (!root)
  return;

if (root->relaxed_ordering_broken)
  pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
 PCI_EXP_DEVCTL_RELAX_EN);
  }

This doesn't check every intervening switch, but I don't think we know
about any issues except with root ports.

And the driver could do:

  if (!pcie_relaxed_ordering_enabled(pdev))
adapter->flags |= ROOT_NO_RELAXED_ORDERING;

The driver code wouldn't show anything about coherent memory vs.
peer-to-peer, but we really don't have a clue about how to handle that
yet anyway.

I guess this is back to exactly what you proposed, except that I
changed the name of pcie_relaxed_ordering_supported() to
pcie_relaxed_ordering_enabled(), which I think is slightly more
specific from the device's point of view.

Bjorn


Re: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported

2017-08-08 Thread Bjorn Helgaas
On Tue, Aug 08, 2017 at 09:22:39PM -0500, Bjorn Helgaas wrote:
> On Sat, Aug 05, 2017 at 03:15:11PM +0800, Ding Tianhong wrote:
> > When bit4 is set in the PCIe Device Control register, it indicates
> > whether the device is permitted to use relaxed ordering.
> > On some platforms using relaxed ordering can have performance issues or
> > due to erratum can cause data-corruption. In such cases devices must avoid
> > using relaxed ordering.
> > 
> > This patch checks if there is any node in the hierarchy that indicates that
> > using relaxed ordering is not safe. 
...

> > +EXPORT_SYMBOL(pcie_relaxed_ordering_supported);
> 
> This is misnamed.  This doesn't tell us anything about whether the
> device *supports* relaxed ordering.  It only tells us whether the
> device is *permitted* to use it.
> 
> When a device initiates a transaction, the hardware should set the RO
> bit in the TLP with logic something like this:
> 
>   RO =  &&
> &&
>
> 
> The issue you're fixing is that some Completers don't handle RO
> correctly.  The determining factor is not the Requester, but the
> Completer (for this series, a Root Port).  So I think this should be
> something like:
> 
>   int pcie_relaxed_ordering_broken(struct pci_dev *completer)
>   {
> if (!completer)
>   return 0;
> 
> return completer->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
>   }
> 
> and the caller should do something like this:
> 
>  if (pcie_relaxed_ordering_broken(pci_find_pcie_root_port(pdev)))
>adapter->flags |= ROOT_NO_RELAXED_ORDERING;
> 
> That way it's obvious where the issue is, and it's obvious that the
> answer might be different for peer-to-peer transactions than it is for
> transactions to the root port, i.e., to coherent memory.

After looking at the driver, I wonder if it would be simpler like
this:

  int pcie_relaxed_ordering_enabled(struct pci_dev *dev)
  {
u16 ctl;

pcie_capability_read_word(dev, PCI_EXP_DEVCTL, );
return ctl & PCI_EXP_DEVCTL_RELAX_EN;
  }
  EXPORT_SYMBOL(pcie_relaxed_ordering_enabled);

  static void pci_configure_relaxed_ordering(struct pci_dev *dev)
  {
struct pci_dev *root;

if (dev->is_virtfn)
  return;  /* PCI_EXP_DEVCTL_RELAX_EN is RsvdP in VFs */

if (!pcie_relaxed_ordering_enabled(dev))
  return;

/*
 * For now, we only deal with Relaxed Ordering issues with Root
 * Ports.  Peer-to-peer DMA is another can of worms.
 */
root = pci_find_pcie_root_port(dev);
if (!root)
  return;

if (root->relaxed_ordering_broken)
  pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
 PCI_EXP_DEVCTL_RELAX_EN);
  }

This doesn't check every intervening switch, but I don't think we know
about any issues except with root ports.

And the driver could do:

  if (!pcie_relaxed_ordering_enabled(pdev))
adapter->flags |= ROOT_NO_RELAXED_ORDERING;

The driver code wouldn't show anything about coherent memory vs.
peer-to-peer, but we really don't have a clue about how to handle that
yet anyway.

I guess this is back to exactly what you proposed, except that I
changed the name of pcie_relaxed_ordering_supported() to
pcie_relaxed_ordering_enabled(), which I think is slightly more
specific from the device's point of view.

Bjorn


Re: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported

2017-08-08 Thread Bjorn Helgaas
On Sat, Aug 05, 2017 at 03:15:11PM +0800, Ding Tianhong wrote:
> When bit4 is set in the PCIe Device Control register, it indicates
> whether the device is permitted to use relaxed ordering.
> On some platforms using relaxed ordering can have performance issues or
> due to erratum can cause data-corruption. In such cases devices must avoid
> using relaxed ordering.
> 
> This patch checks if there is any node in the hierarchy that indicates that
> using relaxed ordering is not safe. 

I think you only check the devices between the root port and the
target device.  For example, you don't check siblings or cousins of
the target device.

> In such cases the patch turns off the
> relaxed ordering by clearing the eapability for this device.

s/eapability/capability/

> And if the
> device is probably running in a guest machine, we should do nothing.

I don't know what this sentence means.  "Probably running in a guest
machine" doesn't really make sense, and there's nothing in your patch
that explicitly checks for being in a guest machine.

> Signed-off-by: Ding Tianhong 
> Acked-by: Alexander Duyck 
> Acked-by: Ashok Raj 
> ---
>  drivers/pci/pci.c   | 29 +
>  drivers/pci/probe.c | 37 +
>  include/linux/pci.h |  2 ++
>  3 files changed, 68 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index af0cc34..4f9d7c1 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
>  EXPORT_SYMBOL(pcie_set_mps);
>  
>  /**
> + * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit
> + * @dev: PCI device to query
> + *
> + * If possible clear relaxed ordering

Why "If possible"?  The bit is required to be RW or hardwired to zero,
so PCI_EXP_DEVCTL_RELAX_EN should *always* be zero when this returns.

> + */
> +int pcie_clear_relaxed_ordering(struct pci_dev *dev)
> +{
> + return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
> +   PCI_EXP_DEVCTL_RELAX_EN);
> +}
> +EXPORT_SYMBOL(pcie_clear_relaxed_ordering);

The current series doesn't add any callers of this except
pci_configure_relaxed_ordering(), so it doesn't need to be exported to
modules.

I think I would put both of these functions in drivers/pci/probe.c.
Then this one could be static and you'd only have to add
pcie_relaxed_ordering_supported() to include/linux/pci.h.

> +
> +/**
> + * pcie_relaxed_ordering_supported - Probe for PCIe relexed ordering support

s/relexed/relaxed/

> + * @dev: PCI device to query
> + *
> + * Returns true if the device support relaxed ordering attribute.
> + */
> +bool pcie_relaxed_ordering_supported(struct pci_dev *dev)
> +{
> + u16 v;
> +
> + pcie_capability_read_word(dev, PCI_EXP_DEVCTL, );
> +
> + return !!(v & PCI_EXP_DEVCTL_RELAX_EN);
> +}
> +EXPORT_SYMBOL(pcie_relaxed_ordering_supported);

This is misnamed.  This doesn't tell us anything about whether the
device *supports* relaxed ordering.  It only tells us whether the
device is *permitted* to use it.

When a device initiates a transaction, the hardware should set the RO
bit in the TLP with logic something like this:

  RO =  &&
&&
   

The issue you're fixing is that some Completers don't handle RO
correctly.  The determining factor is not the Requester, but the
Completer (for this series, a Root Port).  So I think this should be
something like:

  int pcie_relaxed_ordering_broken(struct pci_dev *completer)
  {
if (!completer)
  return 0;

return completer->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
  }

and the caller should do something like this:

 if (pcie_relaxed_ordering_broken(pci_find_pcie_root_port(pdev)))
   adapter->flags |= ROOT_NO_RELAXED_ORDERING;

That way it's obvious where the issue is, and it's obvious that the
answer might be different for peer-to-peer transactions than it is for
transactions to the root port, i.e., to coherent memory.

> +
> +/**
>   * pcie_get_minimum_link - determine minimum link settings of a PCI device
>   * @dev: PCI device to query
>   * @speed: storage for minimum speed
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index c31310d..48df012 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1762,6 +1762,42 @@ static void pci_configure_extended_tags(struct pci_dev 
> *dev)
>PCI_EXP_DEVCTL_EXT_TAG);
>  }
>  
> +/**
> + * pci_dev_should_disable_relaxed_ordering - check if the PCI device
> + * should disable the relaxed ordering attribute.
> + * @dev: PCI device
> + *
> + * Return true if any of the PCI devices above us do not support
> + * relaxed ordering.
> + */
> +static bool pci_dev_should_disable_relaxed_ordering(struct pci_dev *dev)
> +{
> + while (dev) {
> + if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING)
> + 

Re: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported

2017-08-08 Thread Bjorn Helgaas
On Sat, Aug 05, 2017 at 03:15:11PM +0800, Ding Tianhong wrote:
> When bit4 is set in the PCIe Device Control register, it indicates
> whether the device is permitted to use relaxed ordering.
> On some platforms using relaxed ordering can have performance issues or
> due to erratum can cause data-corruption. In such cases devices must avoid
> using relaxed ordering.
> 
> This patch checks if there is any node in the hierarchy that indicates that
> using relaxed ordering is not safe. 

I think you only check the devices between the root port and the
target device.  For example, you don't check siblings or cousins of
the target device.

> In such cases the patch turns off the
> relaxed ordering by clearing the eapability for this device.

s/eapability/capability/

> And if the
> device is probably running in a guest machine, we should do nothing.

I don't know what this sentence means.  "Probably running in a guest
machine" doesn't really make sense, and there's nothing in your patch
that explicitly checks for being in a guest machine.

> Signed-off-by: Ding Tianhong 
> Acked-by: Alexander Duyck 
> Acked-by: Ashok Raj 
> ---
>  drivers/pci/pci.c   | 29 +
>  drivers/pci/probe.c | 37 +
>  include/linux/pci.h |  2 ++
>  3 files changed, 68 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index af0cc34..4f9d7c1 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
>  EXPORT_SYMBOL(pcie_set_mps);
>  
>  /**
> + * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit
> + * @dev: PCI device to query
> + *
> + * If possible clear relaxed ordering

Why "If possible"?  The bit is required to be RW or hardwired to zero,
so PCI_EXP_DEVCTL_RELAX_EN should *always* be zero when this returns.

> + */
> +int pcie_clear_relaxed_ordering(struct pci_dev *dev)
> +{
> + return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
> +   PCI_EXP_DEVCTL_RELAX_EN);
> +}
> +EXPORT_SYMBOL(pcie_clear_relaxed_ordering);

The current series doesn't add any callers of this except
pci_configure_relaxed_ordering(), so it doesn't need to be exported to
modules.

I think I would put both of these functions in drivers/pci/probe.c.
Then this one could be static and you'd only have to add
pcie_relaxed_ordering_supported() to include/linux/pci.h.

> +
> +/**
> + * pcie_relaxed_ordering_supported - Probe for PCIe relexed ordering support

s/relexed/relaxed/

> + * @dev: PCI device to query
> + *
> + * Returns true if the device support relaxed ordering attribute.
> + */
> +bool pcie_relaxed_ordering_supported(struct pci_dev *dev)
> +{
> + u16 v;
> +
> + pcie_capability_read_word(dev, PCI_EXP_DEVCTL, );
> +
> + return !!(v & PCI_EXP_DEVCTL_RELAX_EN);
> +}
> +EXPORT_SYMBOL(pcie_relaxed_ordering_supported);

This is misnamed.  This doesn't tell us anything about whether the
device *supports* relaxed ordering.  It only tells us whether the
device is *permitted* to use it.

When a device initiates a transaction, the hardware should set the RO
bit in the TLP with logic something like this:

  RO =  &&
&&
   

The issue you're fixing is that some Completers don't handle RO
correctly.  The determining factor is not the Requester, but the
Completer (for this series, a Root Port).  So I think this should be
something like:

  int pcie_relaxed_ordering_broken(struct pci_dev *completer)
  {
if (!completer)
  return 0;

return completer->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
  }

and the caller should do something like this:

 if (pcie_relaxed_ordering_broken(pci_find_pcie_root_port(pdev)))
   adapter->flags |= ROOT_NO_RELAXED_ORDERING;

That way it's obvious where the issue is, and it's obvious that the
answer might be different for peer-to-peer transactions than it is for
transactions to the root port, i.e., to coherent memory.

> +
> +/**
>   * pcie_get_minimum_link - determine minimum link settings of a PCI device
>   * @dev: PCI device to query
>   * @speed: storage for minimum speed
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index c31310d..48df012 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1762,6 +1762,42 @@ static void pci_configure_extended_tags(struct pci_dev 
> *dev)
>PCI_EXP_DEVCTL_EXT_TAG);
>  }
>  
> +/**
> + * pci_dev_should_disable_relaxed_ordering - check if the PCI device
> + * should disable the relaxed ordering attribute.
> + * @dev: PCI device
> + *
> + * Return true if any of the PCI devices above us do not support
> + * relaxed ordering.
> + */
> +static bool pci_dev_should_disable_relaxed_ordering(struct pci_dev *dev)
> +{
> + while (dev) {
> + if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING)
> + return true;
> +
> + dev = dev->bus->self;
> + }
> 

[PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported

2017-08-05 Thread Ding Tianhong
When bit4 is set in the PCIe Device Control register, it indicates
whether the device is permitted to use relaxed ordering.
On some platforms using relaxed ordering can have performance issues or
due to erratum can cause data-corruption. In such cases devices must avoid
using relaxed ordering.

This patch checks if there is any node in the hierarchy that indicates that
using relaxed ordering is not safe. In such cases the patch turns off the
relaxed ordering by clearing the eapability for this device. And if the
device is probably running in a guest machine, we should do nothing.

Signed-off-by: Ding Tianhong 
Acked-by: Alexander Duyck 
Acked-by: Ashok Raj 
---
 drivers/pci/pci.c   | 29 +
 drivers/pci/probe.c | 37 +
 include/linux/pci.h |  2 ++
 3 files changed, 68 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index af0cc34..4f9d7c1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
 EXPORT_SYMBOL(pcie_set_mps);
 
 /**
+ * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit
+ * @dev: PCI device to query
+ *
+ * If possible clear relaxed ordering
+ */
+int pcie_clear_relaxed_ordering(struct pci_dev *dev)
+{
+   return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
+ PCI_EXP_DEVCTL_RELAX_EN);
+}
+EXPORT_SYMBOL(pcie_clear_relaxed_ordering);
+
+/**
+ * pcie_relaxed_ordering_supported - Probe for PCIe relexed ordering support
+ * @dev: PCI device to query
+ *
+ * Returns true if the device support relaxed ordering attribute.
+ */
+bool pcie_relaxed_ordering_supported(struct pci_dev *dev)
+{
+   u16 v;
+
+   pcie_capability_read_word(dev, PCI_EXP_DEVCTL, );
+
+   return !!(v & PCI_EXP_DEVCTL_RELAX_EN);
+}
+EXPORT_SYMBOL(pcie_relaxed_ordering_supported);
+
+/**
  * pcie_get_minimum_link - determine minimum link settings of a PCI device
  * @dev: PCI device to query
  * @speed: storage for minimum speed
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index c31310d..48df012 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1762,6 +1762,42 @@ static void pci_configure_extended_tags(struct pci_dev 
*dev)
 PCI_EXP_DEVCTL_EXT_TAG);
 }
 
+/**
+ * pci_dev_should_disable_relaxed_ordering - check if the PCI device
+ * should disable the relaxed ordering attribute.
+ * @dev: PCI device
+ *
+ * Return true if any of the PCI devices above us do not support
+ * relaxed ordering.
+ */
+static bool pci_dev_should_disable_relaxed_ordering(struct pci_dev *dev)
+{
+   while (dev) {
+   if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING)
+   return true;
+
+   dev = dev->bus->self;
+   }
+
+   return false;
+}
+
+static void pci_configure_relaxed_ordering(struct pci_dev *dev)
+{
+   /* We should not alter the relaxed ordering bit for the VF */
+   if (dev->is_virtfn)
+   return;
+
+   /* If the releaxed ordering enable bit is not set, do nothing. */
+   if (!pcie_relaxed_ordering_supported(dev))
+   return;
+
+   if (pci_dev_should_disable_relaxed_ordering(dev)) {
+   pcie_clear_relaxed_ordering(dev);
+   dev_info(>dev, "Disable Relaxed Ordering\n");
+   }
+}
+
 static void pci_configure_device(struct pci_dev *dev)
 {
struct hotplug_params hpp;
@@ -1769,6 +1805,7 @@ static void pci_configure_device(struct pci_dev *dev)
 
pci_configure_mps(dev);
pci_configure_extended_tags(dev);
+   pci_configure_relaxed_ordering(dev);
 
memset(, 0, sizeof(hpp));
ret = pci_get_hp_params(dev, );
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 412ec1c..3aa23a2 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1127,6 +1127,8 @@ int pci_add_ext_cap_save_buffer(struct pci_dev *dev,
 void pci_pme_wakeup_bus(struct pci_bus *bus);
 void pci_d3cold_enable(struct pci_dev *dev);
 void pci_d3cold_disable(struct pci_dev *dev);
+int pcie_clear_relaxed_ordering(struct pci_dev *dev);
+bool pcie_relaxed_ordering_supported(struct pci_dev *dev);
 
 /* PCI Virtual Channel */
 int pci_save_vc_state(struct pci_dev *dev);
-- 
1.8.3.1




[PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported

2017-08-05 Thread Ding Tianhong
When bit4 is set in the PCIe Device Control register, it indicates
whether the device is permitted to use relaxed ordering.
On some platforms using relaxed ordering can have performance issues or
due to erratum can cause data-corruption. In such cases devices must avoid
using relaxed ordering.

This patch checks if there is any node in the hierarchy that indicates that
using relaxed ordering is not safe. In such cases the patch turns off the
relaxed ordering by clearing the eapability for this device. And if the
device is probably running in a guest machine, we should do nothing.

Signed-off-by: Ding Tianhong 
Acked-by: Alexander Duyck 
Acked-by: Ashok Raj 
---
 drivers/pci/pci.c   | 29 +
 drivers/pci/probe.c | 37 +
 include/linux/pci.h |  2 ++
 3 files changed, 68 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index af0cc34..4f9d7c1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
 EXPORT_SYMBOL(pcie_set_mps);
 
 /**
+ * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit
+ * @dev: PCI device to query
+ *
+ * If possible clear relaxed ordering
+ */
+int pcie_clear_relaxed_ordering(struct pci_dev *dev)
+{
+   return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
+ PCI_EXP_DEVCTL_RELAX_EN);
+}
+EXPORT_SYMBOL(pcie_clear_relaxed_ordering);
+
+/**
+ * pcie_relaxed_ordering_supported - Probe for PCIe relexed ordering support
+ * @dev: PCI device to query
+ *
+ * Returns true if the device support relaxed ordering attribute.
+ */
+bool pcie_relaxed_ordering_supported(struct pci_dev *dev)
+{
+   u16 v;
+
+   pcie_capability_read_word(dev, PCI_EXP_DEVCTL, );
+
+   return !!(v & PCI_EXP_DEVCTL_RELAX_EN);
+}
+EXPORT_SYMBOL(pcie_relaxed_ordering_supported);
+
+/**
  * pcie_get_minimum_link - determine minimum link settings of a PCI device
  * @dev: PCI device to query
  * @speed: storage for minimum speed
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index c31310d..48df012 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1762,6 +1762,42 @@ static void pci_configure_extended_tags(struct pci_dev 
*dev)
 PCI_EXP_DEVCTL_EXT_TAG);
 }
 
+/**
+ * pci_dev_should_disable_relaxed_ordering - check if the PCI device
+ * should disable the relaxed ordering attribute.
+ * @dev: PCI device
+ *
+ * Return true if any of the PCI devices above us do not support
+ * relaxed ordering.
+ */
+static bool pci_dev_should_disable_relaxed_ordering(struct pci_dev *dev)
+{
+   while (dev) {
+   if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING)
+   return true;
+
+   dev = dev->bus->self;
+   }
+
+   return false;
+}
+
+static void pci_configure_relaxed_ordering(struct pci_dev *dev)
+{
+   /* We should not alter the relaxed ordering bit for the VF */
+   if (dev->is_virtfn)
+   return;
+
+   /* If the releaxed ordering enable bit is not set, do nothing. */
+   if (!pcie_relaxed_ordering_supported(dev))
+   return;
+
+   if (pci_dev_should_disable_relaxed_ordering(dev)) {
+   pcie_clear_relaxed_ordering(dev);
+   dev_info(>dev, "Disable Relaxed Ordering\n");
+   }
+}
+
 static void pci_configure_device(struct pci_dev *dev)
 {
struct hotplug_params hpp;
@@ -1769,6 +1805,7 @@ static void pci_configure_device(struct pci_dev *dev)
 
pci_configure_mps(dev);
pci_configure_extended_tags(dev);
+   pci_configure_relaxed_ordering(dev);
 
memset(, 0, sizeof(hpp));
ret = pci_get_hp_params(dev, );
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 412ec1c..3aa23a2 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1127,6 +1127,8 @@ int pci_add_ext_cap_save_buffer(struct pci_dev *dev,
 void pci_pme_wakeup_bus(struct pci_bus *bus);
 void pci_d3cold_enable(struct pci_dev *dev);
 void pci_d3cold_disable(struct pci_dev *dev);
+int pcie_clear_relaxed_ordering(struct pci_dev *dev);
+bool pcie_relaxed_ordering_supported(struct pci_dev *dev);
 
 /* PCI Virtual Channel */
 int pci_save_vc_state(struct pci_dev *dev);
-- 
1.8.3.1