Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

2017-05-27 Thread Ding Tianhong


On 2017/5/26 3:49, Alexander Duyck wrote:
> On Thu, May 25, 2017 at 6:35 AM, Ding Tianhong  
> wrote:
>>
>> On 2017/5/9 8:48, Casey Leedom wrote:
>>>
>>> | From: Alexander Duyck 
>>> | Date: Saturday, May 6, 2017 11:07 AM
>>> |
>>> | | From: Ding Tianhong 
>>> | | Date: Fri, May 5, 2017 at 8:08 PM
>>> | |
>>> | | According the suggestion, I could only think of this code:
>>> | | ..
>>> |
>>> | This is a bit simplistic but it is a start.
>>>
>>>   Yes, something tells me that this is going to be more complicated than any
>>> of us like ...
>>>
>>> | The other bit I was getting at is that we need to update the core PCIe
>>> | code so that when we configure devices and the root complex reports no
>>> | support for relaxed ordering it should be clearing the relaxed
>>> | ordering bits in the PCIe configuration registers on the upstream
>>> | facing devices.
>>>
>>>   Of course, this can be written to by the Driver at any time ... and is in
>>> the case of the cxgb4 Driver ...
>>>
>>>   After a lot of rummaging around, it does look like KVM prohibits writes to
>>> the PCIe Capability Device Control register in drivers/xen/xen-pciback/
>>> conf_space_capability.c and conf_space.c simply because writes aren't
>>> allowed unless "permissive" is set.  So it ~looks~ like a driver running in
>>> a Virtual Machine can't turn Enable Relaxed Ordering back on ...
>>>
>>> | The last bit we need in all this is a way to allow for setups where
>>> | peer-to-peer wants to perform relaxed ordering but for writes to the
>>> | host we have to not use relaxed ordering. For that we need to enable a
>>> | special case and that isn't handled right now in any of the solutions
>>> | we have coded up so far.
>>>
>>>   Yes, we do need this.
>>>
>>>
>>> | From: Alexander Duyck 
>>> | Date: Saturday, May 8, 2017 08:22 AM
>>> |
>>> | The problem is we need to have something that can be communicated
>>> | through a VM. Your change doesn't work in that regard. That was why I
>>> | suggested just updating the code so that we when we initialized PCIe
>>> | devices what we do is either set or clear the relaxed ordering bit in
>>> | the PCIe device control register. That way when we direct assign an
>>> | interface it could know just based on the bits int the PCIe
>>> | configuration if it could use relaxed ordering or not.
>>> |
>>> | At that point the driver code itself becomes very simple since you
>>> | could just enable the relaxed ordering by default in the igb/ixgbe
>>> | driver and if the bit is set or cleared in the PCIe configuration then
>>> | we are either sending with relaxed ordering requests or not and don't
>>> | have to try and locate the root complex.
>>> |
>>> | So from the sound of it Casey has a special use case where he doesn't
>>> | want to send relaxed ordering frames to the root complex, but instead
>>> | would like to send them to another PCIe device. To do that he needs to
>>> | have a way to enable the relaxed ordering bit in the PCIe
>>> | configuration but then not send any to the root complex. Odds are that
>>> | is something he might be able to just implement in the driver, but is
>>> | something that may become a more general case in the future. I don't
>>> | see our change here impacting it as long as we keep the solution
>>> | generic and mostly confined to when we instantiate the devices as the
>>> | driver could likely make the decision to change the behavior later.
>>>
>>>   It's not just me.  Intel has said that while RO directed at the Root
>>> Complex Host Coherent Memory has a performance bug (not Data Corruption),
>>> it's a performance win for Peer-to-Peer writes to MMIO Space.  (I'll be very
>>> interested in hearing what the bug is if we get that much detail.  The very
>>> same TLPs directed to the Root Complex Port without Relaxed Ordering set get
>>> good performance.  So this is essentially a bug in the hardware that was
>>> ~trying~ to implement a performance win.)
>>>
>>>   Meanwhile, I currently only know of a single PCIe End Point which causes
>>> catastrophic results: the AMD A1100 ARM SoC ("SEATTLE").  And it's not even
>>> clear that product is even alive anymore since I haven't been able to get
>>> any responses from them for several months.
>>>
>>>   What I'm saying is: let's try to architect a solution which doesn't throw
>>> the baby out with the bath water ...
>>>
>>>   I think that if a Device's Root Complex Port has problems with Relaxed
>>> Ordering, it ~probably~ makes sense to turn off the PCIe Capability Device
>>> Control[Enable Relaxed Ordering] when we assign a device to a Virtual
>>> Machine since the Device Driver can no longer query the Relaxed Ordering
>>> Support of the Root Complex Port.  The only down side of this would be if we
>>> assigned two Peers to a VM in an application which wanted to do Peer-to-Peer
>>> transfers.  But that seems like a hard application to 

Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

2017-05-25 Thread Alexander Duyck
On Thu, May 25, 2017 at 6:35 AM, Ding Tianhong  wrote:
>
> On 2017/5/9 8:48, Casey Leedom wrote:
>>
>> | From: Alexander Duyck 
>> | Date: Saturday, May 6, 2017 11:07 AM
>> |
>> | | From: Ding Tianhong 
>> | | Date: Fri, May 5, 2017 at 8:08 PM
>> | |
>> | | According the suggestion, I could only think of this code:
>> | | ..
>> |
>> | This is a bit simplistic but it is a start.
>>
>>   Yes, something tells me that this is going to be more complicated than any
>> of us like ...
>>
>> | The other bit I was getting at is that we need to update the core PCIe
>> | code so that when we configure devices and the root complex reports no
>> | support for relaxed ordering it should be clearing the relaxed
>> | ordering bits in the PCIe configuration registers on the upstream
>> | facing devices.
>>
>>   Of course, this can be written to by the Driver at any time ... and is in
>> the case of the cxgb4 Driver ...
>>
>>   After a lot of rummaging around, it does look like KVM prohibits writes to
>> the PCIe Capability Device Control register in drivers/xen/xen-pciback/
>> conf_space_capability.c and conf_space.c simply because writes aren't
>> allowed unless "permissive" is set.  So it ~looks~ like a driver running in
>> a Virtual Machine can't turn Enable Relaxed Ordering back on ...
>>
>> | The last bit we need in all this is a way to allow for setups where
>> | peer-to-peer wants to perform relaxed ordering but for writes to the
>> | host we have to not use relaxed ordering. For that we need to enable a
>> | special case and that isn't handled right now in any of the solutions
>> | we have coded up so far.
>>
>>   Yes, we do need this.
>>
>>
>> | From: Alexander Duyck 
>> | Date: Saturday, May 8, 2017 08:22 AM
>> |
>> | The problem is we need to have something that can be communicated
>> | through a VM. Your change doesn't work in that regard. That was why I
>> | suggested just updating the code so that we when we initialized PCIe
>> | devices what we do is either set or clear the relaxed ordering bit in
>> | the PCIe device control register. That way when we direct assign an
>> | interface it could know just based on the bits int the PCIe
>> | configuration if it could use relaxed ordering or not.
>> |
>> | At that point the driver code itself becomes very simple since you
>> | could just enable the relaxed ordering by default in the igb/ixgbe
>> | driver and if the bit is set or cleared in the PCIe configuration then
>> | we are either sending with relaxed ordering requests or not and don't
>> | have to try and locate the root complex.
>> |
>> | So from the sound of it Casey has a special use case where he doesn't
>> | want to send relaxed ordering frames to the root complex, but instead
>> | would like to send them to another PCIe device. To do that he needs to
>> | have a way to enable the relaxed ordering bit in the PCIe
>> | configuration but then not send any to the root complex. Odds are that
>> | is something he might be able to just implement in the driver, but is
>> | something that may become a more general case in the future. I don't
>> | see our change here impacting it as long as we keep the solution
>> | generic and mostly confined to when we instantiate the devices as the
>> | driver could likely make the decision to change the behavior later.
>>
>>   It's not just me.  Intel has said that while RO directed at the Root
>> Complex Host Coherent Memory has a performance bug (not Data Corruption),
>> it's a performance win for Peer-to-Peer writes to MMIO Space.  (I'll be very
>> interested in hearing what the bug is if we get that much detail.  The very
>> same TLPs directed to the Root Complex Port without Relaxed Ordering set get
>> good performance.  So this is essentially a bug in the hardware that was
>> ~trying~ to implement a performance win.)
>>
>>   Meanwhile, I currently only know of a single PCIe End Point which causes
>> catastrophic results: the AMD A1100 ARM SoC ("SEATTLE").  And it's not even
>> clear that product is even alive anymore since I haven't been able to get
>> any responses from them for several months.
>>
>>   What I'm saying is: let's try to architect a solution which doesn't throw
>> the baby out with the bath water ...
>>
>>   I think that if a Device's Root Complex Port has problems with Relaxed
>> Ordering, it ~probably~ makes sense to turn off the PCIe Capability Device
>> Control[Enable Relaxed Ordering] when we assign a device to a Virtual
>> Machine since the Device Driver can no longer query the Relaxed Ordering
>> Support of the Root Complex Port.  The only down side of this would be if we
>> assigned two Peers to a VM in an application which wanted to do Peer-to-Peer
>> transfers.  But that seems like a hard application to support in any case
>> since the PCI Bus:Slot.Function IDs for assigned Devices within a VM don't
>> match the actual values.
>>
>>   For 

Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

2017-05-25 Thread Ding Tianhong

On 2017/5/9 8:48, Casey Leedom wrote:
> 
> | From: Alexander Duyck 
> | Date: Saturday, May 6, 2017 11:07 AM
> | 
> | | From: Ding Tianhong 
> | | Date: Fri, May 5, 2017 at 8:08 PM
> | | 
> | | According the suggestion, I could only think of this code:
> | | ..
> | 
> | This is a bit simplistic but it is a start.
> 
>   Yes, something tells me that this is going to be more complicated than any
> of us like ...
> 
> | The other bit I was getting at is that we need to update the core PCIe
> | code so that when we configure devices and the root complex reports no
> | support for relaxed ordering it should be clearing the relaxed
> | ordering bits in the PCIe configuration registers on the upstream
> | facing devices.
> 
>   Of course, this can be written to by the Driver at any time ... and is in
> the case of the cxgb4 Driver ...
> 
>   After a lot of rummaging around, it does look like KVM prohibits writes to
> the PCIe Capability Device Control register in drivers/xen/xen-pciback/
> conf_space_capability.c and conf_space.c simply because writes aren't
> allowed unless "permissive" is set.  So it ~looks~ like a driver running in
> a Virtual Machine can't turn Enable Relaxed Ordering back on ...
> 
> | The last bit we need in all this is a way to allow for setups where
> | peer-to-peer wants to perform relaxed ordering but for writes to the
> | host we have to not use relaxed ordering. For that we need to enable a
> | special case and that isn't handled right now in any of the solutions
> | we have coded up so far.
> 
>   Yes, we do need this.
> 
> 
> | From: Alexander Duyck 
> | Date: Saturday, May 8, 2017 08:22 AM
> |
> | The problem is we need to have something that can be communicated
> | through a VM. Your change doesn't work in that regard. That was why I
> | suggested just updating the code so that we when we initialized PCIe
> | devices what we do is either set or clear the relaxed ordering bit in
> | the PCIe device control register. That way when we direct assign an
> | interface it could know just based on the bits int the PCIe
> | configuration if it could use relaxed ordering or not.
> | 
> | At that point the driver code itself becomes very simple since you
> | could just enable the relaxed ordering by default in the igb/ixgbe
> | driver and if the bit is set or cleared in the PCIe configuration then
> | we are either sending with relaxed ordering requests or not and don't
> | have to try and locate the root complex.
> | 
> | So from the sound of it Casey has a special use case where he doesn't
> | want to send relaxed ordering frames to the root complex, but instead
> | would like to send them to another PCIe device. To do that he needs to
> | have a way to enable the relaxed ordering bit in the PCIe
> | configuration but then not send any to the root complex. Odds are that
> | is something he might be able to just implement in the driver, but is
> | something that may become a more general case in the future. I don't
> | see our change here impacting it as long as we keep the solution
> | generic and mostly confined to when we instantiate the devices as the
> | driver could likely make the decision to change the behavior later.
> 
>   It's not just me.  Intel has said that while RO directed at the Root
> Complex Host Coherent Memory has a performance bug (not Data Corruption),
> it's a performance win for Peer-to-Peer writes to MMIO Space.  (I'll be very
> interested in hearing what the bug is if we get that much detail.  The very
> same TLPs directed to the Root Complex Port without Relaxed Ordering set get
> good performance.  So this is essentially a bug in the hardware that was
> ~trying~ to implement a performance win.)
> 
>   Meanwhile, I currently only know of a single PCIe End Point which causes
> catastrophic results: the AMD A1100 ARM SoC ("SEATTLE").  And it's not even
> clear that product is even alive anymore since I haven't been able to get
> any responses from them for several months.
> 
>   What I'm saying is: let's try to architect a solution which doesn't throw
> the baby out with the bath water ...
> 
>   I think that if a Device's Root Complex Port has problems with Relaxed
> Ordering, it ~probably~ makes sense to turn off the PCIe Capability Device
> Control[Enable Relaxed Ordering] when we assign a device to a Virtual
> Machine since the Device Driver can no longer query the Relaxed Ordering
> Support of the Root Complex Port.  The only down side of this would be if we
> assigned two Peers to a VM in an application which wanted to do Peer-to-Peer
> transfers.  But that seems like a hard application to support in any case
> since the PCI Bus:Slot.Function IDs for assigned Devices within a VM don't
> match the actual values.
> 
>   For Devices running in the base OS/Hypervisor, their Drivers can query the
> Relaxed Ordering Support for the Root Complex Port or a Peer Device.  So a
> simple 

Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

2017-05-17 Thread Alexander Duyck
On Tue, May 16, 2017 at 11:38 AM, Casey Leedom  wrote:
> | From: Ding Tianhong 
> | Sent: Wednesday, May 10, 2017 6:15 PM
> |
> | Hi Casey:
> |
> | Will you continue to work on this solution or send a new version patches?
>
> I won't be able to work on this any time soon given several other urgent
> issues.  If you've got a desire to pick this up, I'd be happy to help code
> review your efforts.
>
> Casey

Thanks for the heads up. I'll see if I can find somebody in my team
that might be able to take on the task though I can't make any
promises either as we have quite a bit going on.

- Alex


Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

2017-05-16 Thread Casey Leedom
| From: Ding Tianhong 
| Sent: Wednesday, May 10, 2017 6:15 PM
|
| Hi Casey:
|
| Will you continue to work on this solution or send a new version patches?

I won't be able to work on this any time soon given several other urgent
issues.  If you've got a desire to pick this up, I'd be happy to help code
review your efforts.

Casey


Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

2017-05-10 Thread Ding Tianhong


On 2017/5/9 8:48, Casey Leedom wrote:
> 
> | From: Alexander Duyck 
> | Date: Saturday, May 6, 2017 11:07 AM
> | 
> | | From: Ding Tianhong 
> | | Date: Fri, May 5, 2017 at 8:08 PM
> | | 
> | | According the suggestion, I could only think of this code:
> | | ..
> | 
> | This is a bit simplistic but it is a start.
> 
>   Yes, something tells me that this is going to be more complicated than any
> of us like ...
> 
> | The other bit I was getting at is that we need to update the core PCIe
> | code so that when we configure devices and the root complex reports no
> | support for relaxed ordering it should be clearing the relaxed
> | ordering bits in the PCIe configuration registers on the upstream
> | facing devices.
> 
>   Of course, this can be written to by the Driver at any time ... and is in
> the case of the cxgb4 Driver ...
> 
>   After a lot of rummaging around, it does look like KVM prohibits writes to
> the PCIe Capability Device Control register in drivers/xen/xen-pciback/
> conf_space_capability.c and conf_space.c simply because writes aren't
> allowed unless "permissive" is set.  So it ~looks~ like a driver running in
> a Virtual Machine can't turn Enable Relaxed Ordering back on ...
> 
> | The last bit we need in all this is a way to allow for setups where
> | peer-to-peer wants to perform relaxed ordering but for writes to the
> | host we have to not use relaxed ordering. For that we need to enable a
> | special case and that isn't handled right now in any of the solutions
> | we have coded up so far.
> 
>   Yes, we do need this.
> 
> 
> | From: Alexander Duyck 
> | Date: Saturday, May 8, 2017 08:22 AM
> |
> | The problem is we need to have something that can be communicated
> | through a VM. Your change doesn't work in that regard. That was why I
> | suggested just updating the code so that we when we initialized PCIe
> | devices what we do is either set or clear the relaxed ordering bit in
> | the PCIe device control register. That way when we direct assign an
> | interface it could know just based on the bits int the PCIe
> | configuration if it could use relaxed ordering or not.
> | 
> | At that point the driver code itself becomes very simple since you
> | could just enable the relaxed ordering by default in the igb/ixgbe
> | driver and if the bit is set or cleared in the PCIe configuration then
> | we are either sending with relaxed ordering requests or not and don't
> | have to try and locate the root complex.
> | 
> | So from the sound of it Casey has a special use case where he doesn't
> | want to send relaxed ordering frames to the root complex, but instead
> | would like to send them to another PCIe device. To do that he needs to
> | have a way to enable the relaxed ordering bit in the PCIe
> | configuration but then not send any to the root complex. Odds are that
> | is something he might be able to just implement in the driver, but is
> | something that may become a more general case in the future. I don't
> | see our change here impacting it as long as we keep the solution
> | generic and mostly confined to when we instantiate the devices as the
> | driver could likely make the decision to change the behavior later.
> 
>   It's not just me.  Intel has said that while RO directed at the Root
> Complex Host Coherent Memory has a performance bug (not Data Corruption),
> it's a performance win for Peer-to-Peer writes to MMIO Space.  (I'll be very
> interested in hearing what the bug is if we get that much detail.  The very
> same TLPs directed to the Root Complex Port without Relaxed Ordering set get
> good performance.  So this is essentially a bug in the hardware that was
> ~trying~ to implement a performance win.)
> 
>   Meanwhile, I currently only know of a single PCIe End Point which causes
> catastrophic results: the AMD A1100 ARM SoC ("SEATTLE").  And it's not even
> clear that product is even alive anymore since I haven't been able to get
> any responses from them for several months.
> 
>   What I'm saying is: let's try to architect a solution which doesn't throw
> the baby out with the bath water ...
> 
>   I think that if a Device's Root Complex Port has problems with Relaxed
> Ordering, it ~probably~ makes sense to turn off the PCIe Capability Device
> Control[Enable Relaxed Ordering] when we assign a device to a Virtual
> Machine since the Device Driver can no longer query the Relaxed Ordering
> Support of the Root Complex Port.  The only down side of this would be if we
> assigned two Peers to a VM in an application which wanted to do Peer-to-Peer
> transfers.  But that seems like a hard application to support in any case
> since the PCI Bus:Slot.Function IDs for assigned Devices within a VM don't
> match the actual values.
> 
>   For Devices running in the base OS/Hypervisor, their Drivers can query the
> Relaxed Ordering Support for the Root Complex Port or a Peer Device.  So a
> simple 

Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

2017-05-08 Thread Casey Leedom

| From: Alexander Duyck 
| Date: Saturday, May 6, 2017 11:07 AM
| 
| | From: Ding Tianhong 
| | Date: Fri, May 5, 2017 at 8:08 PM
| | 
| | According the suggestion, I could only think of this code:
| | ..
| 
| This is a bit simplistic but it is a start.

  Yes, something tells me that this is going to be more complicated than any
of us like ...

| The other bit I was getting at is that we need to update the core PCIe
| code so that when we configure devices and the root complex reports no
| support for relaxed ordering it should be clearing the relaxed
| ordering bits in the PCIe configuration registers on the upstream
| facing devices.

  Of course, this can be written to by the Driver at any time ... and is in
the case of the cxgb4 Driver ...

  After a lot of rummaging around, it does look like KVM prohibits writes to
the PCIe Capability Device Control register in drivers/xen/xen-pciback/
conf_space_capability.c and conf_space.c simply because writes aren't
allowed unless "permissive" is set.  So it ~looks~ like a driver running in
a Virtual Machine can't turn Enable Relaxed Ordering back on ...

| The last bit we need in all this is a way to allow for setups where
| peer-to-peer wants to perform relaxed ordering but for writes to the
| host we have to not use relaxed ordering. For that we need to enable a
| special case and that isn't handled right now in any of the solutions
| we have coded up so far.

  Yes, we do need this.


| From: Alexander Duyck 
| Date: Saturday, May 8, 2017 08:22 AM
|
| The problem is we need to have something that can be communicated
| through a VM. Your change doesn't work in that regard. That was why I
| suggested just updating the code so that we when we initialized PCIe
| devices what we do is either set or clear the relaxed ordering bit in
| the PCIe device control register. That way when we direct assign an
| interface it could know just based on the bits int the PCIe
| configuration if it could use relaxed ordering or not.
| 
| At that point the driver code itself becomes very simple since you
| could just enable the relaxed ordering by default in the igb/ixgbe
| driver and if the bit is set or cleared in the PCIe configuration then
| we are either sending with relaxed ordering requests or not and don't
| have to try and locate the root complex.
| 
| So from the sound of it Casey has a special use case where he doesn't
| want to send relaxed ordering frames to the root complex, but instead
| would like to send them to another PCIe device. To do that he needs to
| have a way to enable the relaxed ordering bit in the PCIe
| configuration but then not send any to the root complex. Odds are that
| is something he might be able to just implement in the driver, but is
| something that may become a more general case in the future. I don't
| see our change here impacting it as long as we keep the solution
| generic and mostly confined to when we instantiate the devices as the
| driver could likely make the decision to change the behavior later.

  It's not just me.  Intel has said that while RO directed at the Root
Complex Host Coherent Memory has a performance bug (not Data Corruption),
it's a performance win for Peer-to-Peer writes to MMIO Space.  (I'll be very
interested in hearing what the bug is if we get that much detail.  The very
same TLPs directed to the Root Complex Port without Relaxed Ordering set get
good performance.  So this is essentially a bug in the hardware that was
~trying~ to implement a performance win.)

  Meanwhile, I currently only know of a single PCIe End Point which causes
catastrophic results: the AMD A1100 ARM SoC ("SEATTLE").  And it's not even
clear that product is even alive anymore since I haven't been able to get
any responses from them for several months.

  What I'm saying is: let's try to architect a solution which doesn't throw
the baby out with the bath water ...

  I think that if a Device's Root Complex Port has problems with Relaxed
Ordering, it ~probably~ makes sense to turn off the PCIe Capability Device
Control[Enable Relaxed Ordering] when we assign a device to a Virtual
Machine since the Device Driver can no longer query the Relaxed Ordering
Support of the Root Complex Port.  The only down side of this would be if we
assigned two Peers to a VM in an application which wanted to do Peer-to-Peer
transfers.  But that seems like a hard application to support in any case
since the PCI Bus:Slot.Function IDs for assigned Devices within a VM don't
match the actual values.

  For Devices running in the base OS/Hypervisor, their Drivers can query the
Relaxed Ordering Support for the Root Complex Port or a Peer Device.  So a
simple flag within the (struct pci_dev *)->dev_flags would serve for that
along with a per-Architecture/Platform mechanism for setting it ...

Casey


Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

2017-05-08 Thread Alexander Duyck
On Mon, May 8, 2017 at 7:33 AM, Ding Tianhong  wrote:
>
>
> On 2017/5/7 2:07, Alexander Duyck wrote:
>> On Fri, May 5, 2017 at 8:08 PM, Ding Tianhong  
>> wrote:
>>>
>>>
>>> On 2017/5/5 22:04, Alexander Duyck wrote:
 On Thu, May 4, 2017 at 2:01 PM, Casey Leedom  wrote:
> | From: Alexander Duyck 
> | Sent: Wednesday, May 3, 2017 9:02 AM
> | ...
> | It sounds like we are more or less in agreement. My only concern is
> | really what we default this to. On x86 I would say we could probably
> | default this to disabled for existing platforms since my understanding
> | is that relaxed ordering doesn't provide much benefit on what is out
> | there right now when performing DMA through the root complex. As far
> | as peer-to-peer I would say we should probably look at enabling the
> | ability to have Relaxed Ordering enabled for some channels but not
> | others. In those cases the hardware needs to be smart enough to allow
> | for you to indicate you want it disabled by default for most of your
> | DMA channels, and then enabled for the select channels that are
> | handling the peer-to-peer traffic.
>
>   Yes, I think that we are mostly in agreement.  I had just wanted to make
> sure that whatever scheme was developed would allow for simultaneously
> supporting non-Relaxed Ordering for some PCIe End Points and Relaxed
> Ordering for others within the same system.  I.e. not simply
> enabling/disabling/etc.  based solely on System Platform Architecture.
>
>   By the way, I've started our QA folks off looking at what things look 
> like
> in Linux Virtual Machines under different Hypervisors to see what
> information they may provide to the VM in the way of what Root Complex 
> Port
> is being used, etc.  So far they've got Windows HyperV done and there
> there's no PCIe Fabric exposed in any way: just the attached device.  I'll
> have to see what pci_find_pcie_root_port() returns in that environment.
> Maybe NULL?

 I believe NULL is one of the options. It all depends on what qemu is
 emulating. Most likely you won't find a pcie root port on KVM because
 the default is to emulate an older system that only supports PCI.

>   With your reservations (which I also share), I think that it probably
> makes sense to have a per-architecture definition of the "Can I Use 
> Relaxed
> Ordering With TLPs Directed At This End Point" predicate, with the default
> being "No" for any architecture which doesn't implement the predicate.  
> And
> if the specified (struct pci_dev *) End Node is NULL, it ought to return
> False for that as well.  I can't see any reason to pass in the Source End
> Node but I may be missing something.
>
>   At this point, this is pretty far outside my level of expertise.  I'm
> happy to give it a go, but I'd be even happier if someone with a lot more
> experience in the PCIe Infrastructure were to want to carry the ball
> forward.  I'm not super familiar with the Linux Kernel "Rules Of
> Engagement", so let me know what my next step should be.  Thanks.
>
> Casey

 For now we can probably keep this on the linux-pci mailing list. Going
 that route is the most straight forward for now since step one is
 probably just making sure we are setting the relaxed ordering bit in
 the setups that make sense. I would say we could probably keep it
 simple. We just need to enable relaxed ordering by default for SPARC
 architectures, on most others we can probably default it to off.

>>>
>>> Casey, Alexander:
>>>
>>> Thanks for the wonderful discussion, it is more clearly that what to do 
>>> next,
>>> I agree that enable relaxed ordering by default only for SPARC and ARM64
>>> is more safe for all the other platform, as no one want to break anything.
>>>
 I believe this all had started as Ding Tianhong was hoping to enable
 this for the ARM architecture. That is the only one I can think of
 where it might be difficult to figure out which way to default as we
 were attempting to follow the same code that was enabled for SPARC and
 that is what started this tug-of-war about how this should be done.
 What we might do is take care of this in two phases. The first one
 enables the infrastructure generically but leaves it defaulted to off
 for everyone but SPARC. Then we can go through and start enabling it
 for other platforms such as some of those on ARM in the platforms that
 Ding Tianhong was working with.

>>>
>>> According the suggestion, I could only think of this code:
>>>
>>> @@ -3979,6 +3979,15 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
>>>  DECLARE_PCI_FIXUP_CLASS_EARLY(0x1797, 0x6869, PCI_CLASS_NOT_DEFINED, 8,
>>> 

Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

2017-05-08 Thread Ding Tianhong


On 2017/5/7 2:07, Alexander Duyck wrote:
> On Fri, May 5, 2017 at 8:08 PM, Ding Tianhong  wrote:
>>
>>
>> On 2017/5/5 22:04, Alexander Duyck wrote:
>>> On Thu, May 4, 2017 at 2:01 PM, Casey Leedom  wrote:
 | From: Alexander Duyck 
 | Sent: Wednesday, May 3, 2017 9:02 AM
 | ...
 | It sounds like we are more or less in agreement. My only concern is
 | really what we default this to. On x86 I would say we could probably
 | default this to disabled for existing platforms since my understanding
 | is that relaxed ordering doesn't provide much benefit on what is out
 | there right now when performing DMA through the root complex. As far
 | as peer-to-peer I would say we should probably look at enabling the
 | ability to have Relaxed Ordering enabled for some channels but not
 | others. In those cases the hardware needs to be smart enough to allow
 | for you to indicate you want it disabled by default for most of your
 | DMA channels, and then enabled for the select channels that are
 | handling the peer-to-peer traffic.

   Yes, I think that we are mostly in agreement.  I had just wanted to make
 sure that whatever scheme was developed would allow for simultaneously
 supporting non-Relaxed Ordering for some PCIe End Points and Relaxed
 Ordering for others within the same system.  I.e. not simply
 enabling/disabling/etc.  based solely on System Platform Architecture.

   By the way, I've started our QA folks off looking at what things look 
 like
 in Linux Virtual Machines under different Hypervisors to see what
 information they may provide to the VM in the way of what Root Complex Port
 is being used, etc.  So far they've got Windows HyperV done and there
 there's no PCIe Fabric exposed in any way: just the attached device.  I'll
 have to see what pci_find_pcie_root_port() returns in that environment.
 Maybe NULL?
>>>
>>> I believe NULL is one of the options. It all depends on what qemu is
>>> emulating. Most likely you won't find a pcie root port on KVM because
>>> the default is to emulate an older system that only supports PCI.
>>>
   With your reservations (which I also share), I think that it probably
 makes sense to have a per-architecture definition of the "Can I Use Relaxed
 Ordering With TLPs Directed At This End Point" predicate, with the default
 being "No" for any architecture which doesn't implement the predicate.  And
 if the specified (struct pci_dev *) End Node is NULL, it ought to return
 False for that as well.  I can't see any reason to pass in the Source End
 Node but I may be missing something.

   At this point, this is pretty far outside my level of expertise.  I'm
 happy to give it a go, but I'd be even happier if someone with a lot more
 experience in the PCIe Infrastructure were to want to carry the ball
 forward.  I'm not super familiar with the Linux Kernel "Rules Of
 Engagement", so let me know what my next step should be.  Thanks.

 Casey
>>>
>>> For now we can probably keep this on the linux-pci mailing list. Going
>>> that route is the most straight forward for now since step one is
>>> probably just making sure we are setting the relaxed ordering bit in
>>> the setups that make sense. I would say we could probably keep it
>>> simple. We just need to enable relaxed ordering by default for SPARC
>>> architectures, on most others we can probably default it to off.
>>>
>>
>> Casey, Alexander:
>>
>> Thanks for the wonderful discussion, it is more clearly that what to do next,
>> I agree that enable relaxed ordering by default only for SPARC and ARM64
>> is more safe for all the other platform, as no one want to break anything.
>>
>>> I believe this all had started as Ding Tianhong was hoping to enable
>>> this for the ARM architecture. That is the only one I can think of
>>> where it might be difficult to figure out which way to default as we
>>> were attempting to follow the same code that was enabled for SPARC and
>>> that is what started this tug-of-war about how this should be done.
>>> What we might do is take care of this in two phases. The first one
>>> enables the infrastructure generically but leaves it defaulted to off
>>> for everyone but SPARC. Then we can go through and start enabling it
>>> for other platforms such as some of those on ARM in the platforms that
>>> Ding Tianhong was working with.
>>>
>>
>> According the suggestion, I could only think of this code:
>>
>> @@ -3979,6 +3979,15 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
>>  DECLARE_PCI_FIXUP_CLASS_EARLY(0x1797, 0x6869, PCI_CLASS_NOT_DEFINED, 8,
>>   quirk_tw686x_class);
>>
>> +static void quirk_relaxedordering_disable(struct pci_dev *dev)
>> +{
>> + if (dev->vendor != PCI_VENDOR_ID_HUAWEI &&
>> + dev->vendor != 

Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

2017-05-06 Thread Alexander Duyck
On Fri, May 5, 2017 at 8:08 PM, Ding Tianhong  wrote:
>
>
> On 2017/5/5 22:04, Alexander Duyck wrote:
>> On Thu, May 4, 2017 at 2:01 PM, Casey Leedom  wrote:
>>> | From: Alexander Duyck 
>>> | Sent: Wednesday, May 3, 2017 9:02 AM
>>> | ...
>>> | It sounds like we are more or less in agreement. My only concern is
>>> | really what we default this to. On x86 I would say we could probably
>>> | default this to disabled for existing platforms since my understanding
>>> | is that relaxed ordering doesn't provide much benefit on what is out
>>> | there right now when performing DMA through the root complex. As far
>>> | as peer-to-peer I would say we should probably look at enabling the
>>> | ability to have Relaxed Ordering enabled for some channels but not
>>> | others. In those cases the hardware needs to be smart enough to allow
>>> | for you to indicate you want it disabled by default for most of your
>>> | DMA channels, and then enabled for the select channels that are
>>> | handling the peer-to-peer traffic.
>>>
>>>   Yes, I think that we are mostly in agreement.  I had just wanted to make
>>> sure that whatever scheme was developed would allow for simultaneously
>>> supporting non-Relaxed Ordering for some PCIe End Points and Relaxed
>>> Ordering for others within the same system.  I.e. not simply
>>> enabling/disabling/etc.  based solely on System Platform Architecture.
>>>
>>>   By the way, I've started our QA folks off looking at what things look like
>>> in Linux Virtual Machines under different Hypervisors to see what
>>> information they may provide to the VM in the way of what Root Complex Port
>>> is being used, etc.  So far they've got Windows HyperV done and there
>>> there's no PCIe Fabric exposed in any way: just the attached device.  I'll
>>> have to see what pci_find_pcie_root_port() returns in that environment.
>>> Maybe NULL?
>>
>> I believe NULL is one of the options. It all depends on what qemu is
>> emulating. Most likely you won't find a pcie root port on KVM because
>> the default is to emulate an older system that only supports PCI.
>>
>>>   With your reservations (which I also share), I think that it probably
>>> makes sense to have a per-architecture definition of the "Can I Use Relaxed
>>> Ordering With TLPs Directed At This End Point" predicate, with the default
>>> being "No" for any architecture which doesn't implement the predicate.  And
>>> if the specified (struct pci_dev *) End Node is NULL, it ought to return
>>> False for that as well.  I can't see any reason to pass in the Source End
>>> Node but I may be missing something.
>>>
>>>   At this point, this is pretty far outside my level of expertise.  I'm
>>> happy to give it a go, but I'd be even happier if someone with a lot more
>>> experience in the PCIe Infrastructure were to want to carry the ball
>>> forward.  I'm not super familiar with the Linux Kernel "Rules Of
>>> Engagement", so let me know what my next step should be.  Thanks.
>>>
>>> Casey
>>
>> For now we can probably keep this on the linux-pci mailing list. Going
>> that route is the most straight forward for now since step one is
>> probably just making sure we are setting the relaxed ordering bit in
>> the setups that make sense. I would say we could probably keep it
>> simple. We just need to enable relaxed ordering by default for SPARC
>> architectures, on most others we can probably default it to off.
>>
>
> Casey, Alexander:
>
> Thanks for the wonderful discussion, it is more clearly that what to do next,
> I agree that enable relaxed ordering by default only for SPARC and ARM64
> is more safe for all the other platform, as no one want to break anything.
>
>> I believe this all had started as Ding Tianhong was hoping to enable
>> this for the ARM architecture. That is the only one I can think of
>> where it might be difficult to figure out which way to default as we
>> were attempting to follow the same code that was enabled for SPARC and
>> that is what started this tug-of-war about how this should be done.
>> What we might do is take care of this in two phases. The first one
>> enables the infrastructure generically but leaves it defaulted to off
>> for everyone but SPARC. Then we can go through and start enabling it
>> for other platforms such as some of those on ARM in the platforms that
>> Ding Tianhong was working with.
>>
>
> According the suggestion, I could only think of this code:
>
> @@ -3979,6 +3979,15 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
>  DECLARE_PCI_FIXUP_CLASS_EARLY(0x1797, 0x6869, PCI_CLASS_NOT_DEFINED, 8,
>   quirk_tw686x_class);
>
> +static void quirk_relaxedordering_disable(struct pci_dev *dev)
> +{
> + if (dev->vendor != PCI_VENDOR_ID_HUAWEI &&
> + dev->vendor != PCI_VENDOR_ID_SUN)
> + dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
> +}
> +DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_INTEL_ID, PCI_ANY_ID, 

Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

2017-05-05 Thread Ding Tianhong


On 2017/5/5 22:04, Alexander Duyck wrote:
> On Thu, May 4, 2017 at 2:01 PM, Casey Leedom  wrote:
>> | From: Alexander Duyck 
>> | Sent: Wednesday, May 3, 2017 9:02 AM
>> | ...
>> | It sounds like we are more or less in agreement. My only concern is
>> | really what we default this to. On x86 I would say we could probably
>> | default this to disabled for existing platforms since my understanding
>> | is that relaxed ordering doesn't provide much benefit on what is out
>> | there right now when performing DMA through the root complex. As far
>> | as peer-to-peer I would say we should probably look at enabling the
>> | ability to have Relaxed Ordering enabled for some channels but not
>> | others. In those cases the hardware needs to be smart enough to allow
>> | for you to indicate you want it disabled by default for most of your
>> | DMA channels, and then enabled for the select channels that are
>> | handling the peer-to-peer traffic.
>>
>>   Yes, I think that we are mostly in agreement.  I had just wanted to make
>> sure that whatever scheme was developed would allow for simultaneously
>> supporting non-Relaxed Ordering for some PCIe End Points and Relaxed
>> Ordering for others within the same system.  I.e. not simply
>> enabling/disabling/etc.  based solely on System Platform Architecture.
>>
>>   By the way, I've started our QA folks off looking at what things look like
>> in Linux Virtual Machines under different Hypervisors to see what
>> information they may provide to the VM in the way of what Root Complex Port
>> is being used, etc.  So far they've got Windows HyperV done and there
>> there's no PCIe Fabric exposed in any way: just the attached device.  I'll
>> have to see what pci_find_pcie_root_port() returns in that environment.
>> Maybe NULL?
> 
> I believe NULL is one of the options. It all depends on what qemu is
> emulating. Most likely you won't find a pcie root port on KVM because
> the default is to emulate an older system that only supports PCI.
> 
>>   With your reservations (which I also share), I think that it probably
>> makes sense to have a per-architecture definition of the "Can I Use Relaxed
>> Ordering With TLPs Directed At This End Point" predicate, with the default
>> being "No" for any architecture which doesn't implement the predicate.  And
>> if the specified (struct pci_dev *) End Node is NULL, it ought to return
>> False for that as well.  I can't see any reason to pass in the Source End
>> Node but I may be missing something.
>>
>>   At this point, this is pretty far outside my level of expertise.  I'm
>> happy to give it a go, but I'd be even happier if someone with a lot more
>> experience in the PCIe Infrastructure were to want to carry the ball
>> forward.  I'm not super familiar with the Linux Kernel "Rules Of
>> Engagement", so let me know what my next step should be.  Thanks.
>>
>> Casey
> 
> For now we can probably keep this on the linux-pci mailing list. Going
> that route is the most straight forward for now since step one is
> probably just making sure we are setting the relaxed ordering bit in
> the setups that make sense. I would say we could probably keep it
> simple. We just need to enable relaxed ordering by default for SPARC
> architectures, on most others we can probably default it to off.
> 

Casey, Alexander:

Thanks for the wonderful discussion, it is more clearly that what to do next,
I agree that enable relaxed ordering by default only for SPARC and ARM64
is more safe for all the other platform, as no one want to break anything.

> I believe this all had started as Ding Tianhong was hoping to enable
> this for the ARM architecture. That is the only one I can think of
> where it might be difficult to figure out which way to default as we
> were attempting to follow the same code that was enabled for SPARC and
> that is what started this tug-of-war about how this should be done.
> What we might do is take care of this in two phases. The first one
> enables the infrastructure generically but leaves it defaulted to off
> for everyone but SPARC. Then we can go through and start enabling it
> for other platforms such as some of those on ARM in the platforms that
> Ding Tianhong was working with.
> 

According the suggestion, I could only think of this code:

@@ -3979,6 +3979,15 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
 DECLARE_PCI_FIXUP_CLASS_EARLY(0x1797, 0x6869, PCI_CLASS_NOT_DEFINED, 8,
  quirk_tw686x_class);

+static void quirk_relaxedordering_disable(struct pci_dev *dev)
+{
+ if (dev->vendor != PCI_VENDOR_ID_HUAWEI &&
+ dev->vendor != PCI_VENDOR_ID_SUN)
+ dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
+}
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_INTEL_ID, PCI_ANY_ID, PCI_CLASS_NOT_DEFINED, 
8,
+   quirk_relaxedordering_disable);
+
 /*
  * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
  * values for the 

Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

2017-05-05 Thread Alexander Duyck
On Thu, May 4, 2017 at 2:01 PM, Casey Leedom  wrote:
> | From: Alexander Duyck 
> | Sent: Wednesday, May 3, 2017 9:02 AM
> | ...
> | It sounds like we are more or less in agreement. My only concern is
> | really what we default this to. On x86 I would say we could probably
> | default this to disabled for existing platforms since my understanding
> | is that relaxed ordering doesn't provide much benefit on what is out
> | there right now when performing DMA through the root complex. As far
> | as peer-to-peer I would say we should probably look at enabling the
> | ability to have Relaxed Ordering enabled for some channels but not
> | others. In those cases the hardware needs to be smart enough to allow
> | for you to indicate you want it disabled by default for most of your
> | DMA channels, and then enabled for the select channels that are
> | handling the peer-to-peer traffic.
>
>   Yes, I think that we are mostly in agreement.  I had just wanted to make
> sure that whatever scheme was developed would allow for simultaneously
> supporting non-Relaxed Ordering for some PCIe End Points and Relaxed
> Ordering for others within the same system.  I.e. not simply
> enabling/disabling/etc.  based solely on System Platform Architecture.
>
>   By the way, I've started our QA folks off looking at what things look like
> in Linux Virtual Machines under different Hypervisors to see what
> information they may provide to the VM in the way of what Root Complex Port
> is being used, etc.  So far they've got Windows HyperV done and there
> there's no PCIe Fabric exposed in any way: just the attached device.  I'll
> have to see what pci_find_pcie_root_port() returns in that environment.
> Maybe NULL?

I believe NULL is one of the options. It all depends on what qemu is
emulating. Most likely you won't find a pcie root port on KVM because
the default is to emulate an older system that only supports PCI.

>   With your reservations (which I also share), I think that it probably
> makes sense to have a per-architecture definition of the "Can I Use Relaxed
> Ordering With TLPs Directed At This End Point" predicate, with the default
> being "No" for any architecture which doesn't implement the predicate.  And
> if the specified (struct pci_dev *) End Node is NULL, it ought to return
> False for that as well.  I can't see any reason to pass in the Source End
> Node but I may be missing something.
>
>   At this point, this is pretty far outside my level of expertise.  I'm
> happy to give it a go, but I'd be even happier if someone with a lot more
> experience in the PCIe Infrastructure were to want to carry the ball
> forward.  I'm not super familiar with the Linux Kernel "Rules Of
> Engagement", so let me know what my next step should be.  Thanks.
>
> Casey

For now we can probably keep this on the linux-pci mailing list. Going
that route is the most straight forward for now since step one is
probably just making sure we are setting the relaxed ordering bit in
the setups that make sense. I would say we could probably keep it
simple. We just need to enable relaxed ordering by default for SPARC
architectures, on most others we can probably default it to off.

I believe this all had started as Ding Tianhong was hoping to enable
this for the ARM architecture. That is the only one I can think of
where it might be difficult to figure out which way to default as we
were attempting to follow the same code that was enabled for SPARC and
that is what started this tug-of-war about how this should be done.
What we might do is take care of this in two phases. The first one
enables the infrastructure generically but leaves it defaulted to off
for everyone but SPARC. Then we can go through and start enabling it
for other platforms such as some of those on ARM in the platforms that
Ding Tianhong was working with.

- Alex


Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

2017-05-04 Thread Casey Leedom
| From: Alexander Duyck 
| Sent: Wednesday, May 3, 2017 9:02 AM
| ...
| It sounds like we are more or less in agreement. My only concern is
| really what we default this to. On x86 I would say we could probably
| default this to disabled for existing platforms since my understanding
| is that relaxed ordering doesn't provide much benefit on what is out
| there right now when performing DMA through the root complex. As far
| as peer-to-peer I would say we should probably look at enabling the
| ability to have Relaxed Ordering enabled for some channels but not
| others. In those cases the hardware needs to be smart enough to allow
| for you to indicate you want it disabled by default for most of your
| DMA channels, and then enabled for the select channels that are
| handling the peer-to-peer traffic.

  Yes, I think that we are mostly in agreement.  I had just wanted to make
sure that whatever scheme was developed would allow for simultaneously
supporting non-Relaxed Ordering for some PCIe End Points and Relaxed
Ordering for others within the same system.  I.e. not simply
enabling/disabling/etc.  based solely on System Platform Architecture.

  By the way, I've started our QA folks off looking at what things look like
in Linux Virtual Machines under different Hypervisors to see what
information they may provide to the VM in the way of what Root Complex Port
is being used, etc.  So far they've got Windows HyperV done and there
there's no PCIe Fabric exposed in any way: just the attached device.  I'll
have to see what pci_find_pcie_root_port() returns in that environment.
Maybe NULL?

  With your reservations (which I also share), I think that it probably
makes sense to have a per-architecture definition of the "Can I Use Relaxed
Ordering With TLPs Directed At This End Point" predicate, with the default
being "No" for any architecture which doesn't implement the predicate.  And
if the specified (struct pci_dev *) End Node is NULL, it ought to return
False for that as well.  I can't see any reason to pass in the Source End
Node but I may be missing something.

  At this point, this is pretty far outside my level of expertise.  I'm
happy to give it a go, but I'd be even happier if someone with a lot more
experience in the PCIe Infrastructure were to want to carry the ball
forward.  I'm not super familiar with the Linux Kernel "Rules Of
Engagement", so let me know what my next step should be.  Thanks.

Casey


Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

2017-05-03 Thread Alexander Duyck
On Tue, May 2, 2017 at 9:30 PM, Casey Leedom  wrote:
> | From: Alexander Duyck 
> | Date: Tuesday, May 2, 2017 11:10 AM
> | ...
> | So for example, in the case of x86 it seems like there are multiple
> | root complexes that have issues, and the gains for enabling it with
> | standard DMA to host memory are small. As such we may want to default
> | it to off via the architecture specific PCIe code and then look at
> | having "white-list" cases where we enable it for things like
> | peer-to-peer accesses. In the case of SPARC we could look at
> | defaulting it to on, and only "black-list" any cases where there might
> | be issues since SPARC relies on this in a significant way for
> | performance. In the case of ARM and other architectures it is a bit of
> | a toss-up. I would say we could just default it to on for now and
> | "black-list" anything that doesn't work, or we could go the other way
> | like I suggested for x86. It all depends on what the ARM community
> | would want to agree on for this. I would say unless it makes a
> | significant difference like it does in the case of SPARC we are
> | probably better off just defaulting it to off.
>
>   Sorry, I forgot to respond to this earlier when someone was rushing me out
> to a customer dinner.
>
>   I'm unaware of any other x86 Root Complex Port that has a problem with
> Relaxed Ordering other than the performance issue with the current Intel
> implementation.  Ashok tells me that Intel is in the final stages of putting
> together a technical notice on this issue but I don't know when that will
> come out.  Hopefully that will shed much more light on the cause and actual
> use of Relaxed Ordering when directed to Coherent Memory on current and past
> Intel platforms.  (Note that the performance bug seems to limit us to
> ~75-85Gb/s DMA Write speed to Coherent Host Memory.)

So my concern isn't so much about existing issues as it is about where
is the advantage in enabling it. We have had support in the Intel
hardware for enabling relaxed ordering for about 10 years. In all that
time I have yet to see an x86 platform that sees any real benefit from
enabling it for standard DMA. That is why my preference would be to
leave it disabled by default on x86 and we white list it in at some
point when hardware shows that there is a benefit to be had for
enabling it.

>   The only other Device that I currently know of which has issues with
> Relaxed Ordering TLPs directed at it, is also a Root Complex Port on the new
> AMD A1100 ARM SoC ("SEATTLE").  There we have an actual bug which could lead
> to Data Corruption.
>
>   But I don't know anything about other x86 Root Complex Ports having issues
> with this -- we've been using it ever since commit ef306b50b from December
> 2010.

So the question I would have for you then, is what benefits have you
seen from enabling it on x86? In our case we haven't seen any for
transactions that go through the root complex. If you are seeing
benefits would I be correct in assuming it is for your peer-to-peer
case or were there some x86 platforms that showed gains?

>   Also, I'm not aware of any performance data which has been gathered on the
> use of Relaxed Ordering when directed at Host Memory.  From your note, it
> sounds like it's important on SPARC architectures.  But it could conceivably
> be important on any architecture or Root Complex/Memory Controller
> implementation.  We use it to direct Ingress Packet Data to Free List
> Buffers, followed by a TLP without Relaxed Ordering directed at a Host
> Memory Message Queue.  The idea here is to give the Root Complex options on
> which DMA Memory Write TLPs to process in order to optimize data placement
> in memory.  And with the next Ethernet speed bump to 200Gb/s this may be
> even more important.

The operative term here is "may be". That is my concern. We are
leaving this enabled by default and there are known hardware that have
issues that can be pretty serious. I'm not saying we have to disable
it and keep it disabled, but I would like to see us intelligently
enable this feature so that it is enabled on the platforms that show
benefit and disabled on the ones that don't.

My biggest concern with all this is introducing regressions as drivers
like igb and ixgbe are used on a wide range of platforms beyond even
what is covered by x86 and I would prefer not to suddenly have a
deluge of bugs to sort out triggered by us enabling relaxed ordering
on platforms that historically have not had it enabled on them.

>   Basically, I'd hate to come up with a solution where we write off the use
> of Relaxed Ordering for DMA Writes to Host Memory.  I don't think you're
> suggesting that, but there are a number of x86 Root Complex implementations
> out there -- and some like the new AMD Ryzen have yet to be tested -- as
> well as other architectures.

I'm not saying that we have to write off the use of Relaxed Ordering,
what I am saying 

Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

2017-05-02 Thread Casey Leedom
| From: Alexander Duyck 
| Date: Tuesday, May 2, 2017 11:10 AM
| ...
| So for example, in the case of x86 it seems like there are multiple
| root complexes that have issues, and the gains for enabling it with
| standard DMA to host memory are small. As such we may want to default
| it to off via the architecture specific PCIe code and then look at
| having "white-list" cases where we enable it for things like
| peer-to-peer accesses. In the case of SPARC we could look at
| defaulting it to on, and only "black-list" any cases where there might
| be issues since SPARC relies on this in a significant way for
| performance. In the case of ARM and other architectures it is a bit of
| a toss-up. I would say we could just default it to on for now and
| "black-list" anything that doesn't work, or we could go the other way
| like I suggested for x86. It all depends on what the ARM community
| would want to agree on for this. I would say unless it makes a
| significant difference like it does in the case of SPARC we are
| probably better off just defaulting it to off.

  Sorry, I forgot to respond to this earlier when someone was rushing me out
to a customer dinner.

  I'm unaware of any other x86 Root Complex Port that has a problem with
Relaxed Ordering other than the performance issue with the current Intel
implementation.  Ashok tells me that Intel is in the final stages of putting
together a technical notice on this issue but I don't know when that will
come out.  Hopefully that will shed much more light on the cause and actual
use of Relaxed Ordering when directed to Coherent Memory on current and past
Intel platforms.  (Note that the performance bug seems to limit us to
~75-85Gb/s DMA Write speed to Coherent Host Memory.)

  The only other Device that I currently know of which has issues with
Relaxed Ordering TLPs directed at it, is also a Root Complex Port on the new
AMD A1100 ARM SoC ("SEATTLE").  There we have an actual bug which could lead
to Data Corruption.

  But I don't know anything about other x86 Root Complex Ports having issues
with this -- we've been using it ever since commit ef306b50b from December
2010.

  Also, I'm not aware of any performance data which has been gathered on the
use of Relaxed Ordering when directed at Host Memory.  From your note, it
sounds like it's important on SPARC architectures.  But it could conceivably
be important on any architecture or Root Complex/Memory Controller
implementation.  We use it to direct Ingress Packet Data to Free List
Buffers, followed by a TLP without Relaxed Ordering directed at a Host
Memory Message Queue.  The idea here is to give the Root Complex options on
which DMA Memory Write TLPs to process in order to optimize data placement
in memory.  And with the next Ethernet speed bump to 200Gb/s this may be
even more important.

  Basically, I'd hate to come up with a solution where we write off the use
of Relaxed Ordering for DMA Writes to Host Memory.  I don't think you're
suggesting that, but there are a number of x86 Root Complex implementations
out there -- and some like the new AMD Ryzen have yet to be tested -- as
well as other architectures.

  And, as Ashok and I have both nothed, any solution we come up with needs
to cope with a heterogeneous situation where, on the same PCIe Fabric, it
may be necessesary/desireable to support Relaxed Ordering TLPs directed at
some End Nodes but not others.

Casey


Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

2017-05-02 Thread Alexander Duyck
On Tue, May 2, 2017 at 12:34 PM, Raj, Ashok  wrote:
> On Tue, May 02, 2017 at 11:10:22AM -0700, Alexander Duyck wrote:
>> On Tue, May 2, 2017 at 9:53 AM, Raj, Ashok  wrote:
>> > On Tue, May 02, 2017 at 09:39:34AM -0700, Alexander Duyck wrote:
>> >> On Mon, May 1, 2017 at 4:13 PM, Casey Leedom  wrote:
>> >> > The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the 
>> >> > Relaxed
>> >> > Ordering Attribute should not be used on Transaction Layer Packets 
>> >> > destined
>> >> > for the PCIe End Node so flagged.  Initially flagged this way are Intel
>> >> > E5-26xx Root Complex Ports which suffer from a Flow Control Credit
>> >> > Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports 
>> >> > which
>> >> > don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.
>> >>
>> >> So this is a good first step though might I suggest one other change.
>> >>
>> >> We may want to add logic to the core PCIe code that clears the "Enable
>> >> Relaxed Ordering" bit in the device control register for all devices
>> >> hanging off of this root complex. Assuming the devices conform to the
>> >> PCIe spec doing that should disable relaxed ordering in a device
>> >> agnostic way that then enables us at a driver level to just enable the
>> >> feature always without having to perform any checks for your flag. We
>> >> could probably do that as a part of probing the PCIe interfaces
>> >> hanging off of these devices.
>> >
>> > I suppose you don't want to turn off RO completely on the device. When
>> > traffic is targetted to mmio for peer to peer reasons RO has performance
>> > upside. The specific issue with these root ports indicate limitation using
>> > RO for traffic targetting coherent memory.
>>
>> Actually my main concern here is virtualization. If I take the PCIe
>> function and direct assign it I have no way of seeing the root complex
>> flag as it is now virtualized away. In the meantime the guest now has
>> the ability to enable the function and sees nothing that says you
>> can't enable relaxed ordering which in turn ends up potentially
>> causing data corruption on the system. I want relaxed ordering
>> disabled before I even consider assigning it to the guest on the
>> systems where this would be an issue.
>>
>> I prefer to err on the side of caution with this. Enabling Relaxed
>> Ordering is technically a performance enhancement, so we function but
>> not as well as we would like, while having it enabled when there are
>> issues can lead to data corruption. I would weigh the risk of data
>> corruption the thing to be avoided and of much higher priority than
>> enabling improved performance. As such I say we should default the
>> relaxed ordering attribute to off in general and look at
>> "white-listing" it in for various architectures and/or chipsets that
>> support/need it rather than having it enabled by default and trying to
>> switch it off after the fact when we find some new issue.
>
> I agree, after thinking about it a bit more.. even for transactions going to
> p2p, i'm just reading the pcie spec and some sections aren't super clear
> about completion redirect and ACS rules for p2p.
>
> Also it appears the device control default value is 1 for enabling
> Relaxed Ordering. This means we should probably save these states across
> resets/FLR for e.g. To ensure perf isn't affected after a FLR.

Right. That should happen automatically with the PCIe configuration
being saved/restored.

- Alex


Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

2017-05-02 Thread Raj, Ashok
On Tue, May 02, 2017 at 11:10:22AM -0700, Alexander Duyck wrote:
> On Tue, May 2, 2017 at 9:53 AM, Raj, Ashok  wrote:
> > On Tue, May 02, 2017 at 09:39:34AM -0700, Alexander Duyck wrote:
> >> On Mon, May 1, 2017 at 4:13 PM, Casey Leedom  wrote:
> >> > The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
> >> > Ordering Attribute should not be used on Transaction Layer Packets 
> >> > destined
> >> > for the PCIe End Node so flagged.  Initially flagged this way are Intel
> >> > E5-26xx Root Complex Ports which suffer from a Flow Control Credit
> >> > Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports 
> >> > which
> >> > don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.
> >>
> >> So this is a good first step though might I suggest one other change.
> >>
> >> We may want to add logic to the core PCIe code that clears the "Enable
> >> Relaxed Ordering" bit in the device control register for all devices
> >> hanging off of this root complex. Assuming the devices conform to the
> >> PCIe spec doing that should disable relaxed ordering in a device
> >> agnostic way that then enables us at a driver level to just enable the
> >> feature always without having to perform any checks for your flag. We
> >> could probably do that as a part of probing the PCIe interfaces
> >> hanging off of these devices.
> >
> > I suppose you don't want to turn off RO completely on the device. When
> > traffic is targetted to mmio for peer to peer reasons RO has performance
> > upside. The specific issue with these root ports indicate limitation using
> > RO for traffic targetting coherent memory.
> 
> Actually my main concern here is virtualization. If I take the PCIe
> function and direct assign it I have no way of seeing the root complex
> flag as it is now virtualized away. In the meantime the guest now has
> the ability to enable the function and sees nothing that says you
> can't enable relaxed ordering which in turn ends up potentially
> causing data corruption on the system. I want relaxed ordering
> disabled before I even consider assigning it to the guest on the
> systems where this would be an issue.
> 
> I prefer to err on the side of caution with this. Enabling Relaxed
> Ordering is technically a performance enhancement, so we function but
> not as well as we would like, while having it enabled when there are
> issues can lead to data corruption. I would weigh the risk of data
> corruption the thing to be avoided and of much higher priority than
> enabling improved performance. As such I say we should default the
> relaxed ordering attribute to off in general and look at
> "white-listing" it in for various architectures and/or chipsets that
> support/need it rather than having it enabled by default and trying to
> switch it off after the fact when we find some new issue.

I agree, after thinking about it a bit more.. even for transactions going to
p2p, i'm just reading the pcie spec and some sections aren't super clear
about completion redirect and ACS rules for p2p.

Also it appears the device control default value is 1 for enabling
Relaxed Ordering. This means we should probably save these states across
resets/FLR for e.g. To ensure perf isn't affected after a FLR.
> 
> So for example, in the case of x86 it seems like there are multiple
> root complexes that have issues, and the gains for enabling it with
> standard DMA to host memory are small. As such we may want to default
> it to off via the architecture specific PCIe code and then look at
> having "white-list" cases where we enable it for things like
> peer-to-peer accesses. In the case of SPARC we could look at
> defaulting it to on, and only "black-list" any cases where there might
> be issues since SPARC relies on this in a significant way for
> performance. In the case of ARM and other architectures it is a bit of
> a toss-up. I would say we could just default it to on for now and
> "black-list" anything that doesn't work, or we could go the other way
> like I suggested for x86. It all depends on what the ARM community
> would want to agree on for this. I would say unless it makes a
> significant difference like it does in the case of SPARC we are
> probably better off just defaulting it to off.
> 
> - Alex


Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

2017-05-02 Thread Alexander Duyck
On Tue, May 2, 2017 at 9:53 AM, Raj, Ashok  wrote:
> On Tue, May 02, 2017 at 09:39:34AM -0700, Alexander Duyck wrote:
>> On Mon, May 1, 2017 at 4:13 PM, Casey Leedom  wrote:
>> > The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
>> > Ordering Attribute should not be used on Transaction Layer Packets destined
>> > for the PCIe End Node so flagged.  Initially flagged this way are Intel
>> > E5-26xx Root Complex Ports which suffer from a Flow Control Credit
>> > Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
>> > don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.
>>
>> So this is a good first step though might I suggest one other change.
>>
>> We may want to add logic to the core PCIe code that clears the "Enable
>> Relaxed Ordering" bit in the device control register for all devices
>> hanging off of this root complex. Assuming the devices conform to the
>> PCIe spec doing that should disable relaxed ordering in a device
>> agnostic way that then enables us at a driver level to just enable the
>> feature always without having to perform any checks for your flag. We
>> could probably do that as a part of probing the PCIe interfaces
>> hanging off of these devices.
>
> I suppose you don't want to turn off RO completely on the device. When
> traffic is targetted to mmio for peer to peer reasons RO has performance
> upside. The specific issue with these root ports indicate limitation using
> RO for traffic targetting coherent memory.

Actually my main concern here is virtualization. If I take the PCIe
function and direct assign it I have no way of seeing the root complex
flag as it is now virtualized away. In the meantime the guest now has
the ability to enable the function and sees nothing that says you
can't enable relaxed ordering which in turn ends up potentially
causing data corruption on the system. I want relaxed ordering
disabled before I even consider assigning it to the guest on the
systems where this would be an issue.

I prefer to err on the side of caution with this. Enabling Relaxed
Ordering is technically a performance enhancement, so we function but
not as well as we would like, while having it enabled when there are
issues can lead to data corruption. I would weigh the risk of data
corruption the thing to be avoided and of much higher priority than
enabling improved performance. As such I say we should default the
relaxed ordering attribute to off in general and look at
"white-listing" it in for various architectures and/or chipsets that
support/need it rather than having it enabled by default and trying to
switch it off after the fact when we find some new issue.

So for example, in the case of x86 it seems like there are multiple
root complexes that have issues, and the gains for enabling it with
standard DMA to host memory are small. As such we may want to default
it to off via the architecture specific PCIe code and then look at
having "white-list" cases where we enable it for things like
peer-to-peer accesses. In the case of SPARC we could look at
defaulting it to on, and only "black-list" any cases where there might
be issues since SPARC relies on this in a significant way for
performance. In the case of ARM and other architectures it is a bit of
a toss-up. I would say we could just default it to on for now and
"black-list" anything that doesn't work, or we could go the other way
like I suggested for x86. It all depends on what the ARM community
would want to agree on for this. I would say unless it makes a
significant difference like it does in the case of SPARC we are
probably better off just defaulting it to off.

- Alex


Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

2017-05-02 Thread Raj, Ashok
On Tue, May 02, 2017 at 09:39:34AM -0700, Alexander Duyck wrote:
> On Mon, May 1, 2017 at 4:13 PM, Casey Leedom  wrote:
> > The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
> > Ordering Attribute should not be used on Transaction Layer Packets destined
> > for the PCIe End Node so flagged.  Initially flagged this way are Intel
> > E5-26xx Root Complex Ports which suffer from a Flow Control Credit
> > Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
> > don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.
> 
> So this is a good first step though might I suggest one other change.
> 
> We may want to add logic to the core PCIe code that clears the "Enable
> Relaxed Ordering" bit in the device control register for all devices
> hanging off of this root complex. Assuming the devices conform to the
> PCIe spec doing that should disable relaxed ordering in a device
> agnostic way that then enables us at a driver level to just enable the
> feature always without having to perform any checks for your flag. We
> could probably do that as a part of probing the PCIe interfaces
> hanging off of these devices.

I suppose you don't want to turn off RO completely on the device. When
traffic is targetted to mmio for peer to peer reasons RO has performance
upside. The specific issue with these root ports indicate limitation using 
RO for traffic targetting coherent memory.

> 
> > ---
> >  drivers/pci/quirks.c | 38 ++
> >  include/linux/pci.h  |  2 ++
> >  2 files changed, 40 insertions(+)
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index f754453..4ae78b3 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -3979,6 +3979,44 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
> >   quirk_tw686x_class);
> >
> >  /*
> > + * Some devices have problems with Transaction Layer Packets with the 
> > Relaxed
> > + * Ordering Attribute set.  Such devices should mark themselves and other
> > + * Device Drivers should check before sending TLPs with RO set.
> > + */
> > +static void quirk_relaxedordering_disable(struct pci_dev *dev)
> > +{
> > +   dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
> > +}
> > +
> > +/*
> > + * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
> > + * cause performance problems with Upstream Transaction Layer Packets with
> > + * Relaxed Ordering set.
> > + */
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
> > + quirk_relaxedordering_disable);
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
> > + quirk_relaxedordering_disable);
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
> > + quirk_relaxedordering_disable);
> > +
> > +/*
> > + * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
> > + * where Upstream Transaction Layer Packets with the Relaxed Ordering
> > + * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
> > + * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
> > + * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 
> > 3.0
> > + * November 10, 2010).  As a result, on this platform we can't use Relaxed
> > + * Ordering for Upstream TLPs.
> > + */
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a00, PCI_CLASS_NOT_DEFINED, 8,
> > + quirk_relaxedordering_disable);
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a01, PCI_CLASS_NOT_DEFINED, 8,
> > + quirk_relaxedordering_disable);
> > +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a02, PCI_CLASS_NOT_DEFINED, 8,
> > + quirk_relaxedordering_disable);
> > +
> > +/*
> >   * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
> >   * values for the Attribute as were supplied in the header of the
> >   * corresponding Request, except as explicitly allowed when IDO is used."
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index eb3da1a..6764f66 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -178,6 +178,8 @@ enum pci_dev_flags {
> > PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
> > /* Get VPD from function 0 VPD */
> > PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
> > +   /* Don't use Relaxed Ordering for TLPs directed at this device */
> > +   PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 
> > 9),
> >  };
> >
> >  enum pci_irq_reroute_variant {
> > --
> > 1.9.1
> >


Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

2017-05-02 Thread Raj, Ashok
Hi Casey


On Mon, May 01, 2017 at 04:13:50PM -0700, Casey Leedom wrote:
> The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
> Ordering Attribute should not be used on Transaction Layer Packets destined
> for the PCIe End Node so flagged.  Initially flagged this way are Intel
> E5-26xx Root Complex Ports which suffer from a Flow Control Credit
> Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
> don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.
> ---
>  drivers/pci/quirks.c | 38 ++
>  include/linux/pci.h  |  2 ++
>  2 files changed, 40 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index f754453..4ae78b3 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3979,6 +3979,44 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
> quirk_tw686x_class);
>  
>  /*
> + * Some devices have problems with Transaction Layer Packets with the Relaxed
> + * Ordering Attribute set.  Such devices should mark themselves and other
> + * Device Drivers should check before sending TLPs with RO set.
> + */
> +static void quirk_relaxedordering_disable(struct pci_dev *dev)
> +{
> + dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
> +}
> +
> +/*
> + * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
> + * cause performance problems with Upstream Transaction Layer Packets with
> + * Relaxed Ordering set.
> + */
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
> +   quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
> +   quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
> +   quirk_relaxedordering_disable);
> +

You might want to add the RP ID's for both HSX/BDX. Tne entire range 
is 2F01H-2F0EH & 6F01H-6F0EH.

Cheers,
Ashok


Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

2017-05-02 Thread Alexander Duyck
On Mon, May 1, 2017 at 4:13 PM, Casey Leedom  wrote:
> The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
> Ordering Attribute should not be used on Transaction Layer Packets destined
> for the PCIe End Node so flagged.  Initially flagged this way are Intel
> E5-26xx Root Complex Ports which suffer from a Flow Control Credit
> Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
> don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.

So this is a good first step though might I suggest one other change.

We may want to add logic to the core PCIe code that clears the "Enable
Relaxed Ordering" bit in the device control register for all devices
hanging off of this root complex. Assuming the devices conform to the
PCIe spec doing that should disable relaxed ordering in a device
agnostic way that then enables us at a driver level to just enable the
feature always without having to perform any checks for your flag. We
could probably do that as a part of probing the PCIe interfaces
hanging off of these devices.

> ---
>  drivers/pci/quirks.c | 38 ++
>  include/linux/pci.h  |  2 ++
>  2 files changed, 40 insertions(+)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index f754453..4ae78b3 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3979,6 +3979,44 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
>   quirk_tw686x_class);
>
>  /*
> + * Some devices have problems with Transaction Layer Packets with the Relaxed
> + * Ordering Attribute set.  Such devices should mark themselves and other
> + * Device Drivers should check before sending TLPs with RO set.
> + */
> +static void quirk_relaxedordering_disable(struct pci_dev *dev)
> +{
> +   dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
> +}
> +
> +/*
> + * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
> + * cause performance problems with Upstream Transaction Layer Packets with
> + * Relaxed Ordering set.
> + */
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
> + quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
> + quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
> + quirk_relaxedordering_disable);
> +
> +/*
> + * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
> + * where Upstream Transaction Layer Packets with the Relaxed Ordering
> + * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
> + * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
> + * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
> + * November 10, 2010).  As a result, on this platform we can't use Relaxed
> + * Ordering for Upstream TLPs.
> + */
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a00, PCI_CLASS_NOT_DEFINED, 8,
> + quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a01, PCI_CLASS_NOT_DEFINED, 8,
> + quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a02, PCI_CLASS_NOT_DEFINED, 8,
> + quirk_relaxedordering_disable);
> +
> +/*
>   * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
>   * values for the Attribute as were supplied in the header of the
>   * corresponding Request, except as explicitly allowed when IDO is used."
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index eb3da1a..6764f66 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -178,6 +178,8 @@ enum pci_dev_flags {
> PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
> /* Get VPD from function 0 VPD */
> PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
> +   /* Don't use Relaxed Ordering for TLPs directed at this device */
> +   PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 
> 9),
>  };
>
>  enum pci_irq_reroute_variant {
> --
> 1.9.1
>


Re: [PATCH 1/2] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

2017-05-02 Thread Ding Tianhong
hi, Casey:

On 2017/5/2 7:13, Casey Leedom wrote:
> The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
> Ordering Attribute should not be used on Transaction Layer Packets destined
> for the PCIe End Node so flagged.  Initially flagged this way are Intel
> E5-26xx Root Complex Ports which suffer from a Flow Control Credit
> Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
> don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.
> ---
>  drivers/pci/quirks.c | 38 ++
>  include/linux/pci.h  |  2 ++
>  2 files changed, 40 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index f754453..4ae78b3 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3979,6 +3979,44 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
> quirk_tw686x_class);
>  
>  /*
> + * Some devices have problems with Transaction Layer Packets with the Relaxed
> + * Ordering Attribute set.  Such devices should mark themselves and other
> + * Device Drivers should check before sending TLPs with RO set.
> + */
> +static void quirk_relaxedordering_disable(struct pci_dev *dev)
> +{
> + dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
> +}
> +
> +/*
> + * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
> + * cause performance problems with Upstream Transaction Layer Packets with
> + * Relaxed Ordering set.
> + */
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
> +   quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
> +   quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x8086, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
> +   quirk_relaxedordering_disable);
> +
> +/*
> + * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
> + * where Upstream Transaction Layer Packets with the Relaxed Ordering
> + * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
> + * set.  This is a violation of the PCIe 3.0 Transaction Ordering Rules
> + * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
> + * November 10, 2010).  As a result, on this platform we can't use Relaxed
> + * Ordering for Upstream TLPs.
> + */
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a00, PCI_CLASS_NOT_DEFINED, 8,
> +   quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a01, PCI_CLASS_NOT_DEFINED, 8,
> +   quirk_relaxedordering_disable);
> +DECLARE_PCI_FIXUP_CLASS_EARLY(0x1022, 0x1a02, PCI_CLASS_NOT_DEFINED, 8,
> +   quirk_relaxedordering_disable);
> +
> +/*
>   * Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
>   * values for the Attribute as were supplied in the header of the
>   * corresponding Request, except as explicitly allowed when IDO is used."
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index eb3da1a..6764f66 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -178,6 +178,8 @@ enum pci_dev_flags {
>   PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
>   /* Get VPD from function 0 VPD */
>   PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
> + /* Don't use Relaxed Ordering for TLPs directed at this device */
> + PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 9),
>  };

What about add a new general func to check the RO for several drivers to use 
them ?

just like:

#define pci_dev_support_relaxed_ordering(struct pci_dev *root) \
(!(root->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING))

Thanks
Ding

>  
>  enum pci_irq_reroute_variant {
>