Re: [PATCH v11 0/5] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-15 Thread Ding Tianhong


On 2017/8/15 22:03, Eric Dumazet wrote:
> On Tue, 2017-08-15 at 06:58 -0700, Eric Dumazet wrote:
>> On Mon, 2017-08-14 at 22:15 -0700, David Miller wrote:
>>> From: Ding Tianhong 
>>> Date: Tue, 15 Aug 2017 11:23:22 +0800
>>>
 Some devices have problems with Transaction Layer Packets with the Relaxed
 Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
 PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
 devices with Relaxed Ordering issues, and a use of this new flag by the
 cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
 Ports.
>>>  ...
>>>
>>> Series applied, thanks.
>>
>> I got a NULL deref in pci_find_pcie_root_port()
>>
> 
> This was :
> 
> [4.241029] BUG: unable to handle kernel NULL pointer dereference at 
> 0050
> [4.247001] IP: pci_find_pcie_root_port+0x62/0x80
> [4.253011] PGD 0 
> [4.253011] P4D 0 
> [4.253011] 
> [4.258013] Oops:  [#1] SMP DEBUG_PAGEALLOC
> [4.262015] Modules linked in:
> [4.265005] CPU: 31 PID: 1 Comm: swapper/0 Not tainted 4.13.0-dbx-DEV #316
> [4.271002] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.40.0 06/22/2016
> [4.279002] task: a2ee38cfa040 task.stack: a51ec0004000
> [4.285001] RIP: 0010:pci_find_pcie_root_port+0x62/0x80
> [4.290012] RSP: :a51ec0007ab8 EFLAGS: 00010246
> [4.295003] RAX:  RBX: a2ee36bae000 RCX: 
> 0006
> [4.303002] RDX: 081c RSI: a2ee38cfa8c8 RDI: 
> a2ee36bae000
> [4.310013] RBP: a51ec0007b58 R08: 0001 R09: 
> 
> [4.317001] R10:  R11:  R12: 
> a51ec0007ad0
> [4.324005] R13: a2ee36bae098 R14: 0002 R15: 
> a2ee37204818
> [4.331002] FS:  () GS:a2ee3fcc() 
> knlGS:
> [4.339002] CS:  0010 DS:  ES:  CR0: 80050033
> [4.345001] CR2: 0050 CR3: 00401000f000 CR4: 
> 001406e0
> [4.351002] Call Trace:
> [4.354012]  ? pci_configure_device+0x19f/0x570
> [4.359002]  ? pci_conf1_read+0xb8/0xf0
> [4.363002]  ? raw_pci_read+0x23/0x40
> [4.366011]  ? pci_read+0x2c/0x30
> [4.370014]  ? pci_read_config_word+0x67/0x70
> [4.374012]  pci_device_add+0x28/0x230
> [4.378012]  ? pci_vpd_f0_read+0x50/0x80
> [4.382014]  pci_scan_single_device+0x96/0xc0
> [4.386012]  pci_scan_slot+0x79/0xf0
> [4.389001]  pci_scan_child_bus+0x31/0x180
> [4.394014]  acpi_pci_root_create+0x1c6/0x240
> [4.398013]  pci_acpi_scan_root+0x15f/0x1b0
> [4.402012]  acpi_pci_root_add+0x2e6/0x400
> [4.406012]  ? acpi_evaluate_integer+0x37/0x60
> [4.411002]  acpi_bus_attach+0xdf/0x200
> [4.415002]  acpi_bus_attach+0x6a/0x200
> [4.418014]  acpi_bus_attach+0x6a/0x200
> [4.422013]  acpi_bus_scan+0x38/0x70
> [4.426011]  acpi_scan_init+0x10c/0x271
> [4.429001]  acpi_init+0x2fa/0x348
> [4.433004]  ? acpi_sleep_proc_init+0x2d/0x2d
> [4.437001]  do_one_initcall+0x43/0x169
> [4.441001]  kernel_init_freeable+0x1d0/0x258
> [4.445003]  ? rest_init+0xe0/0xe0
> [4.449001]  kernel_init+0xe/0x150
> [4.451002]  ret_from_fork+0x27/0x40
> [4.457004] Code: 85 d2 74 27 80 7a 4a 00 74 21 48 89 d0 48 89 c2 f6 80 1b 
> 09 00 00 10 74 07 48 8b 90 a0 0a 00 00 48 8b 52 10 48 83 7a 10 00 75 d0 <0f> 
> b7 50 50 5d 81 e2 f0 00 00 00 83 fa 40 ba 00 00 00 00 48 0f 
> [4.474012] RIP: pci_find_pcie_root_port+0x62/0x80 RSP: a51ec0007ab8
> [4.481004] CR2: 0050
> [4.484001] ---[ end trace 6f9be6a057581199 ]---
> [4.488001] Kernel panic - not syncing: Fatal exception
> [4.494013] Rebooting in 10 seconds..
> [4.494013] ACPI MEMORY or I/O RESET_REG.
> 
>>
>> This local hack seems to fix the issue.
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 
>> af0cc3456dc1b48b1325c06c5edd2ca8cc22a640..cfd8eb5a3d0ba8347d44952ffab28d9c761044d3
>>  100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -522,7 +522,7 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev 
>> *dev)
>> bridge = pci_upstream_bridge(bridge);
>> }
>>  
>> -   if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT)
>> +   if (highest_pcie_bridge && pci_pcie_type(highest_pcie_bridge) != 
>> PCI_EXP_TYPE_ROOT_PORT)
>> return NULL;
>>  
>> return highest_pcie_bridge;
> 

It is very strange that I could not reproduce this problem on my server which 
is Xeon 2690v3,
but it is really a obviously issue when the dev could not find a upstream 
bridge in the
pci_find_pcie_root_port(), so the better way is just like your did in this 
patch. Thanks.

Regards
Tianhong

> 
> 
> .
> 



Re: [PATCH v11 0/5] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-15 Thread Ding Tianhong


On 2017/8/15 22:03, Eric Dumazet wrote:
> On Tue, 2017-08-15 at 06:58 -0700, Eric Dumazet wrote:
>> On Mon, 2017-08-14 at 22:15 -0700, David Miller wrote:
>>> From: Ding Tianhong 
>>> Date: Tue, 15 Aug 2017 11:23:22 +0800
>>>
 Some devices have problems with Transaction Layer Packets with the Relaxed
 Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
 PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
 devices with Relaxed Ordering issues, and a use of this new flag by the
 cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
 Ports.
>>>  ...
>>>
>>> Series applied, thanks.
>>
>> I got a NULL deref in pci_find_pcie_root_port()
>>
> 
> This was :
> 
> [4.241029] BUG: unable to handle kernel NULL pointer dereference at 
> 0050
> [4.247001] IP: pci_find_pcie_root_port+0x62/0x80
> [4.253011] PGD 0 
> [4.253011] P4D 0 
> [4.253011] 
> [4.258013] Oops:  [#1] SMP DEBUG_PAGEALLOC
> [4.262015] Modules linked in:
> [4.265005] CPU: 31 PID: 1 Comm: swapper/0 Not tainted 4.13.0-dbx-DEV #316
> [4.271002] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.40.0 06/22/2016
> [4.279002] task: a2ee38cfa040 task.stack: a51ec0004000
> [4.285001] RIP: 0010:pci_find_pcie_root_port+0x62/0x80
> [4.290012] RSP: :a51ec0007ab8 EFLAGS: 00010246
> [4.295003] RAX:  RBX: a2ee36bae000 RCX: 
> 0006
> [4.303002] RDX: 081c RSI: a2ee38cfa8c8 RDI: 
> a2ee36bae000
> [4.310013] RBP: a51ec0007b58 R08: 0001 R09: 
> 
> [4.317001] R10:  R11:  R12: 
> a51ec0007ad0
> [4.324005] R13: a2ee36bae098 R14: 0002 R15: 
> a2ee37204818
> [4.331002] FS:  () GS:a2ee3fcc() 
> knlGS:
> [4.339002] CS:  0010 DS:  ES:  CR0: 80050033
> [4.345001] CR2: 0050 CR3: 00401000f000 CR4: 
> 001406e0
> [4.351002] Call Trace:
> [4.354012]  ? pci_configure_device+0x19f/0x570
> [4.359002]  ? pci_conf1_read+0xb8/0xf0
> [4.363002]  ? raw_pci_read+0x23/0x40
> [4.366011]  ? pci_read+0x2c/0x30
> [4.370014]  ? pci_read_config_word+0x67/0x70
> [4.374012]  pci_device_add+0x28/0x230
> [4.378012]  ? pci_vpd_f0_read+0x50/0x80
> [4.382014]  pci_scan_single_device+0x96/0xc0
> [4.386012]  pci_scan_slot+0x79/0xf0
> [4.389001]  pci_scan_child_bus+0x31/0x180
> [4.394014]  acpi_pci_root_create+0x1c6/0x240
> [4.398013]  pci_acpi_scan_root+0x15f/0x1b0
> [4.402012]  acpi_pci_root_add+0x2e6/0x400
> [4.406012]  ? acpi_evaluate_integer+0x37/0x60
> [4.411002]  acpi_bus_attach+0xdf/0x200
> [4.415002]  acpi_bus_attach+0x6a/0x200
> [4.418014]  acpi_bus_attach+0x6a/0x200
> [4.422013]  acpi_bus_scan+0x38/0x70
> [4.426011]  acpi_scan_init+0x10c/0x271
> [4.429001]  acpi_init+0x2fa/0x348
> [4.433004]  ? acpi_sleep_proc_init+0x2d/0x2d
> [4.437001]  do_one_initcall+0x43/0x169
> [4.441001]  kernel_init_freeable+0x1d0/0x258
> [4.445003]  ? rest_init+0xe0/0xe0
> [4.449001]  kernel_init+0xe/0x150
> [4.451002]  ret_from_fork+0x27/0x40
> [4.457004] Code: 85 d2 74 27 80 7a 4a 00 74 21 48 89 d0 48 89 c2 f6 80 1b 
> 09 00 00 10 74 07 48 8b 90 a0 0a 00 00 48 8b 52 10 48 83 7a 10 00 75 d0 <0f> 
> b7 50 50 5d 81 e2 f0 00 00 00 83 fa 40 ba 00 00 00 00 48 0f 
> [4.474012] RIP: pci_find_pcie_root_port+0x62/0x80 RSP: a51ec0007ab8
> [4.481004] CR2: 0050
> [4.484001] ---[ end trace 6f9be6a057581199 ]---
> [4.488001] Kernel panic - not syncing: Fatal exception
> [4.494013] Rebooting in 10 seconds..
> [4.494013] ACPI MEMORY or I/O RESET_REG.
> 
>>
>> This local hack seems to fix the issue.
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 
>> af0cc3456dc1b48b1325c06c5edd2ca8cc22a640..cfd8eb5a3d0ba8347d44952ffab28d9c761044d3
>>  100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -522,7 +522,7 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev 
>> *dev)
>> bridge = pci_upstream_bridge(bridge);
>> }
>>  
>> -   if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT)
>> +   if (highest_pcie_bridge && pci_pcie_type(highest_pcie_bridge) != 
>> PCI_EXP_TYPE_ROOT_PORT)
>> return NULL;
>>  
>> return highest_pcie_bridge;
> 

It is very strange that I could not reproduce this problem on my server which 
is Xeon 2690v3,
but it is really a obviously issue when the dev could not find a upstream 
bridge in the
pci_find_pcie_root_port(), so the better way is just like your did in this 
patch. Thanks.

Regards
Tianhong

> 
> 
> .
> 



Re: [PATCH v11 0/5] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-15 Thread Eric Dumazet
On Tue, 2017-08-15 at 06:58 -0700, Eric Dumazet wrote:
> On Mon, 2017-08-14 at 22:15 -0700, David Miller wrote:
> > From: Ding Tianhong 
> > Date: Tue, 15 Aug 2017 11:23:22 +0800
> > 
> > > Some devices have problems with Transaction Layer Packets with the Relaxed
> > > Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
> > > PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
> > > devices with Relaxed Ordering issues, and a use of this new flag by the
> > > cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
> > > Ports.
> >  ...
> > 
> > Series applied, thanks.
> 
> I got a NULL deref in pci_find_pcie_root_port()
> 

This was :

[4.241029] BUG: unable to handle kernel NULL pointer dereference at 
0050
[4.247001] IP: pci_find_pcie_root_port+0x62/0x80
[4.253011] PGD 0 
[4.253011] P4D 0 
[4.253011] 
[4.258013] Oops:  [#1] SMP DEBUG_PAGEALLOC
[4.262015] Modules linked in:
[4.265005] CPU: 31 PID: 1 Comm: swapper/0 Not tainted 4.13.0-dbx-DEV #316
[4.271002] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.40.0 06/22/2016
[4.279002] task: a2ee38cfa040 task.stack: a51ec0004000
[4.285001] RIP: 0010:pci_find_pcie_root_port+0x62/0x80
[4.290012] RSP: :a51ec0007ab8 EFLAGS: 00010246
[4.295003] RAX:  RBX: a2ee36bae000 RCX: 0006
[4.303002] RDX: 081c RSI: a2ee38cfa8c8 RDI: a2ee36bae000
[4.310013] RBP: a51ec0007b58 R08: 0001 R09: 
[4.317001] R10:  R11:  R12: a51ec0007ad0
[4.324005] R13: a2ee36bae098 R14: 0002 R15: a2ee37204818
[4.331002] FS:  () GS:a2ee3fcc() 
knlGS:
[4.339002] CS:  0010 DS:  ES:  CR0: 80050033
[4.345001] CR2: 0050 CR3: 00401000f000 CR4: 001406e0
[4.351002] Call Trace:
[4.354012]  ? pci_configure_device+0x19f/0x570
[4.359002]  ? pci_conf1_read+0xb8/0xf0
[4.363002]  ? raw_pci_read+0x23/0x40
[4.366011]  ? pci_read+0x2c/0x30
[4.370014]  ? pci_read_config_word+0x67/0x70
[4.374012]  pci_device_add+0x28/0x230
[4.378012]  ? pci_vpd_f0_read+0x50/0x80
[4.382014]  pci_scan_single_device+0x96/0xc0
[4.386012]  pci_scan_slot+0x79/0xf0
[4.389001]  pci_scan_child_bus+0x31/0x180
[4.394014]  acpi_pci_root_create+0x1c6/0x240
[4.398013]  pci_acpi_scan_root+0x15f/0x1b0
[4.402012]  acpi_pci_root_add+0x2e6/0x400
[4.406012]  ? acpi_evaluate_integer+0x37/0x60
[4.411002]  acpi_bus_attach+0xdf/0x200
[4.415002]  acpi_bus_attach+0x6a/0x200
[4.418014]  acpi_bus_attach+0x6a/0x200
[4.422013]  acpi_bus_scan+0x38/0x70
[4.426011]  acpi_scan_init+0x10c/0x271
[4.429001]  acpi_init+0x2fa/0x348
[4.433004]  ? acpi_sleep_proc_init+0x2d/0x2d
[4.437001]  do_one_initcall+0x43/0x169
[4.441001]  kernel_init_freeable+0x1d0/0x258
[4.445003]  ? rest_init+0xe0/0xe0
[4.449001]  kernel_init+0xe/0x150
[4.451002]  ret_from_fork+0x27/0x40
[4.457004] Code: 85 d2 74 27 80 7a 4a 00 74 21 48 89 d0 48 89 c2 f6 80 1b 
09 00 00 10 74 07 48 8b 90 a0 0a 00 00 48 8b 52 10 48 83 7a 10 00 75 d0 <0f> b7 
50 50 5d 81 e2 f0 00 00 00 83 fa 40 ba 00 00 00 00 48 0f 
[4.474012] RIP: pci_find_pcie_root_port+0x62/0x80 RSP: a51ec0007ab8
[4.481004] CR2: 0050
[4.484001] ---[ end trace 6f9be6a057581199 ]---
[4.488001] Kernel panic - not syncing: Fatal exception
[4.494013] Rebooting in 10 seconds..
[4.494013] ACPI MEMORY or I/O RESET_REG.

> 
> This local hack seems to fix the issue.
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 
> af0cc3456dc1b48b1325c06c5edd2ca8cc22a640..cfd8eb5a3d0ba8347d44952ffab28d9c761044d3
>  100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -522,7 +522,7 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev 
> *dev)
> bridge = pci_upstream_bridge(bridge);
> }
>  
> -   if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT)
> +   if (highest_pcie_bridge && pci_pcie_type(highest_pcie_bridge) != 
> PCI_EXP_TYPE_ROOT_PORT)
> return NULL;
>  
> return highest_pcie_bridge;




Re: [PATCH v11 0/5] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-15 Thread Eric Dumazet
On Tue, 2017-08-15 at 06:58 -0700, Eric Dumazet wrote:
> On Mon, 2017-08-14 at 22:15 -0700, David Miller wrote:
> > From: Ding Tianhong 
> > Date: Tue, 15 Aug 2017 11:23:22 +0800
> > 
> > > Some devices have problems with Transaction Layer Packets with the Relaxed
> > > Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
> > > PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
> > > devices with Relaxed Ordering issues, and a use of this new flag by the
> > > cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
> > > Ports.
> >  ...
> > 
> > Series applied, thanks.
> 
> I got a NULL deref in pci_find_pcie_root_port()
> 

This was :

[4.241029] BUG: unable to handle kernel NULL pointer dereference at 
0050
[4.247001] IP: pci_find_pcie_root_port+0x62/0x80
[4.253011] PGD 0 
[4.253011] P4D 0 
[4.253011] 
[4.258013] Oops:  [#1] SMP DEBUG_PAGEALLOC
[4.262015] Modules linked in:
[4.265005] CPU: 31 PID: 1 Comm: swapper/0 Not tainted 4.13.0-dbx-DEV #316
[4.271002] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.40.0 06/22/2016
[4.279002] task: a2ee38cfa040 task.stack: a51ec0004000
[4.285001] RIP: 0010:pci_find_pcie_root_port+0x62/0x80
[4.290012] RSP: :a51ec0007ab8 EFLAGS: 00010246
[4.295003] RAX:  RBX: a2ee36bae000 RCX: 0006
[4.303002] RDX: 081c RSI: a2ee38cfa8c8 RDI: a2ee36bae000
[4.310013] RBP: a51ec0007b58 R08: 0001 R09: 
[4.317001] R10:  R11:  R12: a51ec0007ad0
[4.324005] R13: a2ee36bae098 R14: 0002 R15: a2ee37204818
[4.331002] FS:  () GS:a2ee3fcc() 
knlGS:
[4.339002] CS:  0010 DS:  ES:  CR0: 80050033
[4.345001] CR2: 0050 CR3: 00401000f000 CR4: 001406e0
[4.351002] Call Trace:
[4.354012]  ? pci_configure_device+0x19f/0x570
[4.359002]  ? pci_conf1_read+0xb8/0xf0
[4.363002]  ? raw_pci_read+0x23/0x40
[4.366011]  ? pci_read+0x2c/0x30
[4.370014]  ? pci_read_config_word+0x67/0x70
[4.374012]  pci_device_add+0x28/0x230
[4.378012]  ? pci_vpd_f0_read+0x50/0x80
[4.382014]  pci_scan_single_device+0x96/0xc0
[4.386012]  pci_scan_slot+0x79/0xf0
[4.389001]  pci_scan_child_bus+0x31/0x180
[4.394014]  acpi_pci_root_create+0x1c6/0x240
[4.398013]  pci_acpi_scan_root+0x15f/0x1b0
[4.402012]  acpi_pci_root_add+0x2e6/0x400
[4.406012]  ? acpi_evaluate_integer+0x37/0x60
[4.411002]  acpi_bus_attach+0xdf/0x200
[4.415002]  acpi_bus_attach+0x6a/0x200
[4.418014]  acpi_bus_attach+0x6a/0x200
[4.422013]  acpi_bus_scan+0x38/0x70
[4.426011]  acpi_scan_init+0x10c/0x271
[4.429001]  acpi_init+0x2fa/0x348
[4.433004]  ? acpi_sleep_proc_init+0x2d/0x2d
[4.437001]  do_one_initcall+0x43/0x169
[4.441001]  kernel_init_freeable+0x1d0/0x258
[4.445003]  ? rest_init+0xe0/0xe0
[4.449001]  kernel_init+0xe/0x150
[4.451002]  ret_from_fork+0x27/0x40
[4.457004] Code: 85 d2 74 27 80 7a 4a 00 74 21 48 89 d0 48 89 c2 f6 80 1b 
09 00 00 10 74 07 48 8b 90 a0 0a 00 00 48 8b 52 10 48 83 7a 10 00 75 d0 <0f> b7 
50 50 5d 81 e2 f0 00 00 00 83 fa 40 ba 00 00 00 00 48 0f 
[4.474012] RIP: pci_find_pcie_root_port+0x62/0x80 RSP: a51ec0007ab8
[4.481004] CR2: 0050
[4.484001] ---[ end trace 6f9be6a057581199 ]---
[4.488001] Kernel panic - not syncing: Fatal exception
[4.494013] Rebooting in 10 seconds..
[4.494013] ACPI MEMORY or I/O RESET_REG.

> 
> This local hack seems to fix the issue.
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 
> af0cc3456dc1b48b1325c06c5edd2ca8cc22a640..cfd8eb5a3d0ba8347d44952ffab28d9c761044d3
>  100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -522,7 +522,7 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev 
> *dev)
> bridge = pci_upstream_bridge(bridge);
> }
>  
> -   if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT)
> +   if (highest_pcie_bridge && pci_pcie_type(highest_pcie_bridge) != 
> PCI_EXP_TYPE_ROOT_PORT)
> return NULL;
>  
> return highest_pcie_bridge;




Re: [PATCH v11 0/5] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-15 Thread Eric Dumazet
On Mon, 2017-08-14 at 22:15 -0700, David Miller wrote:
> From: Ding Tianhong 
> Date: Tue, 15 Aug 2017 11:23:22 +0800
> 
> > Some devices have problems with Transaction Layer Packets with the Relaxed
> > Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
> > PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
> > devices with Relaxed Ordering issues, and a use of this new flag by the
> > cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
> > Ports.
>  ...
> 
> Series applied, thanks.

I got a NULL deref in pci_find_pcie_root_port()

Was it expected ?

This local hack seems to fix the issue.

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 
af0cc3456dc1b48b1325c06c5edd2ca8cc22a640..cfd8eb5a3d0ba8347d44952ffab28d9c761044d3
 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -522,7 +522,7 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev)
bridge = pci_upstream_bridge(bridge);
}
 
-   if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT)
+   if (highest_pcie_bridge && pci_pcie_type(highest_pcie_bridge) != 
PCI_EXP_TYPE_ROOT_PORT)
return NULL;
 
return highest_pcie_bridge;



Re: [PATCH v11 0/5] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-15 Thread Eric Dumazet
On Mon, 2017-08-14 at 22:15 -0700, David Miller wrote:
> From: Ding Tianhong 
> Date: Tue, 15 Aug 2017 11:23:22 +0800
> 
> > Some devices have problems with Transaction Layer Packets with the Relaxed
> > Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
> > PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
> > devices with Relaxed Ordering issues, and a use of this new flag by the
> > cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
> > Ports.
>  ...
> 
> Series applied, thanks.

I got a NULL deref in pci_find_pcie_root_port()

Was it expected ?

This local hack seems to fix the issue.

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 
af0cc3456dc1b48b1325c06c5edd2ca8cc22a640..cfd8eb5a3d0ba8347d44952ffab28d9c761044d3
 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -522,7 +522,7 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev)
bridge = pci_upstream_bridge(bridge);
}
 
-   if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT)
+   if (highest_pcie_bridge && pci_pcie_type(highest_pcie_bridge) != 
PCI_EXP_TYPE_ROOT_PORT)
return NULL;
 
return highest_pcie_bridge;



Re: [PATCH v11 0/5] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-14 Thread David Miller
From: Ding Tianhong 
Date: Tue, 15 Aug 2017 11:23:22 +0800

> Some devices have problems with Transaction Layer Packets with the Relaxed
> Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
> PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
> devices with Relaxed Ordering issues, and a use of this new flag by the
> cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
> Ports.
 ...

Series applied, thanks.


Re: [PATCH v11 0/5] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-14 Thread David Miller
From: Ding Tianhong 
Date: Tue, 15 Aug 2017 11:23:22 +0800

> Some devices have problems with Transaction Layer Packets with the Relaxed
> Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
> PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
> devices with Relaxed Ordering issues, and a use of this new flag by the
> cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
> Ports.
 ...

Series applied, thanks.


[PATCH v11 0/5] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-14 Thread Ding Tianhong
Some devices have problems with Transaction Layer Packets with the Relaxed
Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
devices with Relaxed Ordering issues, and a use of this new flag by the
cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
Ports.

It's been years since I've submitted kernel.org patches, I appolgise for the
almost certain submission errors.

v2: Alexander point out that the v1 was only a part of the whole solution,
some platform which has some issues could use the new flag to indicate
that it is not safe to enable relaxed ordering attribute, then we need
to clear the relaxed ordering enable bits in the PCI configuration when
initializing the device. So add a new second patch to modify the PCI
initialization code to clear the relaxed ordering enable bit in the
event that the root complex doesn't want relaxed ordering enabled.

The third patch was base on the v1's second patch and only be changed
to query the relaxed ordering enable bit in the PCI configuration space
to allow the Chelsio NIC to send TLPs with the relaxed ordering attributes
set.

This version didn't plan to drop the defines for Intel Drivers to use the
new checking way to enable relaxed ordering because it is not the hardest
part of the moment, we could fix it in next patchset when this patches
reach the goal.

v3: Redesigned the logic for pci_configure_relaxed_ordering when configuration,
If a PCIe device didn't enable the relaxed ordering attribute default,
we should not do anything in the PCIe configuration, otherwise we
should check if any of the devices above us do not support relaxed
ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on
the result if we get a return that indicate that the relaxed ordering
is not supported we should update our device to disable relaxed ordering
in configuration space. If the device above us doesn't exist or isn't
the PCIe device, we shouldn't do anything and skip updating relaxed ordering
because we are probably running in a guest.

v4: Rename the functions pcie_get_relaxed_ordering and 
pcie_disable_relaxed_ordering
according John's suggestion, and modify the description, use the true/false
as the return value.

We shouldn't enable relaxed ordering attribute by the setting in the root
complex configuration space for PCIe device, so fix it for cxgb4.

Fix some format issues.

v5: Removed the unnecessary code for some function which only return the bool
value, and add the check for VF device.

Make this patch set base on 4.12-rc5.

v6: Fix the logic error in the need to enable the relaxed ordering attribute 
for cxgb4.

v7: The cxgb4 drivers will enable the PCIe Capability Device Control[Relaxed
Ordering Enable] in PCI Probe() routine, this will break our current
solution for some platform which has problematic when enable the relaxed
ordering attribute. According to the latest recommendations, remove the
enable_pcie_relaxed_ordering(), although it could not cover the Peer-to-Peer
scene, but we agree to leave this problem until we really trigger it.

Make this patch set base on 4.12 release version.

v8: Change the second patch title and description to make it more reasonable,
add the acked-by from Alex and Ashok.

Add a new patch to enable the Relaxed Ordering Attribute for cxgb4vf driver.

Make this patch set base on 4.13-rc2.

v9: The document (https://software.intel.com/sites/default/files/managed/9e/
bc/64-ia-32-architectures-optimization-manual.pdf) indicate that the Xeon
processors based on Broadwell/Haswell microarchitecture has the problem
with Relaxed Ordering Attribute enabled, so add the whole list Device ID
from Intel to the patch.

v10: Significant rework based on Bjorn's feedback, reorganize the first 2 
patches,
 now the Intel and AMD erratum soc has been divided to the different 
patches,
 rename the pcie_relaxed_ordering_supported() to 
pcie_relaxed_ordering_enabled(),
 and no need to check every intervening switch except the root ports, update
 some commits.

v11: We shouldn't let the Intel engineer to acked the AMD's erratum patch, fix 
the
 funny mistake.

Casey Leedom (2):
  net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
  net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

Ding Tianhong (3):
  PCI: Disable PCIe Relaxed Ordering if unsupported
  PCI: Disable Relaxed Ordering for some Intel processors
  PCI: Disable Relaxed Ordering Attributes for AMD A1100

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c| 23 --
 drivers/net/ethernet/chelsio/cxgb4/sge.c   |  5 +-
 drivers/net/ethernet/chelsio/cxgb4vf/adapter.h |  1 +
 

[PATCH v11 0/5] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

2017-08-14 Thread Ding Tianhong
Some devices have problems with Transaction Layer Packets with the Relaxed
Ordering Attribute set.  This patch set adds a new PCIe Device Flag,
PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
devices with Relaxed Ordering issues, and a use of this new flag by the
cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
Ports.

It's been years since I've submitted kernel.org patches, I appolgise for the
almost certain submission errors.

v2: Alexander point out that the v1 was only a part of the whole solution,
some platform which has some issues could use the new flag to indicate
that it is not safe to enable relaxed ordering attribute, then we need
to clear the relaxed ordering enable bits in the PCI configuration when
initializing the device. So add a new second patch to modify the PCI
initialization code to clear the relaxed ordering enable bit in the
event that the root complex doesn't want relaxed ordering enabled.

The third patch was base on the v1's second patch and only be changed
to query the relaxed ordering enable bit in the PCI configuration space
to allow the Chelsio NIC to send TLPs with the relaxed ordering attributes
set.

This version didn't plan to drop the defines for Intel Drivers to use the
new checking way to enable relaxed ordering because it is not the hardest
part of the moment, we could fix it in next patchset when this patches
reach the goal.

v3: Redesigned the logic for pci_configure_relaxed_ordering when configuration,
If a PCIe device didn't enable the relaxed ordering attribute default,
we should not do anything in the PCIe configuration, otherwise we
should check if any of the devices above us do not support relaxed
ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on
the result if we get a return that indicate that the relaxed ordering
is not supported we should update our device to disable relaxed ordering
in configuration space. If the device above us doesn't exist or isn't
the PCIe device, we shouldn't do anything and skip updating relaxed ordering
because we are probably running in a guest.

v4: Rename the functions pcie_get_relaxed_ordering and 
pcie_disable_relaxed_ordering
according John's suggestion, and modify the description, use the true/false
as the return value.

We shouldn't enable relaxed ordering attribute by the setting in the root
complex configuration space for PCIe device, so fix it for cxgb4.

Fix some format issues.

v5: Removed the unnecessary code for some function which only return the bool
value, and add the check for VF device.

Make this patch set base on 4.12-rc5.

v6: Fix the logic error in the need to enable the relaxed ordering attribute 
for cxgb4.

v7: The cxgb4 drivers will enable the PCIe Capability Device Control[Relaxed
Ordering Enable] in PCI Probe() routine, this will break our current
solution for some platform which has problematic when enable the relaxed
ordering attribute. According to the latest recommendations, remove the
enable_pcie_relaxed_ordering(), although it could not cover the Peer-to-Peer
scene, but we agree to leave this problem until we really trigger it.

Make this patch set base on 4.12 release version.

v8: Change the second patch title and description to make it more reasonable,
add the acked-by from Alex and Ashok.

Add a new patch to enable the Relaxed Ordering Attribute for cxgb4vf driver.

Make this patch set base on 4.13-rc2.

v9: The document (https://software.intel.com/sites/default/files/managed/9e/
bc/64-ia-32-architectures-optimization-manual.pdf) indicate that the Xeon
processors based on Broadwell/Haswell microarchitecture has the problem
with Relaxed Ordering Attribute enabled, so add the whole list Device ID
from Intel to the patch.

v10: Significant rework based on Bjorn's feedback, reorganize the first 2 
patches,
 now the Intel and AMD erratum soc has been divided to the different 
patches,
 rename the pcie_relaxed_ordering_supported() to 
pcie_relaxed_ordering_enabled(),
 and no need to check every intervening switch except the root ports, update
 some commits.

v11: We shouldn't let the Intel engineer to acked the AMD's erratum patch, fix 
the
 funny mistake.

Casey Leedom (2):
  net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
  net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

Ding Tianhong (3):
  PCI: Disable PCIe Relaxed Ordering if unsupported
  PCI: Disable Relaxed Ordering for some Intel processors
  PCI: Disable Relaxed Ordering Attributes for AMD A1100

 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h |  1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c| 23 --
 drivers/net/ethernet/chelsio/cxgb4/sge.c   |  5 +-
 drivers/net/ethernet/chelsio/cxgb4vf/adapter.h |  1 +