[PATCH kernel v3] PCI: Enable access to custom VPD for Chelsio devices (cxgb3)

2016-10-24 Thread Alexey Kardashevskiy
There is at least one Chelsio 10Gb card which uses VPD area to store
some custom blocks (example below). However pci_vpd_size() returns
the length of the first block only assuming that there can be only
one VPD "End Tag" and VFIO blocks access beyond that offset
(since 4e1a63555) which leads to the situation when the guest "cxgb3"
driver fails to probe the device. The host system does not have this
problem as the drives accesses the config space directly without
pci_read_vpd()/...

This adds a quirk to override the VPD size to a bigger value.
The maximum size is taken from EEPROMSIZE in
drivers/net/ethernet/chelsio/cxgb3/common.h. We do not read the tag
as the cxgb3 driver does as the driver supports writing to EEPROM/VPD
and when it writes, it only checks for 8192 bytes boundary. The quirk
is registerted for all devices supported by the cxgb3 driver.

This adds a quirk to the PCI layer (not to the cxgb3 driver) as
the cxgb3 driver itself accesses VPD directly and the problem only exists
with the vfio-pci driver (when cxgb3 is not running on the host and
may not be even loaded) which blocks accesses beyond the first block
of VPD data. However vfio-pci itself does not have quirks mechanism so
we add it to PCI.

This is the controller:
Ethernet controller [0200]: Chelsio Communications Inc T310 10GbE Single Port 
Adapter [1425:0030]

This is what I parsed from its vpd:
===
b'\x82*\x0010 Gigabit Ethernet-SR PCI Express Adapter\x90J\x00EC\x07D76809 
FN\x0746K'
  Large item 42 bytes; name 0x2 Identifier String
b'10 Gigabit Ethernet-SR PCI Express Adapter'
 002d Large item 74 bytes; name 0x10
#00 [EC] len=7: b'D76809 '
#0a [FN] len=7: b'46K7897'
#14 [PN] len=7: b'46K7897'
#1e [MN] len=4: b'1037'
#25 [FC] len=4: b'5769'
#2c [SN] len=12: b'YL102035603V'
#3b [NA] len=12: b'00145E992ED1'
 007a Small item 1 bytes; name 0xf End Tag

 0c00 Large item 16 bytes; name 0x2 Identifier String
b'S310E-SR-X  '
 0c13 Large item 234 bytes; name 0x10
#00 [PN] len=16: b'TBD '
#13 [EC] len=16: b'110107730D2 '
#26 [SN] len=16: b'97YL102035603V  '
#39 [NA] len=12: b'00145E992ED1'
#48 [V0] len=6: b'175000'
#51 [V1] len=6: b'26'
#5a [V2] len=6: b'26'
#63 [V3] len=6: b'2000  '
#6c [V4] len=2: b'1 '
#71 [V5] len=6: b'c2'
#7a [V6] len=6: b'0 '
#83 [V7] len=2: b'1 '
#88 [V8] len=2: b'0 '
#8d [V9] len=2: b'0 '
#92 [VA] len=2: b'0 '
#97 [RV] len=80: 
b's\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'...
 0d00 Large item 252 bytes; name 0x11
#00 [VC] len=16: b'122310_1222 dp  '
#13 [VD] len=16: b'610-0001-00 H1\x00\x00'
#26 [VE] len=16: b'122310_1353 fp  '
#39 [VF] len=16: b'610-0001-00 H1\x00\x00'
#4c [RW] len=173: 
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'...
 0dff Small item 0 bytes; name 0xf End Tag

10f3 Large item 13315 bytes; name 0x62
!!! unknown item name 98: 
b'\xd0\x03\x00@`\x0c\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00'
===

Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru>
---
Changes:
v3:
* unconditionally set VPD size to 8192

v2:
* used pci_set_vpd_size() helper
* added explicit list of IDs from cxgb3 driver
* added a note in the commit log why the quirk is not in cxgb3
---
 drivers/pci/quirks.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index c232729..bc7c541 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3255,6 +3255,25 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 
PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_PORT_RIDGE,
quirk_thunderbolt_hotplug_msi);
 
+static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
+{
+   pci_set_vpd_size(dev, 8192);
+}
+
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x20, quirk_chelsio_extend_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x21, quirk_chelsio_extend_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x22, quirk_chelsio_extend_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x23, quirk_chelsio_extend_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x24, quirk_chelsio_extend_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x25, quirk_chelsio_extend_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x26, quirk_chelsio_extend_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x30, quirk_chelsio_extend_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x31, quirk_chelsio_extend_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x32, quirk_chelsio_extend_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x35, quirk_chelsio_extend_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x36, quirk_chelsio_extend_vpd)

Re: [PATCH kernel v2] PCI: Enable access to custom VPD for Chelsio devices (cxgb3)

2016-10-10 Thread Alexey Kardashevskiy
On 11/10/16 02:23, Alexander Duyck wrote:
> On Wed, Sep 28, 2016 at 10:21 PM, Alexey Kardashevskiy <a...@ozlabs.ru> wrote:
>> There is at least one Chelsio 10Gb card which uses VPD area to store
>> some custom blocks (example below). However pci_vpd_size() returns
>> the length of the first block only assuming that there can be only
>> one VPD "End Tag" and VFIO blocks access beyond that offset
>> (since 4e1a63555) which leads to the situation when the guest "cxgb3"
>> driver fails to probe the device. The host system does not have this
>> problem as the drives accesses the config space directly without
>> pci_read_vpd()/...
>>
>> This adds a quirk to override the VPD size to a bigger value.
>> The maximum size is taken from EEPROMSIZE in
>> drivers/net/ethernet/chelsio/cxgb3/common.h. We do not read the tag
>> as the cxgb3 driver does as the driver supports writing to EEPROM/VPD
>> and when it writes, it only checks for 8192 bytes boundary. The quirk
>> is registerted for all devices supported by the cxgb3 driver.
>>
>> This adds a quirk to the PCI layer (not to the cxgb3 driver) as
>> the cxgb3 driver itself accesses VPD directly and the problem only exists
>> with the vfio-pci driver (when cxgb3 is not running on the host and
>> may not be even loaded) which blocks accesses beyond the first block
>> of VPD data. However vfio-pci itself does not have quirks mechanism so
>> we add it to PCI.
>>
>> Tested on:
>> Ethernet controller [0200]: Chelsio Communications Inc T310 10GbE Single 
>> Port Adapter [1425:0030]
>>
>> This is its VPD:
>>  Large item 42 bytes; name 0x2 Identifier String
>> b'10 Gigabit Ethernet-SR PCI Express Adapter'
>> #00 [EC] len=7: b'D76809 '
>> #0a [FN] len=7: b'46K7897'
>> #14 [PN] len=7: b'46K7897'
>> #1e [MN] len=4: b'1037'
>> #25 [FC] len=4: b'5769'
>> #2c [SN] len=12: b'YL102035603V'
>> #3b [NA] len=12: b'00145E992ED1'
>>
>> 0c00 Large item 16 bytes; name 0x2 Identifier String
>> b'S310E-SR-X  '
>> 0c13 Large item 234 bytes; name 0x10
>> #00 [PN] len=16: b'TBD '
>> #13 [EC] len=16: b'110107730D2 '
>> #26 [SN] len=16: b'97YL102035603V  '
>> #39 [NA] len=12: b'00145E992ED1'
>> #48 [V0] len=6: b'175000'
>> #51 [V1] len=6: b'26'
>> #5a [V2] len=6: b'26'
>> #63 [V3] len=6: b'2000  '
>> #6c [V4] len=2: b'1 '
>> #71 [V5] len=6: b'c2'
>> #7a [V6] len=6: b'0 '
>> #83 [V7] len=2: b'1 '
>> #88 [V8] len=2: b'0 '
>> #8d [V9] len=2: b'0 '
>> #92 [VA] len=2: b'0 '
>> #97 [RV] len=80: 
>> b's\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'...
>> 0d00 Large item 252 bytes; name 0x11
>>     #00 [VC] len=16: b'122310_1222 dp  '
>> #13 [VD] len=16: b'610-0001-00 H1\x00\x00'
>> #26 [VE] len=16: b'122310_1353 fp  '
>> #39 [VF] len=16: b'610-0001-00 H1\x00\x00'
>> #4c [RW] len=173: 
>> b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'...
>> 0dff Small item 0 bytes; name 0xf End Tag
>>
>> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru>
>> ---
>> Changes:
>> v2:
>> * used pci_set_vpd_size() helper
>> * added explicit list of IDs from cxgb3 driver
>> * added a note in the commit log why the quirk is not in cxgb3
>> ---
>>  drivers/pci/quirks.c | 22 ++
>>  1 file changed, 22 insertions(+)
>>
>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> index 44e0ff3..b22fce5 100644
>> --- a/drivers/pci/quirks.c
>> +++ b/drivers/pci/quirks.c
>> @@ -3243,6 +3243,28 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 
>> PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C
>>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_PORT_RIDGE,
>> quirk_thunderbolt_hotplug_msi);
>>
>> +static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
>> +{
>> +   if (!dev->vpd)
>> +   return;
>> +
>> +   pci_set_vpd_size(dev, max_t(unsigned int, dev->vpd->len, 8192));
> 
> What is the point of the max_t?  From what I can tell you aren't
> writing 8192, you will always end up writing 32K since that is the
> starting value for dev->vpd->len assuming there have yet to be any
> reads.

At this stage dev->vpd->len is always 32k? I thought here VP

Re: [PATCH kernel v2] PCI: Enable access to custom VPD for Chelsio devices (cxgb3)

2016-10-09 Thread Alexey Kardashevskiy
Anyone, ping?

On 29/09/16 15:21, Alexey Kardashevskiy wrote:
> There is at least one Chelsio 10Gb card which uses VPD area to store
> some custom blocks (example below). However pci_vpd_size() returns
> the length of the first block only assuming that there can be only
> one VPD "End Tag" and VFIO blocks access beyond that offset
> (since 4e1a63555) which leads to the situation when the guest "cxgb3"
> driver fails to probe the device. The host system does not have this
> problem as the drives accesses the config space directly without
> pci_read_vpd()/...
> 
> This adds a quirk to override the VPD size to a bigger value.
> The maximum size is taken from EEPROMSIZE in
> drivers/net/ethernet/chelsio/cxgb3/common.h. We do not read the tag
> as the cxgb3 driver does as the driver supports writing to EEPROM/VPD
> and when it writes, it only checks for 8192 bytes boundary. The quirk
> is registerted for all devices supported by the cxgb3 driver.
> 
> This adds a quirk to the PCI layer (not to the cxgb3 driver) as
> the cxgb3 driver itself accesses VPD directly and the problem only exists
> with the vfio-pci driver (when cxgb3 is not running on the host and
> may not be even loaded) which blocks accesses beyond the first block
> of VPD data. However vfio-pci itself does not have quirks mechanism so
> we add it to PCI.
> 
> Tested on:
> Ethernet controller [0200]: Chelsio Communications Inc T310 10GbE Single Port 
> Adapter [1425:0030]
> 
> This is its VPD:
>  Large item 42 bytes; name 0x2 Identifier String
>   b'10 Gigabit Ethernet-SR PCI Express Adapter'
>   #00 [EC] len=7: b'D76809 '
>   #0a [FN] len=7: b'46K7897'
>   #14 [PN] len=7: b'46K7897'
>   #1e [MN] len=4: b'1037'
>   #25 [FC] len=4: b'5769'
>   #2c [SN] len=12: b'YL102035603V'
>   #3b [NA] len=12: b'00145E992ED1'
> 
> 0c00 Large item 16 bytes; name 0x2 Identifier String
> b'S310E-SR-X  '
> 0c13 Large item 234 bytes; name 0x10
> #00 [PN] len=16: b'TBD '
> #13 [EC] len=16: b'110107730D2 '
> #26 [SN] len=16: b'97YL102035603V  '
> #39 [NA] len=12: b'00145E992ED1'
> #48 [V0] len=6: b'175000'
> #51 [V1] len=6: b'26'
> #5a [V2] len=6: b'26'
> #63 [V3] len=6: b'2000  '
> #6c [V4] len=2: b'1 '
> #71 [V5] len=6: b'c2'
> #7a [V6] len=6: b'0 '
> #83 [V7] len=2: b'1 '
> #88 [V8] len=2: b'0 '
> #8d [V9] len=2: b'0 '
> #92 [VA] len=2: b'0 '
> #97 [RV] len=80: 
> b's\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'...
> 0d00 Large item 252 bytes; name 0x11
> #00 [VC] len=16: b'122310_1222 dp  '
> #13 [VD] len=16: b'610-0001-00 H1\x00\x00'
> #26 [VE] len=16: b'122310_1353 fp  '
> #39 [VF] len=16: b'610-0001-00 H1\x00\x00'
> #4c [RW] len=173: 
> b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'...
> 0dff Small item 0 bytes; name 0xf End Tag
> 
> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru>
> ---
> Changes:
> v2:
> * used pci_set_vpd_size() helper
> * added explicit list of IDs from cxgb3 driver
> * added a note in the commit log why the quirk is not in cxgb3
> ---
>  drivers/pci/quirks.c | 22 ++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 44e0ff3..b22fce5 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3243,6 +3243,28 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 
> PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_PORT_RIDGE,
>   quirk_thunderbolt_hotplug_msi);
>  
> +static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
> +{
> + if (!dev->vpd)
> + return;
> +
> + pci_set_vpd_size(dev, max_t(unsigned int, dev->vpd->len, 8192));
> +}
> +
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x20, 
> quirk_chelsio_extend_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x21, 
> quirk_chelsio_extend_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x22, 
> quirk_chelsio_extend_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x23, 
> quirk_chelsio_extend_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x24, 
> quirk_chelsio_extend_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x25, 
> quirk_chelsio_extend_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x26, 
> quirk_chelsio_extend_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x30, 
> quirk_chelsio_extend_vpd);
&g

[PATCH kernel v2] PCI: Enable access to custom VPD for Chelsio devices (cxgb3)

2016-09-28 Thread Alexey Kardashevskiy
There is at least one Chelsio 10Gb card which uses VPD area to store
some custom blocks (example below). However pci_vpd_size() returns
the length of the first block only assuming that there can be only
one VPD "End Tag" and VFIO blocks access beyond that offset
(since 4e1a63555) which leads to the situation when the guest "cxgb3"
driver fails to probe the device. The host system does not have this
problem as the drives accesses the config space directly without
pci_read_vpd()/...

This adds a quirk to override the VPD size to a bigger value.
The maximum size is taken from EEPROMSIZE in
drivers/net/ethernet/chelsio/cxgb3/common.h. We do not read the tag
as the cxgb3 driver does as the driver supports writing to EEPROM/VPD
and when it writes, it only checks for 8192 bytes boundary. The quirk
is registerted for all devices supported by the cxgb3 driver.

This adds a quirk to the PCI layer (not to the cxgb3 driver) as
the cxgb3 driver itself accesses VPD directly and the problem only exists
with the vfio-pci driver (when cxgb3 is not running on the host and
may not be even loaded) which blocks accesses beyond the first block
of VPD data. However vfio-pci itself does not have quirks mechanism so
we add it to PCI.

Tested on:
Ethernet controller [0200]: Chelsio Communications Inc T310 10GbE Single Port 
Adapter [1425:0030]

This is its VPD:
 Large item 42 bytes; name 0x2 Identifier String
b'10 Gigabit Ethernet-SR PCI Express Adapter'
#00 [EC] len=7: b'D76809 '
#0a [FN] len=7: b'46K7897'
#14 [PN] len=7: b'46K7897'
#1e [MN] len=4: b'1037'
#25 [FC] len=4: b'5769'
#2c [SN] len=12: b'YL102035603V'
#3b [NA] len=12: b'00145E992ED1'

0c00 Large item 16 bytes; name 0x2 Identifier String
b'S310E-SR-X  '
0c13 Large item 234 bytes; name 0x10
#00 [PN] len=16: b'TBD '
#13 [EC] len=16: b'110107730D2 '
#26 [SN] len=16: b'97YL102035603V  '
#39 [NA] len=12: b'00145E992ED1'
#48 [V0] len=6: b'175000'
#51 [V1] len=6: b'26'
#5a [V2] len=6: b'26'
#63 [V3] len=6: b'2000  '
#6c [V4] len=2: b'1 '
#71 [V5] len=6: b'c2'
#7a [V6] len=6: b'0 '
#83 [V7] len=2: b'1 '
#88 [V8] len=2: b'0 '
#8d [V9] len=2: b'0 '
#92 [VA] len=2: b'0 '
#97 [RV] len=80: 
b's\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'...
0d00 Large item 252 bytes; name 0x11
#00 [VC] len=16: b'122310_1222 dp  '
#13 [VD] len=16: b'610-0001-00 H1\x00\x00'
#26 [VE] len=16: b'122310_1353 fp  '
#39 [VF] len=16: b'610-0001-00 H1\x00\x00'
#4c [RW] len=173: 
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'...
0dff Small item 0 bytes; name 0xf End Tag

Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru>
---
Changes:
v2:
* used pci_set_vpd_size() helper
* added explicit list of IDs from cxgb3 driver
* added a note in the commit log why the quirk is not in cxgb3
---
 drivers/pci/quirks.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 44e0ff3..b22fce5 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3243,6 +3243,28 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 
PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_PORT_RIDGE,
quirk_thunderbolt_hotplug_msi);
 
+static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
+{
+   if (!dev->vpd)
+   return;
+
+   pci_set_vpd_size(dev, max_t(unsigned int, dev->vpd->len, 8192));
+}
+
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x20, quirk_chelsio_extend_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x21, quirk_chelsio_extend_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x22, quirk_chelsio_extend_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x23, quirk_chelsio_extend_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x24, quirk_chelsio_extend_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x25, quirk_chelsio_extend_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x26, quirk_chelsio_extend_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x30, quirk_chelsio_extend_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x31, quirk_chelsio_extend_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x32, quirk_chelsio_extend_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x35, quirk_chelsio_extend_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x36, quirk_chelsio_extend_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, 0x37, quirk_chelsio_extend_vpd);
+
 #ifdef CONFIG_ACPI
 /*
  * Apple: Shutdown Cactus Ridge Thunderbolt controller.
-- 
2.5.0.rc3



Re: [RFC PATCH kernel] PCI: Enable access to custom VPD for Chelsio devices (cxgb3)

2016-09-21 Thread Alexey Kardashevskiy
On 07/09/16 04:30, Alexander Duyck wrote:
> On Tue, Sep 6, 2016 at 8:48 AM, Bjorn Helgaas <helg...@kernel.org> wrote:
>> Hi Alexey,
>>
>> On Thu, Aug 11, 2016 at 08:03:29PM +1000, Alexey Kardashevskiy wrote:
>>> There is at least one Chelsio 10Gb card which uses VPD area to store
>>> some custom blocks (example below). However pci_vpd_size() returns
>>> the length of the first block only assuming that there can be only
>>> one VPD "End Tag" and VFIO blocks access beyond that offset
>>> (since 4e1a63555) which leads to the situation when the guest "cxgb3"
>>> driver fails to probe the device. The host system does not have this
>>> problem as the drives accesses the config space directly without
>>> pci_read_vpd()/...
>>>
>>> This adds a quirk to override the VPD size to a bigger value.
>>>
>>> This is the controller:
>>> Ethernet controller [0200]: Chelsio Communications Inc T310 10GbE Single 
>>> Port Adapter [1425:0030]
>>>
>>> This is its VPD:
>>> # Large item 42 bytes; name 0x2 Identifier String
>>>   b'10 Gigabit Ethernet-SR PCI Express Adapter'
>>> #002d Large item 74 bytes; name 0x10
>>>   #00 [EC] len=7: b'D76809 '
>>>   #0a [FN] len=7: b'46K7897'
>>>   #14 [PN] len=7: b'46K7897'
>>>   #1e [MN] len=4: b'1037'
>>>   #25 [FC] len=4: b'5769'
>>>   #2c [SN] len=12: b'YL102035603V'
>>>   #3b [NA] len=12: b'00145E992ED1'
>>> #007a Small item 1 bytes; name 0xf End Tag
>>> ---
>>> #0c00 Large item 16 bytes; name 0x2 Identifier String
>>>   b'S310E-SR-X  '
>>> #0c13 Large item 234 bytes; name 0x10
>>>   #00 [PN] len=16: b'TBD '
>>>   #13 [EC] len=16: b'110107730D2 '
>>>   #26 [SN] len=16: b'97YL102035603V  '
>>>   #39 [NA] len=12: b'00145E992ED1'
>>>   #48 [V0] len=6: b'175000'
>>>   #51 [V1] len=6: b'26'
>>>   #5a [V2] len=6: b'26'
>>>   #63 [V3] len=6: b'2000  '
>>>   #6c [V4] len=2: b'1 '
>>>   #71 [V5] len=6: b'c2'
>>>   #7a [V6] len=6: b'0 '
>>>   #83 [V7] len=2: b'1 '
>>>   #88 [V8] len=2: b'0 '
>>>   #8d [V9] len=2: b'0 '
>>>   #92 [VA] len=2: b'0 '
>>>   #97 [RV] len=80: 
>>> b's\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'...
>>> #0d00 Large item 252 bytes; name 0x11
>>>   #00 [VC] len=16: b'122310_1222 dp  '
>>>   #13 [VD] len=16: b'610-0001-00 H1\x00\x00'
>>>   #26 [VE] len=16: b'122310_1353 fp  '
>>>   #39 [VF] len=16: b'610-0001-00 H1\x00\x00'
>>>   #4c [RW] len=173: 
>>> b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'...
>>> #0dff Small item 0 bytes; name 0xf End Tag
>>>
>>> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru>
>>> ---
>>>  drivers/pci/quirks.c | 12 
>>>  1 file changed, 12 insertions(+)
>>>
>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>> index ee72ebe1..94d3fb5 100644
>>> --- a/drivers/pci/quirks.c
>>> +++ b/drivers/pci/quirks.c
>>> @@ -3241,6 +3241,18 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 
>>> PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C
>>>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 
>>> PCI_DEVICE_ID_INTEL_PORT_RIDGE,
>>>   quirk_thunderbolt_hotplug_msi);
>>>
>>> +static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
>>> +{
>>> + if (!dev->vpd || !dev->vpd->ops || !dev->vpd->ops->set_size)
>>> + return;
>>> +
>>> + dev->vpd->ops->set_size(dev, max_t(unsigned int, dev->vpd->len, 
>>> 0xe00));
>>> +}
>>> +
> 
> So one thing you might want to look at doing is actually validating
> there is something there before increasing the size.  If you look at
> the get_vpd_params function from the cxgb3 driver you will see what
> they do is verify the first tag located at 0xC00 is 0x82 before they
> do any further reads.  You might do something similar just to verify
> there is something there before you open it up to access by anyone.
> 
> One option would be to modify pci_vpd_size so that you can use it
> outside of access.c and can pass it an offset.  Then you could update
> your quirk so that you call pci_vpd_size and pass it the offset of
> 0xC00.  It should then be able to

Re: [PATCHv2 3/4] pci: Determine actual VPD size on first access

2016-08-15 Thread Alexey Kardashevskiy
On 12/08/16 04:52, Alexander Duyck wrote:
> On Wed, Aug 10, 2016 at 4:54 PM, Benjamin Herrenschmidt
>  wrote:
>> On Wed, 2016-08-10 at 08:47 -0700, Alexander Duyck wrote:
>>>
>>> The problem is if we don't do this it becomes possible for a guest to
>>> essentially cripple a device on the host by just accessing VPD
>>> regions that aren't actually viable on many devices.
>>
>> And ? We already can cripple the device in so many different ways
>> simpy because we have pretty much full BAR access to it...
>>
>>>  We are much better off
>>> in terms of security and stability if we restrict access to what
>>> should be accessible.
>>
>> Bollox. I've heard that argument over and over again, it never stood
>> and still doesn't.
>>
>> We have full BAR access for god sake. We can already destroy the device
>> in many cases (think: reflashing microcode, internal debug bus access
>> with a route to the config space, voltage/freq control ).
>>
>> We aren't protecting anything more here, we are just adding layers of
>> bloat, complication and bugs.
> 
> To some extent I can agree with you.  I don't know if we should be
> restricting the VFIO based interface the same way we restrict systemd
> from accessing this region.  In the case of VFIO maybe we need to look
> at a different approach for accessing this.  Perhaps we need a
> privileged version of the VPD accessors that could be used by things
> like VFIO and the cxgb3 driver since they are assumed to be a bit
> smarter than those interfaces that were just trying to slurp up
> something like 4K of VPD data.
> 
>>>  In this case what has happened is that the
>>> vendor threw in an extra out-of-spec block and just expected it to
>>> work.
>>
>> Like vendors do all the time in all sort of places
>>
>> I still completely fail to see the point in acting as a filtering
>> middle man.
> 
> The problem is we are having to do some filtering because things like
> systemd were using dumb accessors that were trying to suck down 4K of
> VPD data instead of trying to parse through and read it a field at a
> time.
> 
>>> In order to work around it we just need to add a small function
>>> to drivers/pci/quirks.c that would update the VPD size reported so
>>> that it matches what the hardware is actually providing instead of
>>> what we can determine based on the VPD layout.
>>>
>>> Really working around something like this is not much different than
>>> what we would have to do if the vendor had stuffed the data in some
>>> reserved section of their PCI configuration space.
>>
>> It is, in both cases we shouldn't have VFIO or the host involved. We
>> should just let the guest config space accesses go through.
>>
>>>   We end up needing
>>> to add special quirks any time a vendor goes out-of-spec for some
>>> one-off configuration interface that only they are ever going to use.
>>
>> Cheers,
>> Ben.
> 
> If you have a suggestion on how to resolve this patches are always
> welcome.  Otherwise I think the simpler approach to fixing this
> without re-introducing the existing bugs is to just add the quirk.  I
> will try to get to it sometime this weekend if nobody else does.  It
> should be pretty straight foward, but I just don't have the time to
> pull up a kernel and generate a patch right now.


How exactly is mine - https://lkml.org/lkml/2016/8/11/200 - bad (except
missing rb/ab from Chelsio folks)? Thanks.


-- 
Alexey


[RFC PATCH kernel] PCI: Enable access to custom VPD for Chelsio devices (cxgb3)

2016-08-11 Thread Alexey Kardashevskiy
There is at least one Chelsio 10Gb card which uses VPD area to store
some custom blocks (example below). However pci_vpd_size() returns
the length of the first block only assuming that there can be only
one VPD "End Tag" and VFIO blocks access beyond that offset
(since 4e1a63555) which leads to the situation when the guest "cxgb3"
driver fails to probe the device. The host system does not have this
problem as the drives accesses the config space directly without
pci_read_vpd()/...

This adds a quirk to override the VPD size to a bigger value.

This is the controller:
Ethernet controller [0200]: Chelsio Communications Inc T310 10GbE Single Port 
Adapter [1425:0030]

This is its VPD:
# Large item 42 bytes; name 0x2 Identifier String
b'10 Gigabit Ethernet-SR PCI Express Adapter'
#002d Large item 74 bytes; name 0x10
#00 [EC] len=7: b'D76809 '
#0a [FN] len=7: b'46K7897'
#14 [PN] len=7: b'46K7897'
#1e [MN] len=4: b'1037'
#25 [FC] len=4: b'5769'
#2c [SN] len=12: b'YL102035603V'
#3b [NA] len=12: b'00145E992ED1'
#007a Small item 1 bytes; name 0xf End Tag
---
#0c00 Large item 16 bytes; name 0x2 Identifier String
b'S310E-SR-X  '
#0c13 Large item 234 bytes; name 0x10
#00 [PN] len=16: b'TBD '
#13 [EC] len=16: b'110107730D2 '
#26 [SN] len=16: b'97YL102035603V  '
#39 [NA] len=12: b'00145E992ED1'
#48 [V0] len=6: b'175000'
#51 [V1] len=6: b'26'
#5a [V2] len=6: b'26'
#63 [V3] len=6: b'2000  '
#6c [V4] len=2: b'1 '
#71 [V5] len=6: b'c2'
#7a [V6] len=6: b'0 '
#83 [V7] len=2: b'1 '
#88 [V8] len=2: b'0 '
#8d [V9] len=2: b'0 '
#92 [VA] len=2: b'0 '
#97 [RV] len=80: 
b's\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'...
#0d00 Large item 252 bytes; name 0x11
#00 [VC] len=16: b'122310_1222 dp  '
#13 [VD] len=16: b'610-0001-00 H1\x00\x00'
#26 [VE] len=16: b'122310_1353 fp  '
#39 [VF] len=16: b'610-0001-00 H1\x00\x00'
#4c [RW] len=173: 
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'...
#0dff Small item 0 bytes; name 0xf End Tag

Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru>
---
 drivers/pci/quirks.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index ee72ebe1..94d3fb5 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3241,6 +3241,18 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 
PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_PORT_RIDGE,
quirk_thunderbolt_hotplug_msi);
 
+static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
+{
+   if (!dev->vpd || !dev->vpd->ops || !dev->vpd->ops->set_size)
+   return;
+
+   dev->vpd->ops->set_size(dev, max_t(unsigned int, dev->vpd->len, 0xe00));
+}
+
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO,
+   PCI_ANY_ID,
+   quirk_chelsio_extend_vpd);
+
 #ifdef CONFIG_ACPI
 /*
  * Apple: Shutdown Cactus Ridge Thunderbolt controller.
-- 
2.5.0.rc3



Re: [PATCH net-next] net/mlx4_core: Fix backward compatibility on VFs

2016-03-21 Thread Alexey Kardashevskiy

On 03/22/2016 12:56 AM, Eli Cohen wrote:

On Mon, Mar 21, 2016 at 04:02:16PM +1100, Alexey Kardashevskiy wrote:


After more tries, I found that if for whatever reason mlx4_core
fails to stop while shutting the guest down (last message is
"mlx4_core :00:00.0: mlx4_shutdown was called"), then next time
VF in guest won't start.

Example #1:

mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
mlx4_core: Initializing :00:00.0
mlx4_core :00:00.0: enabling device ( -> 0002)
mlx4_core :00:00.0: Detected virtual function - running in slave mode
mlx4_core :00:00.0: Sending reset
mlx4_core :00:00.0: Sending vhcr0
mlx4_core :00:00.0: HCA minimum page size:1
mlx4_core :00:00.0: UAR size:4096 != kernel PAGE_SIZE of 65536
mlx4_core :00:00.0: Failed to obtain slave caps


Alexey, can you verify that the value of the enable_4k_uar parameter
is false?


aik@fstn1-p1:~$ cat 
/sys/bus/pci/drivers/mlx4_core/module/parameters/enable_4k_uar

N
aik@fstn1-p1:~$







Example #2:

root@le-dbg:~# dhclient eth0
NETDEV WATCHDOG: eth0 (mlx4_core): transmit queue 11 timed out
[ cut here ]
WARNING: at /home/aik/p/guest-kernel/net/sched/sch_generic.c:303

and no IP assigned, timed out.


This is fixed by the guest restart, first restart might not help,
then the second restart will.

The host is running the latest upstream plus the patch I am replying
to. The guest is using initramdisk from debian bootstrap and vanilla
v4.2 kernel, ppc64le arch, POWER8 chip, QEMU is running with 1 CPU
and 2GB of RAM.

Does this look any familiar?



This is completely unrelated to the compatibility problem you reported
and which this patch addresses. We will reproduce in house and post a
fix.



Example #2 is but example #1 mentions "UAR size" :)



--
Alexey


Re: [PATCH net-next] net/mlx4_core: Fix backward compatibility on VFs

2016-03-20 Thread Alexey Kardashevskiy

On 03/18/2016 08:45 PM, Alexey Kardashevskiy wrote:

On 03/18/2016 03:49 AM, Eli Cohen wrote:

Commit 85743f1eb345 ("net/mlx4_core: Set UAR page size to 4KB regardless
of system page size") introduced dependency where old VF drivers without
this fix fail to load if the PF driver runs with this commit.

To resolve this add a module parameter which disables that functionality
by default.  If both the PF and VFs are running with a driver with that
commit the administrator may set the module param to true.

The module parameter is called enable_4k_uar.

Fixes: 85743f1eb345 ('net/mlx4_core: Set UAR page size to 4KB ...')
Signed-off-by: Eli Cohen <e...@mellanox.com>


Thanks!


After more tries, I found that if for whatever reason mlx4_core fails to 
stop while shutting the guest down (last message is "mlx4_core 
:00:00.0: mlx4_shutdown was called"), then next time VF in guest won't 
start.


Example #1:

mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
mlx4_core: Initializing :00:00.0
mlx4_core :00:00.0: enabling device ( -> 0002)
mlx4_core :00:00.0: Detected virtual function - running in slave mode
mlx4_core :00:00.0: Sending reset
mlx4_core :00:00.0: Sending vhcr0
mlx4_core :00:00.0: HCA minimum page size:1
mlx4_core :00:00.0: UAR size:4096 != kernel PAGE_SIZE of 65536
mlx4_core :00:00.0: Failed to obtain slave caps

Example #2:

root@le-dbg:~# dhclient eth0
NETDEV WATCHDOG: eth0 (mlx4_core): transmit queue 11 timed out
[ cut here ]
WARNING: at /home/aik/p/guest-kernel/net/sched/sch_generic.c:303

and no IP assigned, timed out.


This is fixed by the guest restart, first restart might not help, then the 
second restart will.


The host is running the latest upstream plus the patch I am replying to. 
The guest is using initramdisk from debian bootstrap and vanilla v4.2 
kernel, ppc64le arch, POWER8 chip, QEMU is running with 1 CPU and 2GB of RAM.


Does this look any familiar?






Tested-by: Alexey Kardashevskiy <a...@ozlabs.ru>





---
  drivers/net/ethernet/mellanox/mlx4/main.c | 24 ++--
  1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c
b/drivers/net/ethernet/mellanox/mlx4/main.c
index 503ec23e84cc..358f7230da58 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -105,6 +105,11 @@ module_param(enable_64b_cqe_eqe, bool, 0444);
  MODULE_PARM_DESC(enable_64b_cqe_eqe,
   "Enable 64 byte CQEs/EQEs when the FW supports this (default:
True)");

+static bool enable_4k_uar;
+module_param(enable_4k_uar, bool, 0444);
+MODULE_PARM_DESC(enable_4k_uar,
+ "Enable using 4K UAR. Should not be enabled if have VFs which
do not support 4K UARs (default: false)");
+
  #define PF_CONTEXT_BEHAVIOUR_MASK(MLX4_FUNC_CAP_64B_EQE_CQE | \
   MLX4_FUNC_CAP_EQE_CQE_STRIDE | \
   MLX4_FUNC_CAP_DMFS_A0_STATIC)
@@ -423,7 +428,11 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct
mlx4_dev_cap *dev_cap)
  /* Virtual PCI function needs to determine UAR page size from
   * firmware. Only master PCI function can set the uar page size
   */
-dev->uar_page_shift = DEFAULT_UAR_PAGE_SHIFT;
+if (enable_4k_uar)
+dev->uar_page_shift = DEFAULT_UAR_PAGE_SHIFT;
+else
+dev->uar_page_shift = PAGE_SHIFT;
+
  mlx4_set_num_reserved_uars(dev, dev_cap);
  }

@@ -2233,11 +2242,14 @@ static int mlx4_init_hca(struct mlx4_dev *dev)

  dev->caps.max_fmr_maps = (1 << (32 -
ilog2(dev->caps.num_mpts))) - 1;

-/* Always set UAR page size 4KB, set log_uar_sz accordingly */
-init_hca.log_uar_sz = ilog2(dev->caps.num_uars) +
-  PAGE_SHIFT -
-  DEFAULT_UAR_PAGE_SHIFT;
-init_hca.uar_page_sz = DEFAULT_UAR_PAGE_SHIFT - 12;
+if (enable_4k_uar) {
+init_hca.log_uar_sz = ilog2(dev->caps.num_uars) +
+PAGE_SHIFT - DEFAULT_UAR_PAGE_SHIFT;
+init_hca.uar_page_sz = DEFAULT_UAR_PAGE_SHIFT - 12;
+} else {
+init_hca.log_uar_sz = ilog2(dev->caps.num_uars);
+init_hca.uar_page_sz = PAGE_SHIFT - 12;
+}

  init_hca.mw_enabled = 0;
  if (dev->caps.flags & MLX4_DEV_CAP_FLAG_MEM_WINDOW ||







--
Alexey


Re: [PATCH net-next] net/mlx4_core: Fix backward compatibility on VFs

2016-03-19 Thread Alexey Kardashevskiy

On 03/18/2016 03:49 AM, Eli Cohen wrote:

Commit 85743f1eb345 ("net/mlx4_core: Set UAR page size to 4KB regardless
of system page size") introduced dependency where old VF drivers without
this fix fail to load if the PF driver runs with this commit.

To resolve this add a module parameter which disables that functionality
by default.  If both the PF and VFs are running with a driver with that
commit the administrator may set the module param to true.

The module parameter is called enable_4k_uar.

Fixes: 85743f1eb345 ('net/mlx4_core: Set UAR page size to 4KB ...')
Signed-off-by: Eli Cohen <e...@mellanox.com>


Thanks!


Tested-by: Alexey Kardashevskiy <a...@ozlabs.ru>





---
  drivers/net/ethernet/mellanox/mlx4/main.c | 24 ++--
  1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index 503ec23e84cc..358f7230da58 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -105,6 +105,11 @@ module_param(enable_64b_cqe_eqe, bool, 0444);
  MODULE_PARM_DESC(enable_64b_cqe_eqe,
 "Enable 64 byte CQEs/EQEs when the FW supports this (default: 
True)");

+static bool enable_4k_uar;
+module_param(enable_4k_uar, bool, 0444);
+MODULE_PARM_DESC(enable_4k_uar,
+"Enable using 4K UAR. Should not be enabled if have VFs which do 
not support 4K UARs (default: false)");
+
  #define PF_CONTEXT_BEHAVIOUR_MASK (MLX4_FUNC_CAP_64B_EQE_CQE | \
 MLX4_FUNC_CAP_EQE_CQE_STRIDE | \
 MLX4_FUNC_CAP_DMFS_A0_STATIC)
@@ -423,7 +428,11 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct 
mlx4_dev_cap *dev_cap)
/* Virtual PCI function needs to determine UAR page size from
 * firmware. Only master PCI function can set the uar page size
 */
-   dev->uar_page_shift = DEFAULT_UAR_PAGE_SHIFT;
+   if (enable_4k_uar)
+   dev->uar_page_shift = DEFAULT_UAR_PAGE_SHIFT;
+   else
+   dev->uar_page_shift = PAGE_SHIFT;
+
mlx4_set_num_reserved_uars(dev, dev_cap);
}

@@ -2233,11 +2242,14 @@ static int mlx4_init_hca(struct mlx4_dev *dev)

dev->caps.max_fmr_maps = (1 << (32 - 
ilog2(dev->caps.num_mpts))) - 1;

-   /* Always set UAR page size 4KB, set log_uar_sz accordingly */
-   init_hca.log_uar_sz = ilog2(dev->caps.num_uars) +
- PAGE_SHIFT -
- DEFAULT_UAR_PAGE_SHIFT;
-   init_hca.uar_page_sz = DEFAULT_UAR_PAGE_SHIFT - 12;
+   if (enable_4k_uar) {
+   init_hca.log_uar_sz = ilog2(dev->caps.num_uars) +
+   PAGE_SHIFT - 
DEFAULT_UAR_PAGE_SHIFT;
+   init_hca.uar_page_sz = DEFAULT_UAR_PAGE_SHIFT - 12;
+   } else {
+   init_hca.log_uar_sz = ilog2(dev->caps.num_uars);
+   init_hca.uar_page_sz = PAGE_SHIFT - 12;
+   }

init_hca.mw_enabled = 0;
if (dev->caps.flags & MLX4_DEV_CAP_FLAG_MEM_WINDOW ||




--
Alexey


Re: [RFC PATCH kernel] Revert "net/mlx4_core: Set UAR page size to 4KB regardless of system page size"

2016-03-18 Thread Alexey Kardashevskiy

On 03/16/2016 08:45 PM, Or Gerlitz wrote:

On Wed, Mar 16, 2016 at 10:34 AM, Alexey Kardashevskiy <a...@ozlabs.ru> wrote:


Oh. ok. It also looks like even with the reverted patch, mlx4 VF does not
work in a guest:


So where is the breakage point for you? does 4.4 works? if not, what?


Ah, my bad. It is unrelated to the kernel version.

I tried passing a PF to a guest while its VFs are already passed to another 
guest and see how exactly it blows up (AER/EEH were thrown but the host 
recovered => good) but this left the device in a weird state when I could 
not use VF in a guest anymore but it seemed to keep working on the host.


It seems like the actual adapter does not reset completely when the machine 
is rebooted, I had unplug/replug power cables to fix this.



--
Alexey


Re: [RFC PATCH kernel] Revert "net/mlx4_core: Set UAR page size to 4KB regardless of system page size"

2016-03-16 Thread Alexey Kardashevskiy

On 03/16/2016 05:09 PM, Eli Cohen wrote:

On Wed, Mar 16, 2016 at 04:49:00PM +1100, Alexey Kardashevskiy wrote:

On 03/16/2016 04:10 PM, Eli Cohen wrote:

On Wed, Mar 16, 2016 at 01:07:58PM +1100, Alexey Kardashevskiy wrote:


So with v4.5 as a host, there is no actual distro available today to
use as a guest in the next 6 months (or whatever it takes to
backport this partucular patch back there).

You could have added a module parameter to enforce the old behavoir,
at least...

And sorry but from the original commit log I could not understand
why exactly all existing guests need to be broken. Could you please
point me to a piece of documentation describing all this UAR
bisuness (what is UAR, why 128 UARs are required and for what, etc).
Thanks.



We are going to send a patch that fixes this using a module parameter.
The patch will be on top of Huy's patch.

Some background to the problem: mlx4 supported devices require 128 UAR


What does UAR stand for?

User Access Region. It's the way you interface with the hardware.



pages from PCI memory space defined by BAR2-3. Each UAR page can be
any power of 2 value from 4K up to 64K. Before Huy's patch the driver
chose UAR page size to be equal to system page size. Since PowerPC's
page size is 64K this means minimum requirement of UAR pages is not
met (default UAR BAR is 8MB and only half of it is really reserved for
UARs).


And what was the downside? afaict the performance was good...



It's not a performance issue. Defining 64KB for a UAR is not required
and wastes pci memory mapped i/o space.




More details can be found in the programmer's manual.


Can you please point me to this manual on the website? I tried,
honestly, could not find it. Thanks.


It's not publically available. If you have an FAE that work with your
company you can ask him how to get the doc.



Oh. ok. It also looks like even with the reverted patch, mlx4 VF does not 
work in a guest:


root@le-dbg:~# dhclient eth0
mlx4_en: eth0:   frag:0 - size:1518 prefix:0 stride:1536

mlx4_core :00:00.0: Internal error detected on the communication channel
mlx4_core :00:00.0: device is going to be reset
mlx4_core :00:00.0: VF reset is not needed
mlx4_core :00:00.0: device was reset successfully
mlx4_en :00:00.0: Internal error detected, restarting device
mlx4_core :00:00.0: command 0x5 failed: fw status = 0x1
mlx4_core :00:00.0: Failed to close slave function
mlx4_core :00:00.0: Detected virtual function - running in slave mode
mlx4_core :00:00.0: Sending reset


mlx4_core :00:00.0: slave is currently in the middle of FLR - Deferring 
probe
mlx4_core :00:00.0: mlx4_restart_one: ERROR: mlx4_load_one failed, 
pci_name=:00:00.0, err=-517

mlx4_core :00:00.0: mlx4_restart_one was ended, ret=-517

root@le-dbg:~# ifconfig -a
loLink encap:Local Loopback
  inet addr:127.0.0.1  Mask:255.0.0.0
  UP LOOPBACK RUNNING  MTU:65536  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

root@le-dbg:~# lspci -v
00:00.0 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family 
[ConnectX-3/ConnectX-3 Pro Virtual Function]

Subsystem: IBM Device 61b0
Physical Slot: C16
Flags: bus master, fast devsel, latency 0
Memory at 1012000 (64-bit, prefetchable) [size=64M]
Capabilities: [60] Express Endpoint, MSI 00
Capabilities: [9c] MSI-X: Enable- Count=52 Masked-
Capabilities: [40] Power Management version 0
Kernel driver in use: mlx4_core



--
Alexey


Re: [RFC PATCH kernel] Revert "net/mlx4_core: Set UAR page size to 4KB regardless of system page size"

2016-03-15 Thread Alexey Kardashevskiy

On 03/16/2016 04:10 PM, Eli Cohen wrote:

On Wed, Mar 16, 2016 at 01:07:58PM +1100, Alexey Kardashevskiy wrote:


So with v4.5 as a host, there is no actual distro available today to
use as a guest in the next 6 months (or whatever it takes to
backport this partucular patch back there).

You could have added a module parameter to enforce the old behavoir,
at least...

And sorry but from the original commit log I could not understand
why exactly all existing guests need to be broken. Could you please
point me to a piece of documentation describing all this UAR
bisuness (what is UAR, why 128 UARs are required and for what, etc).
Thanks.



We are going to send a patch that fixes this using a module parameter.
The patch will be on top of Huy's patch.

Some background to the problem: mlx4 supported devices require 128 UAR


What does UAR stand for?


pages from PCI memory space defined by BAR2-3. Each UAR page can be
any power of 2 value from 4K up to 64K. Before Huy's patch the driver
chose UAR page size to be equal to system page size. Since PowerPC's
page size is 64K this means minimum requirement of UAR pages is not
met (default UAR BAR is 8MB and only half of it is really reserved for
UARs).


And what was the downside? afaict the performance was good...



More details can be found in the programmer's manual.


Can you please point me to this manual on the website? I tried, honestly, 
could not find it. Thanks.



--
Alexey


Re: [RFC PATCH kernel] Revert "net/mlx4_core: Set UAR page size to 4KB regardless of system page size"

2016-03-15 Thread Alexey Kardashevskiy

On 03/15/2016 09:40 PM, Or Gerlitz wrote:

On Tue, Mar 15, 2016 at 12:19 PM, Alexey Kardashevskiy <a...@ozlabs.ru> wrote:

This reverts commit 85743f1eb34548ba4b056d2f184a3d107a3b8917.

Without this revert, POWER "pseries" KVM guests with a VF passed to a guest
using VFIO fail to bring the driver up:

mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
mlx4_core: Initializing :00:00.0
mlx4_core :00:00.0: enabling device ( -> 0002)
mlx4_core :00:00.0: Detected virtual function - running in slave mode
mlx4_core :00:00.0: Sending reset
mlx4_core :00:00.0: Sending vhcr0
mlx4_core :00:00.0: HCA minimum page size:512
mlx4_core :00:00.0: UAR size:4096 != kernel PAGE_SIZE of 65536
mlx4_core :00:00.0: Failed to obtain slave caps



Both host and guest use 64K system pages.

How to fix this properly? Thanks.


The commit message says:

"[..] Regarding backward compatibility in SR-IOV, if hypervisor has
this new code, the virtual OS must be updated. [...]"



So with v4.5 as a host, there is no actual distro available today to use as 
a guest in the next 6 months (or whatever it takes to backport this 
partucular patch back there).


You could have added a module parameter to enforce the old behavoir, at 
least...


And sorry but from the original commit log I could not understand why 
exactly all existing guests need to be broken. Could you please point me to 
a piece of documentation describing all this UAR bisuness (what is UAR, why 
128 UARs are required and for what, etc). Thanks.



--
Alexey


Re: [RFC PATCH kernel] Revert "net/mlx4_core: Set UAR page size to 4KB regardless of system page size"

2016-03-15 Thread Alexey Kardashevskiy

On 03/16/2016 02:29 AM, Christoph Hellwig wrote:

On Tue, Mar 15, 2016 at 04:23:33PM +0200, Or Gerlitz wrote:

Let us check. I was under (the maybe wrong) impression, that before this
patch both PF/VF drivers were not operative on some systems, so on those
systems it's fair to require the VF driver to be patched too.


To me it sounds like the system worked before. Alexey, can you confirm?


It worked just fine for year(s), this is definitely regression.


--
Alexey


[RFC PATCH kernel] Revert "net/mlx4_core: Set UAR page size to 4KB regardless of system page size"

2016-03-15 Thread Alexey Kardashevskiy
This reverts commit 85743f1eb34548ba4b056d2f184a3d107a3b8917.

Without this revert, POWER "pseries" KVM guests with a VF passed to a guest
using VFIO fail to bring the driver up:

mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
mlx4_core: Initializing :00:00.0
mlx4_core :00:00.0: enabling device ( -> 0002)
mlx4_core :00:00.0: Detected virtual function - running in slave mode
mlx4_core :00:00.0: Sending reset
mlx4_core :00:00.0: Sending vhcr0
mlx4_core :00:00.0: HCA minimum page size:512
mlx4_core :00:00.0: UAR size:4096 != kernel PAGE_SIZE of 65536
mlx4_core :00:00.0: Failed to obtain slave caps


Both host and guest use 64K system pages.

How to fix this properly? Thanks.



---
 drivers/infiniband/hw/mlx4/qp.c   |  7 +--
 drivers/net/ethernet/mellanox/mlx4/cq.c   |  4 +-
 drivers/net/ethernet/mellanox/mlx4/en_resources.c |  3 +-
 drivers/net/ethernet/mellanox/mlx4/en_tx.c|  4 +-
 drivers/net/ethernet/mellanox/mlx4/eq.c   |  7 ++-
 drivers/net/ethernet/mellanox/mlx4/main.c | 56 +--
 drivers/net/ethernet/mellanox/mlx4/pd.c   | 12 ++---
 include/linux/mlx4/device.h   | 13 --
 8 files changed, 22 insertions(+), 84 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index fd97534..bc5536f 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -1681,12 +1681,9 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
}
 
if (qp->ibqp.uobject)
-   context->usr_page = cpu_to_be32(
-   mlx4_to_hw_uar_index(dev->dev,
-
to_mucontext(ibqp->uobject->context)->uar.index));
+   context->usr_page = 
cpu_to_be32(to_mucontext(ibqp->uobject->context)->uar.index);
else
-   context->usr_page = cpu_to_be32(
-   mlx4_to_hw_uar_index(dev->dev, dev->priv_uar.index));
+   context->usr_page = cpu_to_be32(dev->priv_uar.index);
 
if (attr_mask & IB_QP_DEST_QPN)
context->remote_qpn = cpu_to_be32(attr->dest_qp_num);
diff --git a/drivers/net/ethernet/mellanox/mlx4/cq.c 
b/drivers/net/ethernet/mellanox/mlx4/cq.c
index a849da9..3348e64 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cq.c
@@ -318,9 +318,7 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent,
if (timestamp_en)
cq_context->flags  |= cpu_to_be32(1 << 19);
 
-   cq_context->logsize_usrpage =
-   cpu_to_be32((ilog2(nent) << 24) |
-   mlx4_to_hw_uar_index(dev, uar->index));
+   cq_context->logsize_usrpage = cpu_to_be32((ilog2(nent) << 24) | 
uar->index);
cq_context->comp_eqn= 
priv->eq_table.eq[MLX4_CQ_TO_EQ_VECTOR(vector)].eqn;
cq_context->log_page_size   = mtt->page_shift - MLX4_ICM_PAGE_SHIFT;
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_resources.c 
b/drivers/net/ethernet/mellanox/mlx4/en_resources.c
index 02e925d..12aab5a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_resources.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_resources.c
@@ -58,8 +58,7 @@ void mlx4_en_fill_qp_context(struct mlx4_en_priv *priv, int 
size, int stride,
} else {
context->sq_size_stride = ilog2(TXBB_SIZE) - 4;
}
-   context->usr_page = cpu_to_be32(mlx4_to_hw_uar_index(mdev->dev,
-   mdev->priv_uar.index));
+   context->usr_page = cpu_to_be32(mdev->priv_uar.index);
context->local_qpn = cpu_to_be32(qpn);
context->pri_path.ackto = 1 & 0x07;
context->pri_path.sched_queue = 0x83 | (priv->port - 1) << 6;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c 
b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index e0946ab..4421bf5 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -213,9 +213,7 @@ int mlx4_en_activate_tx_ring(struct mlx4_en_priv *priv,
mlx4_en_fill_qp_context(priv, ring->size, ring->stride, 1, 0, ring->qpn,
ring->cqn, user_prio, >context);
if (ring->bf_alloced)
-   ring->context.usr_page =
-   cpu_to_be32(mlx4_to_hw_uar_index(mdev->dev,
-ring->bf.uar->index));
+   ring->context.usr_page = cpu_to_be32(ring->bf.uar->index);
 
err = mlx4_qp_to_ready(mdev->dev, >wqres.mtt, >context,
   >qp, >qp_state);
diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c 
b/drivers/net/ethernet/mellanox/mlx4/eq.c
index f613977..4696053 100644
--- a/drivers/net/ethernet/mellanox/mlx4/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx4/eq.c
@@ -940,10 +940,9 @@ static void __iomem *mlx4_get_eq_uar(struct mlx4_dev *dev, 
struct mlx4_eq *eq)
 

Re: [RFC PATCH kernel] Revert "net/mlx4_core: Add port attribute when tracking counters"

2015-09-22 Thread Alexey Kardashevskiy

On 09/20/2015 11:51 PM, Or Gerlitz wrote:

On Tue, Sep 15, 2015 at 1:41 PM, Alexey Kardashevskiy <a...@ozlabs.ru> wrote:

Any luck with that?


I am checking with the team if they can set a PPC node to try and
reproduce the crash, on x86 they don't see it.


Somehow I cannot reproduce it anymore on v4.2 kernel which is quite 
disturbing. I'll get back as soon as I see this again...



--
Alexey
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH kernel] Revert "net/mlx4_core: Add port attribute when tracking counters"

2015-09-15 Thread Alexey Kardashevskiy

Any luck with that?


On 09/04/2015 01:36 PM, Alexey Kardashevskiy wrote:

On 09/03/2015 10:09 PM, eran ben elisha wrote:

On Mon, Aug 31, 2015 at 5:39 AM, Alexey Kardashevskiy <a...@ozlabs.ru> wrote:

On 08/30/2015 04:28 PM, Or Gerlitz wrote:


On Fri, Aug 28, 2015 at 7:06 AM, Alexey Kardashevskiy <a...@ozlabs.ru>
wrote:


68230242cdb breaks SRIOV on POWER8 system. I am not really suggesting
reverting the patch, rather asking for a fix.



thanks for the detailed report, we will look into that.

Just to be sure, when going back in time, what is the latest upstream
version where
this system/config works okay? is that 4.1 or later?



4.1 is good, 4.2 is not.







To reproduce it:

1. boot latest upstream kernel (v4.2-rc8 sha1 4941b8f, ppc64le)

2. Run:
sudo rmmod mlx4_en mlx4_ib mlx4_core
sudo modprobe mlx4_core num_vfs=4 probe_vf=4 port_type_array=2,2
debug_level=1

3. Run QEMU (just to give a complete picture):
/home/aik/qemu-system-ppc64 -enable-kvm -m 2048 -machine pseries \
-nodefaults \
-chardev stdio,id=id0,signal=off,mux=on \
-device spapr-vty,id=id1,chardev=id0,reg=0x71000100 \
-mon id=id2,chardev=id0,mode=readline -nographic -vga none \
-initrd dhclient.cpio -kernel vml400bedbg \
-device vfio-pci,id=id3,host=0003:03:00.1
What guest is used does not matter at all.

4. Wait till guest boots and then run:
dhclient
This assigns IPs to both interfaces just fine. This is essential -
if interface was not brought up since guest started, the bug does not
appear.
If interface was up and then down, this still causes the problem
(less likely though).

5. Run in the guest: shutdown -h 0
Guest prints:
mlx4_en: eth0: Close port called
mlx4_en: eth1: Close port called
mlx4_core :00:00.0: mlx4_shutdown was called
And then the host hangs. After 10-30 seconds the host console prints:
NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s!
[qemu-system-ppc:5095]
OR
INFO: rcu_sched detected stalls on CPUs/tasks:
or some other random stuff but always related to some sort of lockup.
Backtraces are like these:

[c01e492a7ac0] [c0135b84]
smp_call_function_many+0x2f4/0x3fable)
[c01e492a7b40] [c0135db8] kick_all_cpus_sync+0x38/0x50
[c01e492a7b60] [c0048f38] pmdp_huge_get_and_clear+0x48/0x70
[c01e492a7b90] [c023181c] change_huge_pmd+0xac/0x210
[c01e492a7bf0] [c01fb9e8] change_protection+0x678/0x720
[c01e492a7d00] [c0217d38] change_prot_numa+0x28/0xa0
[c01e492a7d30] [c00e0e40] task_numa_work+0x2a0/0x370
[c01e492a7db0] [c00c5fb4] task_work_run+0xe4/0x160
[c01e492a7e00] [c00169a4] do_notify_resume+0x84/0x90
[c01e492a7e30] [c00098b8] ret_from_except_lite+0x64/0x68

OR

[c01def1b7280] [c00ff941d368] 0xc00ff941d368 (unreliable)
[c01def1b7450] [c001512c] __switch_to+0x1fc/0x350
[c01def1b7490] [c01def1b74e0] 0xc01def1b74e0
[c01def1b74e0] [c011a50c] try_to_del_timer_sync+0x5c/0x90
[c01def1b7520] [c011a590] del_timer_sync+0x50/0x70
[c01def1b7550] [c09136fc] schedule_timeout+0x15c/0x2b0
[c01def1b7620] [c0910e6c] wait_for_common+0x12c/0x230
[c01def1b7660] [c00fa22c] up+0x4c/0x80
[c01def1b76a0] [d00016323e60] __mlx4_cmd+0x320/0x940 [mlx4_core]
[c01def1b7760] [c01def1b77a0] 0xc01def1b77a0
[c01def1b77f0] [d000163528b4] mlx4_2RST_QP_wrapper+0x154/0x1e0
[mlx4_core]
[c01def1b7860] [d00016324934]
mlx4_master_process_vhcr+0x1b4/0x6c0 [mlx4_core]
[c01def1b7930] [d00016324170] __mlx4_cmd+0x630/0x940 [mlx4_core]
[c01def1b79f0] [d00016346fec]
__mlx4_qp_modify.constprop.8+0x1ec/0x350 [mlx4_core]
[c01def1b7ac0] [d00016292228] mlx4_ib_destroy_qp+0xd8/0x5d0
[mlx4_ib]
[c01def1b7b60] [d00013c7305c] ib_destroy_qp+0x1cc/0x290 [ib_core]
[c01def1b7bb0] [d00016284548]
destroy_pv_resources.isra.14.part.15+0x48/0xf0 [mlx4_ib]
[c01def1b7be0] [d00016284d28] mlx4_ib_tunnels_update+0x168/0x170
[mlx4_ib]
[c01def1b7c20] [d000162876e0]
mlx4_ib_tunnels_update_work+0x30/0x50 [mlx4_ib]
[c01def1b7c50] [c00c0d34] process_one_work+0x194/0x490
[c01def1b7ce0] [c00c11b0] worker_thread+0x180/0x5a0
[c01def1b7d80] [c00c8a0c] kthread+0x10c/0x130
[c01def1b7e30] [c00095a8] ret_from_kernel_thread+0x5c/0xb4

i.e. may or may not mention mlx4.
The issue may not happen on a first try but maximum on the second.



so when you revert commit 68230242cdb on the host all works just fine?
what guest driver are you running?



To be precise, I did checkout 68230242cdb, checked that it does not work,
then reverted 68230242cdb right there and checked that it works. I did not
try reverting later revisions yet.

My guest kernel in this test has tag v4.0. I get the same effect with some
3.18 from Ubuntu 14.04 LTS so the guest kernel version does not make a
difference afaict.



This needs a fix, I don't think the right thing to d

Re: [RFC PATCH kernel] Revert "net/mlx4_core: Add port attribute when tracking counters"

2015-09-03 Thread Alexey Kardashevskiy

On 09/03/2015 10:09 PM, eran ben elisha wrote:

On Mon, Aug 31, 2015 at 5:39 AM, Alexey Kardashevskiy <a...@ozlabs.ru> wrote:

On 08/30/2015 04:28 PM, Or Gerlitz wrote:


On Fri, Aug 28, 2015 at 7:06 AM, Alexey Kardashevskiy <a...@ozlabs.ru>
wrote:


68230242cdb breaks SRIOV on POWER8 system. I am not really suggesting
reverting the patch, rather asking for a fix.



thanks for the detailed report, we will look into that.

Just to be sure, when going back in time, what is the latest upstream
version where
this system/config works okay? is that 4.1 or later?



4.1 is good, 4.2 is not.







To reproduce it:

1. boot latest upstream kernel (v4.2-rc8 sha1 4941b8f, ppc64le)

2. Run:
sudo rmmod mlx4_en mlx4_ib mlx4_core
sudo modprobe mlx4_core num_vfs=4 probe_vf=4 port_type_array=2,2
debug_level=1

3. Run QEMU (just to give a complete picture):
/home/aik/qemu-system-ppc64 -enable-kvm -m 2048 -machine pseries \
-nodefaults \
-chardev stdio,id=id0,signal=off,mux=on \
-device spapr-vty,id=id1,chardev=id0,reg=0x71000100 \
-mon id=id2,chardev=id0,mode=readline -nographic -vga none \
-initrd dhclient.cpio -kernel vml400bedbg \
-device vfio-pci,id=id3,host=0003:03:00.1
What guest is used does not matter at all.

4. Wait till guest boots and then run:
dhclient
This assigns IPs to both interfaces just fine. This is essential -
if interface was not brought up since guest started, the bug does not
appear.
If interface was up and then down, this still causes the problem
(less likely though).

5. Run in the guest: shutdown -h 0
Guest prints:
mlx4_en: eth0: Close port called
mlx4_en: eth1: Close port called
mlx4_core :00:00.0: mlx4_shutdown was called
And then the host hangs. After 10-30 seconds the host console prints:
NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s!
[qemu-system-ppc:5095]
OR
INFO: rcu_sched detected stalls on CPUs/tasks:
or some other random stuff but always related to some sort of lockup.
Backtraces are like these:

[c01e492a7ac0] [c0135b84]
smp_call_function_many+0x2f4/0x3fable)
[c01e492a7b40] [c0135db8] kick_all_cpus_sync+0x38/0x50
[c01e492a7b60] [c0048f38] pmdp_huge_get_and_clear+0x48/0x70
[c01e492a7b90] [c023181c] change_huge_pmd+0xac/0x210
[c01e492a7bf0] [c01fb9e8] change_protection+0x678/0x720
[c01e492a7d00] [c0217d38] change_prot_numa+0x28/0xa0
[c01e492a7d30] [c00e0e40] task_numa_work+0x2a0/0x370
[c01e492a7db0] [c00c5fb4] task_work_run+0xe4/0x160
[c01e492a7e00] [c00169a4] do_notify_resume+0x84/0x90
[c01e492a7e30] [c00098b8] ret_from_except_lite+0x64/0x68

OR

[c01def1b7280] [c00ff941d368] 0xc00ff941d368 (unreliable)
[c01def1b7450] [c001512c] __switch_to+0x1fc/0x350
[c01def1b7490] [c01def1b74e0] 0xc01def1b74e0
[c01def1b74e0] [c011a50c] try_to_del_timer_sync+0x5c/0x90
[c01def1b7520] [c011a590] del_timer_sync+0x50/0x70
[c01def1b7550] [c09136fc] schedule_timeout+0x15c/0x2b0
[c01def1b7620] [c0910e6c] wait_for_common+0x12c/0x230
[c01def1b7660] [c00fa22c] up+0x4c/0x80
[c01def1b76a0] [d00016323e60] __mlx4_cmd+0x320/0x940 [mlx4_core]
[c01def1b7760] [c01def1b77a0] 0xc01def1b77a0
[c01def1b77f0] [d000163528b4] mlx4_2RST_QP_wrapper+0x154/0x1e0
[mlx4_core]
[c01def1b7860] [d00016324934]
mlx4_master_process_vhcr+0x1b4/0x6c0 [mlx4_core]
[c01def1b7930] [d00016324170] __mlx4_cmd+0x630/0x940 [mlx4_core]
[c01def1b79f0] [d00016346fec]
__mlx4_qp_modify.constprop.8+0x1ec/0x350 [mlx4_core]
[c01def1b7ac0] [d00016292228] mlx4_ib_destroy_qp+0xd8/0x5d0
[mlx4_ib]
[c01def1b7b60] [d00013c7305c] ib_destroy_qp+0x1cc/0x290 [ib_core]
[c01def1b7bb0] [d00016284548]
destroy_pv_resources.isra.14.part.15+0x48/0xf0 [mlx4_ib]
[c01def1b7be0] [d00016284d28] mlx4_ib_tunnels_update+0x168/0x170
[mlx4_ib]
[c01def1b7c20] [d000162876e0]
mlx4_ib_tunnels_update_work+0x30/0x50 [mlx4_ib]
[c01def1b7c50] [c00c0d34] process_one_work+0x194/0x490
[c01def1b7ce0] [c00c11b0] worker_thread+0x180/0x5a0
[c01def1b7d80] [c00c8a0c] kthread+0x10c/0x130
[c01def1b7e30] [c00095a8] ret_from_kernel_thread+0x5c/0xb4

i.e. may or may not mention mlx4.
The issue may not happen on a first try but maximum on the second.



so when you revert commit 68230242cdb on the host all works just fine?
what guest driver are you running?



To be precise, I did checkout 68230242cdb, checked that it does not work,
then reverted 68230242cdb right there and checked that it works. I did not
try reverting later revisions yet.

My guest kernel in this test has tag v4.0. I get the same effect with some
3.18 from Ubuntu 14.04 LTS so the guest kernel version does not make a
difference afaict.



This needs a fix, I don't think the right thing to do is just go and
revert the commit, if the right fix misses 4.2

Re: [RFC PATCH kernel] Revert net/mlx4_core: Add port attribute when tracking counters

2015-08-30 Thread Alexey Kardashevskiy

On 08/30/2015 04:28 PM, Or Gerlitz wrote:

On Fri, Aug 28, 2015 at 7:06 AM, Alexey Kardashevskiy a...@ozlabs.ru wrote:

68230242cdb breaks SRIOV on POWER8 system. I am not really suggesting
reverting the patch, rather asking for a fix.


thanks for the detailed report, we will look into that.

Just to be sure, when going back in time, what is the latest upstream
version where
this system/config works okay? is that 4.1 or later?


4.1 is good, 4.2 is not.






To reproduce it:

1. boot latest upstream kernel (v4.2-rc8 sha1 4941b8f, ppc64le)

2. Run:
sudo rmmod mlx4_en mlx4_ib mlx4_core
sudo modprobe mlx4_core num_vfs=4 probe_vf=4 port_type_array=2,2 debug_level=1

3. Run QEMU (just to give a complete picture):
/home/aik/qemu-system-ppc64 -enable-kvm -m 2048 -machine pseries \
-nodefaults \
-chardev stdio,id=id0,signal=off,mux=on \
-device spapr-vty,id=id1,chardev=id0,reg=0x71000100 \
-mon id=id2,chardev=id0,mode=readline -nographic -vga none \
-initrd dhclient.cpio -kernel vml400bedbg \
-device vfio-pci,id=id3,host=0003:03:00.1
What guest is used does not matter at all.

4. Wait till guest boots and then run:
dhclient
This assigns IPs to both interfaces just fine. This is essential -
if interface was not brought up since guest started, the bug does not appear.
If interface was up and then down, this still causes the problem
(less likely though).

5. Run in the guest: shutdown -h 0
Guest prints:
mlx4_en: eth0: Close port called
mlx4_en: eth1: Close port called
mlx4_core :00:00.0: mlx4_shutdown was called
And then the host hangs. After 10-30 seconds the host console prints:
NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-ppc:5095]
OR
INFO: rcu_sched detected stalls on CPUs/tasks:
or some other random stuff but always related to some sort of lockup.
Backtraces are like these:

[c01e492a7ac0] [c0135b84] smp_call_function_many+0x2f4/0x3fable)
[c01e492a7b40] [c0135db8] kick_all_cpus_sync+0x38/0x50
[c01e492a7b60] [c0048f38] pmdp_huge_get_and_clear+0x48/0x70
[c01e492a7b90] [c023181c] change_huge_pmd+0xac/0x210
[c01e492a7bf0] [c01fb9e8] change_protection+0x678/0x720
[c01e492a7d00] [c0217d38] change_prot_numa+0x28/0xa0
[c01e492a7d30] [c00e0e40] task_numa_work+0x2a0/0x370
[c01e492a7db0] [c00c5fb4] task_work_run+0xe4/0x160
[c01e492a7e00] [c00169a4] do_notify_resume+0x84/0x90
[c01e492a7e30] [c00098b8] ret_from_except_lite+0x64/0x68

OR

[c01def1b7280] [c00ff941d368] 0xc00ff941d368 (unreliable)
[c01def1b7450] [c001512c] __switch_to+0x1fc/0x350
[c01def1b7490] [c01def1b74e0] 0xc01def1b74e0
[c01def1b74e0] [c011a50c] try_to_del_timer_sync+0x5c/0x90
[c01def1b7520] [c011a590] del_timer_sync+0x50/0x70
[c01def1b7550] [c09136fc] schedule_timeout+0x15c/0x2b0
[c01def1b7620] [c0910e6c] wait_for_common+0x12c/0x230
[c01def1b7660] [c00fa22c] up+0x4c/0x80
[c01def1b76a0] [d00016323e60] __mlx4_cmd+0x320/0x940 [mlx4_core]
[c01def1b7760] [c01def1b77a0] 0xc01def1b77a0
[c01def1b77f0] [d000163528b4] mlx4_2RST_QP_wrapper+0x154/0x1e0 
[mlx4_core]
[c01def1b7860] [d00016324934] mlx4_master_process_vhcr+0x1b4/0x6c0 
[mlx4_core]
[c01def1b7930] [d00016324170] __mlx4_cmd+0x630/0x940 [mlx4_core]
[c01def1b79f0] [d00016346fec] __mlx4_qp_modify.constprop.8+0x1ec/0x350 
[mlx4_core]
[c01def1b7ac0] [d00016292228] mlx4_ib_destroy_qp+0xd8/0x5d0 [mlx4_ib]
[c01def1b7b60] [d00013c7305c] ib_destroy_qp+0x1cc/0x290 [ib_core]
[c01def1b7bb0] [d00016284548] 
destroy_pv_resources.isra.14.part.15+0x48/0xf0 [mlx4_ib]
[c01def1b7be0] [d00016284d28] mlx4_ib_tunnels_update+0x168/0x170 
[mlx4_ib]
[c01def1b7c20] [d000162876e0] mlx4_ib_tunnels_update_work+0x30/0x50 
[mlx4_ib]
[c01def1b7c50] [c00c0d34] process_one_work+0x194/0x490
[c01def1b7ce0] [c00c11b0] worker_thread+0x180/0x5a0
[c01def1b7d80] [c00c8a0c] kthread+0x10c/0x130
[c01def1b7e30] [c00095a8] ret_from_kernel_thread+0x5c/0xb4

i.e. may or may not mention mlx4.
The issue may not happen on a first try but maximum on the second.


so when you revert commit 68230242cdb on the host all works just fine?
what guest driver are you running?


To be precise, I did checkout 68230242cdb, checked that it does not work, 
then reverted 68230242cdb right there and checked that it works. I did not 
try reverting later revisions yet.


My guest kernel in this test has tag v4.0. I get the same effect with some 
3.18 from Ubuntu 14.04 LTS so the guest kernel version does not make a 
difference afaict.




This needs a fix, I don't think the right thing to do is just go and
revert the commit, if the right fix misses 4.2 we will get it there
through -stable


v4.2 was just released :)


--
Alexey
--
To unsubscribe from this list: send the line unsubscribe netdev

[RFC PATCH kernel] Revert net/mlx4_core: Add port attribute when tracking counters

2015-08-28 Thread Alexey Kardashevskiy
68230242cdb breaks SRIOV on POWER8 system. I am not really suggesting
reverting the patch, rather asking for a fix.

To reproduce it:

1. boot latest upstream kernel (v4.2-rc8 sha1 4941b8f, ppc64le)

2. Run:
sudo rmmod mlx4_en mlx4_ib mlx4_core
sudo modprobe mlx4_core num_vfs=4 probe_vf=4 port_type_array=2,2 debug_level=1

3. Run QEMU (just to give a complete picture):
/home/aik/qemu-system-ppc64 -enable-kvm -m 2048 -machine pseries \
-nodefaults \
-chardev stdio,id=id0,signal=off,mux=on \
-device spapr-vty,id=id1,chardev=id0,reg=0x71000100 \
-mon id=id2,chardev=id0,mode=readline -nographic -vga none \
-initrd dhclient.cpio -kernel vml400bedbg \
-device vfio-pci,id=id3,host=0003:03:00.1
What guest is used does not matter at all.

4. Wait till guest boots and then run:
dhclient
This assigns IPs to both interfaces just fine. This is essential -
if interface was not brought up since guest started, the bug does not appear.
If interface was up and then down, this still causes the problem
(less likely though).

5. Run in the guest: shutdown -h 0
Guest prints:
mlx4_en: eth0: Close port called
mlx4_en: eth1: Close port called
mlx4_core :00:00.0: mlx4_shutdown was called
And then the host hangs. After 10-30 seconds the host console prints:
NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [qemu-system-ppc:5095]
OR
INFO: rcu_sched detected stalls on CPUs/tasks:
or some other random stuff but always related to some sort of lockup.
Backtraces are like these:

[c01e492a7ac0] [c0135b84] smp_call_function_many+0x2f4/0x3fable)
[c01e492a7b40] [c0135db8] kick_all_cpus_sync+0x38/0x50
[c01e492a7b60] [c0048f38] pmdp_huge_get_and_clear+0x48/0x70
[c01e492a7b90] [c023181c] change_huge_pmd+0xac/0x210
[c01e492a7bf0] [c01fb9e8] change_protection+0x678/0x720
[c01e492a7d00] [c0217d38] change_prot_numa+0x28/0xa0
[c01e492a7d30] [c00e0e40] task_numa_work+0x2a0/0x370
[c01e492a7db0] [c00c5fb4] task_work_run+0xe4/0x160
[c01e492a7e00] [c00169a4] do_notify_resume+0x84/0x90
[c01e492a7e30] [c00098b8] ret_from_except_lite+0x64/0x68

OR

[c01def1b7280] [c00ff941d368] 0xc00ff941d368 (unreliable)
[c01def1b7450] [c001512c] __switch_to+0x1fc/0x350
[c01def1b7490] [c01def1b74e0] 0xc01def1b74e0
[c01def1b74e0] [c011a50c] try_to_del_timer_sync+0x5c/0x90
[c01def1b7520] [c011a590] del_timer_sync+0x50/0x70
[c01def1b7550] [c09136fc] schedule_timeout+0x15c/0x2b0
[c01def1b7620] [c0910e6c] wait_for_common+0x12c/0x230
[c01def1b7660] [c00fa22c] up+0x4c/0x80
[c01def1b76a0] [d00016323e60] __mlx4_cmd+0x320/0x940 [mlx4_core]
[c01def1b7760] [c01def1b77a0] 0xc01def1b77a0
[c01def1b77f0] [d000163528b4] mlx4_2RST_QP_wrapper+0x154/0x1e0 
[mlx4_core]
[c01def1b7860] [d00016324934] mlx4_master_process_vhcr+0x1b4/0x6c0 
[mlx4_core]
[c01def1b7930] [d00016324170] __mlx4_cmd+0x630/0x940 [mlx4_core]
[c01def1b79f0] [d00016346fec] __mlx4_qp_modify.constprop.8+0x1ec/0x350 
[mlx4_core]
[c01def1b7ac0] [d00016292228] mlx4_ib_destroy_qp+0xd8/0x5d0 [mlx4_ib]
[c01def1b7b60] [d00013c7305c] ib_destroy_qp+0x1cc/0x290 [ib_core]
[c01def1b7bb0] [d00016284548] 
destroy_pv_resources.isra.14.part.15+0x48/0xf0 [mlx4_ib]
[c01def1b7be0] [d00016284d28] mlx4_ib_tunnels_update+0x168/0x170 
[mlx4_ib]
[c01def1b7c20] [d000162876e0] mlx4_ib_tunnels_update_work+0x30/0x50 
[mlx4_ib]
[c01def1b7c50] [c00c0d34] process_one_work+0x194/0x490
[c01def1b7ce0] [c00c11b0] worker_thread+0x180/0x5a0
[c01def1b7d80] [c00c8a0c] kthread+0x10c/0x130
[c01def1b7e30] [c00095a8] ret_from_kernel_thread+0x5c/0xb4

i.e. may or may not mention mlx4.
The issue may not happen on a first try but maximum on the second.


This is the function I am passing to the guest:
0003:03:00.1 Ethernet controller: Mellanox Technologies MT27500/MT27520 Family 
[ConnectX-3/ConnectX-3 Pro Virtual Function]
Subsystem: IBM Device 61b0
Flags: bus master, fast devsel, latency 0
[virtual] Memory at 3c108000 (64-bit, prefetchable) [size=128M]
Capabilities: access denied
Kernel driver in use: mlx4_core

And ideas? Some patches to try? Thanks!



---
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  | 90 +-
 1 file changed, 3 insertions(+), 87 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c 
b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index 73db584..802eb2a 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -723,9 +723,6 @@ static void update_gid(struct mlx4_dev *dev, struct 
mlx4_cmd_mailbox *inbox,
}
 }
 
-static int handle_counter(struct mlx4_dev *dev, struct mlx4_qp_context *qpc,
-