Re: [PATCH net-next] net/mlx4_core: Fix backward compatibility on VFs

2016-03-21 Thread Alexey Kardashevskiy

On 03/22/2016 12:56 AM, Eli Cohen wrote:

On Mon, Mar 21, 2016 at 04:02:16PM +1100, Alexey Kardashevskiy wrote:


After more tries, I found that if for whatever reason mlx4_core
fails to stop while shutting the guest down (last message is
"mlx4_core :00:00.0: mlx4_shutdown was called"), then next time
VF in guest won't start.

Example #1:

mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
mlx4_core: Initializing :00:00.0
mlx4_core :00:00.0: enabling device ( -> 0002)
mlx4_core :00:00.0: Detected virtual function - running in slave mode
mlx4_core :00:00.0: Sending reset
mlx4_core :00:00.0: Sending vhcr0
mlx4_core :00:00.0: HCA minimum page size:1
mlx4_core :00:00.0: UAR size:4096 != kernel PAGE_SIZE of 65536
mlx4_core :00:00.0: Failed to obtain slave caps


Alexey, can you verify that the value of the enable_4k_uar parameter
is false?


aik@fstn1-p1:~$ cat 
/sys/bus/pci/drivers/mlx4_core/module/parameters/enable_4k_uar

N
aik@fstn1-p1:~$







Example #2:

root@le-dbg:~# dhclient eth0
NETDEV WATCHDOG: eth0 (mlx4_core): transmit queue 11 timed out
[ cut here ]
WARNING: at /home/aik/p/guest-kernel/net/sched/sch_generic.c:303

and no IP assigned, timed out.


This is fixed by the guest restart, first restart might not help,
then the second restart will.

The host is running the latest upstream plus the patch I am replying
to. The guest is using initramdisk from debian bootstrap and vanilla
v4.2 kernel, ppc64le arch, POWER8 chip, QEMU is running with 1 CPU
and 2GB of RAM.

Does this look any familiar?



This is completely unrelated to the compatibility problem you reported
and which this patch addresses. We will reproduce in house and post a
fix.



Example #2 is but example #1 mentions "UAR size" :)



--
Alexey


Re: [PATCH net-next] net/mlx4_core: Fix backward compatibility on VFs

2016-03-21 Thread Eli Cohen
On Mon, Mar 21, 2016 at 04:02:16PM +1100, Alexey Kardashevskiy wrote:
> 
> After more tries, I found that if for whatever reason mlx4_core
> fails to stop while shutting the guest down (last message is
> "mlx4_core :00:00.0: mlx4_shutdown was called"), then next time
> VF in guest won't start.
> 
> Example #1:
> 
> mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
> mlx4_core: Initializing :00:00.0
> mlx4_core :00:00.0: enabling device ( -> 0002)
> mlx4_core :00:00.0: Detected virtual function - running in slave mode
> mlx4_core :00:00.0: Sending reset
> mlx4_core :00:00.0: Sending vhcr0
> mlx4_core :00:00.0: HCA minimum page size:1
> mlx4_core :00:00.0: UAR size:4096 != kernel PAGE_SIZE of 65536
> mlx4_core :00:00.0: Failed to obtain slave caps

Alexey, can you verify that the value of the enable_4k_uar parameter
is false?

> 
> Example #2:
> 
> root@le-dbg:~# dhclient eth0
> NETDEV WATCHDOG: eth0 (mlx4_core): transmit queue 11 timed out
> [ cut here ]
> WARNING: at /home/aik/p/guest-kernel/net/sched/sch_generic.c:303
> 
> and no IP assigned, timed out.
> 
> 
> This is fixed by the guest restart, first restart might not help,
> then the second restart will.
> 
> The host is running the latest upstream plus the patch I am replying
> to. The guest is using initramdisk from debian bootstrap and vanilla
> v4.2 kernel, ppc64le arch, POWER8 chip, QEMU is running with 1 CPU
> and 2GB of RAM.
> 
> Does this look any familiar?
>

This is completely unrelated to the compatibility problem you reported
and which this patch addresses. We will reproduce in house and post a
fix.


Re: [PATCH net-next] net/mlx4_core: Fix backward compatibility on VFs

2016-03-20 Thread Alexey Kardashevskiy

On 03/18/2016 08:45 PM, Alexey Kardashevskiy wrote:

On 03/18/2016 03:49 AM, Eli Cohen wrote:

Commit 85743f1eb345 ("net/mlx4_core: Set UAR page size to 4KB regardless
of system page size") introduced dependency where old VF drivers without
this fix fail to load if the PF driver runs with this commit.

To resolve this add a module parameter which disables that functionality
by default.  If both the PF and VFs are running with a driver with that
commit the administrator may set the module param to true.

The module parameter is called enable_4k_uar.

Fixes: 85743f1eb345 ('net/mlx4_core: Set UAR page size to 4KB ...')
Signed-off-by: Eli Cohen 


Thanks!


After more tries, I found that if for whatever reason mlx4_core fails to 
stop while shutting the guest down (last message is "mlx4_core 
:00:00.0: mlx4_shutdown was called"), then next time VF in guest won't 
start.


Example #1:

mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
mlx4_core: Initializing :00:00.0
mlx4_core :00:00.0: enabling device ( -> 0002)
mlx4_core :00:00.0: Detected virtual function - running in slave mode
mlx4_core :00:00.0: Sending reset
mlx4_core :00:00.0: Sending vhcr0
mlx4_core :00:00.0: HCA minimum page size:1
mlx4_core :00:00.0: UAR size:4096 != kernel PAGE_SIZE of 65536
mlx4_core :00:00.0: Failed to obtain slave caps

Example #2:

root@le-dbg:~# dhclient eth0
NETDEV WATCHDOG: eth0 (mlx4_core): transmit queue 11 timed out
[ cut here ]
WARNING: at /home/aik/p/guest-kernel/net/sched/sch_generic.c:303

and no IP assigned, timed out.


This is fixed by the guest restart, first restart might not help, then the 
second restart will.


The host is running the latest upstream plus the patch I am replying to. 
The guest is using initramdisk from debian bootstrap and vanilla v4.2 
kernel, ppc64le arch, POWER8 chip, QEMU is running with 1 CPU and 2GB of RAM.


Does this look any familiar?






Tested-by: Alexey Kardashevskiy 





---
  drivers/net/ethernet/mellanox/mlx4/main.c | 24 ++--
  1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c
b/drivers/net/ethernet/mellanox/mlx4/main.c
index 503ec23e84cc..358f7230da58 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -105,6 +105,11 @@ module_param(enable_64b_cqe_eqe, bool, 0444);
  MODULE_PARM_DESC(enable_64b_cqe_eqe,
   "Enable 64 byte CQEs/EQEs when the FW supports this (default:
True)");

+static bool enable_4k_uar;
+module_param(enable_4k_uar, bool, 0444);
+MODULE_PARM_DESC(enable_4k_uar,
+ "Enable using 4K UAR. Should not be enabled if have VFs which
do not support 4K UARs (default: false)");
+
  #define PF_CONTEXT_BEHAVIOUR_MASK(MLX4_FUNC_CAP_64B_EQE_CQE | \
   MLX4_FUNC_CAP_EQE_CQE_STRIDE | \
   MLX4_FUNC_CAP_DMFS_A0_STATIC)
@@ -423,7 +428,11 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct
mlx4_dev_cap *dev_cap)
  /* Virtual PCI function needs to determine UAR page size from
   * firmware. Only master PCI function can set the uar page size
   */
-dev->uar_page_shift = DEFAULT_UAR_PAGE_SHIFT;
+if (enable_4k_uar)
+dev->uar_page_shift = DEFAULT_UAR_PAGE_SHIFT;
+else
+dev->uar_page_shift = PAGE_SHIFT;
+
  mlx4_set_num_reserved_uars(dev, dev_cap);
  }

@@ -2233,11 +2242,14 @@ static int mlx4_init_hca(struct mlx4_dev *dev)

  dev->caps.max_fmr_maps = (1 << (32 -
ilog2(dev->caps.num_mpts))) - 1;

-/* Always set UAR page size 4KB, set log_uar_sz accordingly */
-init_hca.log_uar_sz = ilog2(dev->caps.num_uars) +
-  PAGE_SHIFT -
-  DEFAULT_UAR_PAGE_SHIFT;
-init_hca.uar_page_sz = DEFAULT_UAR_PAGE_SHIFT - 12;
+if (enable_4k_uar) {
+init_hca.log_uar_sz = ilog2(dev->caps.num_uars) +
+PAGE_SHIFT - DEFAULT_UAR_PAGE_SHIFT;
+init_hca.uar_page_sz = DEFAULT_UAR_PAGE_SHIFT - 12;
+} else {
+init_hca.log_uar_sz = ilog2(dev->caps.num_uars);
+init_hca.uar_page_sz = PAGE_SHIFT - 12;
+}

  init_hca.mw_enabled = 0;
  if (dev->caps.flags & MLX4_DEV_CAP_FLAG_MEM_WINDOW ||







--
Alexey


Re: [PATCH net-next] net/mlx4_core: Fix backward compatibility on VFs

2016-03-20 Thread Or Gerlitz
On Sun, Mar 20, 2016 at 9:07 AM, Yuval Shaia  wrote:
> On Fri, Mar 18, 2016 at 11:11:06PM -0400, David Miller wrote:
>> From: Eli Cohen 
>> Date: Thu, 17 Mar 2016 18:49:42 +0200

>> > Commit 85743f1eb345 ("net/mlx4_core: Set UAR page size to 4KB regardless
>> > of system page size") introduced dependency where old VF drivers without
>> > this fix fail to load if the PF driver runs with this commit.
>> > To resolve this add a module parameter which disables that functionality
>> > by default.  If both the PF and VFs are running with a driver with that
>> > commit the administrator may set the module param to true.
>> > The module parameter is called enable_4k_uar.

> Can you consider passing this via comm-channel and save us all from new
> module parameter?
> Suggesting this from sys-admin perspective where (1) making this consist in
> VF and **all** guests would me a nightmare and also (2) take into account
> in public cloud that hypervisor sys-admin is not necessary the same person
> as guest sys-admin.

AFAIK both modified (e.g containing the offending commit) and
non-modified VF drivers
need not be aware to the fix. It should be  a PF only param, where all types of
VF driver keeps working with their source of info being the comm-channel only.

Eli, Yishai, can you confirm this is the case?

Or.


Re: [PATCH net-next] net/mlx4_core: Fix backward compatibility on VFs

2016-03-20 Thread Yuval Shaia
On Fri, Mar 18, 2016 at 11:11:06PM -0400, David Miller wrote:
> From: Eli Cohen 
> Date: Thu, 17 Mar 2016 18:49:42 +0200
> 
> > Commit 85743f1eb345 ("net/mlx4_core: Set UAR page size to 4KB regardless
> > of system page size") introduced dependency where old VF drivers without
> > this fix fail to load if the PF driver runs with this commit.
> > 
> > To resolve this add a module parameter which disables that functionality
> > by default.  If both the PF and VFs are running with a driver with that
> > commit the administrator may set the module param to true.
> > 
> > The module parameter is called enable_4k_uar.
Can you consider passing this via comm-channel and save us all from new
module parameter?
Suggesting this from sys-admin perspective where (1) making this consist in
VF and **all** guests would me a nightmare and also (2) take into account
in public cloud that hypervisor sys-admin is not necessary the same person
as guest sys-admin.
> > 
> > Fixes: 85743f1eb345 ('net/mlx4_core: Set UAR page size to 4KB ...')
> > Signed-off-by: Eli Cohen 
> 
> Applied.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net/mlx4_core: Fix backward compatibility on VFs

2016-03-19 Thread Alexey Kardashevskiy

On 03/18/2016 03:49 AM, Eli Cohen wrote:

Commit 85743f1eb345 ("net/mlx4_core: Set UAR page size to 4KB regardless
of system page size") introduced dependency where old VF drivers without
this fix fail to load if the PF driver runs with this commit.

To resolve this add a module parameter which disables that functionality
by default.  If both the PF and VFs are running with a driver with that
commit the administrator may set the module param to true.

The module parameter is called enable_4k_uar.

Fixes: 85743f1eb345 ('net/mlx4_core: Set UAR page size to 4KB ...')
Signed-off-by: Eli Cohen 


Thanks!


Tested-by: Alexey Kardashevskiy 





---
  drivers/net/ethernet/mellanox/mlx4/main.c | 24 ++--
  1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index 503ec23e84cc..358f7230da58 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -105,6 +105,11 @@ module_param(enable_64b_cqe_eqe, bool, 0444);
  MODULE_PARM_DESC(enable_64b_cqe_eqe,
 "Enable 64 byte CQEs/EQEs when the FW supports this (default: 
True)");

+static bool enable_4k_uar;
+module_param(enable_4k_uar, bool, 0444);
+MODULE_PARM_DESC(enable_4k_uar,
+"Enable using 4K UAR. Should not be enabled if have VFs which do 
not support 4K UARs (default: false)");
+
  #define PF_CONTEXT_BEHAVIOUR_MASK (MLX4_FUNC_CAP_64B_EQE_CQE | \
 MLX4_FUNC_CAP_EQE_CQE_STRIDE | \
 MLX4_FUNC_CAP_DMFS_A0_STATIC)
@@ -423,7 +428,11 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct 
mlx4_dev_cap *dev_cap)
/* Virtual PCI function needs to determine UAR page size from
 * firmware. Only master PCI function can set the uar page size
 */
-   dev->uar_page_shift = DEFAULT_UAR_PAGE_SHIFT;
+   if (enable_4k_uar)
+   dev->uar_page_shift = DEFAULT_UAR_PAGE_SHIFT;
+   else
+   dev->uar_page_shift = PAGE_SHIFT;
+
mlx4_set_num_reserved_uars(dev, dev_cap);
}

@@ -2233,11 +2242,14 @@ static int mlx4_init_hca(struct mlx4_dev *dev)

dev->caps.max_fmr_maps = (1 << (32 - 
ilog2(dev->caps.num_mpts))) - 1;

-   /* Always set UAR page size 4KB, set log_uar_sz accordingly */
-   init_hca.log_uar_sz = ilog2(dev->caps.num_uars) +
- PAGE_SHIFT -
- DEFAULT_UAR_PAGE_SHIFT;
-   init_hca.uar_page_sz = DEFAULT_UAR_PAGE_SHIFT - 12;
+   if (enable_4k_uar) {
+   init_hca.log_uar_sz = ilog2(dev->caps.num_uars) +
+   PAGE_SHIFT - 
DEFAULT_UAR_PAGE_SHIFT;
+   init_hca.uar_page_sz = DEFAULT_UAR_PAGE_SHIFT - 12;
+   } else {
+   init_hca.log_uar_sz = ilog2(dev->caps.num_uars);
+   init_hca.uar_page_sz = PAGE_SHIFT - 12;
+   }

init_hca.mw_enabled = 0;
if (dev->caps.flags & MLX4_DEV_CAP_FLAG_MEM_WINDOW ||




--
Alexey


[PATCH net-next] net/mlx4_core: Fix backward compatibility on VFs

2016-03-18 Thread Eli Cohen
Commit 85743f1eb345 ("net/mlx4_core: Set UAR page size to 4KB regardless
of system page size") introduced dependency where old VF drivers without
this fix fail to load if the PF driver runs with this commit.

To resolve this add a module parameter which disables that functionality
by default.  If both the PF and VFs are running with a driver with that
commit the administrator may set the module param to true.

The module parameter is called enable_4k_uar.

Fixes: 85743f1eb345 ('net/mlx4_core: Set UAR page size to 4KB ...')
Signed-off-by: Eli Cohen 
---
 drivers/net/ethernet/mellanox/mlx4/main.c | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index 503ec23e84cc..358f7230da58 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -105,6 +105,11 @@ module_param(enable_64b_cqe_eqe, bool, 0444);
 MODULE_PARM_DESC(enable_64b_cqe_eqe,
 "Enable 64 byte CQEs/EQEs when the FW supports this (default: 
True)");
 
+static bool enable_4k_uar;
+module_param(enable_4k_uar, bool, 0444);
+MODULE_PARM_DESC(enable_4k_uar,
+"Enable using 4K UAR. Should not be enabled if have VFs which 
do not support 4K UARs (default: false)");
+
 #define PF_CONTEXT_BEHAVIOUR_MASK  (MLX4_FUNC_CAP_64B_EQE_CQE | \
 MLX4_FUNC_CAP_EQE_CQE_STRIDE | \
 MLX4_FUNC_CAP_DMFS_A0_STATIC)
@@ -423,7 +428,11 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct 
mlx4_dev_cap *dev_cap)
/* Virtual PCI function needs to determine UAR page size from
 * firmware. Only master PCI function can set the uar page size
 */
-   dev->uar_page_shift = DEFAULT_UAR_PAGE_SHIFT;
+   if (enable_4k_uar)
+   dev->uar_page_shift = DEFAULT_UAR_PAGE_SHIFT;
+   else
+   dev->uar_page_shift = PAGE_SHIFT;
+
mlx4_set_num_reserved_uars(dev, dev_cap);
}
 
@@ -2233,11 +2242,14 @@ static int mlx4_init_hca(struct mlx4_dev *dev)
 
dev->caps.max_fmr_maps = (1 << (32 - 
ilog2(dev->caps.num_mpts))) - 1;
 
-   /* Always set UAR page size 4KB, set log_uar_sz accordingly */
-   init_hca.log_uar_sz = ilog2(dev->caps.num_uars) +
- PAGE_SHIFT -
- DEFAULT_UAR_PAGE_SHIFT;
-   init_hca.uar_page_sz = DEFAULT_UAR_PAGE_SHIFT - 12;
+   if (enable_4k_uar) {
+   init_hca.log_uar_sz = ilog2(dev->caps.num_uars) +
+   PAGE_SHIFT - 
DEFAULT_UAR_PAGE_SHIFT;
+   init_hca.uar_page_sz = DEFAULT_UAR_PAGE_SHIFT - 12;
+   } else {
+   init_hca.log_uar_sz = ilog2(dev->caps.num_uars);
+   init_hca.uar_page_sz = PAGE_SHIFT - 12;
+   }
 
init_hca.mw_enabled = 0;
if (dev->caps.flags & MLX4_DEV_CAP_FLAG_MEM_WINDOW ||
-- 
1.8.3.1



Re: [PATCH net-next] net/mlx4_core: Fix backward compatibility on VFs

2016-03-18 Thread David Miller
From: Eli Cohen 
Date: Thu, 17 Mar 2016 18:49:42 +0200

> Commit 85743f1eb345 ("net/mlx4_core: Set UAR page size to 4KB regardless
> of system page size") introduced dependency where old VF drivers without
> this fix fail to load if the PF driver runs with this commit.
> 
> To resolve this add a module parameter which disables that functionality
> by default.  If both the PF and VFs are running with a driver with that
> commit the administrator may set the module param to true.
> 
> The module parameter is called enable_4k_uar.
> 
> Fixes: 85743f1eb345 ('net/mlx4_core: Set UAR page size to 4KB ...')
> Signed-off-by: Eli Cohen 

Applied.