Re: [PATCH net 00/13] Mellanox 100G mlx5 resiliency and xmit path fixes

2016-07-01 Thread Saeed Mahameed
On Fri, Jul 1, 2016 at 1:14 PM, David Miller  wrote:
> From: Saeed Mahameed 
> Date: Thu, 30 Jun 2016 17:34:37 +0300
>
>> This series provides two set of fixes to the mlx5 driver:
>>   - Resiliency fixes for reset flow and internal pci errors
>>   - xmit path fixes
>
> Series applied to 'net' but expecting all of this to be backported
> to -stable is unreasonable.
>
> PCI error paths, chip resets, timeout handling, etc. and some
> non-trivial changes.  This is not appropriate for -stable at all.
>
> What's relevant for -stable is operational bug fixes for issues
> users will hit in regular operation.
>
> And yes this is partially my own interpreation of what is appropriate
> for -stable.
>
> But if I were to set the precendence of adding all of these to -stable
> it means people will then be able to request similar things in other
> drivers.  We want to avoid having too much grey area stuff backported
> to -stable, there are too many changes going through -stable already.

Understood,

but some fixes are really really critical to us, so i would suggest
new list for -stable

net/mlx5: Avoid calling sleeping function by the health poll thread
net/mlx5: Fix wait_vital for VFs and remove fixed sleep
net/mlx5e: Copy all L2 headers into inline segment
net/mlx5e: Fix select queue callback

Thanks,
Saeed.


Re: [PATCH net 00/13] Mellanox 100G mlx5 resiliency and xmit path fixes

2016-07-01 Thread Saeed Mahameed
On Fri, Jul 1, 2016 at 1:14 PM, David Miller  wrote:
> From: Saeed Mahameed 
> Date: Thu, 30 Jun 2016 17:34:37 +0300
>
>> This series provides two set of fixes to the mlx5 driver:
>>   - Resiliency fixes for reset flow and internal pci errors
>>   - xmit path fixes
>
> Series applied to 'net' but expecting all of this to be backported
> to -stable is unreasonable.
>

Thanks Dave,

One small comment on this series is that it will hit two trivial
conflicts once net is merged into current net-next.

Conflict applying: "net/mlx5e: Timeout if SQ doesn't flush during close":
Fix:
 ---
@@@ -810,12 -802,19 +820,19 @@@ static void mlx5e_close_sq(struct mlx5e
if (mlx5e_sq_has_room_for(sq, 1))
mlx5e_send_nop(sq, true);

-   mlx5e_modify_sq(sq, MLX5_SQC_STATE_RDY, MLX5_SQC_STATE_ERR,
-   false, 0);
 -  err = mlx5e_modify_sq(sq, MLX5_SQC_STATE_RDY,
 -MLX5_SQC_STATE_ERR);

++  err = mlx5e_modify_sq(sq, MLX5_SQC_STATE_RDY,
MLX5_SQC_STATE_ERR,
++false, 0);
+   if (err)
+   set_bit(MLX5E_SQ_STATE_TX_TIMEOUT, >state);

---

Conflict applying: "net/mlx5e: Handle RQ flush in error cases"
Fix:

---
diff --cc drivers/net/ethernet/mellanox/mlx5/core/en.h
index 6db979e,b429591..000
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h

@@@ -214,7 -191,7 +214,8 @@@ struct mlx5e_tstamp
  enum {
MLX5E_RQ_STATE_POST_WQES_ENABLE,
MLX5E_RQ_STATE_UMR_WQE_IN_PROGRESS,
 +MLX5E_RQ_STATE_AM,
+ MLX5E_RQ_STATE_FLUSH_TIMEOUT,
  };

---

Thanks,
Saeed


Re: [PATCH net 00/13] Mellanox 100G mlx5 resiliency and xmit path fixes

2016-07-01 Thread David Miller
From: Saeed Mahameed 
Date: Thu, 30 Jun 2016 17:34:37 +0300

> This series provides two set of fixes to the mlx5 driver:
>   - Resiliency fixes for reset flow and internal pci errors
>   - xmit path fixes

Series applied to 'net' but expecting all of this to be backported
to -stable is unreasonable.

PCI error paths, chip resets, timeout handling, etc. and some
non-trivial changes.  This is not appropriate for -stable at all.

What's relevant for -stable is operational bug fixes for issues
users will hit in regular operation.

And yes this is partially my own interpreation of what is appropriate
for -stable.

But if I were to set the precendence of adding all of these to -stable
it means people will then be able to request similar things in other
drivers.  We want to avoid having too much grey area stuff backported
to -stable, there are too many changes going through -stable already.


[PATCH net 00/13] Mellanox 100G mlx5 resiliency and xmit path fixes

2016-06-30 Thread Saeed Mahameed
Hi Dave,

This series provides two set of fixes to the mlx5 driver:
- Resiliency fixes for reset flow and internal pci errors
- xmit path fixes

Please consider queuing those patches for -stable (4.6).

Reset flow fixes for core driver:
- Add more commands to the list of error simulated commands 
  when pci errors occur
- Avoid calling sleeping function by the health poll thread
- Fix incorrect page count when in internal error
- Fix timeout in wait vital for VFs
- Deadlock fix and Timeout handling in commands interface

Reset flow and resiliency fixes for mlx5e netdev driver:
- Handle RQ flush in error cases
- Implement ndo_tx_timeout callback
- Timeout if SQ doesn't flush during close
- Log link state changes
- Validate BW weight values of ETS

xmit path fixes:
- Fix wrong fallback assumption in select queue callback
- Account for all L2 headers when copying headers into inline segment

Thanks,
Saeed.

Daniel Jurgens (5):
  net/mlx5: Fix incorrect page count when in internal error
  net/mlx5: Fix wait_vital for VFs and remove fixed sleep
  net/mlx5e: Timeout if SQ doesn't flush during close
  net/mlx5e: Implement ndo_tx_timeout callback
  net/mlx5e: Handle RQ flush in error cases

Matthew Finlay (1):
  net/mlx5e: Copy all L2 headers into inline segment

Mohamad Haj Yahia (4):
  net/mlx5: Fix teardown errors that happen in pci error handler
  net/mlx5: Avoid calling sleeping function by the health poll thread
  net/mlx5: Fix potential deadlock in command mode change
  net/mlx5: Add timeout handle to commands with callback

Rana Shahout (2):
  net/mlx5e: Fix select queue callback
  net/mlx5e: Validate BW weight values of ETS

Shaker Daibes (1):
  net/mlx5e: Log link state changes

 drivers/net/ethernet/mellanox/mlx5/core/cmd.c  | 129 -
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  11 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c |   8 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  99 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|  41 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|  52 -
 drivers/net/ethernet/mellanox/mlx5/core/health.c   |  11 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  41 +++
 .../net/ethernet/mellanox/mlx5/core/pagealloc.c|  63 +++---
 include/linux/mlx5/driver.h|   1 +
 10 files changed, 335 insertions(+), 121 deletions(-)

-- 
2.8.0