Re: [PATCH net 00/13] Mellanox 100G mlx5 resiliency and xmit path fixes
On Fri, Jul 1, 2016 at 1:14 PM, David Millerwrote: > From: Saeed Mahameed > Date: Thu, 30 Jun 2016 17:34:37 +0300 > >> This series provides two set of fixes to the mlx5 driver: >> - Resiliency fixes for reset flow and internal pci errors >> - xmit path fixes > > Series applied to 'net' but expecting all of this to be backported > to -stable is unreasonable. > > PCI error paths, chip resets, timeout handling, etc. and some > non-trivial changes. This is not appropriate for -stable at all. > > What's relevant for -stable is operational bug fixes for issues > users will hit in regular operation. > > And yes this is partially my own interpreation of what is appropriate > for -stable. > > But if I were to set the precendence of adding all of these to -stable > it means people will then be able to request similar things in other > drivers. We want to avoid having too much grey area stuff backported > to -stable, there are too many changes going through -stable already. Understood, but some fixes are really really critical to us, so i would suggest new list for -stable net/mlx5: Avoid calling sleeping function by the health poll thread net/mlx5: Fix wait_vital for VFs and remove fixed sleep net/mlx5e: Copy all L2 headers into inline segment net/mlx5e: Fix select queue callback Thanks, Saeed.
Re: [PATCH net 00/13] Mellanox 100G mlx5 resiliency and xmit path fixes
On Fri, Jul 1, 2016 at 1:14 PM, David Millerwrote: > From: Saeed Mahameed > Date: Thu, 30 Jun 2016 17:34:37 +0300 > >> This series provides two set of fixes to the mlx5 driver: >> - Resiliency fixes for reset flow and internal pci errors >> - xmit path fixes > > Series applied to 'net' but expecting all of this to be backported > to -stable is unreasonable. > Thanks Dave, One small comment on this series is that it will hit two trivial conflicts once net is merged into current net-next. Conflict applying: "net/mlx5e: Timeout if SQ doesn't flush during close": Fix: --- @@@ -810,12 -802,19 +820,19 @@@ static void mlx5e_close_sq(struct mlx5e if (mlx5e_sq_has_room_for(sq, 1)) mlx5e_send_nop(sq, true); - mlx5e_modify_sq(sq, MLX5_SQC_STATE_RDY, MLX5_SQC_STATE_ERR, - false, 0); - err = mlx5e_modify_sq(sq, MLX5_SQC_STATE_RDY, -MLX5_SQC_STATE_ERR); ++ err = mlx5e_modify_sq(sq, MLX5_SQC_STATE_RDY, MLX5_SQC_STATE_ERR, ++false, 0); + if (err) + set_bit(MLX5E_SQ_STATE_TX_TIMEOUT, >state); --- Conflict applying: "net/mlx5e: Handle RQ flush in error cases" Fix: --- diff --cc drivers/net/ethernet/mellanox/mlx5/core/en.h index 6db979e,b429591..000 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@@ -214,7 -191,7 +214,8 @@@ struct mlx5e_tstamp enum { MLX5E_RQ_STATE_POST_WQES_ENABLE, MLX5E_RQ_STATE_UMR_WQE_IN_PROGRESS, +MLX5E_RQ_STATE_AM, + MLX5E_RQ_STATE_FLUSH_TIMEOUT, }; --- Thanks, Saeed
Re: [PATCH net 00/13] Mellanox 100G mlx5 resiliency and xmit path fixes
From: Saeed MahameedDate: Thu, 30 Jun 2016 17:34:37 +0300 > This series provides two set of fixes to the mlx5 driver: > - Resiliency fixes for reset flow and internal pci errors > - xmit path fixes Series applied to 'net' but expecting all of this to be backported to -stable is unreasonable. PCI error paths, chip resets, timeout handling, etc. and some non-trivial changes. This is not appropriate for -stable at all. What's relevant for -stable is operational bug fixes for issues users will hit in regular operation. And yes this is partially my own interpreation of what is appropriate for -stable. But if I were to set the precendence of adding all of these to -stable it means people will then be able to request similar things in other drivers. We want to avoid having too much grey area stuff backported to -stable, there are too many changes going through -stable already.
[PATCH net 00/13] Mellanox 100G mlx5 resiliency and xmit path fixes
Hi Dave, This series provides two set of fixes to the mlx5 driver: - Resiliency fixes for reset flow and internal pci errors - xmit path fixes Please consider queuing those patches for -stable (4.6). Reset flow fixes for core driver: - Add more commands to the list of error simulated commands when pci errors occur - Avoid calling sleeping function by the health poll thread - Fix incorrect page count when in internal error - Fix timeout in wait vital for VFs - Deadlock fix and Timeout handling in commands interface Reset flow and resiliency fixes for mlx5e netdev driver: - Handle RQ flush in error cases - Implement ndo_tx_timeout callback - Timeout if SQ doesn't flush during close - Log link state changes - Validate BW weight values of ETS xmit path fixes: - Fix wrong fallback assumption in select queue callback - Account for all L2 headers when copying headers into inline segment Thanks, Saeed. Daniel Jurgens (5): net/mlx5: Fix incorrect page count when in internal error net/mlx5: Fix wait_vital for VFs and remove fixed sleep net/mlx5e: Timeout if SQ doesn't flush during close net/mlx5e: Implement ndo_tx_timeout callback net/mlx5e: Handle RQ flush in error cases Matthew Finlay (1): net/mlx5e: Copy all L2 headers into inline segment Mohamad Haj Yahia (4): net/mlx5: Fix teardown errors that happen in pci error handler net/mlx5: Avoid calling sleeping function by the health poll thread net/mlx5: Fix potential deadlock in command mode change net/mlx5: Add timeout handle to commands with callback Rana Shahout (2): net/mlx5e: Fix select queue callback net/mlx5e: Validate BW weight values of ETS Shaker Daibes (1): net/mlx5e: Log link state changes drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 129 - drivers/net/ethernet/mellanox/mlx5/core/en.h | 11 +- drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 8 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 99 ++-- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c| 41 +++ drivers/net/ethernet/mellanox/mlx5/core/en_tx.c| 52 - drivers/net/ethernet/mellanox/mlx5/core/health.c | 11 +- drivers/net/ethernet/mellanox/mlx5/core/main.c | 41 +++ .../net/ethernet/mellanox/mlx5/core/pagealloc.c| 63 +++--- include/linux/mlx5/driver.h| 1 + 10 files changed, 335 insertions(+), 121 deletions(-) -- 2.8.0