Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters
On Thu, Dec 1, 2016 at 7:36 PM, Eric Dumazet wrote: > On Thu, 2016-12-01 at 08:08 -0800, Eric Dumazet wrote: >> On Thu, 2016-12-01 at 07:55 -0800, Eric Dumazet wrote: >> >> > So removing the spinlock is doable, but needs to add a new parameter >> > to mlx4_en_fold_software_stats() and call netdev_stats_to_stats64() >> > before mlx4_en_fold_software_stats(dev) >> >> Untested patch would be : >> >> drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |2 - >> drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 10 + >> drivers/net/ethernet/mellanox/mlx4/en_port.c| 24 +- >> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h|3 + >> 4 files changed, 23 insertions(+), 16 deletions(-) > > The patch is wrong, since priv->port_up could change to false while we > are running and using the about to be deleted tx/rx rings. > Right, hence the regression Jesper saw ;). > > So the only safe thing to do is to remove the _bh suffix. > > Not worth trying to avoid taking a spinlock in this code. > Ack.
Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters
On Thu, Dec 1, 2016 at 7:08 PM, Eric Dumazet wrote: > On Thu, 2016-12-01 at 18:33 +0200, Saeed Mahameed wrote: > >> Thanks for the detailed answer !! > > You're welcome. > >> >> BTW you went 5 steps ahead of my original question :)), so far you >> already have a patch without locking at all (really impressive). >> >> What i wanted to ask originally, was regarding the "_bh", i didn't >> mean to completely remove the "spin_lock_bh", >> I meant, what happens if we replace "spin_lock_bh" with "spin_lock", >> without disabling bh ? >> I gues raw "sping_lock" handles points (2 to 4) from above, but it >> won't handle long irqs. > > Thats a very good point, the _bh prefix can totally be removed, since > stats_lock is only acquired from process context. > > That was my initial point, Thanks for the help. will provide a fix patch later once 4.9 is release.
Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters
On Thu, 2016-12-01 at 08:08 -0800, Eric Dumazet wrote: > On Thu, 2016-12-01 at 07:55 -0800, Eric Dumazet wrote: > > > So removing the spinlock is doable, but needs to add a new parameter > > to mlx4_en_fold_software_stats() and call netdev_stats_to_stats64() > > before mlx4_en_fold_software_stats(dev) > > Untested patch would be : > > drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |2 - > drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 10 + > drivers/net/ethernet/mellanox/mlx4/en_port.c| 24 +- > drivers/net/ethernet/mellanox/mlx4/mlx4_en.h|3 + > 4 files changed, 23 insertions(+), 16 deletions(-) The patch is wrong, since priv->port_up could change to false while we are running and using the about to be deleted tx/rx rings. So the only safe thing to do is to remove the _bh suffix. Not worth trying to avoid taking a spinlock in this code.
Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters
On Thu, 2016-12-01 at 18:33 +0200, Saeed Mahameed wrote: > Thanks for the detailed answer !! You're welcome. > > BTW you went 5 steps ahead of my original question :)), so far you > already have a patch without locking at all (really impressive). > > What i wanted to ask originally, was regarding the "_bh", i didn't > mean to completely remove the "spin_lock_bh", > I meant, what happens if we replace "spin_lock_bh" with "spin_lock", > without disabling bh ? > I gues raw "sping_lock" handles points (2 to 4) from above, but it > won't handle long irqs. Thats a very good point, the _bh prefix can totally be removed, since stats_lock is only acquired from process context.
Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters
On Thu, Dec 1, 2016 at 5:55 PM, Eric Dumazet wrote: > On Thu, 2016-12-01 at 17:38 +0200, Saeed Mahameed wrote: > >> >> Hi Eric, Thanks for the patch, I already acked it. > > Thanks ! > >> >> I have one educational question (not related to this patch, but >> related to stats reading in general). >> I was wondering why do we need to disable bh every time we read stats >> "spin_lock_bh" ? is it essential ? >> >> I checked and in mlx4 we don't hold stats_lock in softirq >> (en_rx.c/en_tx.c), so I don't see any deadlock risk in here.. > > Excellent question, and I chose to keep the spinlock. > > That would be doable, only if we do not overwrite dev->stats. > > Current code is : > > static struct rtnl_link_stats64 * > mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) > { > struct mlx4_en_priv *priv = netdev_priv(dev); > > spin_lock_bh(&priv->stats_lock); > mlx4_en_fold_software_stats(dev); > netdev_stats_to_stats64(stats, &dev->stats); > spin_unlock_bh(&priv->stats_lock); > > return stats; > } > > If you remove the spin_lock_bh() : > > > static struct rtnl_link_stats64 * > mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) > { > struct mlx4_en_priv *priv = netdev_priv(dev); > > mlx4_en_fold_software_stats(dev); // possible races > > netdev_stats_to_stats64(stats, &dev->stats); > > return stats; > } > > 1) one mlx4_en_fold_software_stats(dev) could be preempted > on a CONFIG_PREEMPT kernel, or interrupted by long irqs. > > 2) Another cpu would also call mlx4_en_fold_software_stats(dev) while >first cpu is busy. > > 3) Then when resuming first cpu/thread, part of the dev->stats fieds > would be updated with 'old counters', > while another thread might have updated them with newer values. > > 4) A SNMP reader could then get counters that are not monotonically > increasing, > which would be confusing/buggy. > > So removing the spinlock is doable, but needs to add a new parameter > to mlx4_en_fold_software_stats() and call netdev_stats_to_stats64() > before mlx4_en_fold_software_stats(dev) > > static struct rtnl_link_stats64 * > mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) > { > struct mlx4_en_priv *priv = netdev_priv(dev); > > netdev_stats_to_stats64(stats, &dev->stats); > > // Passing a non NULL stats asks mlx4_en_fold_software_stats() > // to not update dev->stats, but stats directly. > > mlx4_en_fold_software_stats(dev, stats) > > > return stats; > } > > Thanks for the detailed answer !! BTW you went 5 steps ahead of my original question :)), so far you already have a patch without locking at all (really impressive). What i wanted to ask originally, was regarding the "_bh", i didn't mean to completely remove the "spin_lock_bh", I meant, what happens if we replace "spin_lock_bh" with "spin_lock", without disabling bh ? I gues raw "sping_lock" handles points (2 to 4) from above, but it won't handle long irqs.
Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters
On Thu, 2016-12-01 at 07:55 -0800, Eric Dumazet wrote: > So removing the spinlock is doable, but needs to add a new parameter > to mlx4_en_fold_software_stats() and call netdev_stats_to_stats64() > before mlx4_en_fold_software_stats(dev) Untested patch would be : drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |2 - drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 10 + drivers/net/ethernet/mellanox/mlx4/en_port.c| 24 +- drivers/net/ethernet/mellanox/mlx4/mlx4_en.h|3 + 4 files changed, 23 insertions(+), 16 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c index d9c9f86a30df953fa555934c5406057dcaf28960..676050e352703cebe7fcaa5202a06496f7a5a0df 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c @@ -367,7 +367,7 @@ static void mlx4_en_get_ethtool_stats(struct net_device *dev, spin_lock_bh(&priv->stats_lock); - mlx4_en_fold_software_stats(dev); + mlx4_en_fold_software_stats(dev, NULL); for (i = 0; i < NUM_MAIN_STATS; i++, bitmap_iterator_inc(&it)) if (bitmap_iterator_test(&it)) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c index 091b904262bc7932d3edf99cf850affb23b9ce6e..6ee9e31e59c392cb88faedf9c541b3bc6d195228 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c @@ -1321,13 +1321,9 @@ static void mlx4_en_tx_timeout(struct net_device *dev) static struct rtnl_link_stats64 * mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) { - struct mlx4_en_priv *priv = netdev_priv(dev); - - spin_lock_bh(&priv->stats_lock); - mlx4_en_fold_software_stats(dev); netdev_stats_to_stats64(stats, &dev->stats); - spin_unlock_bh(&priv->stats_lock); - + /* Must be called after netdev_stats_to_stats64() */ + mlx4_en_fold_software_stats(dev, stats); return stats; } @@ -1810,7 +1806,7 @@ void mlx4_en_stop_port(struct net_device *dev, int detach) netif_tx_disable(dev); spin_lock_bh(&priv->stats_lock); - mlx4_en_fold_software_stats(dev); + mlx4_en_fold_software_stats(dev, NULL); /* Set port as not active */ priv->port_up = false; spin_unlock_bh(&priv->stats_lock); diff --git a/drivers/net/ethernet/mellanox/mlx4/en_port.c b/drivers/net/ethernet/mellanox/mlx4/en_port.c index 9166d90e732858610b1407fe85cbf6cbe27f5e0b..eea042a18e3cfba62745ece4ca673c2db967b9aa 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_port.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_port.c @@ -147,7 +147,8 @@ static unsigned long en_stats_adder(__be64 *start, __be64 *next, int num) return ret; } -void mlx4_en_fold_software_stats(struct net_device *dev) +void mlx4_en_fold_software_stats(struct net_device *dev, +struct rtnl_link_stats64 *stats) { struct mlx4_en_priv *priv = netdev_priv(dev); struct mlx4_en_dev *mdev = priv->mdev; @@ -165,9 +166,13 @@ void mlx4_en_fold_software_stats(struct net_device *dev) packets += READ_ONCE(ring->packets); bytes += READ_ONCE(ring->bytes); } - dev->stats.rx_packets = packets; - dev->stats.rx_bytes = bytes; - + if (stats) { + stats->rx_packets = packets; + stats->rx_bytes = bytes; + } else { + dev->stats.rx_packets = packets; + dev->stats.rx_bytes = bytes; + } packets = 0; bytes = 0; for (i = 0; i < priv->tx_ring_num[TX]; i++) { @@ -176,8 +181,13 @@ void mlx4_en_fold_software_stats(struct net_device *dev) packets += READ_ONCE(ring->packets); bytes += READ_ONCE(ring->bytes); } - dev->stats.tx_packets = packets; - dev->stats.tx_bytes = bytes; + if (stats) { + stats->tx_packets = packets; + stats->tx_bytes = bytes; + } else { + dev->stats.tx_packets = packets; + dev->stats.tx_bytes = bytes; + } } int mlx4_en_DUMP_ETH_STATS(struct mlx4_en_dev *mdev, u8 port, u8 reset) @@ -208,7 +218,7 @@ int mlx4_en_DUMP_ETH_STATS(struct mlx4_en_dev *mdev, u8 port, u8 reset) spin_lock_bh(&priv->stats_lock); - mlx4_en_fold_software_stats(dev); + mlx4_en_fold_software_stats(dev, NULL); priv->port_stats.rx_chksum_good = 0; priv->port_stats.rx_chksum_none = 0; diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h index 20a936428f4a44c8ca0a7161855da310f9166b50..92dbb41f425b282e9ab7c8d534f091da0ba661c3 100644 --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h @@ -755,7 +755,8 @@ void mlx
Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters
On Thu, 2016-12-01 at 17:38 +0200, Saeed Mahameed wrote: > > Hi Eric, Thanks for the patch, I already acked it. Thanks ! > > I have one educational question (not related to this patch, but > related to stats reading in general). > I was wondering why do we need to disable bh every time we read stats > "spin_lock_bh" ? is it essential ? > > I checked and in mlx4 we don't hold stats_lock in softirq > (en_rx.c/en_tx.c), so I don't see any deadlock risk in here.. Excellent question, and I chose to keep the spinlock. That would be doable, only if we do not overwrite dev->stats. Current code is : static struct rtnl_link_stats64 * mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) { struct mlx4_en_priv *priv = netdev_priv(dev); spin_lock_bh(&priv->stats_lock); mlx4_en_fold_software_stats(dev); netdev_stats_to_stats64(stats, &dev->stats); spin_unlock_bh(&priv->stats_lock); return stats; } If you remove the spin_lock_bh() : static struct rtnl_link_stats64 * mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) { struct mlx4_en_priv *priv = netdev_priv(dev); mlx4_en_fold_software_stats(dev); // possible races netdev_stats_to_stats64(stats, &dev->stats); return stats; } 1) one mlx4_en_fold_software_stats(dev) could be preempted on a CONFIG_PREEMPT kernel, or interrupted by long irqs. 2) Another cpu would also call mlx4_en_fold_software_stats(dev) while first cpu is busy. 3) Then when resuming first cpu/thread, part of the dev->stats fieds would be updated with 'old counters', while another thread might have updated them with newer values. 4) A SNMP reader could then get counters that are not monotonically increasing, which would be confusing/buggy. So removing the spinlock is doable, but needs to add a new parameter to mlx4_en_fold_software_stats() and call netdev_stats_to_stats64() before mlx4_en_fold_software_stats(dev) static struct rtnl_link_stats64 * mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) { struct mlx4_en_priv *priv = netdev_priv(dev); netdev_stats_to_stats64(stats, &dev->stats); // Passing a non NULL stats asks mlx4_en_fold_software_stats() // to not update dev->stats, but stats directly. mlx4_en_fold_software_stats(dev, stats) return stats; }
Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters
On Wed, Nov 30, 2016 at 11:00 PM, Eric Dumazet wrote: > On Wed, 2016-11-30 at 22:42 +0200, Saeed Mahameed wrote: >> On Wed, Nov 30, 2016 at 7:35 PM, Eric Dumazet wrote: >> > On Wed, 2016-11-30 at 18:46 +0200, Saeed Mahameed wrote: >> > >> >> we had/still have the proper stats they are the ones that >> >> mlx4_en_fold_software_stats is trying to cache into (they always >> >> exist), >> >> but the ones that you are trying to read from (the mlx4 rings) are gone ! >> >> >> >> This bug is totally new and as i warned, this is another symptom of >> >> the real root cause (can't sleep while reading stats). >> >> >> >> Eric what do you suggest ? Keep pre-allocated MAX_RINGS stats and >> >> always iterate over all of them to query stats ? >> >> what if you have one ring/none/1K ? how would you know how many to query ? >> > >> > I am suggesting I will fix the bug I introduced. >> > >> > Do not panic. >> > >> > >> >> Not at all, I trust you are the only one who is capable of providing >> the best solution. >> I am just trying to read your mind :-). >> >> As i said i like the solution and i want to adapt it to mlx5, so I am >> a little bit enthusiastic :) > > What about the following fix guys ? > > As a bonus we update the stats right before they are sent to monitors > via rtnetlink ;) Hi Eric, Thanks for the patch, I already acked it. I have one educational question (not related to this patch, but related to stats reading in general). I was wondering why do we need to disable bh every time we read stats "spin_lock_bh" ? is it essential ? I checked and in mlx4 we don't hold stats_lock in softirq (en_rx.c/en_tx.c), so I don't see any deadlock risk in here.. Thanks Saeed.
Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters
On Wed, 30 Nov 2016 13:00:52 -0800 Eric Dumazet wrote: > On Wed, 2016-11-30 at 22:42 +0200, Saeed Mahameed wrote: > > On Wed, Nov 30, 2016 at 7:35 PM, Eric Dumazet > > wrote: > > > On Wed, 2016-11-30 at 18:46 +0200, Saeed Mahameed wrote: [...] > > > > > > I am suggesting I will fix the bug I introduced. > > > > > > Do not panic. > > > > > > > > > > Not at all, I trust you are the only one who is capable of providing > > the best solution. > > I am just trying to read your mind :-). > > > > As i said i like the solution and i want to adapt it to mlx5, so I am > > a little bit enthusiastic :) > > What about the following fix guys ? Confirming this fixed the crash on shutdown for me, thanks! > As a bonus we update the stats right before they are sent to monitors > via rtnetlink ;) > > > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c > b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c > index > 12ea3405f442717478bf0e8882edaf0de77986cb..091b904262bc7932d3edf99cf850affb23b9ce6e > 100644 > --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c > +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c > @@ -1809,8 +1809,12 @@ void mlx4_en_stop_port(struct net_device *dev, int > detach) > > netif_tx_disable(dev); > > + spin_lock_bh(&priv->stats_lock); > + mlx4_en_fold_software_stats(dev); > /* Set port as not active */ > priv->port_up = false; > + spin_unlock_bh(&priv->stats_lock); > + > priv->counter_index = MLX4_SINK_COUNTER_INDEX(mdev->dev); > > /* Promsicuous mode */ > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_port.c > b/drivers/net/ethernet/mellanox/mlx4/en_port.c > index > c6c4f1238923e09eced547454b86c68720292859..9166d90e732858610b1407fe85cbf6cbe27f5e0b > 100644 > --- a/drivers/net/ethernet/mellanox/mlx4/en_port.c > +++ b/drivers/net/ethernet/mellanox/mlx4/en_port.c > @@ -154,7 +154,7 @@ void mlx4_en_fold_software_stats(struct net_device *dev) > unsigned long packets, bytes; > int i; > > - if (mlx4_is_master(mdev->dev)) > + if (!priv->port_up || mlx4_is_master(mdev->dev)) > return; > > packets = 0; > > -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters
On Wed, 2016-11-30 at 22:42 +0200, Saeed Mahameed wrote: > On Wed, Nov 30, 2016 at 7:35 PM, Eric Dumazet wrote: > > On Wed, 2016-11-30 at 18:46 +0200, Saeed Mahameed wrote: > > > >> we had/still have the proper stats they are the ones that > >> mlx4_en_fold_software_stats is trying to cache into (they always > >> exist), > >> but the ones that you are trying to read from (the mlx4 rings) are gone ! > >> > >> This bug is totally new and as i warned, this is another symptom of > >> the real root cause (can't sleep while reading stats). > >> > >> Eric what do you suggest ? Keep pre-allocated MAX_RINGS stats and > >> always iterate over all of them to query stats ? > >> what if you have one ring/none/1K ? how would you know how many to query ? > > > > I am suggesting I will fix the bug I introduced. > > > > Do not panic. > > > > > > Not at all, I trust you are the only one who is capable of providing > the best solution. > I am just trying to read your mind :-). > > As i said i like the solution and i want to adapt it to mlx5, so I am > a little bit enthusiastic :) What about the following fix guys ? As a bonus we update the stats right before they are sent to monitors via rtnetlink ;) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c index 12ea3405f442717478bf0e8882edaf0de77986cb..091b904262bc7932d3edf99cf850affb23b9ce6e 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c @@ -1809,8 +1809,12 @@ void mlx4_en_stop_port(struct net_device *dev, int detach) netif_tx_disable(dev); + spin_lock_bh(&priv->stats_lock); + mlx4_en_fold_software_stats(dev); /* Set port as not active */ priv->port_up = false; + spin_unlock_bh(&priv->stats_lock); + priv->counter_index = MLX4_SINK_COUNTER_INDEX(mdev->dev); /* Promsicuous mode */ diff --git a/drivers/net/ethernet/mellanox/mlx4/en_port.c b/drivers/net/ethernet/mellanox/mlx4/en_port.c index c6c4f1238923e09eced547454b86c68720292859..9166d90e732858610b1407fe85cbf6cbe27f5e0b 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_port.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_port.c @@ -154,7 +154,7 @@ void mlx4_en_fold_software_stats(struct net_device *dev) unsigned long packets, bytes; int i; - if (mlx4_is_master(mdev->dev)) + if (!priv->port_up || mlx4_is_master(mdev->dev)) return; packets = 0;
Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters
On Wed, Nov 30, 2016 at 7:35 PM, Eric Dumazet wrote: > On Wed, 2016-11-30 at 18:46 +0200, Saeed Mahameed wrote: > >> we had/still have the proper stats they are the ones that >> mlx4_en_fold_software_stats is trying to cache into (they always >> exist), >> but the ones that you are trying to read from (the mlx4 rings) are gone ! >> >> This bug is totally new and as i warned, this is another symptom of >> the real root cause (can't sleep while reading stats). >> >> Eric what do you suggest ? Keep pre-allocated MAX_RINGS stats and >> always iterate over all of them to query stats ? >> what if you have one ring/none/1K ? how would you know how many to query ? > > I am suggesting I will fix the bug I introduced. > > Do not panic. > > Not at all, I trust you are the only one who is capable of providing the best solution. I am just trying to read your mind :-). As i said i like the solution and i want to adapt it to mlx5, so I am a little bit enthusiastic :) Thanks.
Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters
On Wed, 2016-11-30 at 18:46 +0200, Saeed Mahameed wrote: > we had/still have the proper stats they are the ones that > mlx4_en_fold_software_stats is trying to cache into (they always > exist), > but the ones that you are trying to read from (the mlx4 rings) are gone ! > > This bug is totally new and as i warned, this is another symptom of > the real root cause (can't sleep while reading stats). > > Eric what do you suggest ? Keep pre-allocated MAX_RINGS stats and > always iterate over all of them to query stats ? > what if you have one ring/none/1K ? how would you know how many to query ? I am suggesting I will fix the bug I introduced. Do not panic.
Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters
On Wed, Nov 30, 2016 at 5:58 PM, Eric Dumazet wrote: > On Wed, 2016-11-30 at 15:08 +0100, Jesper Dangaard Brouer wrote: >> On Fri, 25 Nov 2016 07:46:20 -0800 Eric Dumazet >> wrote: >> >> > From: Eric Dumazet >> >> Ended up-in net-next as: >> >> commit 40931b85113dad7881d49e8759e5ad41d30a5e6c >> Author: Eric Dumazet >> Date: Fri Nov 25 07:46:20 2016 -0800 >> >> mlx4: give precise rx/tx bytes/packets counters >> >> mlx4 stats are chaotic because a deferred work queue is responsible >> to update them every 250 ms. >> >> Likely after this patch I get this crash (below), when rebooting my machine. >> Looks like a device removal order thing. >> Tested with net-next at commit 93ba5504. >> >> [...] >> [ 1967.248453] mlx5_core :02:00.1: Shutdown was called >> [ 1967.854556] mlx5_core :02:00.0: Shutdown was called >> [ 1968.443015] e1000e: EEE TX LPI TIMER: 0011 >> [ 1968.484676] sd 3:0:0:0: [sda] Synchronizing SCSI cache >> [ 1968.528354] mlx4_core :01:00.0: mlx4_shutdown was called >> [ 1968.534054] mlx4_en: mlx4p1: Close port called >> [ 1968.571156] mlx4_en :01:00.0: removed PHC >> [ 1968.575677] mlx4_en: mlx4p2: Close port called >> [ 1969.506602] BUG: unable to handle kernel NULL pointer dereference at >> 0d08 >> [ 1969.514530] IP: [] >> mlx4_en_fold_software_stats.part.1+0x34/0xb0 [mlx4_en] >> [ 1969.522963] PGD 0 [ 1969.524803] >> [ 1969.526332] Oops: [#1] PREEMPT SMP >> [ 1969.530201] Modules linked in: coretemp kvm_intel kvm irqbypass >> intel_cstate mxm_wmi i2c_i801 intel_rapl_perf i2c_smbus sg pcspkr i2c_core >> shpchp nfsd wmi video acpi_pad auth_rpcgss oid_registry nfs_acl lockd grace >> sunrpc ip_tables x_tables mlx4_en e1000e mlx5_core ptp serio_raw sd_mod >> mlx4_core pps_core devlink hid_generic >> [ 1969.559616] CPU: 3 PID: 3104 Comm: kworker/3:1 Not tainted >> 4.9.0-rc6-net-next3-01390-g93ba5504 #12 >> [ 1969.568984] Hardware name: To Be Filled By O.E.M. To Be Filled By >> O.E.M./Z97 Extreme4, BIOS P2.10 05/12/2015 >> [ 1969.578877] Workqueue: events linkwatch_event >> [ 1969.583285] task: 8803f42a task.stack: 88040b2d >> [ 1969.589238] RIP: 0010:[] [] >> mlx4_en_fold_software_stats.part.1+0x34/0xb0 [mlx4_en] >> [ 1969.600102] RSP: 0018:88040b2d3bd8 EFLAGS: 00010282 >> [ 1969.605442] RAX: 8803f432efc8 RBX: 8803f432 RCX: >> >> [ 1969.612604] RDX: RSI: RDI: >> 8803f432 >> [ 1969.619772] RBP: 88040b2d3bd8 R08: 000c R09: >> 8803f432f000 >> [ 1969.626938] R10: R11: 88040d64ac00 R12: >> 8803e5aff8dc >> [ 1969.634104] R13: 8803f4320a28 R14: 8803e5aff800 R15: >> >> [ 1969.641273] FS: () GS:88041fac() >> knlGS: >> [ 1969.649422] CS: 0010 DS: ES: CR0: 80050033 >> [ 1969.655197] CR2: 0d08 CR3: 01c07000 CR4: >> 001406e0 >> [ 1969.662366] Stack: >> [ 1969.664412] 88040b2d3be8 a0127f8e 88040b2d3c10 >> a012a23b >> [ 1969.671948] 8803e5aff8dc 8803f432 8803f432 >> 88040b2d3c30 >> [ 1969.679478] 8160ae29 8803e5aff8d8 8804088ff300 >> 88040b2d3c58 >> [ 1969.687001] Call Trace: >> [ 1969.689484] [] mlx4_en_fold_software_stats+0x1e/0x20 >> [mlx4_en] >> [ 1969.697026] [] mlx4_en_get_stats64+0x2b/0x50 [mlx4_en] >> [ 1969.703844] [] dev_get_stats+0x39/0xa0 >> [ 1969.709274] [] rtnl_fill_stats+0x40/0x130 >> [ 1969.714968] [] rtnl_fill_ifinfo+0x55b/0x1010 >> [ 1969.720921] [] rtmsg_ifinfo_build_skb+0x73/0xd0 >> [ 1969.727136] [] rtmsg_ifinfo.part.25+0x16/0x50 >> [ 1969.733176] [] rtmsg_ifinfo+0x18/0x20 >> [ 1969.738522] [] netdev_state_change+0x47/0x50 >> [ 1969.744478] [] linkwatch_do_dev+0x38/0x50 >> [ 1969.750170] [] __linkwatch_run_queue+0xe7/0x160 >> [ 1969.756385] [] linkwatch_event+0x25/0x30 >> [ 1969.761991] [] process_one_work+0x15b/0x460 >> [ 1969.767857] [] worker_thread+0x4e/0x480 >> [ 1969.773378] [] ? process_one_work+0x460/0x460 >> [ 1969.779420] [] ? process_one_work+0x460/0x460 >> [ 1969.785460] [] kthread+0xca/0xe0 >> [ 1969.790372] [] ? kthread_worker_fn+0x120/0x120 >> [ 1969.796495] [] ret_from_fork+0x22/0x30 >> [ 1969.801924] Code: 00 00 55 48 89 e5 85 d2 0f 84 90 00 00 00 83 ea 01 31 >> c9 31 f6 48 8d 87 c0 ef 00 00 4c 8d 8c d7 c8 ef 00 00 48 8b 10 48 83 c0 08 >> <4c> 8b 82 08 0d 00 00 48 8b 92 00 0d 00 00 4c 01 c6 48 01 d1 4c >> [ 1969.821969] RIP [] >> mlx4_en_fold_software_stats.part.1+0x34/0xb0 [mlx4_en] >> [ 1969.830486] RSP >> [ 1969.834002] CR2: 0d08 >> [ 1969.837440] ---[ end trace 80b9fbc1e7baed9b ]--- >> [ 1969.842102] Kernel panic - not syncing: Fatal exception in interrupt >> [ 1969.848520] Kernel Offset: disabled >> [ 1969.852050] ---[ end Kernel panic - not syncing: Fatal exception in >> interrupt >> (END) > > Hi Jesper. > > Thanks for the r
Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters
On Wed, 2016-11-30 at 15:08 +0100, Jesper Dangaard Brouer wrote: > On Fri, 25 Nov 2016 07:46:20 -0800 Eric Dumazet > wrote: > > > From: Eric Dumazet > > Ended up-in net-next as: > > commit 40931b85113dad7881d49e8759e5ad41d30a5e6c > Author: Eric Dumazet > Date: Fri Nov 25 07:46:20 2016 -0800 > > mlx4: give precise rx/tx bytes/packets counters > > mlx4 stats are chaotic because a deferred work queue is responsible > to update them every 250 ms. > > Likely after this patch I get this crash (below), when rebooting my machine. > Looks like a device removal order thing. > Tested with net-next at commit 93ba5504. > > [...] > [ 1967.248453] mlx5_core :02:00.1: Shutdown was called > [ 1967.854556] mlx5_core :02:00.0: Shutdown was called > [ 1968.443015] e1000e: EEE TX LPI TIMER: 0011 > [ 1968.484676] sd 3:0:0:0: [sda] Synchronizing SCSI cache > [ 1968.528354] mlx4_core :01:00.0: mlx4_shutdown was called > [ 1968.534054] mlx4_en: mlx4p1: Close port called > [ 1968.571156] mlx4_en :01:00.0: removed PHC > [ 1968.575677] mlx4_en: mlx4p2: Close port called > [ 1969.506602] BUG: unable to handle kernel NULL pointer dereference at > 0d08 > [ 1969.514530] IP: [] > mlx4_en_fold_software_stats.part.1+0x34/0xb0 [mlx4_en] > [ 1969.522963] PGD 0 [ 1969.524803] > [ 1969.526332] Oops: [#1] PREEMPT SMP > [ 1969.530201] Modules linked in: coretemp kvm_intel kvm irqbypass > intel_cstate mxm_wmi i2c_i801 intel_rapl_perf i2c_smbus sg pcspkr i2c_core > shpchp nfsd wmi video acpi_pad auth_rpcgss oid_registry nfs_acl lockd grace > sunrpc ip_tables x_tables mlx4_en e1000e mlx5_core ptp serio_raw sd_mod > mlx4_core pps_core devlink hid_generic > [ 1969.559616] CPU: 3 PID: 3104 Comm: kworker/3:1 Not tainted > 4.9.0-rc6-net-next3-01390-g93ba5504 #12 > [ 1969.568984] Hardware name: To Be Filled By O.E.M. To Be Filled By > O.E.M./Z97 Extreme4, BIOS P2.10 05/12/2015 > [ 1969.578877] Workqueue: events linkwatch_event > [ 1969.583285] task: 8803f42a task.stack: 88040b2d > [ 1969.589238] RIP: 0010:[] [] > mlx4_en_fold_software_stats.part.1+0x34/0xb0 [mlx4_en] > [ 1969.600102] RSP: 0018:88040b2d3bd8 EFLAGS: 00010282 > [ 1969.605442] RAX: 8803f432efc8 RBX: 8803f432 RCX: > > [ 1969.612604] RDX: RSI: RDI: > 8803f432 > [ 1969.619772] RBP: 88040b2d3bd8 R08: 000c R09: > 8803f432f000 > [ 1969.626938] R10: R11: 88040d64ac00 R12: > 8803e5aff8dc > [ 1969.634104] R13: 8803f4320a28 R14: 8803e5aff800 R15: > > [ 1969.641273] FS: () GS:88041fac() > knlGS: > [ 1969.649422] CS: 0010 DS: ES: CR0: 80050033 > [ 1969.655197] CR2: 0d08 CR3: 01c07000 CR4: > 001406e0 > [ 1969.662366] Stack: > [ 1969.664412] 88040b2d3be8 a0127f8e 88040b2d3c10 > a012a23b > [ 1969.671948] 8803e5aff8dc 8803f432 8803f432 > 88040b2d3c30 > [ 1969.679478] 8160ae29 8803e5aff8d8 8804088ff300 > 88040b2d3c58 > [ 1969.687001] Call Trace: > [ 1969.689484] [] mlx4_en_fold_software_stats+0x1e/0x20 > [mlx4_en] > [ 1969.697026] [] mlx4_en_get_stats64+0x2b/0x50 [mlx4_en] > [ 1969.703844] [] dev_get_stats+0x39/0xa0 > [ 1969.709274] [] rtnl_fill_stats+0x40/0x130 > [ 1969.714968] [] rtnl_fill_ifinfo+0x55b/0x1010 > [ 1969.720921] [] rtmsg_ifinfo_build_skb+0x73/0xd0 > [ 1969.727136] [] rtmsg_ifinfo.part.25+0x16/0x50 > [ 1969.733176] [] rtmsg_ifinfo+0x18/0x20 > [ 1969.738522] [] netdev_state_change+0x47/0x50 > [ 1969.744478] [] linkwatch_do_dev+0x38/0x50 > [ 1969.750170] [] __linkwatch_run_queue+0xe7/0x160 > [ 1969.756385] [] linkwatch_event+0x25/0x30 > [ 1969.761991] [] process_one_work+0x15b/0x460 > [ 1969.767857] [] worker_thread+0x4e/0x480 > [ 1969.773378] [] ? process_one_work+0x460/0x460 > [ 1969.779420] [] ? process_one_work+0x460/0x460 > [ 1969.785460] [] kthread+0xca/0xe0 > [ 1969.790372] [] ? kthread_worker_fn+0x120/0x120 > [ 1969.796495] [] ret_from_fork+0x22/0x30 > [ 1969.801924] Code: 00 00 55 48 89 e5 85 d2 0f 84 90 00 00 00 83 ea 01 31 c9 > 31 f6 48 8d 87 c0 ef 00 00 4c 8d 8c d7 c8 ef 00 00 48 8b 10 48 83 c0 08 <4c> > 8b 82 08 0d 00 00 48 8b 92 00 0d 00 00 4c 01 c6 48 01 d1 4c > [ 1969.821969] RIP [] > mlx4_en_fold_software_stats.part.1+0x34/0xb0 [mlx4_en] > [ 1969.830486] RSP > [ 1969.834002] CR2: 0d08 > [ 1969.837440] ---[ end trace 80b9fbc1e7baed9b ]--- > [ 1969.842102] Kernel panic - not syncing: Fatal exception in interrupt > [ 1969.848520] Kernel Offset: disabled > [ 1969.852050] ---[ end Kernel panic - not syncing: Fatal exception in > interrupt > (END) Hi Jesper. Thanks for the report. Then we have a bug in the driver, deleting some memory too soon. If we depend on having proper stats at device dismantle, we need to keep arou