Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters

2016-12-04 Thread Saeed Mahameed
On Thu, Dec 1, 2016 at 7:36 PM, Eric Dumazet  wrote:
> On Thu, 2016-12-01 at 08:08 -0800, Eric Dumazet wrote:
>> On Thu, 2016-12-01 at 07:55 -0800, Eric Dumazet wrote:
>>
>> > So removing the spinlock is doable, but needs to add a new parameter
>> > to mlx4_en_fold_software_stats() and call netdev_stats_to_stats64()
>> > before mlx4_en_fold_software_stats(dev)
>>
>> Untested patch would be :
>>
>>  drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |2 -
>>  drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |   10 +
>>  drivers/net/ethernet/mellanox/mlx4/en_port.c|   24 +-
>>  drivers/net/ethernet/mellanox/mlx4/mlx4_en.h|3 +
>>  4 files changed, 23 insertions(+), 16 deletions(-)
>
> The patch is wrong, since priv->port_up could change to false while we
> are running and using the about to be deleted tx/rx rings.
>

Right, hence the regression Jesper saw ;).

>
> So the only safe thing to do is to remove the _bh suffix.
>
> Not worth trying to avoid taking a spinlock in this code.
>

Ack.


Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters

2016-12-04 Thread Saeed Mahameed
On Thu, Dec 1, 2016 at 7:08 PM, Eric Dumazet  wrote:
> On Thu, 2016-12-01 at 18:33 +0200, Saeed Mahameed wrote:
>
>> Thanks for the detailed answer !!
>
> You're welcome.
>
>>
>> BTW you went 5 steps ahead of my original question :)), so far you
>> already have a patch without locking at all (really impressive).
>>
>> What i wanted to ask originally, was regarding the "_bh", i didn't
>> mean to completely remove the "spin_lock_bh",
>> I meant, what happens if we replace "spin_lock_bh"  with "spin_lock",
>> without disabling bh ?
>> I gues raw "sping_lock" handles points (2 to 4) from above, but it
>> won't handle long irqs.
>
> Thats a very good point, the _bh prefix can totally be removed, since
> stats_lock is only acquired from process context.
>
>

That was my initial point, Thanks for the help.
will provide a fix patch later once 4.9 is release.


Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters

2016-12-01 Thread Eric Dumazet
On Thu, 2016-12-01 at 08:08 -0800, Eric Dumazet wrote:
> On Thu, 2016-12-01 at 07:55 -0800, Eric Dumazet wrote:
> 
> > So removing the spinlock is doable, but needs to add a new parameter
> > to mlx4_en_fold_software_stats() and call netdev_stats_to_stats64()
> > before mlx4_en_fold_software_stats(dev)
> 
> Untested patch would be :
> 
>  drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |2 -
>  drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |   10 +
>  drivers/net/ethernet/mellanox/mlx4/en_port.c|   24 +-
>  drivers/net/ethernet/mellanox/mlx4/mlx4_en.h|3 +
>  4 files changed, 23 insertions(+), 16 deletions(-)

The patch is wrong, since priv->port_up could change to false while we
are running and using the about to be deleted tx/rx rings.

So the only safe thing to do is to remove the _bh suffix.

Not worth trying to avoid taking a spinlock in this code.







Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters

2016-12-01 Thread Eric Dumazet
On Thu, 2016-12-01 at 18:33 +0200, Saeed Mahameed wrote:

> Thanks for the detailed answer !!

You're welcome.

> 
> BTW you went 5 steps ahead of my original question :)), so far you
> already have a patch without locking at all (really impressive).
> 
> What i wanted to ask originally, was regarding the "_bh", i didn't
> mean to completely remove the "spin_lock_bh",
> I meant, what happens if we replace "spin_lock_bh"  with "spin_lock",
> without disabling bh ?
> I gues raw "sping_lock" handles points (2 to 4) from above, but it
> won't handle long irqs.

Thats a very good point, the _bh prefix can totally be removed, since
stats_lock is only acquired from process context.




Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters

2016-12-01 Thread Saeed Mahameed
On Thu, Dec 1, 2016 at 5:55 PM, Eric Dumazet  wrote:
> On Thu, 2016-12-01 at 17:38 +0200, Saeed Mahameed wrote:
>
>>
>> Hi Eric, Thanks for the patch, I already acked it.
>
> Thanks !
>
>>
>> I have one educational question (not related to this patch, but
>> related to stats reading in general).
>> I was wondering why do we need to disable bh every time we read stats
>> "spin_lock_bh" ? is it essential ?
>>
>> I checked and in mlx4 we don't hold stats_lock in softirq
>> (en_rx.c/en_tx.c), so I don't see any deadlock risk in here..
>
> Excellent question, and I chose to keep the spinlock.
>
> That would be doable, only if we do not overwrite dev->stats.
>
> Current code is :
>
> static struct rtnl_link_stats64 *
> mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
> {
> struct mlx4_en_priv *priv = netdev_priv(dev);
>
> spin_lock_bh(&priv->stats_lock);
> mlx4_en_fold_software_stats(dev);
> netdev_stats_to_stats64(stats, &dev->stats);
> spin_unlock_bh(&priv->stats_lock);
>
> return stats;
> }
>
> If you remove the spin_lock_bh() :
>
>
> static struct rtnl_link_stats64 *
> mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
> {
> struct mlx4_en_priv *priv = netdev_priv(dev);
>
> mlx4_en_fold_software_stats(dev); // possible races
>
> netdev_stats_to_stats64(stats, &dev->stats);
>
> return stats;
> }
>
> 1) one mlx4_en_fold_software_stats(dev) could be preempted
> on a CONFIG_PREEMPT kernel, or interrupted by long irqs.
>
> 2) Another cpu would also call mlx4_en_fold_software_stats(dev) while
>first cpu is busy.
>
> 3) Then when resuming first cpu/thread, part of the dev->stats fieds
> would be updated with 'old counters',
> while another thread might have updated them with newer values.
>
> 4) A SNMP reader could then get counters that are not monotonically
> increasing,
> which would be confusing/buggy.
>
> So removing the spinlock is doable, but needs to add a new parameter
> to mlx4_en_fold_software_stats() and call netdev_stats_to_stats64()
> before mlx4_en_fold_software_stats(dev)
>
> static struct rtnl_link_stats64 *
> mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
> {
> struct mlx4_en_priv *priv = netdev_priv(dev);
>
> netdev_stats_to_stats64(stats, &dev->stats);
>
> // Passing a non NULL stats asks mlx4_en_fold_software_stats()
> // to not update dev->stats, but stats directly.
>
> mlx4_en_fold_software_stats(dev, stats)
>
>
> return stats;
> }
>
>

Thanks for the detailed answer !!

BTW you went 5 steps ahead of my original question :)), so far you
already have a patch without locking at all (really impressive).

What i wanted to ask originally, was regarding the "_bh", i didn't
mean to completely remove the "spin_lock_bh",
I meant, what happens if we replace "spin_lock_bh"  with "spin_lock",
without disabling bh ?
I gues raw "sping_lock" handles points (2 to 4) from above, but it
won't handle long irqs.


Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters

2016-12-01 Thread Eric Dumazet
On Thu, 2016-12-01 at 07:55 -0800, Eric Dumazet wrote:

> So removing the spinlock is doable, but needs to add a new parameter
> to mlx4_en_fold_software_stats() and call netdev_stats_to_stats64()
> before mlx4_en_fold_software_stats(dev)

Untested patch would be :

 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |2 -
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |   10 +
 drivers/net/ethernet/mellanox/mlx4/en_port.c|   24 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h|3 +
 4 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index 
d9c9f86a30df953fa555934c5406057dcaf28960..676050e352703cebe7fcaa5202a06496f7a5a0df
 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -367,7 +367,7 @@ static void mlx4_en_get_ethtool_stats(struct net_device 
*dev,
 
spin_lock_bh(&priv->stats_lock);
 
-   mlx4_en_fold_software_stats(dev);
+   mlx4_en_fold_software_stats(dev, NULL);
 
for (i = 0; i < NUM_MAIN_STATS; i++, bitmap_iterator_inc(&it))
if (bitmap_iterator_test(&it))
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 
091b904262bc7932d3edf99cf850affb23b9ce6e..6ee9e31e59c392cb88faedf9c541b3bc6d195228
 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1321,13 +1321,9 @@ static void mlx4_en_tx_timeout(struct net_device *dev)
 static struct rtnl_link_stats64 *
 mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
 {
-   struct mlx4_en_priv *priv = netdev_priv(dev);
-
-   spin_lock_bh(&priv->stats_lock);
-   mlx4_en_fold_software_stats(dev);
netdev_stats_to_stats64(stats, &dev->stats);
-   spin_unlock_bh(&priv->stats_lock);
-
+   /* Must be called after netdev_stats_to_stats64() */
+   mlx4_en_fold_software_stats(dev, stats);
return stats;
 }
 
@@ -1810,7 +1806,7 @@ void mlx4_en_stop_port(struct net_device *dev, int detach)
netif_tx_disable(dev);
 
spin_lock_bh(&priv->stats_lock);
-   mlx4_en_fold_software_stats(dev);
+   mlx4_en_fold_software_stats(dev, NULL);
/* Set port as not active */
priv->port_up = false;
spin_unlock_bh(&priv->stats_lock);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_port.c 
b/drivers/net/ethernet/mellanox/mlx4/en_port.c
index 
9166d90e732858610b1407fe85cbf6cbe27f5e0b..eea042a18e3cfba62745ece4ca673c2db967b9aa
 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_port.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_port.c
@@ -147,7 +147,8 @@ static unsigned long en_stats_adder(__be64 *start, __be64 
*next, int num)
return ret;
 }
 
-void mlx4_en_fold_software_stats(struct net_device *dev)
+void mlx4_en_fold_software_stats(struct net_device *dev,
+struct rtnl_link_stats64 *stats)
 {
struct mlx4_en_priv *priv = netdev_priv(dev);
struct mlx4_en_dev *mdev = priv->mdev;
@@ -165,9 +166,13 @@ void mlx4_en_fold_software_stats(struct net_device *dev)
packets += READ_ONCE(ring->packets);
bytes   += READ_ONCE(ring->bytes);
}
-   dev->stats.rx_packets = packets;
-   dev->stats.rx_bytes = bytes;
-
+   if (stats) {
+   stats->rx_packets = packets;
+   stats->rx_bytes = bytes;
+   } else {
+   dev->stats.rx_packets = packets;
+   dev->stats.rx_bytes = bytes;
+   }
packets = 0;
bytes = 0;
for (i = 0; i < priv->tx_ring_num[TX]; i++) {
@@ -176,8 +181,13 @@ void mlx4_en_fold_software_stats(struct net_device *dev)
packets += READ_ONCE(ring->packets);
bytes   += READ_ONCE(ring->bytes);
}
-   dev->stats.tx_packets = packets;
-   dev->stats.tx_bytes = bytes;
+   if (stats) {
+   stats->tx_packets = packets;
+   stats->tx_bytes = bytes;
+   } else {
+   dev->stats.tx_packets = packets;
+   dev->stats.tx_bytes = bytes;
+   }
 }
 
 int mlx4_en_DUMP_ETH_STATS(struct mlx4_en_dev *mdev, u8 port, u8 reset)
@@ -208,7 +218,7 @@ int mlx4_en_DUMP_ETH_STATS(struct mlx4_en_dev *mdev, u8 
port, u8 reset)
 
spin_lock_bh(&priv->stats_lock);
 
-   mlx4_en_fold_software_stats(dev);
+   mlx4_en_fold_software_stats(dev, NULL);
 
priv->port_stats.rx_chksum_good = 0;
priv->port_stats.rx_chksum_none = 0;
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h 
b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 
20a936428f4a44c8ca0a7161855da310f9166b50..92dbb41f425b282e9ab7c8d534f091da0ba661c3
 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -755,7 +755,8 @@ void mlx

Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters

2016-12-01 Thread Eric Dumazet
On Thu, 2016-12-01 at 17:38 +0200, Saeed Mahameed wrote:

> 
> Hi Eric, Thanks for the patch, I already acked it.

Thanks !

> 
> I have one educational question (not related to this patch, but
> related to stats reading in general).
> I was wondering why do we need to disable bh every time we read stats
> "spin_lock_bh" ? is it essential ?
> 
> I checked and in mlx4 we don't hold stats_lock in softirq
> (en_rx.c/en_tx.c), so I don't see any deadlock risk in here..

Excellent question, and I chose to keep the spinlock.

That would be doable, only if we do not overwrite dev->stats.

Current code is :

static struct rtnl_link_stats64 *
mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
{
struct mlx4_en_priv *priv = netdev_priv(dev);

spin_lock_bh(&priv->stats_lock);
mlx4_en_fold_software_stats(dev);
netdev_stats_to_stats64(stats, &dev->stats);
spin_unlock_bh(&priv->stats_lock);

return stats;
}

If you remove the spin_lock_bh() :

static struct rtnl_link_stats64 *
mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
{
struct mlx4_en_priv *priv = netdev_priv(dev);

mlx4_en_fold_software_stats(dev); // possible races

netdev_stats_to_stats64(stats, &dev->stats);

return stats;
}

1) one mlx4_en_fold_software_stats(dev) could be preempted
on a CONFIG_PREEMPT kernel, or interrupted by long irqs.

2) Another cpu would also call mlx4_en_fold_software_stats(dev) while
   first cpu is busy.

3) Then when resuming first cpu/thread, part of the dev->stats fieds 
would be updated with 'old counters',
while another thread might have updated them with newer values.

4) A SNMP reader could then get counters that are not monotonically
increasing,
which would be confusing/buggy.

So removing the spinlock is doable, but needs to add a new parameter
to mlx4_en_fold_software_stats() and call netdev_stats_to_stats64()
before mlx4_en_fold_software_stats(dev)

static struct rtnl_link_stats64 *
mlx4_en_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
{
struct mlx4_en_priv *priv = netdev_priv(dev);

netdev_stats_to_stats64(stats, &dev->stats);

// Passing a non NULL stats asks mlx4_en_fold_software_stats()
// to not update dev->stats, but stats directly.

mlx4_en_fold_software_stats(dev, stats)


return stats;
}




Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters

2016-12-01 Thread Saeed Mahameed
On Wed, Nov 30, 2016 at 11:00 PM, Eric Dumazet  wrote:
> On Wed, 2016-11-30 at 22:42 +0200, Saeed Mahameed wrote:
>> On Wed, Nov 30, 2016 at 7:35 PM, Eric Dumazet  wrote:
>> > On Wed, 2016-11-30 at 18:46 +0200, Saeed Mahameed wrote:
>> >
>> >> we had/still have the proper stats they are the ones that
>> >> mlx4_en_fold_software_stats is trying to cache into  (they always
>> >> exist),
>> >> but the ones that you are trying to read from (the mlx4 rings) are gone !
>> >>
>> >> This bug is totally new and as i warned, this is another symptom of
>> >> the real root cause (can't sleep while reading stats).
>> >>
>> >> Eric what do you suggest ? Keep pre-allocated MAX_RINGS stats  and
>> >> always iterate over all of them to query stats ?
>> >> what if you have one ring/none/1K ? how would you know how many to query ?
>> >
>> > I am suggesting I will fix the bug I introduced.
>> >
>> > Do not panic.
>> >
>> >
>>
>> Not at all, I trust you are the only one who is capable of providing
>> the best solution.
>> I am just trying to read your mind :-).
>>
>> As i said i like the solution and i want to adapt it to mlx5, so I am
>> a little bit enthusiastic :)
>
> What about the following fix guys ?
>
> As a bonus we update the stats right before they are sent to monitors
> via rtnetlink ;)

Hi Eric, Thanks for the patch, I already acked it.

I have one educational question (not related to this patch, but
related to stats reading in general).
I was wondering why do we need to disable bh every time we read stats
"spin_lock_bh" ? is it essential ?

I checked and in mlx4 we don't hold stats_lock in softirq
(en_rx.c/en_tx.c), so I don't see any deadlock risk in here..

 Thanks
Saeed.


Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters

2016-12-01 Thread Jesper Dangaard Brouer

On Wed, 30 Nov 2016 13:00:52 -0800 Eric Dumazet  wrote:

> On Wed, 2016-11-30 at 22:42 +0200, Saeed Mahameed wrote:
> > On Wed, Nov 30, 2016 at 7:35 PM, Eric Dumazet  
> > wrote:  
> > > On Wed, 2016-11-30 at 18:46 +0200, Saeed Mahameed wrote:
[...]
> > >
> > > I am suggesting I will fix the bug I introduced.
> > >
> > > Do not panic.
> > >
> > >  
> > 
> > Not at all, I trust you are the only one who is capable of providing
> > the best solution.
> > I am just trying to read your mind :-).
> > 
> > As i said i like the solution and i want to adapt it to mlx5, so I am
> > a little bit enthusiastic :)  
> 
> What about the following fix guys ?

Confirming this fixed the crash on shutdown for me, thanks!

 
> As a bonus we update the stats right before they are sent to monitors
> via rtnetlink ;)
> 
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
> b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index 
> 12ea3405f442717478bf0e8882edaf0de77986cb..091b904262bc7932d3edf99cf850affb23b9ce6e
>  100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -1809,8 +1809,12 @@ void mlx4_en_stop_port(struct net_device *dev, int 
> detach)
>  
>   netif_tx_disable(dev);
>  
> + spin_lock_bh(&priv->stats_lock);
> + mlx4_en_fold_software_stats(dev);
>   /* Set port as not active */
>   priv->port_up = false;
> + spin_unlock_bh(&priv->stats_lock);
> +
>   priv->counter_index = MLX4_SINK_COUNTER_INDEX(mdev->dev);
>  
>   /* Promsicuous mode */
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_port.c 
> b/drivers/net/ethernet/mellanox/mlx4/en_port.c
> index 
> c6c4f1238923e09eced547454b86c68720292859..9166d90e732858610b1407fe85cbf6cbe27f5e0b
>  100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_port.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_port.c
> @@ -154,7 +154,7 @@ void mlx4_en_fold_software_stats(struct net_device *dev)
>   unsigned long packets, bytes;
>   int i;
>  
> - if (mlx4_is_master(mdev->dev))
> + if (!priv->port_up || mlx4_is_master(mdev->dev))
>   return;
>  
>   packets = 0;
> 
> 



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters

2016-11-30 Thread Eric Dumazet
On Wed, 2016-11-30 at 22:42 +0200, Saeed Mahameed wrote:
> On Wed, Nov 30, 2016 at 7:35 PM, Eric Dumazet  wrote:
> > On Wed, 2016-11-30 at 18:46 +0200, Saeed Mahameed wrote:
> >
> >> we had/still have the proper stats they are the ones that
> >> mlx4_en_fold_software_stats is trying to cache into  (they always
> >> exist),
> >> but the ones that you are trying to read from (the mlx4 rings) are gone !
> >>
> >> This bug is totally new and as i warned, this is another symptom of
> >> the real root cause (can't sleep while reading stats).
> >>
> >> Eric what do you suggest ? Keep pre-allocated MAX_RINGS stats  and
> >> always iterate over all of them to query stats ?
> >> what if you have one ring/none/1K ? how would you know how many to query ?
> >
> > I am suggesting I will fix the bug I introduced.
> >
> > Do not panic.
> >
> >
> 
> Not at all, I trust you are the only one who is capable of providing
> the best solution.
> I am just trying to read your mind :-).
> 
> As i said i like the solution and i want to adapt it to mlx5, so I am
> a little bit enthusiastic :)

What about the following fix guys ?

As a bonus we update the stats right before they are sent to monitors
via rtnetlink ;)


diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 
12ea3405f442717478bf0e8882edaf0de77986cb..091b904262bc7932d3edf99cf850affb23b9ce6e
 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1809,8 +1809,12 @@ void mlx4_en_stop_port(struct net_device *dev, int 
detach)
 
netif_tx_disable(dev);
 
+   spin_lock_bh(&priv->stats_lock);
+   mlx4_en_fold_software_stats(dev);
/* Set port as not active */
priv->port_up = false;
+   spin_unlock_bh(&priv->stats_lock);
+
priv->counter_index = MLX4_SINK_COUNTER_INDEX(mdev->dev);
 
/* Promsicuous mode */
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_port.c 
b/drivers/net/ethernet/mellanox/mlx4/en_port.c
index 
c6c4f1238923e09eced547454b86c68720292859..9166d90e732858610b1407fe85cbf6cbe27f5e0b
 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_port.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_port.c
@@ -154,7 +154,7 @@ void mlx4_en_fold_software_stats(struct net_device *dev)
unsigned long packets, bytes;
int i;
 
-   if (mlx4_is_master(mdev->dev))
+   if (!priv->port_up || mlx4_is_master(mdev->dev))
return;
 
packets = 0;




Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters

2016-11-30 Thread Saeed Mahameed
On Wed, Nov 30, 2016 at 7:35 PM, Eric Dumazet  wrote:
> On Wed, 2016-11-30 at 18:46 +0200, Saeed Mahameed wrote:
>
>> we had/still have the proper stats they are the ones that
>> mlx4_en_fold_software_stats is trying to cache into  (they always
>> exist),
>> but the ones that you are trying to read from (the mlx4 rings) are gone !
>>
>> This bug is totally new and as i warned, this is another symptom of
>> the real root cause (can't sleep while reading stats).
>>
>> Eric what do you suggest ? Keep pre-allocated MAX_RINGS stats  and
>> always iterate over all of them to query stats ?
>> what if you have one ring/none/1K ? how would you know how many to query ?
>
> I am suggesting I will fix the bug I introduced.
>
> Do not panic.
>
>

Not at all, I trust you are the only one who is capable of providing
the best solution.
I am just trying to read your mind :-).

As i said i like the solution and i want to adapt it to mlx5, so I am
a little bit enthusiastic :)

Thanks.


Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters

2016-11-30 Thread Eric Dumazet
On Wed, 2016-11-30 at 18:46 +0200, Saeed Mahameed wrote:

> we had/still have the proper stats they are the ones that
> mlx4_en_fold_software_stats is trying to cache into  (they always
> exist),
> but the ones that you are trying to read from (the mlx4 rings) are gone !
> 
> This bug is totally new and as i warned, this is another symptom of
> the real root cause (can't sleep while reading stats).
> 
> Eric what do you suggest ? Keep pre-allocated MAX_RINGS stats  and
> always iterate over all of them to query stats ?
> what if you have one ring/none/1K ? how would you know how many to query ?

I am suggesting I will fix the bug I introduced.

Do not panic.




Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters

2016-11-30 Thread Saeed Mahameed
On Wed, Nov 30, 2016 at 5:58 PM, Eric Dumazet  wrote:
> On Wed, 2016-11-30 at 15:08 +0100, Jesper Dangaard Brouer wrote:
>> On Fri, 25 Nov 2016 07:46:20 -0800 Eric Dumazet  
>> wrote:
>>
>> > From: Eric Dumazet 
>>
>> Ended up-in net-next as:
>>
>>  commit 40931b85113dad7881d49e8759e5ad41d30a5e6c
>>  Author: Eric Dumazet 
>>  Date:   Fri Nov 25 07:46:20 2016 -0800
>>
>> mlx4: give precise rx/tx bytes/packets counters
>>
>> mlx4 stats are chaotic because a deferred work queue is responsible
>> to update them every 250 ms.
>>
>> Likely after this patch I get this crash (below), when rebooting my machine.
>> Looks like a device removal order thing.
>> Tested with net-next at commit 93ba5504.
>>
>> [...]
>> [ 1967.248453] mlx5_core :02:00.1: Shutdown was called
>> [ 1967.854556] mlx5_core :02:00.0: Shutdown was called
>> [ 1968.443015] e1000e: EEE TX LPI TIMER: 0011
>> [ 1968.484676] sd 3:0:0:0: [sda] Synchronizing SCSI cache
>> [ 1968.528354] mlx4_core :01:00.0: mlx4_shutdown was called
>> [ 1968.534054] mlx4_en: mlx4p1: Close port called
>> [ 1968.571156] mlx4_en :01:00.0: removed PHC
>> [ 1968.575677] mlx4_en: mlx4p2: Close port called
>> [ 1969.506602] BUG: unable to handle kernel NULL pointer dereference at 
>> 0d08
>> [ 1969.514530] IP: [] 
>> mlx4_en_fold_software_stats.part.1+0x34/0xb0 [mlx4_en]
>> [ 1969.522963] PGD 0 [ 1969.524803]
>> [ 1969.526332] Oops:  [#1] PREEMPT SMP
>> [ 1969.530201] Modules linked in: coretemp kvm_intel kvm irqbypass 
>> intel_cstate mxm_wmi i2c_i801 intel_rapl_perf i2c_smbus sg pcspkr i2c_core 
>> shpchp nfsd wmi video acpi_pad auth_rpcgss oid_registry nfs_acl lockd grace 
>> sunrpc ip_tables x_tables mlx4_en e1000e mlx5_core ptp serio_raw sd_mod 
>> mlx4_core pps_core devlink hid_generic
>> [ 1969.559616] CPU: 3 PID: 3104 Comm: kworker/3:1 Not tainted 
>> 4.9.0-rc6-net-next3-01390-g93ba5504 #12
>> [ 1969.568984] Hardware name: To Be Filled By O.E.M. To Be Filled By 
>> O.E.M./Z97 Extreme4, BIOS P2.10 05/12/2015
>> [ 1969.578877] Workqueue: events linkwatch_event
>> [ 1969.583285] task: 8803f42a task.stack: 88040b2d
>> [ 1969.589238] RIP: 0010:[]  [] 
>> mlx4_en_fold_software_stats.part.1+0x34/0xb0 [mlx4_en]
>> [ 1969.600102] RSP: 0018:88040b2d3bd8  EFLAGS: 00010282
>> [ 1969.605442] RAX: 8803f432efc8 RBX: 8803f432 RCX: 
>> 
>> [ 1969.612604] RDX:  RSI:  RDI: 
>> 8803f432
>> [ 1969.619772] RBP: 88040b2d3bd8 R08: 000c R09: 
>> 8803f432f000
>> [ 1969.626938] R10:  R11: 88040d64ac00 R12: 
>> 8803e5aff8dc
>> [ 1969.634104] R13: 8803f4320a28 R14: 8803e5aff800 R15: 
>> 
>> [ 1969.641273] FS:  () GS:88041fac() 
>> knlGS:
>> [ 1969.649422] CS:  0010 DS:  ES:  CR0: 80050033
>> [ 1969.655197] CR2: 0d08 CR3: 01c07000 CR4: 
>> 001406e0
>> [ 1969.662366] Stack:
>> [ 1969.664412]  88040b2d3be8 a0127f8e 88040b2d3c10 
>> a012a23b
>> [ 1969.671948]  8803e5aff8dc 8803f432 8803f432 
>> 88040b2d3c30
>> [ 1969.679478]  8160ae29 8803e5aff8d8 8804088ff300 
>> 88040b2d3c58
>> [ 1969.687001] Call Trace:
>> [ 1969.689484]  [] mlx4_en_fold_software_stats+0x1e/0x20 
>> [mlx4_en]
>> [ 1969.697026]  [] mlx4_en_get_stats64+0x2b/0x50 [mlx4_en]
>> [ 1969.703844]  [] dev_get_stats+0x39/0xa0
>> [ 1969.709274]  [] rtnl_fill_stats+0x40/0x130
>> [ 1969.714968]  [] rtnl_fill_ifinfo+0x55b/0x1010
>> [ 1969.720921]  [] rtmsg_ifinfo_build_skb+0x73/0xd0
>> [ 1969.727136]  [] rtmsg_ifinfo.part.25+0x16/0x50
>> [ 1969.733176]  [] rtmsg_ifinfo+0x18/0x20
>> [ 1969.738522]  [] netdev_state_change+0x47/0x50
>> [ 1969.744478]  [] linkwatch_do_dev+0x38/0x50
>> [ 1969.750170]  [] __linkwatch_run_queue+0xe7/0x160
>> [ 1969.756385]  [] linkwatch_event+0x25/0x30
>> [ 1969.761991]  [] process_one_work+0x15b/0x460
>> [ 1969.767857]  [] worker_thread+0x4e/0x480
>> [ 1969.773378]  [] ? process_one_work+0x460/0x460
>> [ 1969.779420]  [] ? process_one_work+0x460/0x460
>> [ 1969.785460]  [] kthread+0xca/0xe0
>> [ 1969.790372]  [] ? kthread_worker_fn+0x120/0x120
>> [ 1969.796495]  [] ret_from_fork+0x22/0x30
>> [ 1969.801924] Code: 00 00 55 48 89 e5 85 d2 0f 84 90 00 00 00 83 ea 01 31 
>> c9 31 f6 48 8d 87 c0 ef 00 00 4c 8d 8c d7 c8 ef 00 00 48 8b 10 48 83 c0 08 
>> <4c> 8b 82 08 0d 00 00 48 8b 92 00 0d 00 00 4c 01 c6 48 01 d1 4c
>> [ 1969.821969] RIP  [] 
>> mlx4_en_fold_software_stats.part.1+0x34/0xb0 [mlx4_en]
>> [ 1969.830486]  RSP 
>> [ 1969.834002] CR2: 0d08
>> [ 1969.837440] ---[ end trace 80b9fbc1e7baed9b ]---
>> [ 1969.842102] Kernel panic - not syncing: Fatal exception in interrupt
>> [ 1969.848520] Kernel Offset: disabled
>> [ 1969.852050] ---[ end Kernel panic - not syncing: Fatal exception in 
>> interrupt
>> (END)
>
> Hi Jesper.
>
> Thanks for the r

Re: Regression: [PATCH] mlx4: give precise rx/tx bytes/packets counters

2016-11-30 Thread Eric Dumazet
On Wed, 2016-11-30 at 15:08 +0100, Jesper Dangaard Brouer wrote:
> On Fri, 25 Nov 2016 07:46:20 -0800 Eric Dumazet  
> wrote:
> 
> > From: Eric Dumazet 
> 
> Ended up-in net-next as:
> 
>  commit 40931b85113dad7881d49e8759e5ad41d30a5e6c
>  Author: Eric Dumazet 
>  Date:   Fri Nov 25 07:46:20 2016 -0800
> 
> mlx4: give precise rx/tx bytes/packets counters
> 
> mlx4 stats are chaotic because a deferred work queue is responsible
> to update them every 250 ms.
> 
> Likely after this patch I get this crash (below), when rebooting my machine.
> Looks like a device removal order thing.
> Tested with net-next at commit 93ba5504.   
> 
> [...]
> [ 1967.248453] mlx5_core :02:00.1: Shutdown was called
> [ 1967.854556] mlx5_core :02:00.0: Shutdown was called
> [ 1968.443015] e1000e: EEE TX LPI TIMER: 0011
> [ 1968.484676] sd 3:0:0:0: [sda] Synchronizing SCSI cache
> [ 1968.528354] mlx4_core :01:00.0: mlx4_shutdown was called
> [ 1968.534054] mlx4_en: mlx4p1: Close port called
> [ 1968.571156] mlx4_en :01:00.0: removed PHC
> [ 1968.575677] mlx4_en: mlx4p2: Close port called
> [ 1969.506602] BUG: unable to handle kernel NULL pointer dereference at 
> 0d08
> [ 1969.514530] IP: [] 
> mlx4_en_fold_software_stats.part.1+0x34/0xb0 [mlx4_en]
> [ 1969.522963] PGD 0 [ 1969.524803] 
> [ 1969.526332] Oops:  [#1] PREEMPT SMP
> [ 1969.530201] Modules linked in: coretemp kvm_intel kvm irqbypass 
> intel_cstate mxm_wmi i2c_i801 intel_rapl_perf i2c_smbus sg pcspkr i2c_core 
> shpchp nfsd wmi video acpi_pad auth_rpcgss oid_registry nfs_acl lockd grace 
> sunrpc ip_tables x_tables mlx4_en e1000e mlx5_core ptp serio_raw sd_mod 
> mlx4_core pps_core devlink hid_generic
> [ 1969.559616] CPU: 3 PID: 3104 Comm: kworker/3:1 Not tainted 
> 4.9.0-rc6-net-next3-01390-g93ba5504 #12
> [ 1969.568984] Hardware name: To Be Filled By O.E.M. To Be Filled By 
> O.E.M./Z97 Extreme4, BIOS P2.10 05/12/2015
> [ 1969.578877] Workqueue: events linkwatch_event
> [ 1969.583285] task: 8803f42a task.stack: 88040b2d
> [ 1969.589238] RIP: 0010:[]  [] 
> mlx4_en_fold_software_stats.part.1+0x34/0xb0 [mlx4_en]
> [ 1969.600102] RSP: 0018:88040b2d3bd8  EFLAGS: 00010282
> [ 1969.605442] RAX: 8803f432efc8 RBX: 8803f432 RCX: 
> 
> [ 1969.612604] RDX:  RSI:  RDI: 
> 8803f432
> [ 1969.619772] RBP: 88040b2d3bd8 R08: 000c R09: 
> 8803f432f000
> [ 1969.626938] R10:  R11: 88040d64ac00 R12: 
> 8803e5aff8dc
> [ 1969.634104] R13: 8803f4320a28 R14: 8803e5aff800 R15: 
> 
> [ 1969.641273] FS:  () GS:88041fac() 
> knlGS:
> [ 1969.649422] CS:  0010 DS:  ES:  CR0: 80050033
> [ 1969.655197] CR2: 0d08 CR3: 01c07000 CR4: 
> 001406e0
> [ 1969.662366] Stack:
> [ 1969.664412]  88040b2d3be8 a0127f8e 88040b2d3c10 
> a012a23b
> [ 1969.671948]  8803e5aff8dc 8803f432 8803f432 
> 88040b2d3c30
> [ 1969.679478]  8160ae29 8803e5aff8d8 8804088ff300 
> 88040b2d3c58
> [ 1969.687001] Call Trace:
> [ 1969.689484]  [] mlx4_en_fold_software_stats+0x1e/0x20 
> [mlx4_en]
> [ 1969.697026]  [] mlx4_en_get_stats64+0x2b/0x50 [mlx4_en]
> [ 1969.703844]  [] dev_get_stats+0x39/0xa0
> [ 1969.709274]  [] rtnl_fill_stats+0x40/0x130
> [ 1969.714968]  [] rtnl_fill_ifinfo+0x55b/0x1010
> [ 1969.720921]  [] rtmsg_ifinfo_build_skb+0x73/0xd0
> [ 1969.727136]  [] rtmsg_ifinfo.part.25+0x16/0x50
> [ 1969.733176]  [] rtmsg_ifinfo+0x18/0x20
> [ 1969.738522]  [] netdev_state_change+0x47/0x50
> [ 1969.744478]  [] linkwatch_do_dev+0x38/0x50
> [ 1969.750170]  [] __linkwatch_run_queue+0xe7/0x160
> [ 1969.756385]  [] linkwatch_event+0x25/0x30
> [ 1969.761991]  [] process_one_work+0x15b/0x460
> [ 1969.767857]  [] worker_thread+0x4e/0x480
> [ 1969.773378]  [] ? process_one_work+0x460/0x460
> [ 1969.779420]  [] ? process_one_work+0x460/0x460
> [ 1969.785460]  [] kthread+0xca/0xe0
> [ 1969.790372]  [] ? kthread_worker_fn+0x120/0x120
> [ 1969.796495]  [] ret_from_fork+0x22/0x30
> [ 1969.801924] Code: 00 00 55 48 89 e5 85 d2 0f 84 90 00 00 00 83 ea 01 31 c9 
> 31 f6 48 8d 87 c0 ef 00 00 4c 8d 8c d7 c8 ef 00 00 48 8b 10 48 83 c0 08 <4c> 
> 8b 82 08 0d 00 00 48 8b 92 00 0d 00 00 4c 01 c6 48 01 d1 4c 
> [ 1969.821969] RIP  [] 
> mlx4_en_fold_software_stats.part.1+0x34/0xb0 [mlx4_en]
> [ 1969.830486]  RSP 
> [ 1969.834002] CR2: 0d08
> [ 1969.837440] ---[ end trace 80b9fbc1e7baed9b ]---
> [ 1969.842102] Kernel panic - not syncing: Fatal exception in interrupt
> [ 1969.848520] Kernel Offset: disabled
> [ 1969.852050] ---[ end Kernel panic - not syncing: Fatal exception in 
> interrupt
> (END)

Hi Jesper.

Thanks for the report.

Then we have a bug in the driver, deleting some memory too soon.

If we depend on having proper stats at device dismantle, we need to keep
arou