Re: [PATCH] netpoll: Check for skb->queue_mapping

2017-04-20 Thread Eric Dumazet
On Thu, Apr 20, 2017 at 12:08 PM, tndave  wrote:
>
>
> On 04/12/2017 03:37 PM, tndave wrote:
>>
>>
>>
>> On 04/06/2017 12:14 PM, Eric Dumazet wrote:
>>>
>>> On Thu, 2017-04-06 at 12:07 -0700, tndave wrote:
>>>
> +q_index = q_index % dev->real_num_tx_queues;

 cpu interrupted here and dev->real_num_tx_queues has reduced!
>
> +skb_set_queue_mapping(skb, q_index);
> +}
> +txq = netdev_get_tx_queue(dev, q_index);

 or cpu interrupted here and dev->real_num_tx_queues has reduced!
>>>
>>>
>>> If dev->real_num_tx_queues can be changed while this code is running we
>>> are in deep deep trouble.
>>>
>>> Better make sure that when control path does this change, device (and/pr
>>> netpoll) is frozen and no packet can be sent.
>>
>> When control path is making change to real_num_tx_queues, underlying
>> device is disabled; also netdev tx queues are stopped/disabled so
>> certainly no transmit is happening.
>>
>>
>> The corner case I was referring is if netpoll's queue_process() code is
>> interrupted and while it is not running, control path makes change to
>> dev->real_num_tx_queues and exits. Later on, interrupted queue_process()
>> resume execution and it can end up with wrong skb->queue_mapping and txq.
>> We can prevent this case with below change:
>>
>> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
>> index 9424673..29be246 100644
>> --- a/net/core/netpoll.c
>> +++ b/net/core/netpoll.c
>> @@ -105,15 +105,21 @@ static void queue_process(struct work_struct *work)
>> while ((skb = skb_dequeue(>txq))) {
>> struct net_device *dev = skb->dev;
>> struct netdev_queue *txq;
>> +   unsigned int q_index;
>>
>> if (!netif_device_present(dev) || !netif_running(dev)) {
>> kfree_skb(skb);
>> continue;
>> }
>>
>> -   txq = skb_get_tx_queue(dev, skb);
>> -
>> local_irq_save(flags);
>> +   /* check if skb->queue_mapping is still valid */
>> +   q_index = skb_get_queue_mapping(skb);
>> +   if (unlikely(q_index >= dev->real_num_tx_queues)) {
>> +   q_index = q_index % dev->real_num_tx_queues;
>> +   skb_set_queue_mapping(skb, q_index);
>> +   }
>> +   txq = netdev_get_tx_queue(dev, q_index);
>> HARD_TX_LOCK(dev, txq, smp_processor_id());
>> if (netif_xmit_frozen_or_stopped(txq) ||
>> netpoll_start_xmit(skb, dev, txq) != NETDEV_TX_OK) {
>>
>>
>> Thanks for your patience.
>
> Eric,
>
> Let me know if you are okay with above changes and me sending v2.

Hi Tushar

Yes, this looks good to me, thanks !


Re: [PATCH] netpoll: Check for skb->queue_mapping

2017-04-20 Thread tndave



On 04/12/2017 03:37 PM, tndave wrote:



On 04/06/2017 12:14 PM, Eric Dumazet wrote:

On Thu, 2017-04-06 at 12:07 -0700, tndave wrote:


+q_index = q_index % dev->real_num_tx_queues;

cpu interrupted here and dev->real_num_tx_queues has reduced!

+skb_set_queue_mapping(skb, q_index);
+}
+txq = netdev_get_tx_queue(dev, q_index);

or cpu interrupted here and dev->real_num_tx_queues has reduced!


If dev->real_num_tx_queues can be changed while this code is running we
are in deep deep trouble.

Better make sure that when control path does this change, device (and/pr
netpoll) is frozen and no packet can be sent.

When control path is making change to real_num_tx_queues, underlying
device is disabled; also netdev tx queues are stopped/disabled so
certainly no transmit is happening.


The corner case I was referring is if netpoll's queue_process() code is
interrupted and while it is not running, control path makes change to
dev->real_num_tx_queues and exits. Later on, interrupted queue_process()
resume execution and it can end up with wrong skb->queue_mapping and txq.
We can prevent this case with below change:

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 9424673..29be246 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -105,15 +105,21 @@ static void queue_process(struct work_struct *work)
while ((skb = skb_dequeue(>txq))) {
struct net_device *dev = skb->dev;
struct netdev_queue *txq;
+   unsigned int q_index;

if (!netif_device_present(dev) || !netif_running(dev)) {
kfree_skb(skb);
continue;
}

-   txq = skb_get_tx_queue(dev, skb);
-
local_irq_save(flags);
+   /* check if skb->queue_mapping is still valid */
+   q_index = skb_get_queue_mapping(skb);
+   if (unlikely(q_index >= dev->real_num_tx_queues)) {
+   q_index = q_index % dev->real_num_tx_queues;
+   skb_set_queue_mapping(skb, q_index);
+   }
+   txq = netdev_get_tx_queue(dev, q_index);
HARD_TX_LOCK(dev, txq, smp_processor_id());
if (netif_xmit_frozen_or_stopped(txq) ||
netpoll_start_xmit(skb, dev, txq) != NETDEV_TX_OK) {


Thanks for your patience.

Eric,

Let me know if you are okay with above changes and me sending v2.

-Tushar


-Tushar









Re: [PATCH] netpoll: Check for skb->queue_mapping

2017-04-12 Thread tndave



On 04/06/2017 12:14 PM, Eric Dumazet wrote:

On Thu, 2017-04-06 at 12:07 -0700, tndave wrote:


+   q_index = q_index % dev->real_num_tx_queues;

cpu interrupted here and dev->real_num_tx_queues has reduced!

+   skb_set_queue_mapping(skb, q_index);
+   }
+   txq = netdev_get_tx_queue(dev, q_index);

or cpu interrupted here and dev->real_num_tx_queues has reduced!


If dev->real_num_tx_queues can be changed while this code is running we
are in deep deep trouble.

Better make sure that when control path does this change, device (and/pr
netpoll) is frozen and no packet can be sent.

When control path is making change to real_num_tx_queues, underlying
device is disabled; also netdev tx queues are stopped/disabled so
certainly no transmit is happening.


The corner case I was referring is if netpoll's queue_process() code is
interrupted and while it is not running, control path makes change to
dev->real_num_tx_queues and exits. Later on, interrupted queue_process()
resume execution and it can end up with wrong skb->queue_mapping and txq.
We can prevent this case with below change:

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 9424673..29be246 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -105,15 +105,21 @@ static void queue_process(struct work_struct *work)
while ((skb = skb_dequeue(>txq))) {
struct net_device *dev = skb->dev;
struct netdev_queue *txq;
+   unsigned int q_index;

if (!netif_device_present(dev) || !netif_running(dev)) {
kfree_skb(skb);
continue;
}

-   txq = skb_get_tx_queue(dev, skb);
-
local_irq_save(flags);
+   /* check if skb->queue_mapping is still valid */
+   q_index = skb_get_queue_mapping(skb);
+   if (unlikely(q_index >= dev->real_num_tx_queues)) {
+   q_index = q_index % dev->real_num_tx_queues;
+   skb_set_queue_mapping(skb, q_index);
+   }
+   txq = netdev_get_tx_queue(dev, q_index);
HARD_TX_LOCK(dev, txq, smp_processor_id());
if (netif_xmit_frozen_or_stopped(txq) ||
netpoll_start_xmit(skb, dev, txq) != NETDEV_TX_OK) {


Thanks for your patience.

-Tushar







Re: [PATCH] netpoll: Check for skb->queue_mapping

2017-04-06 Thread Eric Dumazet
On Thu, 2017-04-06 at 12:07 -0700, tndave wrote:

> > +   q_index = q_index % dev->real_num_tx_queues;
> cpu interrupted here and dev->real_num_tx_queues has reduced!
> > +   skb_set_queue_mapping(skb, q_index);
> > +   }
> > +   txq = netdev_get_tx_queue(dev, q_index);
> or cpu interrupted here and dev->real_num_tx_queues has reduced!

If dev->real_num_tx_queues can be changed while this code is running we
are in deep deep trouble.

Better make sure that when control path does this change, device (and/pr
netpoll) is frozen and no packet can be sent.





Re: [PATCH] netpoll: Check for skb->queue_mapping

2017-04-06 Thread tndave



On 04/06/2017 03:26 AM, Eric Dumazet wrote:

On Wed, 2017-04-05 at 19:06 -0700, Tushar Dave wrote:

Reducing real_num_tx_queues needs to be in sync with skb queue_mapping
otherwise skbs with queue_mapping greater than real_num_tx_queues
can be sent to the underlying driver and can result in kernel panic.

One such event is running netconsole and enabling VF on the same
device. Or running netconsole and changing number of tx queues via
ethtool on same device.

e.g.




Signed-off-by: Tushar Dave 
---
 net/core/netpoll.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 9424673..c572e49 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -101,6 +101,7 @@ static void queue_process(struct work_struct *work)
container_of(work, struct netpoll_info, tx_work.work);
struct sk_buff *skb;
unsigned long flags;
+   u16 q_index;

while ((skb = skb_dequeue(>txq))) {
struct net_device *dev = skb->dev;
@@ -117,6 +118,12 @@ static void queue_process(struct work_struct *work)
HARD_TX_LOCK(dev, txq, smp_processor_id());
if (netif_xmit_frozen_or_stopped(txq) ||
netpoll_start_xmit(skb, dev, txq) != NETDEV_TX_OK) {
+   /* check if skb->queue_mapping has changed */
+   q_index = skb_get_queue_mapping(skb);
+   if (unlikely(q_index >= dev->real_num_tx_queues)) {
+   q_index = q_index % dev->real_num_tx_queues;
+   skb_set_queue_mapping(skb, q_index);
+   }
skb_queue_head(>txq, skb);
HARD_TX_UNLOCK(dev, txq);
local_irq_restore(flags);


Hi Tushar, thank you for working on this issue.

Eric,

Thank you for reviewing my change and your valuable comments.


Where and when skb->queue_mapping has changed ?

Well, I should amend the comment,
"/* check if skb->queue_mapping has changed */" is misleading.
I should say, /* check if dev->real_num_tx_queues has changed */


It looks that the real problem is that dev->real_num_tx_queues has been
changed instead of skb->queue_mapping

Yes.


So maybe the more correct change would be to cap skb->queue_mapping even
before getting skb_get_tx_queue() ?

Otherwise, even after your patch, we might still access an invalid queue
on the device ?

This is the case of direct xmit (netdev_start_xmit). Most of underlying
mq device drivers retrieve tx queue index from skb->queue_mapping.
One possibility I see where device driver still get invalid queue index
(skb->queue_mapping) with my patch is when following occurs:
- after executing "txq = skb_get_tx_queue(dev, skb)" cpu is interrupted';
- some other cpu reduced dev->real_num_tx_queues;
- Interrupted cpu resume (queue_process()).
With above sequence, at the underlying driver's xmit function we can
have skb->queue_mapping > dev->real_num_tx_queues [1].

I like your suggested patch more than mine though because it checks for
change in dev->real_num_tx_queues before even we retrieve 'netdev_queue
txq'. Less chance to have wrong txq.

However even your patch has same issue like [1] ? (see my comment below).



Something like the following :

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 
9424673009c14e0fb288b8e4041dba596b37ee8d..16702d95f83ab884e605e3868cfef94615dcbc72
 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -105,13 +105,20 @@ static void queue_process(struct work_struct *work)
while ((skb = skb_dequeue(>txq))) {
struct net_device *dev = skb->dev;
struct netdev_queue *txq;
+   unsigned int q_index;

if (!netif_device_present(dev) || !netif_running(dev)) {
kfree_skb(skb);
continue;
}

-   txq = skb_get_tx_queue(dev, skb);
+   /* check if skb->queue_mapping is still valid */
+   q_index = skb_get_queue_mapping(skb);
+   if (unlikely(q_index >= dev->real_num_tx_queues)) {
+   q_index = q_index % dev->real_num_tx_queues;

cpu interrupted here and dev->real_num_tx_queues has reduced!

+   skb_set_queue_mapping(skb, q_index);
+   }
+   txq = netdev_get_tx_queue(dev, q_index);

or cpu interrupted here and dev->real_num_tx_queues has reduced!

In any case we hit the bug or am I missing something?

The other concern is concurrent execution of netpoll's direct xmit path
and updates to dev->real_num_tx_queues (by other cpu)?


Thanks.

-Tushar


local_irq_save(flags);
HARD_TX_LOCK(dev, txq, smp_processor_id());






Re: [PATCH] netpoll: Check for skb->queue_mapping

2017-04-06 Thread Eric Dumazet
On Thu, 2017-04-06 at 03:26 -0700, Eric Dumazet wrote:
> Hi Thushar, thank you for working on this issue.

I am sorry for the typo Tushar !
Thanks !




Re: [PATCH] netpoll: Check for skb->queue_mapping

2017-04-06 Thread Eric Dumazet
On Wed, 2017-04-05 at 19:06 -0700, Tushar Dave wrote:
> Reducing real_num_tx_queues needs to be in sync with skb queue_mapping
> otherwise skbs with queue_mapping greater than real_num_tx_queues
> can be sent to the underlying driver and can result in kernel panic.
> 
> One such event is running netconsole and enabling VF on the same
> device. Or running netconsole and changing number of tx queues via
> ethtool on same device.
> 
> e.g.

> 
> Signed-off-by: Tushar Dave 
> ---
>  net/core/netpoll.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index 9424673..c572e49 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -101,6 +101,7 @@ static void queue_process(struct work_struct *work)
>   container_of(work, struct netpoll_info, tx_work.work);
>   struct sk_buff *skb;
>   unsigned long flags;
> + u16 q_index;
>  
>   while ((skb = skb_dequeue(>txq))) {
>   struct net_device *dev = skb->dev;
> @@ -117,6 +118,12 @@ static void queue_process(struct work_struct *work)
>   HARD_TX_LOCK(dev, txq, smp_processor_id());
>   if (netif_xmit_frozen_or_stopped(txq) ||
>   netpoll_start_xmit(skb, dev, txq) != NETDEV_TX_OK) {
> + /* check if skb->queue_mapping has changed */
> + q_index = skb_get_queue_mapping(skb);
> + if (unlikely(q_index >= dev->real_num_tx_queues)) {
> + q_index = q_index % dev->real_num_tx_queues;
> + skb_set_queue_mapping(skb, q_index);
> + }
>   skb_queue_head(>txq, skb);
>   HARD_TX_UNLOCK(dev, txq);
>   local_irq_restore(flags);

Hi Thushar, thank you for working on this issue.

Where and when skb->queue_mapping has changed ?

It looks that the real problem is that dev->real_num_tx_queues has been
changed instead of skb->queue_mapping

So maybe the more correct change would be to cap skb->queue_mapping even
before getting skb_get_tx_queue() ?

Otherwise, even after your patch, we might still access an invalid queue
on the device ?

Something like the following :

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 
9424673009c14e0fb288b8e4041dba596b37ee8d..16702d95f83ab884e605e3868cfef94615dcbc72
 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -105,13 +105,20 @@ static void queue_process(struct work_struct *work)
while ((skb = skb_dequeue(>txq))) {
struct net_device *dev = skb->dev;
struct netdev_queue *txq;
+   unsigned int q_index;
 
if (!netif_device_present(dev) || !netif_running(dev)) {
kfree_skb(skb);
continue;
}
 
-   txq = skb_get_tx_queue(dev, skb);
+   /* check if skb->queue_mapping is still valid */
+   q_index = skb_get_queue_mapping(skb);
+   if (unlikely(q_index >= dev->real_num_tx_queues)) {
+   q_index = q_index % dev->real_num_tx_queues;
+   skb_set_queue_mapping(skb, q_index);
+   }
+   txq = netdev_get_tx_queue(dev, q_index);
 
local_irq_save(flags);
HARD_TX_LOCK(dev, txq, smp_processor_id());





[PATCH] netpoll: Check for skb->queue_mapping

2017-04-05 Thread Tushar Dave
Reducing real_num_tx_queues needs to be in sync with skb queue_mapping
otherwise skbs with queue_mapping greater than real_num_tx_queues
can be sent to the underlying driver and can result in kernel panic.

One such event is running netconsole and enabling VF on the same
device. Or running netconsole and changing number of tx queues via
ethtool on same device.

e.g.
Unable to handle kernel NULL pointer dereference
tsk->{mm,active_mm}->context = 1525
tsk->{mm,active_mm}->pgd = fff800130ff9a000
  \|/  \|/
  "@'/ .. \`@"
  /_| \__/ |_\
 \__U_/
kworker/48:1(475): Oops [#1]
CPU: 48 PID: 475 Comm: kworker/48:1 Tainted: G   OE
4.11.0-rc3-davem-net+ #7
Workqueue: events queue_process
task: fff80013113299c0 task.stack: fff800131132c000
TSTATE: 004480e01600 TPC: 103f9e3c TNPC: 103f9e40 Y:
Tainted: G   OE
TPC: 
g0:  g1: 3fff g2:  g3:
0001
g4: fff80013113299c0 g5: fff8001fa6808000 g6: fff800131132c000 g7:
00c0
o0: fff8001fa760c460 o1: fff8001311329a50 o2: fff8001fa7607504 o3:
0003
o4: fff8001f96e63a40 o5: fff8001311d77ec0 sp: fff800131132f0e1 ret_pc:
0049ed94
RPC: 
l0:  l1: 0800 l2:  l3:

l4: 000b2aa30e34b10d l5:  l6:  l7:
fff8001fa7605028
i0: fff80013111a8a00 i1: fff80013155a0780 i2:  i3:

i4:  i5: 0010 i6: fff800131132f1a1 i7:
103fa4b0
I7: 
Call Trace:
 [103fa4b0] ixgbe_xmit_frame+0x30/0xa0 [ixgbe]
 [00998c74] netpoll_start_xmit+0xf4/0x200
 [00998e10] queue_process+0x90/0x160
 [00485fa8] process_one_work+0x188/0x480
 [00486410] worker_thread+0x170/0x4c0
 [0048c6b8] kthread+0xd8/0x120
 [00406064] ret_from_fork+0x1c/0x2c
 []   (null)
Disabling lock debugging due to kernel taint
Caller[103fa4b0]: ixgbe_xmit_frame+0x30/0xa0 [ixgbe]
Caller[00998c74]: netpoll_start_xmit+0xf4/0x200
Caller[00998e10]: queue_process+0x90/0x160
Caller[00485fa8]: process_one_work+0x188/0x480
Caller[00486410]: worker_thread+0x170/0x4c0
Caller[0048c6b8]: kthread+0xd8/0x120
Caller[00406064]: ret_from_fork+0x1c/0x2c
Caller[]:   (null)

Signed-off-by: Tushar Dave 
---
 net/core/netpoll.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 9424673..c572e49 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -101,6 +101,7 @@ static void queue_process(struct work_struct *work)
container_of(work, struct netpoll_info, tx_work.work);
struct sk_buff *skb;
unsigned long flags;
+   u16 q_index;
 
while ((skb = skb_dequeue(>txq))) {
struct net_device *dev = skb->dev;
@@ -117,6 +118,12 @@ static void queue_process(struct work_struct *work)
HARD_TX_LOCK(dev, txq, smp_processor_id());
if (netif_xmit_frozen_or_stopped(txq) ||
netpoll_start_xmit(skb, dev, txq) != NETDEV_TX_OK) {
+   /* check if skb->queue_mapping has changed */
+   q_index = skb_get_queue_mapping(skb);
+   if (unlikely(q_index >= dev->real_num_tx_queues)) {
+   q_index = q_index % dev->real_num_tx_queues;
+   skb_set_queue_mapping(skb, q_index);
+   }
skb_queue_head(>txq, skb);
HARD_TX_UNLOCK(dev, txq);
local_irq_restore(flags);
-- 
1.9.1