Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-28 Thread Fengguang Wu

Hi Sabrina,


The idea when this first came up was to skip the sleeping part of
disable_irq():

http://marc.info/?l=linux-netdev=142314159626052

This fell off my todolist and I didn't send the conversion patches,
which would basically look like this:


Yes it works in the several machines that had the BUG!

[   23.806847] netpoll: netconsole: local port 6665
[   23.807145] netpoll: netconsole: local IPv4 address 0.0.0.0
[   23.807494] netpoll: netconsole: interface 'eth0'
[   23.807799] netpoll: netconsole: remote port 6646
[   23.808096] netpoll: netconsole: remote IPv4 address 192.168.1.1
[   23.808474] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
[   23.808910] netpoll: netconsole: local IP 192.168.1.161
[   23.811680] 28 Jul 19:42:10 ntpdate[376]: step time server 192.168.1.1 
offset 1696.257557 sec
[   23.811886] console [netcon0] enabled
[   23.812131] netconsole: network logging started

Thanks,
Fengguang



diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 41f32c0b341e..b022691e680b 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6713,20 +6713,20 @@ static irqreturn_t e1000_intr_msix(int __always_unused 
irq, void *data)

vector = 0;
msix_irq = adapter->msix_entries[vector].vector;
-   disable_irq(msix_irq);
-   e1000_intr_msix_rx(msix_irq, netdev);
+   if (disable_hardirq(msix_irq))
+   e1000_intr_msix_rx(msix_irq, netdev);
enable_irq(msix_irq);

vector++;
msix_irq = adapter->msix_entries[vector].vector;
-   disable_irq(msix_irq);
-   e1000_intr_msix_tx(msix_irq, netdev);
+   if (disable_hardirq(msix_irq))
+   e1000_intr_msix_tx(msix_irq, netdev);
enable_irq(msix_irq);

vector++;
msix_irq = adapter->msix_entries[vector].vector;
-   disable_irq(msix_irq);
-   e1000_msix_other(msix_irq, netdev);
+   if (disable_hardirq(msix_irq))
+   e1000_msix_other(msix_irq, netdev);
enable_irq(msix_irq);
}

@@ -6750,13 +6750,13 @@ static void e1000_netpoll(struct net_device *netdev)
e1000_intr_msix(adapter->pdev->irq, netdev);
break;
case E1000E_INT_MODE_MSI:
-   disable_irq(adapter->pdev->irq);
-   e1000_intr_msi(adapter->pdev->irq, netdev);
+   if (disable_hardirq(adapter->pdev->irq))
+   e1000_intr_msi(adapter->pdev->irq, netdev);
enable_irq(adapter->pdev->irq);
break;
default:/* E1000E_INT_MODE_LEGACY */
-   disable_irq(adapter->pdev->irq);
-   e1000_intr(adapter->pdev->irq, netdev);
+   if (disable_hardirq(adapter->pdev->irq))
+   e1000_intr(adapter->pdev->irq, netdev);
enable_irq(adapter->pdev->irq);
br
ak;
}


Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-28 Thread Thomas Gleixner
On Thu, 28 Jul 2016, Sabrina Dubroca wrote:
> 2016-07-28, 07:43:55 +0200, Eric Dumazet wrote:
> > I would prefer having a definitive advice from Thomas Gleixner and/or
> > others if disable_irq() is forbidden from IRQ path.

Yes it is. Before we added threaded interrupt handlers it was not an issue,
but with (possibly) threaded interrupts it's an absolute no-no.

> > As I said, about all netpoll() methods in net drivers use disable_irq()
> > so a lot of patches would be needed.
> > 
> > disable_irq() should then test this condition earlier, so that we can
> > detect potential bug, even if the IRQ is not (yet) threaded.
> 
> The idea when this first came up was to skip the sleeping part of
> disable_irq():
> 
> http://marc.info/?l=linux-netdev=142314159626052
> 
> This fell off my todolist and I didn't send the conversion patches,
> which would basically look like this:
>
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 41f32c0b341e..b022691e680b 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -6713,20 +6713,20 @@ static irqreturn_t e1000_intr_msix(int 
> __always_unused irq, void *data)
>  
>   vector = 0;
>   msix_irq = adapter->msix_entries[vector].vector;
> - disable_irq(msix_irq);
> - e1000_intr_msix_rx(msix_irq, netdev);
> + if (disable_hardirq(msix_irq))
> + e1000_intr_msix_rx(msix_irq, netdev);
>   enable_irq(msix_irq);

That'll work nicely even when one of the affected interrupts is threaded.

Thanks,

tglx


Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-28 Thread Sabrina Dubroca
2016-07-28, 07:43:55 +0200, Eric Dumazet wrote:
> On Wed, 2016-07-27 at 14:38 -0700, Jeff Kirsher wrote:
> > On Tue, 2016-07-26 at 11:14 +0200, Eric Dumazet wrote:
> > > Could you try this ?
> > > 
> > > diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c
> > > b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > > index
> > > f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a
> > > 938b3820b 100644
> > > --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> > > +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > > @@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device
> > > *netdev)
> > >  {
> > > struct e1000_adapter *adapter = netdev_priv(netdev);
> > >  
> > > -   disable_irq(adapter->pdev->irq);
> > > -   e1000_intr(adapter->pdev->irq, netdev);
> > > -   enable_irq(adapter->pdev->irq);
> > > +   if (napi_schedule_prep(>napi)) {
> > > +   adapter->total_tx_bytes = 0;
> > > +   adapter->total_tx_packets = 0;
> > > +   adapter->total_rx_bytes = 0;
> > > +   adapter->total_rx_packets = 0;
> > > +   __napi_schedule(>napi);
> > > +   }
> > >  }
> > >  #endif
> > >  
> > 
> > Since this fixes the issue Fengguang saw, will you be submitting a formal
> > patch Eric? (please) I can get this queued up for Dave's net tree as soon
> > as I receive the formal patch.
> 
> I would prefer having a definitive advice from Thomas Gleixner and/or
> others if disable_irq() is forbidden from IRQ path.
> 
> As I said, about all netpoll() methods in net drivers use disable_irq()
> so a lot of patches would be needed.
> 
> disable_irq() should then test this condition earlier, so that we can
> detect potential bug, even if the IRQ is not (yet) threaded.

The idea when this first came up was to skip the sleeping part of
disable_irq():

http://marc.info/?l=linux-netdev=142314159626052

This fell off my todolist and I didn't send the conversion patches,
which would basically look like this:


diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 41f32c0b341e..b022691e680b 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6713,20 +6713,20 @@ static irqreturn_t e1000_intr_msix(int __always_unused 
irq, void *data)
 
vector = 0;
msix_irq = adapter->msix_entries[vector].vector;
-   disable_irq(msix_irq);
-   e1000_intr_msix_rx(msix_irq, netdev);
+   if (disable_hardirq(msix_irq))
+   e1000_intr_msix_rx(msix_irq, netdev);
enable_irq(msix_irq);
 
vector++;
msix_irq = adapter->msix_entries[vector].vector;
-   disable_irq(msix_irq);
-   e1000_intr_msix_tx(msix_irq, netdev);
+   if (disable_hardirq(msix_irq))
+   e1000_intr_msix_tx(msix_irq, netdev);
enable_irq(msix_irq);
 
vector++;
msix_irq = adapter->msix_entries[vector].vector;
-   disable_irq(msix_irq);
-   e1000_msix_other(msix_irq, netdev);
+   if (disable_hardirq(msix_irq))
+   e1000_msix_other(msix_irq, netdev);
enable_irq(msix_irq);
}
 
@@ -6750,13 +6750,13 @@ static void e1000_netpoll(struct net_device *netdev)
e1000_intr_msix(adapter->pdev->irq, netdev);
break;
case E1000E_INT_MODE_MSI:
-   disable_irq(adapter->pdev->irq);
-   e1000_intr_msi(adapter->pdev->irq, netdev);
+   if (disable_hardirq(adapter->pdev->irq))
+   e1000_intr_msi(adapter->pdev->irq, netdev);
enable_irq(adapter->pdev->irq);
break;
default:/* E1000E_INT_MODE_LEGACY */
-   disable_irq(adapter->pdev->irq);
-   e1000_intr(adapter->pdev->irq, netdev);
+   if (disable_hardirq(adapter->pdev->irq))
+   e1000_intr(adapter->pdev->irq, netdev);
enable_irq(adapter->pdev->irq);
break;
}


-- 
Sabrina


Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-28 Thread Valdis . Kletnieks
On Thu, 28 Jul 2016 09:45:12 +0200, Thomas Gleixner said:
> On Tue, 26 Jul 2016, nick wrote:
> > diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c 
> > b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > index f42129d..e1830af 100644
> > --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> > +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > @@ -3797,7 +3797,7 @@ static irqreturn_t e1000_intr(int irq, void *data)
> > hw->get_link_status = 1;
> > /* guard against interrupt when we're going down */
> > if (!test_bit(__E1000_DOWN, >flags))
> > -   schedule_delayed_work(>watchdog_task, 1);
> > +   mod_work(>watchdog_task, jiffies + 1);
>
> And that's not even funny anymore. Are you using a random generator to create
> these patches?

At some point, we need to decide if the occasional accidentally-correct
trivial patch from Nick is worth all the wasted maintainer time.



pgpmzfn8ooCEA.pgp
Description: PGP signature


Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-28 Thread Thomas Gleixner
On Tue, 26 Jul 2016, nick wrote:
> diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c 
> b/drivers/net/ethernet/intel/e1000/e1000_main.c
> index f42129d..e1830af 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> @@ -3797,7 +3797,7 @@ static irqreturn_t e1000_intr(int irq, void *data)
>   hw->get_link_status = 1;
>   /* guard against interrupt when we're going down */
>   if (!test_bit(__E1000_DOWN, >flags))
> - schedule_delayed_work(>watchdog_task, 1);
> + mod_work(>watchdog_task, jiffies + 1);

And that's not even funny anymore. Are you using a random generator to create
these patches?




Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-27 Thread Eric Dumazet
On Wed, 2016-07-27 at 14:38 -0700, Jeff Kirsher wrote:
> On Tue, 2016-07-26 at 11:14 +0200, Eric Dumazet wrote:
> > Could you try this ?
> > 
> > diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c
> > b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > index
> > f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a
> > 938b3820b 100644
> > --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> > +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > @@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device
> > *netdev)
> >  {
> > struct e1000_adapter *adapter = netdev_priv(netdev);
> >  
> > -   disable_irq(adapter->pdev->irq);
> > -   e1000_intr(adapter->pdev->irq, netdev);
> > -   enable_irq(adapter->pdev->irq);
> > +   if (napi_schedule_prep(>napi)) {
> > +   adapter->total_tx_bytes = 0;
> > +   adapter->total_tx_packets = 0;
> > +   adapter->total_rx_bytes = 0;
> > +   adapter->total_rx_packets = 0;
> > +   __napi_schedule(>napi);
> > +   }
> >  }
> >  #endif
> >  
> 
> Since this fixes the issue Fengguang saw, will you be submitting a formal
> patch Eric? (please) I can get this queued up for Dave's net tree as soon
> as I receive the formal patch.

I would prefer having a definitive advice from Thomas Gleixner and/or
others if disable_irq() is forbidden from IRQ path.

As I said, about all netpoll() methods in net drivers use disable_irq()
so a lot of patches would be needed.

disable_irq() should then test this condition earlier, so that we can
detect potential bug, even if the IRQ is not (yet) threaded.

Thanks.




Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-27 Thread Jeff Kirsher
On Tue, 2016-07-26 at 11:14 +0200, Eric Dumazet wrote:
> Could you try this ?
> 
> diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c
> b/drivers/net/ethernet/intel/e1000/e1000_main.c
> index
> f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a
> 938b3820b 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> @@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device
> *netdev)
>  {
> struct e1000_adapter *adapter = netdev_priv(netdev);
>  
> -   disable_irq(adapter->pdev->irq);
> -   e1000_intr(adapter->pdev->irq, netdev);
> -   enable_irq(adapter->pdev->irq);
> +   if (napi_schedule_prep(>napi)) {
> +   adapter->total_tx_bytes = 0;
> +   adapter->total_tx_packets = 0;
> +   adapter->total_rx_bytes = 0;
> +   adapter->total_rx_packets = 0;
> +   __napi_schedule(>napi);
> +   }
>  }
>  #endif
>  

Since this fixes the issue Fengguang saw, will you be submitting a formal
patch Eric? (please) I can get this queued up for Dave's net tree as soon
as I receive the formal patch.

signature.asc
Description: This is a digitally signed message part


Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-27 Thread Eric Dumazet
On Wed, 2016-07-27 at 23:01 +0800, Fengguang Wu wrote:
> On Tue, Jul 26, 2016 at 06:28:33PM +0200, Eric Dumazet wrote:
> >On Tue, 2016-07-26 at 23:32 +0800, Fengguang Wu wrote:
> >> Hi Eric,
> >>
> >> It works!
> >>
> >> On Tue, Jul 26, 2016 at 11:14:52AM +0200, Eric Dumazet wrote:
> >> >On Tue, 2016-07-26 at 11:50 +0800, Fengguang Wu wrote:
> >> >> Greetings,
> >> >>
> >> >> This BUG message can be found in recent kernels as well as v4.4 and
> >> >> linux-stable. It happens when running
> >> >>
> >> >> modprobe netconsole netconsole=@/,$port@$server/
> >> >>
> >> >> [   39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server 
> >> >> 192.168.1.1 offset -673.833841 sec
> >> >> [   39.943285] netpoll: netconsole: local port 6665
> >> >> [   39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0
> >> >> [   39.943609] netpoll: netconsole: interface 'eth0'
> >> >> [   39.943756] netpoll: netconsole: remote port 6672
> >> >> [   39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1
> >> >> [   39.944099] netpoll: netconsole: remote ethernet address 
> >> >> ff:ff:ff:ff:ff:ff
> >> >> [   39.944311] netpoll: netconsole: local IP 192.168.1.193
> >> >> [   39.944514] BUG: sleeping function called from invalid context at 
> >> >> kernel/irq/manage.c:110
> >> >> [   39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: 
> >> >> modprobe
> >> >> [   39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted 
> >> >> 4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102
> >> >> [   39.944518] Hardware name:  /DZ77BH-55K, BIOS 
> >> >> BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
> >> >> [   39.944522]   c90001f2f9e8 813417d9 
> >> >> 88007faba5c0
> >> >> [   39.944524]  006e c90001f2fa00 810aec03 
> >> >> 81a25948
> >> >> [   39.944525]  c90001f2fa28 810aec9a 8803e5bd9400 
> >> >> 8803e50fbd68
> >> >> [   39.944526] Call Trace:
> >> >> [   39.944533]  [] dump_stack+0x63/0x8a
> >> >> [   39.944536]  [] ___might_sleep+0xd3/0x120
> >> >> [   39.944537]  [] __might_sleep+0x4a/0x80
> >> >> [   39.944541]  [] synchronize_irq+0x38/0xa0
> >> >> [   39.944543]  [] ? __irq_put_desc_unlock+0x1e/0x40
> >> >> [   39.944545]  [] ? __disable_irq_nosync+0x43/0x60
> >> >> [   39.944547]  [] disable_irq+0x1c/0x20
> >> >> [   39.944559]  [] e1000_netpoll+0xf2/0x120 [e1000e]
> >> >> [   39.944563]  [] netpoll_poll_dev+0x5c/0x1a0
> >> >> [   39.944567]  [] ? __kmalloc_reserve+0x31/0x90
> >> >> [   39.944569]  [] netpoll_send_skb_on_dev+0x16b/0x250
> >> >> [   39.944572]  [] netpoll_send_udp+0x2ec/0x450
> >> >> [   39.944576]  [] write_msg+0xb2/0xf0 [netconsole]
> >> >> [   39.944578]  [] call_console_drivers+0x115/0x120
> >> >> [   39.944580]  [] console_unlock+0x333/0x5c0
> >> >> [   39.944583]  [] register_console+0x1c4/0x380
> >> >> [   39.944586]  [] init_netconsole+0x1c5/0x1000 
> >> >> [netconsole]
> >> >> [   39.944588]  [] ? 0xa004f000
> >> >> [   39.944591]  [] do_one_initcall+0x3d/0x150
> >> >> [   39.944592]  [] ? __might_sleep+0x4a/0x80
> >> >> [   39.944596]  [] ? 
> >> >> kmem_cache_alloc_trace+0x188/0x1e0
> >> >> [   39.944598]  [] do_init_module+0x5f/0x1d8
> >> >> [   39.944602]  [] load_module+0x1429/0x1b40
> >> >> [   39.944604]  [] ? __symbol_put+0x40/0x40
> >> >> [   39.944607]  [] ? kernel_read_file+0x178/0x1a0
> >> >> [   39.944608]  [] ? 
> >> >> kernel_read_file_from_fd+0x49/0x80
> >> >> [   39.944611]  [] SYSC_finit_module+0xc3/0xf0
> >> >> [   39.944614]  [] SyS_finit_module+0xe/0x10
> >> >> [   39.944617]  [] entry_SYSCALL_64_fastpath+0x1a/0xa9
> >> >> [   39.946384] console [netcon0] enabled
> >> >> [   39.946514] netconsole: network logging started
> >> >>
> >> >> Can this be possibly fixed?
> >> >
> >> >Could you try this ?
> >> >
> >> >diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c 
> >> >b/drivers/net/ethernet/intel/e1000/e1000_main.c
> >> >index 
> >> >f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a938b3820b
> >> > 100644
> >> >--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> >> >+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> >> >@@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device 
> >> >*netdev)
> >> > {
> >> >  struct e1000_adapter *adapter = netdev_priv(netdev);
> >> >
> >> >- disable_irq(adapter->pdev->irq);
> >> >- e1000_intr(adapter->pdev->irq, netdev);
> >> >- enable_irq(adapter->pdev->irq);
> >> >+ if (napi_schedule_prep(>napi)) {
> >> >+ adapter->total_tx_bytes = 0;
> >> >+ adapter->total_tx_packets = 0;
> >> >+ adapter->total_rx_bytes = 0;
> >> >+ adapter->total_rx_packets = 0;
> >> >+ __napi_schedule(>napi);
> >> >+ }
> >>
> >> The machines are actually running e1000e driver, so I copied your
> >> approach to e1000e and it works:
> >>
> >> kern  :info  : [   16.109647] netpoll: netconsole: local port 6665
> >> kern  :info  : [   16.109961] netpoll: netconsole: local IPv4 

Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-27 Thread Fengguang Wu

On Tue, Jul 26, 2016 at 06:28:33PM +0200, Eric Dumazet wrote:

On Tue, 2016-07-26 at 23:32 +0800, Fengguang Wu wrote:

Hi Eric,

It works!

On Tue, Jul 26, 2016 at 11:14:52AM +0200, Eric Dumazet wrote:
>On Tue, 2016-07-26 at 11:50 +0800, Fengguang Wu wrote:
>> Greetings,
>>
>> This BUG message can be found in recent kernels as well as v4.4 and
>> linux-stable. It happens when running
>>
>> modprobe netconsole netconsole=@/,$port@$server/
>>
>> [   39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server 192.168.1.1 
offset -673.833841 sec
>> [   39.943285] netpoll: netconsole: local port 6665
>> [   39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0
>> [   39.943609] netpoll: netconsole: interface 'eth0'
>> [   39.943756] netpoll: netconsole: remote port 6672
>> [   39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1
>> [   39.944099] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
>> [   39.944311] netpoll: netconsole: local IP 192.168.1.193
>> [   39.944514] BUG: sleeping function called from invalid context at 
kernel/irq/manage.c:110
>> [   39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: modprobe
>> [   39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted 
4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102
>> [   39.944518] Hardware name:  /DZ77BH-55K, BIOS 
BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
>> [   39.944522]   c90001f2f9e8 813417d9 
88007faba5c0
>> [   39.944524]  006e c90001f2fa00 810aec03 
81a25948
>> [   39.944525]  c90001f2fa28 810aec9a 8803e5bd9400 
8803e50fbd68
>> [   39.944526] Call Trace:
>> [   39.944533]  [] dump_stack+0x63/0x8a
>> [   39.944536]  [] ___might_sleep+0xd3/0x120
>> [   39.944537]  [] __might_sleep+0x4a/0x80
>> [   39.944541]  [] synchronize_irq+0x38/0xa0
>> [   39.944543]  [] ? __irq_put_desc_unlock+0x1e/0x40
>> [   39.944545]  [] ? __disable_irq_nosync+0x43/0x60
>> [   39.944547]  [] disable_irq+0x1c/0x20
>> [   39.944559]  [] e1000_netpoll+0xf2/0x120 [e1000e]
>> [   39.944563]  [] netpoll_poll_dev+0x5c/0x1a0
>> [   39.944567]  [] ? __kmalloc_reserve+0x31/0x90
>> [   39.944569]  [] netpoll_send_skb_on_dev+0x16b/0x250
>> [   39.944572]  [] netpoll_send_udp+0x2ec/0x450
>> [   39.944576]  [] write_msg+0xb2/0xf0 [netconsole]
>> [   39.944578]  [] call_console_drivers+0x115/0x120
>> [   39.944580]  [] console_unlock+0x333/0x5c0
>> [   39.944583]  [] register_console+0x1c4/0x380
>> [   39.944586]  [] init_netconsole+0x1c5/0x1000 
[netconsole]
>> [   39.944588]  [] ? 0xa004f000
>> [   39.944591]  [] do_one_initcall+0x3d/0x150
>> [   39.944592]  [] ? __might_sleep+0x4a/0x80
>> [   39.944596]  [] ? kmem_cache_alloc_trace+0x188/0x1e0
>> [   39.944598]  [] do_init_module+0x5f/0x1d8
>> [   39.944602]  [] load_module+0x1429/0x1b40
>> [   39.944604]  [] ? __symbol_put+0x40/0x40
>> [   39.944607]  [] ? kernel_read_file+0x178/0x1a0
>> [   39.944608]  [] ? kernel_read_file_from_fd+0x49/0x80
>> [   39.944611]  [] SYSC_finit_module+0xc3/0xf0
>> [   39.944614]  [] SyS_finit_module+0xe/0x10
>> [   39.944617]  [] entry_SYSCALL_64_fastpath+0x1a/0xa9
>> [   39.946384] console [netcon0] enabled
>> [   39.946514] netconsole: network logging started
>>
>> Can this be possibly fixed?
>
>Could you try this ?
>
>diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c 
b/drivers/net/ethernet/intel/e1000/e1000_main.c
>index 
f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a938b3820b 
100644
>--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
>+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
>@@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device *netdev)
> {
>struct e1000_adapter *adapter = netdev_priv(netdev);
>
>-   disable_irq(adapter->pdev->irq);
>-   e1000_intr(adapter->pdev->irq, netdev);
>-   enable_irq(adapter->pdev->irq);
>+   if (napi_schedule_prep(>napi)) {
>+   adapter->total_tx_bytes = 0;
>+   adapter->total_tx_packets = 0;
>+   adapter->total_rx_bytes = 0;
>+   adapter->total_rx_packets = 0;
>+   __napi_schedule(>napi);
>+   }

The machines are actually running e1000e driver, so I copied your
approach to e1000e and it works:

kern  :info  : [   16.109647] netpoll: netconsole: local port 6665
kern  :info  : [   16.109961] netpoll: netconsole: local IPv4 address 0.0.0.0
kern  :info  : [   16.110346] netpoll: netconsole: interface 'eth0'
kern  :info  : [   16.110672] netpoll: netconsole: remote port 6676
kern  :info  : [   16.110991] netpoll: netconsole: remote IPv4 address 
192.168.2.1
kern  :info  : [   16.111398] netpoll: netconsole: remote ethernet address 
ff:ff:ff:ff:ff:ff
kern  :info  : [   16.111845] netpoll: netconsole: local IP 192.168.2.3
kern  :info  : [   16.114284] console [netcon0] enabled
kern  :info  : [   16.114550] netconsole: network logging started

However I'm not sure if it'll have side effects, because this
effectively 

Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-26 Thread Eric Dumazet
On Tue, 2016-07-26 at 23:32 +0800, Fengguang Wu wrote:
> Hi Eric,
> 
> It works!
> 
> On Tue, Jul 26, 2016 at 11:14:52AM +0200, Eric Dumazet wrote:
> >On Tue, 2016-07-26 at 11:50 +0800, Fengguang Wu wrote:
> >> Greetings,
> >>
> >> This BUG message can be found in recent kernels as well as v4.4 and
> >> linux-stable. It happens when running
> >>
> >> modprobe netconsole netconsole=@/,$port@$server/
> >>
> >> [   39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server 192.168.1.1 
> >> offset -673.833841 sec
> >> [   39.943285] netpoll: netconsole: local port 6665
> >> [   39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0
> >> [   39.943609] netpoll: netconsole: interface 'eth0'
> >> [   39.943756] netpoll: netconsole: remote port 6672
> >> [   39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1
> >> [   39.944099] netpoll: netconsole: remote ethernet address 
> >> ff:ff:ff:ff:ff:ff
> >> [   39.944311] netpoll: netconsole: local IP 192.168.1.193
> >> [   39.944514] BUG: sleeping function called from invalid context at 
> >> kernel/irq/manage.c:110
> >> [   39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: modprobe
> >> [   39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted 
> >> 4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102
> >> [   39.944518] Hardware name:  /DZ77BH-55K, BIOS 
> >> BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
> >> [   39.944522]   c90001f2f9e8 813417d9 
> >> 88007faba5c0
> >> [   39.944524]  006e c90001f2fa00 810aec03 
> >> 81a25948
> >> [   39.944525]  c90001f2fa28 810aec9a 8803e5bd9400 
> >> 8803e50fbd68
> >> [   39.944526] Call Trace:
> >> [   39.944533]  [] dump_stack+0x63/0x8a
> >> [   39.944536]  [] ___might_sleep+0xd3/0x120
> >> [   39.944537]  [] __might_sleep+0x4a/0x80
> >> [   39.944541]  [] synchronize_irq+0x38/0xa0
> >> [   39.944543]  [] ? __irq_put_desc_unlock+0x1e/0x40
> >> [   39.944545]  [] ? __disable_irq_nosync+0x43/0x60
> >> [   39.944547]  [] disable_irq+0x1c/0x20
> >> [   39.944559]  [] e1000_netpoll+0xf2/0x120 [e1000e]
> >> [   39.944563]  [] netpoll_poll_dev+0x5c/0x1a0
> >> [   39.944567]  [] ? __kmalloc_reserve+0x31/0x90
> >> [   39.944569]  [] netpoll_send_skb_on_dev+0x16b/0x250
> >> [   39.944572]  [] netpoll_send_udp+0x2ec/0x450
> >> [   39.944576]  [] write_msg+0xb2/0xf0 [netconsole]
> >> [   39.944578]  [] call_console_drivers+0x115/0x120
> >> [   39.944580]  [] console_unlock+0x333/0x5c0
> >> [   39.944583]  [] register_console+0x1c4/0x380
> >> [   39.944586]  [] init_netconsole+0x1c5/0x1000 
> >> [netconsole]
> >> [   39.944588]  [] ? 0xa004f000
> >> [   39.944591]  [] do_one_initcall+0x3d/0x150
> >> [   39.944592]  [] ? __might_sleep+0x4a/0x80
> >> [   39.944596]  [] ? kmem_cache_alloc_trace+0x188/0x1e0
> >> [   39.944598]  [] do_init_module+0x5f/0x1d8
> >> [   39.944602]  [] load_module+0x1429/0x1b40
> >> [   39.944604]  [] ? __symbol_put+0x40/0x40
> >> [   39.944607]  [] ? kernel_read_file+0x178/0x1a0
> >> [   39.944608]  [] ? kernel_read_file_from_fd+0x49/0x80
> >> [   39.944611]  [] SYSC_finit_module+0xc3/0xf0
> >> [   39.944614]  [] SyS_finit_module+0xe/0x10
> >> [   39.944617]  [] entry_SYSCALL_64_fastpath+0x1a/0xa9
> >> [   39.946384] console [netcon0] enabled
> >> [   39.946514] netconsole: network logging started
> >>
> >> Can this be possibly fixed?
> >
> >Could you try this ?
> >
> >diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c 
> >b/drivers/net/ethernet/intel/e1000/e1000_main.c
> >index 
> >f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a938b3820b
> > 100644
> >--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> >+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> >@@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device *netdev)
> > {
> > struct e1000_adapter *adapter = netdev_priv(netdev);
> >
> >-disable_irq(adapter->pdev->irq);
> >-e1000_intr(adapter->pdev->irq, netdev);
> >-enable_irq(adapter->pdev->irq);
> >+if (napi_schedule_prep(>napi)) {
> >+adapter->total_tx_bytes = 0;
> >+adapter->total_tx_packets = 0;
> >+adapter->total_rx_bytes = 0;
> >+adapter->total_rx_packets = 0;
> >+__napi_schedule(>napi);
> >+}
> 
> The machines are actually running e1000e driver, so I copied your
> approach to e1000e and it works:
> 
> kern  :info  : [   16.109647] netpoll: netconsole: local port 6665
> kern  :info  : [   16.109961] netpoll: netconsole: local IPv4 address 0.0.0.0
> kern  :info  : [   16.110346] netpoll: netconsole: interface 'eth0'
> kern  :info  : [   16.110672] netpoll: netconsole: remote port 6676
> kern  :info  : [   16.110991] netpoll: netconsole: remote IPv4 address 
> 192.168.2.1
> kern  :info  : [   16.111398] netpoll: netconsole: remote ethernet address 
> ff:ff:ff:ff:ff:ff
> kern  :info  : [   16.111845] netpoll: netconsole: local IP 192.168.2.3
> kern  

Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-26 Thread Fengguang Wu

Hi Eric,

It works!

On Tue, Jul 26, 2016 at 11:14:52AM +0200, Eric Dumazet wrote:

On Tue, 2016-07-26 at 11:50 +0800, Fengguang Wu wrote:

Greetings,

This BUG message can be found in recent kernels as well as v4.4 and
linux-stable. It happens when running

modprobe netconsole netconsole=@/,$port@$server/

[   39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server 192.168.1.1 
offset -673.833841 sec
[   39.943285] netpoll: netconsole: local port 6665
[   39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0
[   39.943609] netpoll: netconsole: interface 'eth0'
[   39.943756] netpoll: netconsole: remote port 6672
[   39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1
[   39.944099] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
[   39.944311] netpoll: netconsole: local IP 192.168.1.193
[   39.944514] BUG: sleeping function called from invalid context at 
kernel/irq/manage.c:110
[   39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: modprobe
[   39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted 
4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102
[   39.944518] Hardware name:  /DZ77BH-55K, BIOS 
BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
[   39.944522]   c90001f2f9e8 813417d9 
88007faba5c0
[   39.944524]  006e c90001f2fa00 810aec03 
81a25948
[   39.944525]  c90001f2fa28 810aec9a 8803e5bd9400 
8803e50fbd68
[   39.944526] Call Trace:
[   39.944533]  [] dump_stack+0x63/0x8a
[   39.944536]  [] ___might_sleep+0xd3/0x120
[   39.944537]  [] __might_sleep+0x4a/0x80
[   39.944541]  [] synchronize_irq+0x38/0xa0
[   39.944543]  [] ? __irq_put_desc_unlock+0x1e/0x40
[   39.944545]  [] ? __disable_irq_nosync+0x43/0x60
[   39.944547]  [] disable_irq+0x1c/0x20
[   39.944559]  [] e1000_netpoll+0xf2/0x120 [e1000e]
[   39.944563]  [] netpoll_poll_dev+0x5c/0x1a0
[   39.944567]  [] ? __kmalloc_reserve+0x31/0x90
[   39.944569]  [] netpoll_send_skb_on_dev+0x16b/0x250
[   39.944572]  [] netpoll_send_udp+0x2ec/0x450
[   39.944576]  [] write_msg+0xb2/0xf0 [netconsole]
[   39.944578]  [] call_console_drivers+0x115/0x120
[   39.944580]  [] console_unlock+0x333/0x5c0
[   39.944583]  [] register_console+0x1c4/0x380
[   39.944586]  [] init_netconsole+0x1c5/0x1000 [netconsole]
[   39.944588]  [] ? 0xa004f000
[   39.944591]  [] do_one_initcall+0x3d/0x150
[   39.944592]  [] ? __might_sleep+0x4a/0x80
[   39.944596]  [] ? kmem_cache_alloc_trace+0x188/0x1e0
[   39.944598]  [] do_init_module+0x5f/0x1d8
[   39.944602]  [] load_module+0x1429/0x1b40
[   39.944604]  [] ? __symbol_put+0x40/0x40
[   39.944607]  [] ? kernel_read_file+0x178/0x1a0
[   39.944608]  [] ? kernel_read_file_from_fd+0x49/0x80
[   39.944611]  [] SYSC_finit_module+0xc3/0xf0
[   39.944614]  [] SyS_finit_module+0xe/0x10
[   39.944617]  [] entry_SYSCALL_64_fastpath+0x1a/0xa9
[   39.946384] console [netcon0] enabled
[   39.946514] netconsole: network logging started

Can this be possibly fixed?


Could you try this ?

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c 
b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 
f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a938b3820b
 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device *netdev)
{
struct e1000_adapter *adapter = netdev_priv(netdev);

-   disable_irq(adapter->pdev->irq);
-   e1000_intr(adapter->pdev->irq, netdev);
-   enable_irq(adapter->pdev->irq);
+   if (napi_schedule_prep(>napi)) {
+   adapter->total_tx_bytes = 0;
+   adapter->total_tx_packets = 0;
+   adapter->total_rx_bytes = 0;
+   adapter->total_rx_packets = 0;
+   __napi_schedule(>napi);
+   }


The machines are actually running e1000e driver, so I copied your
approach to e1000e and it works:

kern  :info  : [   16.109647] netpoll: netconsole: local port 6665
kern  :info  : [   16.109961] netpoll: netconsole: local IPv4 address 0.0.0.0
kern  :info  : [   16.110346] netpoll: netconsole: interface 'eth0'
kern  :info  : [   16.110672] netpoll: netconsole: remote port 6676
kern  :info  : [   16.110991] netpoll: netconsole: remote IPv4 address 
192.168.2.1
kern  :info  : [   16.111398] netpoll: netconsole: remote ethernet address 
ff:ff:ff:ff:ff:ff
kern  :info  : [   16.111845] netpoll: netconsole: local IP 192.168.2.3
kern  :info  : [   16.114284] console [netcon0] enabled
kern  :info  : [   16.114550] netconsole: network logging started

However I'm not sure if it'll have side effects, because this
effectively disables the various checks in e1000_intr() and
e1000_intr_msi().

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 9b4ec13..4f89873 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ 

Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-26 Thread Thomas Gleixner
On Tue, 26 Jul 2016, Fengguang Wu wrote:
> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> @@ -3797,7 +3797,7 @@ static irqreturn_t e1000_intr(int irq, void *data)
>   hw->get_link_status = 1;
>   /* guard against interrupt when we're going down */
>   if (!test_bit(__E1000_DOWN, >flags))
> - schedule_delayed_work(>watchdog_task, 1);
> + mod_timer(>watchdog_timer, jiffies + 1);

ROTFL 



Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-26 Thread Eric Dumazet
On Tue, 2016-07-26 at 11:50 +0800, Fengguang Wu wrote:
> Greetings,
> 
> This BUG message can be found in recent kernels as well as v4.4 and
> linux-stable. It happens when running
> 
> modprobe netconsole netconsole=@/,$port@$server/ 
> 
> [   39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server 192.168.1.1 
> offset -673.833841 sec
> [   39.943285] netpoll: netconsole: local port 6665
> [   39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0
> [   39.943609] netpoll: netconsole: interface 'eth0'
> [   39.943756] netpoll: netconsole: remote port 6672
> [   39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1
> [   39.944099] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
> [   39.944311] netpoll: netconsole: local IP 192.168.1.193
> [   39.944514] BUG: sleeping function called from invalid context at 
> kernel/irq/manage.c:110
> [   39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: modprobe
> [   39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted 
> 4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102
> [   39.944518] Hardware name:  /DZ77BH-55K, BIOS 
> BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
> [   39.944522]   c90001f2f9e8 813417d9 
> 88007faba5c0
> [   39.944524]  006e c90001f2fa00 810aec03 
> 81a25948
> [   39.944525]  c90001f2fa28 810aec9a 8803e5bd9400 
> 8803e50fbd68
> [   39.944526] Call Trace:
> [   39.944533]  [] dump_stack+0x63/0x8a
> [   39.944536]  [] ___might_sleep+0xd3/0x120
> [   39.944537]  [] __might_sleep+0x4a/0x80
> [   39.944541]  [] synchronize_irq+0x38/0xa0
> [   39.944543]  [] ? __irq_put_desc_unlock+0x1e/0x40
> [   39.944545]  [] ? __disable_irq_nosync+0x43/0x60
> [   39.944547]  [] disable_irq+0x1c/0x20
> [   39.944559]  [] e1000_netpoll+0xf2/0x120 [e1000e]
> [   39.944563]  [] netpoll_poll_dev+0x5c/0x1a0
> [   39.944567]  [] ? __kmalloc_reserve+0x31/0x90
> [   39.944569]  [] netpoll_send_skb_on_dev+0x16b/0x250
> [   39.944572]  [] netpoll_send_udp+0x2ec/0x450
> [   39.944576]  [] write_msg+0xb2/0xf0 [netconsole]
> [   39.944578]  [] call_console_drivers+0x115/0x120
> [   39.944580]  [] console_unlock+0x333/0x5c0
> [   39.944583]  [] register_console+0x1c4/0x380
> [   39.944586]  [] init_netconsole+0x1c5/0x1000 [netconsole]
> [   39.944588]  [] ? 0xa004f000
> [   39.944591]  [] do_one_initcall+0x3d/0x150
> [   39.944592]  [] ? __might_sleep+0x4a/0x80
> [   39.944596]  [] ? kmem_cache_alloc_trace+0x188/0x1e0
> [   39.944598]  [] do_init_module+0x5f/0x1d8
> [   39.944602]  [] load_module+0x1429/0x1b40
> [   39.944604]  [] ? __symbol_put+0x40/0x40
> [   39.944607]  [] ? kernel_read_file+0x178/0x1a0
> [   39.944608]  [] ? kernel_read_file_from_fd+0x49/0x80
> [   39.944611]  [] SYSC_finit_module+0xc3/0xf0
> [   39.944614]  [] SyS_finit_module+0xe/0x10
> [   39.944617]  [] entry_SYSCALL_64_fastpath+0x1a/0xa9
> [   39.946384] console [netcon0] enabled
> [   39.946514] netconsole: network logging started
> 
> Can this be possibly fixed?

Could you try this ?

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c 
b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 
f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a938b3820b
 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device *netdev)
 {
struct e1000_adapter *adapter = netdev_priv(netdev);
 
-   disable_irq(adapter->pdev->irq);
-   e1000_intr(adapter->pdev->irq, netdev);
-   enable_irq(adapter->pdev->irq);
+   if (napi_schedule_prep(>napi)) {
+   adapter->total_tx_bytes = 0;
+   adapter->total_tx_packets = 0;
+   adapter->total_rx_bytes = 0;
+   adapter->total_rx_packets = 0;
+   __napi_schedule(>napi);
+   }
 }
 #endif