Re: ath9k, irq 16: nobody cared, 4.13.16+ kernel

2018-01-26 Thread Ben Greear

On 01/26/2018 10:04 AM, Ben Greear wrote:

Hello,

I'm testing on a heavily loaded system.  It has 6 ath9k radios, each of which 
have
200 stations on them.  When I try to start some slow-speed UDP traffic on each
virtual station, I see this splat below.  Is this just expected on a heavily
loaded system?  Or, any ideas how to make it better (I'll investigate this 
irqpoll
thing it suggests...)


Maybe a related question...  The ath9k driver has a napi budget of 512.

I added code and verified that at least sometimes it would take more than
5 jiffies in this handler.

Maybe it is a good idea to have a jiffies timeout breakout as well as a budget
to stop the while loop and exit this handler?

If so, is 5 jiffies a good number do you think?

And, from looking at the IRQ code, it seems that it will re-enable the IRQ
later even if I see the splat below, so the system should recover OK?

Thanks,
Ben




[ 2833.533351] irq 16: nobody cared (try booting with the "irqpoll" option)
[ 2833.539111] CPU: 4 PID: 0 Comm: swapper/4 Tainted: GW  O4.13.16+ 
#36
[ 2833.539112] Hardware name: Iron_Systems,Inc CS-CAD-2U-A02/X10SRL-F, BIOS 
2.0b 05/02/2017
[ 2833.539113] Call Trace:
[ 2833.539114]  
[ 2833.539119]  dump_stack+0x63/0x87
[ 2833.539122]  __report_bad_irq+0x2e/0xc0
[ 2833.539123]  note_interrupt+0x227/0x270
[ 2833.539124]  handle_irq_event_percpu+0x40/0x50
[ 2833.539126]  handle_irq_event+0x34/0x60
[ 2833.539127]  handle_fasteoi_irq+0x78/0x140
[ 2833.539128]  handle_irq+0x18/0x30
[ 2833.539129]  do_IRQ+0x41/0xc0
[ 2833.539130]  common_interrupt+0x89/0x89
[ 2833.539133] RIP: 0010:fib_rules_lookup+0x3e/0x1a0
[ 2833.539134] RSP: 0018:88087fd03828 EFLAGS: 0283 ORIG_RAX: 
ffac
[ 2833.539135] RAX: 0353 RBX: 8807eb10c300 RCX: 88087fd03868
[ 2833.539135] RDX:  RSI: 88087fd038e0 RDI: 88085ab55b40
[ 2833.539136] RBP: 88087fd03858 R08:  R09: 0001
[ 2833.539136] R10: 81efd500 R11: 88081ae6ac00 R12: 88085ab55b40
[ 2833.539137] R13: 88087fd03868 R14: 88085ab55bc0 R15: 88087fd038e0
[ 2833.539139]  ? fib_rules_lookup+0x2a/0x1a0
[ 2833.539141]  __fib_lookup+0x4f/0x80
[ 2833.539143]  ip_route_input_rcu+0xa0d/0xc20
[ 2833.539144]  ip_route_input_noref+0x3e/0x60
[ 2833.539147]  ? ioapic_ir_ack_level+0x29/0x30
[ 2833.539147]  ? handle_fasteoi_irq+0xff/0x140
[ 2833.539149]  arp_process+0x486/0x820
[ 2833.539150]  ? do_IRQ+0x4a/0xc0
[ 2833.539151]  ? common_interrupt+0x89/0x89
[ 2833.539152]  arp_rcv+0x112/0x1b0
[ 2833.539155]  __netif_receive_skb_core+0x689/0xa90
[ 2833.539156]  __netif_receive_skb+0x13/0x60
[ 2833.539156]  ? __netif_receive_skb+0x13/0x60
[ 2833.539158]  netif_receive_skb_internal+0x1c6/0x440
[ 2833.539159]  netif_receive_skb+0x17/0x80
[ 2833.539178]  ieee80211_deliver_skb+0x165/0x1d0 [mac80211]
[ 2833.539187]  ieee80211_rx_handlers+0xe50/0x26f0 [mac80211]
[ 2833.539188]  ? skb_copy_bits+0x5f/0x260
[ 2833.539189]  ? copy_skb_header+0x12/0x90
[ 2833.539197]  ieee80211_prepare_and_rx_handle+0x638/0x1130 [mac80211]
[ 2833.539198]  ? ioapic_ir_ack_level+0x29/0x30
[ 2833.539206]  ieee80211_rx_napi+0x87d/0xa70 [mac80211]
[ 2833.539210]  ath_rx_tasklet+0xa99/0x11a0 [ath9k]
[ 2833.539213]  ath9k_tasklet+0x113/0x2e0 [ath9k]
[ 2833.539215]  tasklet_action+0x10c/0x120
[ 2833.539216]  __do_softirq+0xc1/0x2a0
[ 2833.539217]  irq_exit+0x9b/0xa0
[ 2833.539218]  do_IRQ+0x4a/0xc0
[ 2833.539219]  common_interrupt+0x89/0x89
[ 2833.539221] RIP: 0010:cpuidle_enter_state+0xf8/0x2a0
[ 2833.539221] RSP: 0018:c900031fbe78 EFLAGS: 0246 ORIG_RAX: 
ffac
[ 2833.539222] RAX: 88087fd1c000 RBX: 88087fd25f00 RCX: 001f
[ 2833.539223] RDX:  RSI: e691b77ea649 RDI: 
[ 2833.539223] RBP: c900031fbeb0 R08: c442b282beb1 R09: 02eb
[ 2833.539224] R10: 035c R11: 88087fd18ee4 R12: 0004
[ 2833.539224] R13: 0004 R14: 0004 R15: 0293855eb8ba
[ 2833.539225]  
[ 2833.539226]  ? cpuidle_enter_state+0xd4/0x2a0
[ 2833.539227]  cpuidle_enter+0x12/0x20
[ 2833.539229]  call_cpuidle+0x1e/0x40
[ 2833.539229]  do_idle+0x17f/0x1d0
[ 2833.539230]  cpu_startup_entry+0x5f/0x70
[ 2833.539231]  start_secondary+0x14b/0x180
[ 2833.539233]  secondary_startup_64+0x9f/0x9f
[ 2833.539233] handlers:
[ 2833.540606] [] ath_isr [ath9k]
[ 2833.544278] Disabling IRQ #16

Thanks,
Ben




--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



ath9k, irq 16: nobody cared, 4.13.16+ kernel

2018-01-26 Thread Ben Greear

Hello,

I'm testing on a heavily loaded system.  It has 6 ath9k radios, each of which 
have
200 stations on them.  When I try to start some slow-speed UDP traffic on each
virtual station, I see this splat below.  Is this just expected on a heavily
loaded system?  Or, any ideas how to make it better (I'll investigate this 
irqpoll
thing it suggests...)

[ 2833.533351] irq 16: nobody cared (try booting with the "irqpoll" option)
[ 2833.539111] CPU: 4 PID: 0 Comm: swapper/4 Tainted: GW  O4.13.16+ 
#36
[ 2833.539112] Hardware name: Iron_Systems,Inc CS-CAD-2U-A02/X10SRL-F, BIOS 
2.0b 05/02/2017
[ 2833.539113] Call Trace:
[ 2833.539114]  
[ 2833.539119]  dump_stack+0x63/0x87
[ 2833.539122]  __report_bad_irq+0x2e/0xc0
[ 2833.539123]  note_interrupt+0x227/0x270
[ 2833.539124]  handle_irq_event_percpu+0x40/0x50
[ 2833.539126]  handle_irq_event+0x34/0x60
[ 2833.539127]  handle_fasteoi_irq+0x78/0x140
[ 2833.539128]  handle_irq+0x18/0x30
[ 2833.539129]  do_IRQ+0x41/0xc0
[ 2833.539130]  common_interrupt+0x89/0x89
[ 2833.539133] RIP: 0010:fib_rules_lookup+0x3e/0x1a0
[ 2833.539134] RSP: 0018:88087fd03828 EFLAGS: 0283 ORIG_RAX: 
ffac
[ 2833.539135] RAX: 0353 RBX: 8807eb10c300 RCX: 88087fd03868
[ 2833.539135] RDX:  RSI: 88087fd038e0 RDI: 88085ab55b40
[ 2833.539136] RBP: 88087fd03858 R08:  R09: 0001
[ 2833.539136] R10: 81efd500 R11: 88081ae6ac00 R12: 88085ab55b40
[ 2833.539137] R13: 88087fd03868 R14: 88085ab55bc0 R15: 88087fd038e0
[ 2833.539139]  ? fib_rules_lookup+0x2a/0x1a0
[ 2833.539141]  __fib_lookup+0x4f/0x80
[ 2833.539143]  ip_route_input_rcu+0xa0d/0xc20
[ 2833.539144]  ip_route_input_noref+0x3e/0x60
[ 2833.539147]  ? ioapic_ir_ack_level+0x29/0x30
[ 2833.539147]  ? handle_fasteoi_irq+0xff/0x140
[ 2833.539149]  arp_process+0x486/0x820
[ 2833.539150]  ? do_IRQ+0x4a/0xc0
[ 2833.539151]  ? common_interrupt+0x89/0x89
[ 2833.539152]  arp_rcv+0x112/0x1b0
[ 2833.539155]  __netif_receive_skb_core+0x689/0xa90
[ 2833.539156]  __netif_receive_skb+0x13/0x60
[ 2833.539156]  ? __netif_receive_skb+0x13/0x60
[ 2833.539158]  netif_receive_skb_internal+0x1c6/0x440
[ 2833.539159]  netif_receive_skb+0x17/0x80
[ 2833.539178]  ieee80211_deliver_skb+0x165/0x1d0 [mac80211]
[ 2833.539187]  ieee80211_rx_handlers+0xe50/0x26f0 [mac80211]
[ 2833.539188]  ? skb_copy_bits+0x5f/0x260
[ 2833.539189]  ? copy_skb_header+0x12/0x90
[ 2833.539197]  ieee80211_prepare_and_rx_handle+0x638/0x1130 [mac80211]
[ 2833.539198]  ? ioapic_ir_ack_level+0x29/0x30
[ 2833.539206]  ieee80211_rx_napi+0x87d/0xa70 [mac80211]
[ 2833.539210]  ath_rx_tasklet+0xa99/0x11a0 [ath9k]
[ 2833.539213]  ath9k_tasklet+0x113/0x2e0 [ath9k]
[ 2833.539215]  tasklet_action+0x10c/0x120
[ 2833.539216]  __do_softirq+0xc1/0x2a0
[ 2833.539217]  irq_exit+0x9b/0xa0
[ 2833.539218]  do_IRQ+0x4a/0xc0
[ 2833.539219]  common_interrupt+0x89/0x89
[ 2833.539221] RIP: 0010:cpuidle_enter_state+0xf8/0x2a0
[ 2833.539221] RSP: 0018:c900031fbe78 EFLAGS: 0246 ORIG_RAX: 
ffac
[ 2833.539222] RAX: 88087fd1c000 RBX: 88087fd25f00 RCX: 001f
[ 2833.539223] RDX:  RSI: e691b77ea649 RDI: 
[ 2833.539223] RBP: c900031fbeb0 R08: c442b282beb1 R09: 02eb
[ 2833.539224] R10: 035c R11: 88087fd18ee4 R12: 0004
[ 2833.539224] R13: 0004 R14: 0004 R15: 0293855eb8ba
[ 2833.539225]  
[ 2833.539226]  ? cpuidle_enter_state+0xd4/0x2a0
[ 2833.539227]  cpuidle_enter+0x12/0x20
[ 2833.539229]  call_cpuidle+0x1e/0x40
[ 2833.539229]  do_idle+0x17f/0x1d0
[ 2833.539230]  cpu_startup_entry+0x5f/0x70
[ 2833.539231]  start_secondary+0x14b/0x180
[ 2833.539233]  secondary_startup_64+0x9f/0x9f
[ 2833.539233] handlers:
[ 2833.540606] [] ath_isr [ath9k]
[ 2833.544278] Disabling IRQ #16

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com