You have been subscribed to a public bug:

Ubuntu 12.04 LTS was installed on 2 servers on a SM15000-XE system.
Bidirectional iperf traffic was started on 7 nics between the 2 servers.
After a few hours, following stack trace was observed on the dmesg.
Sluggish network performance was observed after that


[15640.958682] sched: RT throttling activated [16924.585541]
------------[ cut here ]------------

[16924.585549] WARNING: at 
/build/buildd/linux-3.2.0/net/sched/sch_generic.c:255 
dev_watchdog+0x25a/0x270() [16924.585551] Hardware name: Sabre2 [16924.585552] 
NETDEV WATCHDOG: eth6 (e1000): transmit queue 0 timed out [16924.585553] 
Modules linked in: 8021q garp stp nfsd nfs lockd fscache auth_rpcgss nfs_acl 
sunrpc msr mac_hid lp parport e1000 [16924.585564] Pid: 0, comm: swapper/7 Not 
tainted 3.2.0-23-generic #36-Ubuntu [16924.585566] Call Trace:
> [16924.585567]  <IRQ>  [<ffffffff8106712f>] warn_slowpath_common+0x7f/0xc0 
> [16924.585575]  [<ffffffff81067226>] warn_slowpath_fmt+0x46/0x50 
> [16924.585578]  [<ffffffff8155e22a>] dev_watchdog+0x25a/0x270 [16924.585581]  
> [<ffffffff81110b80>] ? perf_rotate_context+0x110/0x220 [16924.585585]  
> [<ffffffff81083030>] ? __queue_work+0x320/0x320 [16924.585587]  
> [<ffffffff8155dfd0>] ? qdisc_reset+0x50/0x50 [16924.585589]  
> [<ffffffff8155dfd0>] ? qdisc_reset+0x50/0x50 [16924.585592]  
> [<ffffffff81076086>] call_timer_fn+0x46/0x160 [16924.585594]  
> [<ffffffff8155dfd0>] ? qdisc_reset+0x50/0x50 [16924.585597]  
> [<ffffffff810779d2>] run_timer_softirq+0x132/0x2a0 [16924.585600]  
> [<ffffffff810950c5>] ? ktime_get+0x65/0xe0 [16924.585604]  
> [<ffffffff8106e928>] __do_softirq+0xa8/0x210 [16924.585607]  
> [<ffffffff8101a779>] ? read_tsc+0x9/0x20 [16924.585609]  [<ffffffff8109c054>] 
> ? tick_program_event+0x24/0x30 [16924.585613]  [<ffffffff81666cec>] 
> call_softirq+0x1c/0x30 [16924.585616]  [<ffffffff81015305>] do_so
 ftirq+0x65/0xa0 [16924.585619]  [<ffffffff8106ed0e>] irq_exit+0x8e/0xb0 
[16924.585622]  [<ffffffff8166768e>] smp_apic_timer_interrupt+0x6e/0x99
> [16924.585625]  [<ffffffff8166555e>] apic_timer_interrupt+0x6e/0x80 
> [16924.585626]  <EOI>  [<ffffffff8136b42d>] ? intel_idle+0xed/0x150 
> [16924.585631]  [<ffffffff8136b40f>] ? intel_idle+0xcf/0x150 [16924.585634]  
> [<ffffffff81504a01>] cpuidle_idle_call+0xc1/0x280 [16924.585637]  
> [<ffffffff8101222a>] cpu_idle+0xca/0x120 [16924.585640]  [<ffffffff8163a7fa>] 
> start_secondary+0xd9/0xdb [16924.585642] 

---[ end trace 3c74a3d373267b03 ]---

[16941.890574] BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0]


[16941.963686] Modules linked in: 8021q garp stp nfsd nfs lockd fscache 
auth_rpcgss nfs_acl sunrpc msr mac_hid lp parport e1000
[16941.963697] CPU 0 
[16941.963698] Modules linked in: 8021q garp stp nfsd nfs lockd fscache 
auth_rpcgss nfs_acl sunrpc msr mac_hid lp parport e1000
[16941.963704] 
[16941.963706] Pid: 0, comm: swapper/0 Tainted: G        W    3.2.0-23-generic 
#36-Ubuntu SeaMicro Sabre2/Type2 - Board Product Name1
[16941.963710] RIP: 0010:[<ffffffff81531e2f>]  [<ffffffff81531e2f>] 
__alloc_skb+0xcf/0x240
[16941.963718] RSP: 0018:ffff88043fc03c70  EFLAGS: 00010202
[16941.963719] RAX: 0000000000000000 RBX: ffff88043fc03c60 RCX: 000000000000000a
[16941.963721] RDX: ffff88040f876680 RSI: 00000000000000cc RDI: ffff880420dab478
[16941.963722] RBP: ffff88043fc03cb0 R08: 0000000000000680 R09: 0000000000000800
[16941.963724] R10: ffff880420dab500 R11: 0000000000000001 R12: ffff88043fc03be8
[16941.963725] R13: ffffffff8166555e R14: ffff88043fc03cb0 R15: ffff88040f876000
[16941.963727] FS:  0000000000000000(0000) GS:ffff88043fc00000(0000) 
knlGS:0000000000000000
[16941.963729] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[16941.963730] CR2: 00007f19afb0ad50 CR3: 0000000210999000 CR4: 00000000000406f0
[16941.963732] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[16941.963734] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[16941.963735] Process swapper/0 (pid: 0, threadinfo ffffffff81c00000, task 
ffffffff81c0d020)
[16941.963737] Stack:
[16941.987830]  ffff88043fc03ca0 00000632813265a4 ffff880424796090 
ffff880421200000
[16941.987834]  0000000000000000 00000000000000ff ffff88040f875840 
ffff8804218a7a80
[16941.987837]  ffff88043fc03cd0 ffffffff81532424 00000000000000fe 
ffffc90001877fd0
[16941.987840] Call Trace:
[16942.017153]  <IRQ> 
[16942.042416]  [<ffffffff81532424>] __netdev_alloc_skb+0x24/0x50
[16942.042423]  [<ffffffffa0003861>] e1000_alloc_rx_buffers+0x291/0x4f0 [e1000]
[16942.042427]  [<ffffffff813265c0>] ? map_single+0x60/0x60
[16942.042430]  [<ffffffffa000287a>] e1000_clean_rx_irq+0x3ba/0x4d0 [e1000]
[16942.042434]  [<ffffffffa0003561>] e1000_clean+0x51/0xc0 [e1000]
[16942.042438]  [<ffffffff81540ad4>] net_rx_action+0x134/0x290
[16942.042442]  [<ffffffff8106e928>] __do_softirq+0xa8/0x210
[16942.042446]  [<ffffffff810334ee>] ? eoi_ioapic_irq.isra.23+0x5e/0x70
[16942.042450]  [<ffffffff81666cec>] call_softirq+0x1c/0x30
[16942.042453]  [<ffffffff81015305>] do_softirq+0x65/0xa0
[16942.042456]  [<ffffffff8106ed0e>] irq_exit+0x8e/0xb0
[16942.042458]  [<ffffffff816675a3>] do_IRQ+0x63/0xe0
[16942.042462]  [<ffffffff8165c92e>] common_interrupt+0x6e/0x6e
[16942.042463]  <EOI> 
[16942.067708]  [<ffffffff8108ed70>] ? hrtimer_try_to_cancel+0x50/0xc0
[16942.067712]  [<ffffffff8109cdb6>] ? tick_nohz_restart_sched_tick+0x106/0x130
[16942.067715]  [<ffffffff8109cdb2>] ? tick_nohz_restart_sched_tick+0x102/0x130
[16942.067719]  [<ffffffff81012263>] cpu_idle+0x103/0x120
[16942.067723]  [<ffffffff816235ae>] rest_init+0x72/0x74
[16942.067728]  [<ffffffff81cfbc0d>] start_kernel+0x3ba/0x3c7
[16942.067731]  [<ffffffff81cfb388>] x86_64_start_reservations+0x132/0x136
[16942.067734]  [<ffffffff81cfb140>] ? early_idt_handlers+0x140/0x140
[16942.067737]  [<ffffffff81cfb459>] x86_64_start_kernel+0xcd/0xdc
[16942.067739] Code: df be cc 00 00 00 0f 85 48 01 00 00 40 f6 c7 02 0f 85 56 
01 00 00 40 f6 c7 04 0f 85 1c 01 00 00 89 f1 31 c0 c1 e9 03 40 f6 c6 04 <f3> 48 
ab 0f 85 f0 00 00 00 40 f6 c6 02 0f 85 ce 00 00 00 83 e6 
[16942.301153] Call Trace:
[16942.301156]  <IRQ>  [<ffffffff81532424>] __netdev_alloc_skb+0x24/0x50
[16942.301168]  [<ffffffffa0003861>] e1000_alloc_rx_buffers+0x291/0x4f0 [e1000]
[16942.301172]  [<ffffffff813265c0>] ? map_single+0x60/0x60
[16942.301175]  [<ffffffffa000287a>] e1000_clean_rx_irq+0x3ba/0x4d0 [e1000]
[16942.301178]  [<ffffffffa0003561>] e1000_clean+0x51/0xc0 [e1000]
[16942.301181]  [<ffffffff81540ad4>] net_rx_action+0x134/0x290
[16942.301186]  [<ffffffff8106e928>] __do_softirq+0xa8/0x210
[16942.301189]  [<ffffffff810334ee>] ? eoi_ioapic_irq.isra.23+0x5e/0x70
[16942.301193]  [<ffffffff81666cec>] call_softirq+0x1c/0x30
[16942.301196]  [<ffffffff81015305>] do_softirq+0x65/0xa0
[16942.301199]  [<ffffffff8106ed0e>] irq_exit+0x8e/0xb0
[16942.301201]  [<ffffffff816675a3>] do_IRQ+0x63/0xe0
[16942.301204]  [<ffffffff8165c92e>] common_interrupt+0x6e/0x6e
[16942.301205]  <EOI>  [<ffffffff8108ed70>] ? hrtimer_try_to_cancel+0x50/0xc0
[16942.301211]  [<ffffffff8109cdb6>] ? tick_nohz_restart_sched_tick+0x106/0x130
[16942.301214]  [<ffffffff8109cdb2>] ? tick_nohz_restart_sched_tick+0x102/0x130
[16942.301216]  [<ffffffff81012263>] cpu_idle+0x103/0x120
[16942.301220]  [<ffffffff816235ae>] rest_init+0x72/0x74
[16942.301233]  [<ffffffff81cfbc0d>] start_kernel+0x3ba/0x3c7
[16942.301236]  [<ffffffff81cfb388>] x86_64_start_reservations+0x132/0x136
[16942.301239]  [<ffffffff81cfb140>] ? early_idt_handlers+0x140/0x140
[16942.301242]  [<ffffffff81cfb459>] x86_64_start_kernel+0xcd/0xdc
[17102.391716] e1000 0000:05:00.1: eth4: Detected Tx Unit Hang
[17102.391717]   Tx Queue             <0>
[17102.391718]   TDH                  <eb>
[17102.391719]   TDT                  <d3>
[17102.391719]   next_to_use          <d3>
[17102.391720]   next_to_clean        <e5>
[17102.391720] buffer_info[next_to_clean]
[17102.391721]   time_stamp           <100402f9f>
[17102.391722]   next_to_watch        <f8>
[17102.391722]   jiffies              <100403264>
[17102.391723]   next_to_watch.status <0>
[17134.426481] INFO: task kworker/7:2:5025 blocked for more than 120 seconds.
[17134.509027] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[17134.603042] kworker/7:2     D 0000000000000007     0  5025      2 0x00000000
[17134.603047]  ffff880413843af0 0000000000000046 0000000000000000 
0000000000000000
[17134.603061]  ffff880413843fd8 ffff880413843fd8 ffff880413843fd8 
0000000000013780
[17134.603066]  ffff880420e75bc0 ffff880416ecdbc0 0000000000000000 
7fffffffffffffff
[17134.603070] Call Trace:
[17134.603080]  [<ffffffff8165a55f>] schedule+0x3f/0x60
[17134.603084]  [<ffffffff8165aba5>] schedule_timeout+0x2a5/0x320
[17134.603088]  [<ffffffff8165a39f>] wait_for_common+0xdf/0x180
[17134.603094]  [<ffffffff81055fdd>] ? set_next_entity+0xad/0xd0
[17134.603099]  [<ffffffff8105f990>] ? try_to_wake_up+0x200/0x200
[17134.603102]  [<ffffffff8165a51d>] wait_for_completion+0x1d/0x20
[17134.603108]  [<ffffffff81084b53>] wait_on_cpu_work+0xd3/0xe0
[17134.603112]  [<ffffffff810822d0>] ? do_work_for_cpu+0x30/0x30
[17134.603119]  [<ffffffff81084c14>] wait_on_work+0xb4/0x120
[17134.603123]  [<ffffffff81085643>] __cancel_work_timer+0x73/0x80
[17134.603127]  [<ffffffff81085680>] cancel_work_sync+0x10/0x20
[17134.603136]  [<ffffffffa0000875>] e1000_down_and_stop+0x25/0x50 [e1000]
[17134.603150]  [<ffffffffa0004e4f>] e1000_down+0x14f/0x200 [e1000]
[17134.603155]  [<ffffffffa0006ef0>] ? e1000_change_mtu+0x1c0/0x1c0 [e1000]
[17134.603160]  [<ffffffffa0006f5e>] e1000_reset_task+0x6e/0x90 [e1000]
[17134.603162]  [<ffffffff81084e2a>] process_one_work+0x11a/0x480
[17134.603165]  [<ffffffff81085bd4>] worker_thread+0x164/0x370
[17134.603168]  [<ffffffff81085a70>] ? manage_workers.isra.29+0x130/0x130
[17134.603170]  [<ffffffff8108a42c>] kthread+0x8c/0xa0
[17134.603173]  [<ffffffff81666bf4>] kernel_thread_helper+0x4/0x10
[17134.603176]  [<ffffffff8108a3a0>] ? flush_kthread_worker+0xa0/0xa0
[17134.603178]  [<ffffffff81666bf0>] ? gs_change+0x13/0x13
[17134.603184] INFO: task kworker/7:1:5691 blocked for more than 120 seconds.
[17134.685722] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[17134.779732] kworker/7:1     D 0000000000000007     0  5691      2 0x00000000
[17134.779738]  ffff8801d0e29d10 0000000000000046 0000000000000000 
0000000000000000
[17134.779745]  ffff8801d0e29fd8 ffff8801d0e29fd8 ffff8801d0e29fd8 
0000000000013780
[17134.779750]  ffff880425775bc0 ffff88020f8d96f0 ffff880425794600 
ffff880421205040
[17134.779754] Call Trace:
[17134.779762]  [<ffffffff8165a55f>] schedule+0x3f/0x60
[17134.779764]  [<ffffffff8165b367>] __mutex_lock_slowpath+0xd7/0x150
[17134.779767]  [<ffffffff8165af7a>] mutex_lock+0x2a/0x50
[17134.779786]  [<ffffffffa0005d31>] e1000_watchdog+0x71/0x3d0 [e1000]
[17134.779792]  [<ffffffffa0005cc0>] ? e1000_update_stats+0x970/0x970 [e1000]
[17134.779798]  [<ffffffff81084e2a>] process_one_work+0x11a/0x480
[17134.779802]  [<ffffffff81085bd4>] worker_thread+0x164/0x370
[17134.779808]  [<ffffffff81085a70>] ? manage_workers.isra.29+0x130/0x130
[17134.779812]  [<ffffffff8108a42c>] kthread+0x8c/0xa0
[17134.779815]  [<ffffffff81666bf4>] kernel_thread_helper+0x4/0x10
[17134.779817]  [<ffffffff8108a3a0>] ? flush_kthread_worker+0xa0/0xa0
[17134.779822]  [<ffffffff81666bf0>] ? gs_change+0x13/0x13
[17146.285107] e1000 0000:05:00.1: eth4: Detected Tx Unit Hang
[17146.285108]   Tx Queue             <0>
[17146.285109]   TDH                  <ed>
[17146.285109]   TDT                  <37>
[17146.285110]   next_to_use          <37>
[17146.285111]   next_to_clean        <eb>
[17146.285111] buffer_info[next_to_clean]
[17146.285112]   time_stamp           <100405c55>
[17146.285112]   next_to_watch        <ee>
[17146.285113]   jiffies              <100405d53>
[17146.285113]   next_to_watch.status <0>
[17148.215140] e1000 0000:05:00.1: eth4: Detected Tx Unit Hang
[17148.215141]   Tx Queue             <0>
[17148.215142]   TDH                  <a>
[17148.215142]   TDT                  <f2>
[17148.215143]   next_to_use          <f2>
[17148.215143]   next_to_clean        <5>
[17148.215144] buffer_info[next_to_clean]
[17148.215144]   time_stamp           <100405cb2>
[17148.215145]   next_to_watch        <10>
[17148.215145]   jiffies              <100405f36>
[17148.215146]   next_to_watch.status <0>
[17223.041709] e1000 0000:05:00.1: eth4: Detected Tx Unit Hang
[17223.041710]   Tx Queue             <0>
[17223.041711]   TDH                  <bd>
[17223.041711]   TDT                  <20>
[17223.041712]   next_to_use          <20>
[17223.041712]   next_to_clean        <b4>
[17223.041713] buffer_info[next_to_clean]
[17223.041713]   time_stamp           <10040a73e>
[17223.041714]   next_to_watch        <be>
[17223.041714]   jiffies              <10040a866>
[17223.041715]   next_to_watch.status <0>
[17231.053339] e1000 0000:05:00.1: eth4: Detected Tx Unit Hang
[17231.053340]   Tx Queue             <0>
[17231.053341]   TDH                  <f5>
[17231.053341]   TDT                  <75>
[17231.053342]   next_to_use          <75>
[17231.053342]   next_to_clean        <f1>
[17231.053343] buffer_info[next_to_clean]
[17231.053343]   time_stamp           <10040af0d>
[17231.053344]   next_to_watch        <f9>
[17231.053344]   jiffies              <10040b03b>
[17231.053345]   next_to_watch.status <0>
[17254.591185] INFO: task kworker/7:2:5025 blocked for more than 120 seconds.
[17254.673724] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[17254.767714] kworker/7:2     D 0000000000000007     0  5025      2 0x00000000

INTERRUPT

root@ubuntu ~# cat /proc/interrupts                               100%  243KB 
242.8KB/s   00:00
>           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       
> CPU6       CPU7
>  0:         87          0          0          0          0          0         
>  0          0  IR-IO-APIC-edge      timer
>  4:      17735          0      14263          0      30126          0         
>  0          0  IR-IO-APIC-edge      serial
>  8:          1          0          0          0          0          0         
>  0          0  IR-IO-APIC-edge      rtc0
>  9:          0          0          0          0          0          0         
>  0          0  IR-IO-APIC-fasteoi   acpi
> 16:   16604523          0          0   12120410          0          0         
>  0          0  IR-IO-APIC-fasteoi   ahci, ahci, ahci, eth1, eth4
> 17:    1678725          0          0    8996499          0          0         
>  0          0  IR-IO-APIC-fasteoi   ahci, ahci, ahci, eth0, eth6
> 18:   45881173          0          0          0          0          0         
>  0          0  IR-IO-APIC-fasteoi   ahci, ahci, eth3, eth7
> 19:   59172483          0          0      20836          0          0         
>  0          0  IR-IO-APIC-fasteoi   eth2, eth5
> 40:          0          0          0          0          0          0         
>  0          0  DMAR_MSI-edge      dmar0
> 41:          0          0          0          0          0          0         
>  0          0  IR-PCI-MSI-edge      PCIe PME
> 42:          0          0          0          0          0          0         
>  0          0  IR-PCI-MSI-edge      PCIe PME
> 43:          0          0          0          0          0          0         
>  0          0  IR-PCI-MSI-edge      PCIe PME
> 44:          0          0          0          0          0          0         
>  0          0  IR-PCI-MSI-edge      PCIe PME
> 45:          0          0          0          0          0          0         
>  0          0  IR-PCI-MSI-edge      PCIe PME
> 46:          0          0          0          0          0          0         
>  0          0  IR-PCI-MSI-edge      PCIe PME
> 47:          0          0          0          0          0          0         
>  0          0  IR-PCI-MSI-edge      PCIe PME
> 48:          0          0          0          0          0          0         
>  0          0  IR-PCI-MSI-edge      PCIe PME
> NMI:       5647        157       1546       1659       1984        356        
> 997        572   Non-maskable interrupts
> LOC:     964914     979936    1409782     582208    1756677    1311067    
> 2095033    1656838   Local timer interrupts
> SPU:          0          0          0          0          0          0        
>   0          0   Spurious interrupts
> PMI:       5647        157       1546       1659       1984        356        
> 997        572   Performance monitoring interrupts
> IWI:          0          0          0          0          0          0        
>   0          0   IRQ work interrupts
> RES:     147237     712096   22528993     144356   10341760     488304    
> 3966861     554994   Rescheduling interrupts
> CAL:        174        221        212        149        223        212        
> 205        218   Function call interrupts
> TLB:        153        210        252       1558        299        517        
> 361        682   TLB shootdowns
> TRM:          0          0          0          0          0          0        
>   0          0   Thermal event interrupts
> THR:          0          0          0          0          0          0        
>   0          0   Threshold APIC interrupts
> MCE:          0          0          0          0          0          0        
>   0          0   Machine check exceptions
> MCP:         45         45         45         45         45         45        
>  45         45   Machine check polls

** Affects: irqbalance (Ubuntu)
     Importance: Undecided
         Status: Confirmed


** Tags: bot-comment
-- 
On running network and I/O traffic on SM15000-XE, "BUG: soft lockup -- CPU#0 
stuck for 22s!" seen on dmesg
https://bugs.launchpad.net/bugs/1183135
You received this bug notification because you are a member of Ubuntu Server 
Team, which is subscribed to irqbalance in Ubuntu.

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs

Reply via email to