Bug#659169: Re: Bug#659169: [2.6.32] BUG: soft lockup - CPU#7 stuck for 17163091979s! [init:9709]

2012-02-08 Thread Carlos Alberto Lopez Perez
On -10/01/37 20:59, Ben Hutchings wrote:
> Version: 2.6.32-40
> 
> On Wed, Feb 08, 2012 at 10:13:50PM +0100, Carlos Alberto Lopez Perez wrote:
>> Source: linux-2.6
>> Version: 2.6.32-5
>> Severity: normal
>>
>>
>> Hello,
>>
>> Today one of my servers stopped responding to some webs, ssh'ing into it was 
>> impossible. The ping worked and the ssh connection was starting but the 
>> shell didn't showed up after waiting a long time. Finally a hard-reset was 
>> needed in order to bring it back
>>
>> After the reboot I found this on kern.log
>>
>> # tail /var/log/kern.log
>> Feb  1 09:41:00 server-i7_920 kernel: [17383037.287331] EXT4-fs (dm-29): 
>> mounted filesystem with ordered data mode
>> Feb  5 09:38:15 server-i7_920 kernel: [17727613.769052] NOHZ: 
>> local_softirq_pending 100
>> Feb  7 05:56:07 server-i7_920 kernel: [17886689.887577] e1000e: eth0 NIC 
>> Link is Down
>> Feb  7 05:59:05 server-i7_920 kernel: [17886867.166464] e1000e: eth0 NIC 
>> Link is Up 1000 Mbps Full Duplex, Flow Control: None
>> Feb  7 05:59:41 server-i7_920 kernel: [17886903.722727] e1000e: eth0 NIC 
>> Link is Down
>> Feb  7 06:00:00 server-i7_920 kernel: [17886922.309159] e1000e: eth0 NIC 
>> Link is Up 1000 Mbps Full Duplex, Flow Control: None
>> Feb  8 17:28:57 server-i7_920 kernel: [18446744016.876326] BUG: soft lockup 
>> - CPU#7 stuck for 17163091979s! [init:9709]
> [...]
> 
> This appears to be the bug fixed by 'sched, x86: Avoid unnecessary
> overflow in sched_clock', included in longterm update 2.6.32.50 and
> Debian package version 2.6.32-40.  That bug would be triggered once
> the scheduler clock reached 18014398 seconds, which is a little after
> the last reasonable time seen in this log.
> 
> Ben.
> 

Wow!

Really amazing, thanks for the reply.

I will be upgrading the kernel ASAP.



Regards!


-- 
~~~
Carlos Alberto Lopez Perez   http://neutrino.es
Igalia - Free Software Engineeringhttp://www.igalia.com
~~~



signature.asc
Description: OpenPGP digital signature


Bug#659169: [2.6.32] BUG: soft lockup - CPU#7 stuck for 17163091979s! [init:9709]

2012-02-08 Thread Carlos Alberto Lopez Perez
Source: linux-2.6
Version: 2.6.32-5
Severity: normal


Hello,

Today one of my servers stopped responding to some webs, ssh'ing into it was 
impossible. The ping worked and the ssh connection was starting but the shell 
didn't showed up after waiting a long time. Finally a hard-reset was needed in 
order to bring it back

After the reboot I found this on kern.log

# tail /var/log/kern.log
Feb  1 09:41:00 server-i7_920 kernel: [17383037.287331] EXT4-fs (dm-29): 
mounted filesystem with ordered data mode
Feb  5 09:38:15 server-i7_920 kernel: [17727613.769052] NOHZ: 
local_softirq_pending 100
Feb  7 05:56:07 server-i7_920 kernel: [17886689.887577] e1000e: eth0 NIC Link 
is Down
Feb  7 05:59:05 server-i7_920 kernel: [17886867.166464] e1000e: eth0 NIC Link 
is Up 1000 Mbps Full Duplex, Flow Control: None
Feb  7 05:59:41 server-i7_920 kernel: [17886903.722727] e1000e: eth0 NIC Link 
is Down
Feb  7 06:00:00 server-i7_920 kernel: [17886922.309159] e1000e: eth0 NIC Link 
is Up 1000 Mbps Full Duplex, Flow Control: None
Feb  8 17:28:57 server-i7_920 kernel: [18446744016.876326] BUG: soft lockup - 
CPU#7 stuck for 17163091979s! [init:9709]
Feb  8 17:28:57 server-i7_920 kernel: [18446744016.876371] Modules linked in: 
btrfs zlib_deflate crc32c libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos 
fat jfs xfs exportfs reiserfs ext4 jbd2 crc16 ext2 ipt_LOG sg xt_limit 
xt_tcpudp xt_state iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 
nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables dummy loop 
snd_pcm snd_timer snd soundcore snd_page_alloc ioatdma i2c_i801 i2c_core pcspkr 
dca psmouse evdev button serio_raw processor ext3 jbd mbcache dm_mod sd_mod 
crc_t10dif ahci libata uhci_hcd ehci_hcd scsi_mod usbcore nls_base e1000e 
thermal thermal_sys [last unloaded: scsi_wait_scan]
Feb  8 17:28:57 server-i7_920 kernel: [18446744016.876405] CPU 7:
Feb  8 17:28:57 server-i7_920 kernel: [18446744016.876406] Modules linked in: 
btrfs zlib_deflate crc32c libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos 
fat jfs xfs exportfs reiserfs ext4 jbd2 crc16 ext2 ipt_LOG sg xt_limit 
xt_tcpudp xt_state iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 
nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables dummy loop 
snd_pcm snd_timer snd soundcore snd_page_alloc ioatdma i2c_i801 i2c_core pcspkr 
dca psmouse evdev button serio_raw processor ext3 jbd mbcache dm_mod sd_mod 
crc_t10dif ahci libata uhci_hcd ehci_hcd scsi_mod usbcore nls_base e1000e 
thermal thermal_sys [last unloaded: scsi_wait_scan]
Feb  8 17:28:57 server-i7_920 kernel: [18446744016.876433] Pid: 9709, comm: 
init Not tainted 2.6.32-5-vserver-amd64 #1 X8STi
Feb  8 17:28:57 server-i7_920 kernel: [18446744016.876435] RIP: 
0023:[]  [] 0xf76c0430
Feb  8 17:28:57 server-i7_920 kernel: [18446744016.876440] RSP: 
002b:ffa6673c  EFLAGS: 0296
Feb  8 17:28:57 server-i7_920 kernel: [18446744016.876442] RAX: 
fff6 RBX:  RCX: ffa6673c
Feb  8 17:28:57 server-i7_920 kernel: [18446744016.876443] RDX: 
f76c0430 RSI: 000f RDI: 
Feb  8 17:28:57 server-i7_920 kernel: [18446744016.876445] RBP: 
8101166e R08:  R09: 
Feb  8 17:28:57 server-i7_920 kernel: [18446744016.876447] R10: 
 R11:  R12: 
Feb  8 17:28:57 server-i7_920 kernel: [18446744016.876448] R13: 
 R14:  R15: 
Feb  8 17:28:57 server-i7_920 kernel: [18446744016.876451] FS:  
() GS:880016bc(0063) knlGS:f74a96c0
Feb  8 17:28:57 server-i7_920 kernel: [18446744016.876453] CS:  0010 DS: 002b 
ES: 002b CR0: 8005003b
Feb  8 17:28:57 server-i7_920 kernel: [18446744016.876454] CR2: 
7fa1566220a0 CR3: 000514e5d000 CR4: 06e0
Feb  8 17:28:57 server-i7_920 kernel: [18446744016.876456] DR0: 
 DR1:  DR2: 
Feb  8 17:28:57 server-i7_920 kernel: [18446744016.876458] DR3: 
 DR6: 0ff0 DR7: 0400
Feb  8 17:28:57 server-i7_920 kernel: [18446744016.876460] Call Trace:
Feb  8 18:20:04 server-i7_920 kernel: [ 3003.130843] INFO: task cron:8748 
blocked for more than 120 seconds.
Feb  8 18:20:04 server-i7_920 kernel: [ 3003.130874] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb  8 18:20:04 server-i7_920 kernel: [ 3003.130919] cron  D 
 0  8748  25441 0x0002
Feb  8 18:20:04 server-i7_920 kernel: [ 3003.130923]  88063ccf3250 
0086  
Feb  8 18:20:04 server-i7_920 kernel: [ 3003.130927]  88055c60c7e0 
00d0 f9e0 88045d2bdfd8
Feb  8 18:20:04 server-i7_920 kernel: [ 3003.130931]  00015780 
00015780 88055c60c7e0 88055c60cad8
Feb  8 18:20:04 server-i7_920 kernel: [ 3003.130934] Call