[Bug 999755] Re: Kernel crash on EC2 VirtualBox
Bugger, I made the severity of the messages too low... :/ If at all they might be in /var/log/syslog on instance reboot... I better make a better version. But its interesting that this appears on a completely different virtualization platform. I wonder whether it could even appear on real hardware. Thanks a lot for the detailed instructions for the test case. I was running a test program doing lots of forks and pipe communication over two days without hitting this, so it helps a lot to know exactly what the reproducing steps are. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/999755 Title: Kernel crash on EC2 VirtualBox To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/999755/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 999755] Re: Kernel crash on EC2 VirtualBox
So with those exact steps I managed to crash a real dual-core machine. So we can rule out any virtualization. This merely is related to whatever that command does and likely having more than one cpu. (It also worked / crashed on a local xen guest, so I can reproduce things locally). Next step will be to test latest upstream kernels to see whether this persists. ** Summary changed: - Kernel crash on EC2 VirtualBox + Kernel crash in rb_next doin ohai loops ** Description changed: + Testcase: + 1. apt-get install build-essential ruby-1.9.3 screen + 2. gem install chef + 3. in screen session: while true; oahi; done + + --- + We have a number of small and large instances running the release version of 12.04. The small instances have been completely stable. However, every large instance we have has crashed at a seemingly random interval. This is repeatable on individual systems, though not within a defined time period. It appears to be triggered by our half hourly run of OpsCode's chef-client. We tried running the client in a tight loop to recreate the crash but were unable to get it to do so in a short time period. It still took two days to crash again. This was affecting the 3.2.0-23-virtual kernel, so we updated to the 3.2.0-24-virtual kernel but still have found the same crash. The only information available in the system logs is: [17605315.391128] BUG: unable to handle kernel NULL pointer dereference at 0010 [17605315.391148] IP: [8130d7f1] rb_next+0x1/0x50 - [17605315.391163] PGD 1d2fdc067 PUD 1d0e3c067 PMD 0 - [17605315.391172] Oops: [#1] SMP - [17605315.391179] CPU 1 + [17605315.391163] PGD 1d2fdc067 PUD 1d0e3c067 PMD 0 + [17605315.391172] Oops: [#1] SMP + [17605315.391179] CPU 1 [17605315.391182] Modules linked in: ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables isofs acpiphp - [17605315.391209] - [17605315.391214] Pid: 28794, comm: chef-client Not tainted 3.2.0-23-virtual #36-Ubuntu + [17605315.391209] + [17605315.391214] Pid: 28794, comm: chef-client Not tainted 3.2.0-23-virtual #36-Ubuntu [17605315.391223] RIP: e030:[8130d7f1] [8130d7f1] rb_next+0x1/0x50 [17605315.391232] RSP: e02b:8801d2659c18 EFLAGS: 00010046 [17605315.391238] RAX: RBX: 8801d2eb5a00 RCX: [17605315.391244] RDX: fff0 RSI: RDI: 0010 [17605315.391250] RBP: 8801d2659c48 R08: R09: [17605315.391255] R10: 8801dff866c0 R11: 0001 R12: [17605315.391263] R13: R14: R15: 033b9e28 [17605315.391274] FS: 7fee8cc10700() GS:8801dff8f000() knlGS: [17605315.391281] CS: e033 DS: ES: CR0: 8005003b [17605315.391287] CR2: 0010 CR3: 0001d2a0b000 CR4: 2660 [17605315.391294] DR0: DR1: DR2: [17605315.391301] DR3: DR6: 0ff0 DR7: 0400 [17605315.391308] Process chef-client (pid: 28794, threadinfo 8801d2658000, task 8801d087) [17605315.391315] Stack: [17605315.391319] 8801d2659c48 8104ece9 8801d2eb5a00 8801dffa26c0 [17605315.391331] 8801d2eb5200 8801d2659c78 810544b8 [17605315.391343] 8801d2659c78 8801dffa26c0 0001 8801d08703a8 [17605315.391354] Call Trace: [17605315.391364] [8104ece9] ? pick_next_entity+0xb9/0xe0 [17605315.391373] [810544b8] pick_next_task_fair+0x38/0x70 [17605315.391382] [81652ddc] __schedule+0x14c/0x6f0 [17605315.391391] [816554ee] ? _raw_spin_unlock_irqrestore+0x1e/0x30 [17605315.391399] [8165344f] schedule+0x3f/0x60 [17605315.391408] [8117e119] pipe_wait+0x59/0x80 [17605315.391417] [81089340] ? add_wait_queue+0x60/0x60 [17605315.391425] [8117e87a] pipe_read+0x1da/0x330 [17605315.391433] [81174522] do_sync_read+0xd2/0x110 [17605315.391443] [8100a25d] ? xen_force_evtchn_callback+0xd/0x10 [17605315.391451] [8100aa32] ? check_events+0x12/0x20 [17605315.391459] [81298d33] ? security_file_permission+0x93/0xb0 [17605315.391466] [811749a1] ? rw_verify_area+0x61/0xf0 [17605315.391473] [81174e80] vfs_read+0xb0/0x180 [17605315.391479] [81174f9a] sys_read+0x4a/0x90 [17605315.391488] [8165d8c2] system_call_fastpath+0x16/0x1b - [17605315.391494] Code: 89 06 48 8b 47 08 48 89 46 08 48 8b 47 10 48 89 46 10 c3 0f 1f 80 00 00 00 00 48 89 32 eb b2 0f 1f 00 48 89 70 10 eb a9 66 90 55 48 8b 17 48 89 e5 48 89 d0 48 83 e0 fc 48 39 c7 74 34 48 8b 47 + [17605315.391494] Code: 89 06 48 8b 47 08 48 89 46 08 48 8b 47 10 48 89
[Bug 999755] Re: Kernel crash on EC2 VirtualBox
We've got small EC2 instances (single processor) that haven't exhibited this behaviour, but we get it with large EC2 instances (2 CPUs); the VirtualBox machine I just reproduced it with was specifically set to 2 CPUS. It seems to me that this bug might only occur on multi-cpu boxes? ** Summary changed: - Kernel crash on EC2 m1.large instances + Kernel crash on EC2 VirtualBox -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/999755 Title: Kernel crash on EC2 VirtualBox To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/999755/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 999755] Re: Kernel crash on EC2 VirtualBox
And we've just reproduced on EC2 with the debug kernel: [248587.286290] [ cut here ] [248587.286765] kernel BUG at /home/smb/precise-amd64/ubuntu-2.6/kernel/sched_fair.c:1239! [248587.286775] invalid opcode: [#1] SMP [248587.286783] CPU 0 [248587.286786] Modules linked in: ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables bnep rfcomm bluetooth parport_pc ppdev lp parport isofs acpiphp [248587.286822] [248587.286827] Pid: 18805, comm: ohai Not tainted 3.2.0-24-virtual #37+lp999755dbg1 [248587.286836] RIP: e030:[81050cd5] [81050cd5] pick_next_entity+0x105/0x110 [248587.286849] RSP: e02b:8801d02d7c28 EFLAGS: 00010096 [248587.286854] RAX: 002d RBX: 8801d20df800 RCX: 0003 [248587.286860] RDX: RSI: 81e000a0 RDI: 0004 [248587.286866] RBP: 8801d02d7c48 R08: 000a R09: [248587.286872] R10: R11: R12: 8801dff866c0 [248587.286878] R13: 8801d20de600 R14: R15: 01f41018 [248587.286889] FS: 7f5ec2010700() GS:8801dff73000() knlGS: [248587.286895] CS: e033 DS: ES: CR0: 8005003b [248587.286901] CR2: 01f09f30 CR3: 0001cec3e000 CR4: 2660 [248587.286908] DR0: DR1: DR2: [248587.286914] DR3: DR6: 0ff0 DR7: 0400 [248587.286921] Process ohai (pid: 18805, threadinfo 8801d02d6000, task 8801cf10db80) [248587.286928] Stack: [248587.286931] 8801d20df800 8801dff866c0 8801d20de600 [248587.286944] 8801d02d7c78 810544e8 8801d02d7c78 8801dff866c0 [248587.286956] 8801cf10df28 8801d02d7cf8 81652f3c [248587.286968] Call Trace: [248587.286976] [810544e8] pick_next_task_fair+0x38/0x70 [248587.286984] [81652f3c] __schedule+0x14c/0x6f0 [248587.286992] [8165564e] ? _raw_spin_unlock_irqrestore+0x1e/0x30 [248587.286999] [816535af] schedule+0x3f/0x60 [248587.287006] [8117e149] pipe_wait+0x59/0x80 [248587.287014] [81089370] ? add_wait_queue+0x60/0x60 [248587.287021] [8117e8aa] pipe_read+0x1da/0x330 [248587.287028] [81174552] do_sync_read+0xd2/0x110 [248587.287036] [8100a25d] ? xen_force_evtchn_callback+0xd/0x10 [248587.287043] [8100aa32] ? check_events+0x12/0x20 [248587.287051] [81298d63] ? security_file_permission+0x93/0xb0 [248587.287057] [811749d1] ? rw_verify_area+0x61/0xf0 [248587.287063] [81174eb0] vfs_read+0xb0/0x180 [248587.287069] [81174fca] sys_read+0x4a/0x90 [248587.287076] [8165da42] system_call_fastpath+0x16/0x1b [248587.287081] Code: 89 df e8 69 c1 5e 00 48 8b 73 38 48 c7 c7 7b ef 9f 81 31 c0 e8 98 c9 5e 00 48 8b 73 10 48 c7 c7 98 ef 9f 81 31 c0 e8 86 c9 5e 00 0f 0b 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 10 48 89 [248587.287170] RIP [81050cd5] pick_next_entity+0x105/0x110 [248587.287177] RSP 8801d02d7c28 [248587.287194] ---[ end trace 124f7d4d99f55a46 ]--- -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/999755 Title: Kernel crash on EC2 VirtualBox To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/999755/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 999755] Re: Kernel crash on EC2 VirtualBox
We have had a crash with Stefan's debugging kernel running. Here is the output (doesn't look like it contains any more information). [248587.286290] [ cut here ] [248587.286765] kernel BUG at /home/smb/precise-amd64/ubuntu-2.6/kernel/sched_fair.c:1239! [248587.286775] invalid opcode: [#1] SMP [248587.286783] CPU 0 [248587.286786] Modules linked in: ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables bnep rfcomm bluetooth parport_pc ppdev lp parport isofs acpiphp [248587.286822] [248587.286827] Pid: 18805, comm: ohai Not tainted 3.2.0-24-virtual #37+lp999755dbg1 [248587.286836] RIP: e030:[81050cd5] [81050cd5] pick_next_entity+0x105/0x110 [248587.286849] RSP: e02b:8801d02d7c28 EFLAGS: 00010096 [248587.286854] RAX: 002d RBX: 8801d20df800 RCX: 0003 [248587.286860] RDX: RSI: 81e000a0 RDI: 0004 [248587.286866] RBP: 8801d02d7c48 R08: 000a R09: [248587.286872] R10: R11: R12: 8801dff866c0 [248587.286878] R13: 8801d20de600 R14: R15: 01f41018 [248587.286889] FS: 7f5ec2010700() GS:8801dff73000() knlGS: [248587.286895] CS: e033 DS: ES: CR0: 8005003b [248587.286901] CR2: 01f09f30 CR3: 0001cec3e000 CR4: 2660 [248587.286908] DR0: DR1: DR2: [248587.286914] DR3: DR6: 0ff0 DR7: 0400 [248587.286921] Process ohai (pid: 18805, threadinfo 8801d02d6000, task 8801cf10db80) [248587.286928] Stack: [248587.286931] 8801d20df800 8801dff866c0 8801d20de600 [248587.286944] 8801d02d7c78 810544e8 8801d02d7c78 8801dff866c0 [248587.286956] 8801cf10df28 8801d02d7cf8 81652f3c [248587.286968] Call Trace: [248587.286976] [810544e8] pick_next_task_fair+0x38/0x70 [248587.286984] [81652f3c] __schedule+0x14c/0x6f0 [248587.286992] [8165564e] ? _raw_spin_unlock_irqrestore+0x1e/0x30 [248587.286999] [816535af] schedule+0x3f/0x60 [248587.287006] [8117e149] pipe_wait+0x59/0x80 [248587.287014] [81089370] ? add_wait_queue+0x60/0x60 [248587.287021] [8117e8aa] pipe_read+0x1da/0x330 [248587.287028] [81174552] do_sync_read+0xd2/0x110 [248587.287036] [8100a25d] ? xen_force_evtchn_callback+0xd/0x10 [248587.287043] [8100aa32] ? check_events+0x12/0x20 [248587.287051] [81298d63] ? security_file_permission+0x93/0xb0 [248587.287057] [811749d1] ? rw_verify_area+0x61/0xf0 [248587.287063] [81174eb0] vfs_read+0xb0/0x180 [248587.287069] [81174fca] sys_read+0x4a/0x90 [248587.287076] [8165da42] system_call_fastpath+0x16/0x1b [248587.287081] Code: 89 df e8 69 c1 5e 00 48 8b 73 38 48 c7 c7 7b ef 9f 81 31 c0 e8 98 c9 5e 00 48 8b 73 10 48 c7 c7 98 ef 9f 81 31 c0 e8 86 c9 5e 00 0f 0b 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 10 48 89 [248587.287170] RIP [81050cd5] pick_next_entity+0x105/0x110 [248587.287177] RSP 8801d02d7c28 [248587.287194] ---[ end trace 124f7d4d99f55a46 ]--- -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/999755 Title: Kernel crash on EC2 VirtualBox To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/999755/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 999755] Re: Kernel crash on EC2 VirtualBox
And another one from the debug kernel on EC2, with a slightly different call stack: [ 4389.480352] [ cut here ] [ 4389.480884] kernel BUG at /home/smb/precise-amd64/ubuntu-2.6/kernel/sched_fair.c:1239! [ 4389.480894] invalid opcode: [#1] SMP [ 4389.480902] CPU 0 [ 4389.480905] Modules linked in: ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables bnep rfcomm bluetooth parport_pc ppdev lp parport isofs acpiphp [ 4389.480945] [ 4389.480949] Pid: 24612, comm: ohai Not tainted 3.2.0-24-virtual #37+lp999755dbg1 [ 4389.480958] RIP: e030:[81050cd5] [81050cd5] pick_next_entity+0x105/0x110 [ 4389.480972] RSP: e02b:8801d0cdd818 EFLAGS: 00010092 [ 4389.480977] RAX: 002c RBX: 8801d1106e00 RCX: 0003 [ 4389.480983] RDX: RSI: 81e000a0 RDI: 0004 [ 4389.480988] RBP: 8801d0cdd838 R08: 000a R09: [ 4389.480994] R10: R11: R12: 8801dff866c0 [ 4389.481000] R13: 8801d1107a00 R14: 0008 R15: 8801d1e97800 [ 4389.481009] FS: 7f09fcc8a700() GS:8801dff73000() knlGS: [ 4389.481015] CS: e033 DS: ES: CR0: 8005003b [ 4389.481021] CR2: 012dbf30 CR3: 0001d0702000 CR4: 2660 [ 4389.481028] DR0: DR1: DR2: [ 4389.481035] DR3: DR6: 0ff0 DR7: 0400 [ 4389.481041] Process ohai (pid: 24612, threadinfo 8801d0cdc000, task 8801d2905b80) [ 4389.481047] Stack: [ 4389.481051] 8801d1106e00 8801dff866c0 8801d1107a00 0008 [ 4389.481063] 8801d0cdd868 810544e8 8801d0cdd868 8801dff866c0 [ 4389.481075] 8801d2905f28 8801d0cdd8e8 81652f3c [ 4389.481086] Call Trace: [ 4389.481093] [810544e8] pick_next_task_fair+0x38/0x70 [ 4389.481101] [81652f3c] __schedule+0x14c/0x6f0 [ 4389.481108] [816535af] schedule+0x3f/0x60 [ 4389.481114] [816546cd] schedule_hrtimeout_range_clock+0x14d/0x170 [ 4389.481124] [8100aa1f] ? xen_restore_fl_direct_reloc+0x4/0x4 [ 4389.481131] [8165564e] ? _raw_spin_unlock_irqrestore+0x1e/0x30 [ 4389.481139] [8108935d] ? add_wait_queue+0x4d/0x60 [ 4389.481146] [81654703] schedule_hrtimeout_range+0x13/0x20 [ 4389.481154] [811877d9] poll_schedule_timeout+0x49/0x70 [ 4389.481160] [81188356] do_select+0x4d6/0x600 [ 4389.481167] [811878e0] ? poll_freewait+0xe0/0xe0 [ 4389.481173] [811879d0] ? __pollwait+0xf0/0xf0 [ 4389.481179] [81005191] ? __raw_callee_save_xen_pte_val+0x11/0x1e [ 4389.481186] [81054c3e] ? update_curr+0x21e/0x230 [ 4389.481192] [8103cc65] ? pvclock_clocksource_read+0x55/0xf0 [ 4389.481199] [810553db] ? check_preempt_wakeup+0x15b/0x230 [ 4389.481207] [8104ed74] ? check_preempt_curr+0x84/0xa0 [ 4389.481214] [8104edcd] ? ttwu_do_wakeup+0x3d/0x120 [ 4389.481222] [8130fd49] ? put_dec+0x59/0x60 [ 4389.481228] [81310c1f] ? number.isra.2+0x31f/0x350 [ 4389.481236] [81323436] ? nla_parse+0x86/0xe0 [ 4389.481242] [8165564e] ? _raw_spin_unlock_irqrestore+0x1e/0x30 [ 4389.481250] [8105e610] ? try_to_wake_up+0x190/0x200 [ 4389.481257] [81188641] core_sys_select+0x1c1/0x330 [ 4389.481263] [8100aa32] ? check_events+0x12/0x20 [ 4389.481269] [8100a25d] ? xen_force_evtchn_callback+0xd/0x10 [ 4389.481276] [8100aa32] ? check_events+0x12/0x20 [ 4389.481282] [8100aa1f] ? xen_restore_fl_direct_reloc+0x4/0x4 [ 4389.481289] [81004c62] ? xen_mc_flush+0xb2/0x1c0 [ 4389.481295] [8100aa1f] ? xen_restore_fl_direct_reloc+0x4/0x4 [ 4389.481302] [811889eb] sys_select+0xbb/0x100 [ 4389.481308] [8105edf7] ? schedule_tail+0x27/0xb0 [ 4389.481314] [8165da42] system_call_fastpath+0x16/0x1b [ 4389.481319] Code: 89 df e8 69 c1 5e 00 48 8b 73 38 48 c7 c7 7b ef 9f 81 31 c0 e8 98 c9 5e 00 48 8b 73 10 48 c7 c7 98 ef 9f 81 31 c0 e8 86 c9 5e 00 0f 0b 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 10 48 89 [ 4389.481407] RIP [81050cd5] pick_next_entity+0x105/0x110 [ 4389.484006] RSP 8801d0cdd818 [ 4389.484006] ---[ end trace 7ee7cea7516c9821 ]--- -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/999755 Title: Kernel crash on EC2 VirtualBox To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/999755/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs