[Bug 999755] Re: Kernel crash on EC2 VirtualBox

2012-05-29 Thread Stefan Bader
Bugger, I made the severity of the messages too low... :/ If at all they
might be in /var/log/syslog on instance reboot... I better make a better
version. But its interesting that this appears on a completely different
virtualization platform. I wonder whether it could even appear on real
hardware. Thanks a lot for the detailed instructions for the test case.
I was running a test program doing lots of forks and pipe communication
over two days without hitting this, so it helps a lot to know exactly
what the reproducing steps are.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/999755

Title:
  Kernel crash on EC2  VirtualBox

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/999755/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 999755] Re: Kernel crash on EC2 VirtualBox

2012-05-29 Thread Stefan Bader
So with those exact steps I managed to crash a real dual-core machine.
So we can rule out any virtualization. This merely is related to
whatever that command does and likely having more than one cpu. (It also
worked / crashed on a local xen guest, so I can reproduce things
locally). Next step will be to test latest upstream kernels to see
whether this persists.

** Summary changed:

- Kernel crash on EC2  VirtualBox
+ Kernel crash in rb_next doin ohai loops

** Description changed:

+ Testcase:
+ 1. apt-get install build-essential ruby-1.9.3 screen
+ 2. gem install chef
+ 3. in screen session: while true; oahi; done
+ 
+ ---
+ 
  We have a number of small and large instances running the release
  version of 12.04.  The small instances have been completely stable.
  However, every large instance we have has crashed at a seemingly random
  interval.  This is repeatable on individual systems, though not within a
  defined time period.  It appears to be triggered by our half hourly run
  of OpsCode's chef-client.  We tried running the client in a tight loop
  to recreate the crash but were unable to get it to do so in a short time
  period.  It still took two days to crash again.
  
  This was affecting the 3.2.0-23-virtual kernel, so we updated to the
  3.2.0-24-virtual kernel but still have found the same crash.  The only
  information available in the system logs is:
  
  [17605315.391128] BUG: unable to handle kernel NULL pointer dereference at 
0010
  [17605315.391148] IP: [8130d7f1] rb_next+0x1/0x50
- [17605315.391163] PGD 1d2fdc067 PUD 1d0e3c067 PMD 0 
- [17605315.391172] Oops:  [#1] SMP 
- [17605315.391179] CPU 1 
+ [17605315.391163] PGD 1d2fdc067 PUD 1d0e3c067 PMD 0
+ [17605315.391172] Oops:  [#1] SMP
+ [17605315.391179] CPU 1
  [17605315.391182] Modules linked in: ipt_REJECT xt_tcpudp nf_conntrack_ipv4 
nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables isofs 
acpiphp
- [17605315.391209] 
- [17605315.391214] Pid: 28794, comm: chef-client Not tainted 3.2.0-23-virtual 
#36-Ubuntu  
+ [17605315.391209]
+ [17605315.391214] Pid: 28794, comm: chef-client Not tainted 3.2.0-23-virtual 
#36-Ubuntu
  [17605315.391223] RIP: e030:[8130d7f1]  [8130d7f1] 
rb_next+0x1/0x50
  [17605315.391232] RSP: e02b:8801d2659c18  EFLAGS: 00010046
  [17605315.391238] RAX:  RBX: 8801d2eb5a00 RCX: 

  [17605315.391244] RDX: fff0 RSI:  RDI: 
0010
  [17605315.391250] RBP: 8801d2659c48 R08:  R09: 

  [17605315.391255] R10: 8801dff866c0 R11: 0001 R12: 

  [17605315.391263] R13:  R14:  R15: 
033b9e28
  [17605315.391274] FS:  7fee8cc10700() GS:8801dff8f000() 
knlGS:
  [17605315.391281] CS:  e033 DS:  ES:  CR0: 8005003b
  [17605315.391287] CR2: 0010 CR3: 0001d2a0b000 CR4: 
2660
  [17605315.391294] DR0:  DR1:  DR2: 

  [17605315.391301] DR3:  DR6: 0ff0 DR7: 
0400
  [17605315.391308] Process chef-client (pid: 28794, threadinfo 
8801d2658000, task 8801d087)
  [17605315.391315] Stack:
  [17605315.391319]  8801d2659c48 8104ece9 8801d2eb5a00 
8801dffa26c0
  [17605315.391331]  8801d2eb5200  8801d2659c78 
810544b8
  [17605315.391343]  8801d2659c78 8801dffa26c0 0001 
8801d08703a8
  [17605315.391354] Call Trace:
  [17605315.391364]  [8104ece9] ? pick_next_entity+0xb9/0xe0
  [17605315.391373]  [810544b8] pick_next_task_fair+0x38/0x70
  [17605315.391382]  [81652ddc] __schedule+0x14c/0x6f0
  [17605315.391391]  [816554ee] ? 
_raw_spin_unlock_irqrestore+0x1e/0x30
  [17605315.391399]  [8165344f] schedule+0x3f/0x60
  [17605315.391408]  [8117e119] pipe_wait+0x59/0x80
  [17605315.391417]  [81089340] ? add_wait_queue+0x60/0x60
  [17605315.391425]  [8117e87a] pipe_read+0x1da/0x330
  [17605315.391433]  [81174522] do_sync_read+0xd2/0x110
  [17605315.391443]  [8100a25d] ? xen_force_evtchn_callback+0xd/0x10
  [17605315.391451]  [8100aa32] ? check_events+0x12/0x20
  [17605315.391459]  [81298d33] ? security_file_permission+0x93/0xb0
  [17605315.391466]  [811749a1] ? rw_verify_area+0x61/0xf0
  [17605315.391473]  [81174e80] vfs_read+0xb0/0x180
  [17605315.391479]  [81174f9a] sys_read+0x4a/0x90
  [17605315.391488]  [8165d8c2] system_call_fastpath+0x16/0x1b
- [17605315.391494] Code: 89 06 48 8b 47 08 48 89 46 08 48 8b 47 10 48 89 46 10 
c3 0f 1f 80 00 00 00 00 48 89 32 eb b2 0f 1f 00 48 89 70 10 eb a9 66 90 55 48 
8b 17 48 89 e5 48 89 d0 48 83 e0 fc 48 39 c7 74 34 48 8b 47 
+ [17605315.391494] Code: 89 06 48 8b 47 08 48 89 46 08 48 8b 47 10 48 89 

[Bug 999755] Re: Kernel crash on EC2 VirtualBox

2012-05-28 Thread Gavin Heavyside
We've got small EC2 instances (single processor) that haven't exhibited
this behaviour, but we get it with large EC2 instances (2 CPUs); the
VirtualBox machine I just reproduced it with was specifically set to 2
CPUS.

It seems to me that this bug might only occur on multi-cpu boxes?

** Summary changed:

- Kernel crash on EC2 m1.large instances
+ Kernel crash on EC2  VirtualBox

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/999755

Title:
  Kernel crash on EC2  VirtualBox

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/999755/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 999755] Re: Kernel crash on EC2 VirtualBox

2012-05-28 Thread Gavin Heavyside
And we've just reproduced on EC2 with the debug kernel:

[248587.286290] [ cut here ]
[248587.286765] kernel BUG at 
/home/smb/precise-amd64/ubuntu-2.6/kernel/sched_fair.c:1239!
[248587.286775] invalid opcode:  [#1] SMP 
[248587.286783] CPU 0 
[248587.286786] Modules linked in: ipt_REJECT xt_tcpudp nf_conntrack_ipv4 
nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables bnep 
rfcomm bluetooth parport_pc ppdev lp parport isofs acpiphp
[248587.286822] 
[248587.286827] Pid: 18805, comm: ohai Not tainted 3.2.0-24-virtual 
#37+lp999755dbg1  
[248587.286836] RIP: e030:[81050cd5]  [81050cd5] 
pick_next_entity+0x105/0x110
[248587.286849] RSP: e02b:8801d02d7c28  EFLAGS: 00010096
[248587.286854] RAX: 002d RBX: 8801d20df800 RCX: 
0003
[248587.286860] RDX:  RSI: 81e000a0 RDI: 
0004
[248587.286866] RBP: 8801d02d7c48 R08: 000a R09: 

[248587.286872] R10:  R11:  R12: 
8801dff866c0
[248587.286878] R13: 8801d20de600 R14:  R15: 
01f41018
[248587.286889] FS:  7f5ec2010700() GS:8801dff73000() 
knlGS:
[248587.286895] CS:  e033 DS:  ES:  CR0: 8005003b
[248587.286901] CR2: 01f09f30 CR3: 0001cec3e000 CR4: 
2660
[248587.286908] DR0:  DR1:  DR2: 

[248587.286914] DR3:  DR6: 0ff0 DR7: 
0400
[248587.286921] Process ohai (pid: 18805, threadinfo 8801d02d6000, task 
8801cf10db80)
[248587.286928] Stack:
[248587.286931]  8801d20df800 8801dff866c0 8801d20de600 

[248587.286944]  8801d02d7c78 810544e8 8801d02d7c78 
8801dff866c0
[248587.286956]   8801cf10df28 8801d02d7cf8 
81652f3c
[248587.286968] Call Trace:
[248587.286976]  [810544e8] pick_next_task_fair+0x38/0x70
[248587.286984]  [81652f3c] __schedule+0x14c/0x6f0
[248587.286992]  [8165564e] ? _raw_spin_unlock_irqrestore+0x1e/0x30
[248587.286999]  [816535af] schedule+0x3f/0x60
[248587.287006]  [8117e149] pipe_wait+0x59/0x80
[248587.287014]  [81089370] ? add_wait_queue+0x60/0x60
[248587.287021]  [8117e8aa] pipe_read+0x1da/0x330
[248587.287028]  [81174552] do_sync_read+0xd2/0x110
[248587.287036]  [8100a25d] ? xen_force_evtchn_callback+0xd/0x10
[248587.287043]  [8100aa32] ? check_events+0x12/0x20
[248587.287051]  [81298d63] ? security_file_permission+0x93/0xb0
[248587.287057]  [811749d1] ? rw_verify_area+0x61/0xf0
[248587.287063]  [81174eb0] vfs_read+0xb0/0x180
[248587.287069]  [81174fca] sys_read+0x4a/0x90
[248587.287076]  [8165da42] system_call_fastpath+0x16/0x1b
[248587.287081] Code: 89 df e8 69 c1 5e 00 48 8b 73 38 48 c7 c7 7b ef 9f 81 31 
c0 e8 98 c9 5e 00 48 8b 73 10 48 c7 c7 98 ef 9f 81 31 c0 e8 86 c9 5e 00 0f 0b 
66 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 10 48 89 
[248587.287170] RIP  [81050cd5] pick_next_entity+0x105/0x110
[248587.287177]  RSP 8801d02d7c28
[248587.287194] ---[ end trace 124f7d4d99f55a46 ]---

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/999755

Title:
  Kernel crash on EC2  VirtualBox

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/999755/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 999755] Re: Kernel crash on EC2 VirtualBox

2012-05-28 Thread Karl Matthias
We have had a crash with Stefan's debugging kernel running.  Here is the
output (doesn't look like it contains any more information).

[248587.286290] [ cut here ]
[248587.286765] kernel BUG at 
/home/smb/precise-amd64/ubuntu-2.6/kernel/sched_fair.c:1239!
[248587.286775] invalid opcode:  [#1] SMP 
[248587.286783] CPU 0 
[248587.286786] Modules linked in: ipt_REJECT xt_tcpudp nf_conntrack_ipv4 
nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables bnep 
rfcomm bluetooth parport_pc ppdev lp parport isofs acpiphp
[248587.286822] 
[248587.286827] Pid: 18805, comm: ohai Not tainted 3.2.0-24-virtual 
#37+lp999755dbg1  
[248587.286836] RIP: e030:[81050cd5]  [81050cd5] 
pick_next_entity+0x105/0x110
[248587.286849] RSP: e02b:8801d02d7c28  EFLAGS: 00010096
[248587.286854] RAX: 002d RBX: 8801d20df800 RCX: 
0003
[248587.286860] RDX:  RSI: 81e000a0 RDI: 
0004
[248587.286866] RBP: 8801d02d7c48 R08: 000a R09: 

[248587.286872] R10:  R11:  R12: 
8801dff866c0
[248587.286878] R13: 8801d20de600 R14:  R15: 
01f41018
[248587.286889] FS:  7f5ec2010700() GS:8801dff73000() 
knlGS:
[248587.286895] CS:  e033 DS:  ES:  CR0: 8005003b
[248587.286901] CR2: 01f09f30 CR3: 0001cec3e000 CR4: 
2660
[248587.286908] DR0:  DR1:  DR2: 

[248587.286914] DR3:  DR6: 0ff0 DR7: 
0400
[248587.286921] Process ohai (pid: 18805, threadinfo 8801d02d6000, task 
8801cf10db80)
[248587.286928] Stack:
[248587.286931]  8801d20df800 8801dff866c0 8801d20de600 

[248587.286944]  8801d02d7c78 810544e8 8801d02d7c78 
8801dff866c0
[248587.286956]   8801cf10df28 8801d02d7cf8 
81652f3c
[248587.286968] Call Trace:
[248587.286976]  [810544e8] pick_next_task_fair+0x38/0x70
[248587.286984]  [81652f3c] __schedule+0x14c/0x6f0
[248587.286992]  [8165564e] ? _raw_spin_unlock_irqrestore+0x1e/0x30
[248587.286999]  [816535af] schedule+0x3f/0x60
[248587.287006]  [8117e149] pipe_wait+0x59/0x80
[248587.287014]  [81089370] ? add_wait_queue+0x60/0x60
[248587.287021]  [8117e8aa] pipe_read+0x1da/0x330
[248587.287028]  [81174552] do_sync_read+0xd2/0x110
[248587.287036]  [8100a25d] ? xen_force_evtchn_callback+0xd/0x10
[248587.287043]  [8100aa32] ? check_events+0x12/0x20
[248587.287051]  [81298d63] ? security_file_permission+0x93/0xb0
[248587.287057]  [811749d1] ? rw_verify_area+0x61/0xf0
[248587.287063]  [81174eb0] vfs_read+0xb0/0x180
[248587.287069]  [81174fca] sys_read+0x4a/0x90
[248587.287076]  [8165da42] system_call_fastpath+0x16/0x1b
[248587.287081] Code: 89 df e8 69 c1 5e 00 48 8b 73 38 48 c7 c7 7b ef 9f 81 31 
c0 e8 98 c9 5e 00 48 8b 73 10 48 c7 c7 98 ef 9f 81 31 c0 e8 86 c9 5e 00 0f 0b 
66 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 10 48 89 
[248587.287170] RIP  [81050cd5] pick_next_entity+0x105/0x110
[248587.287177]  RSP 8801d02d7c28
[248587.287194] ---[ end trace 124f7d4d99f55a46 ]---

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/999755

Title:
  Kernel crash on EC2  VirtualBox

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/999755/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 999755] Re: Kernel crash on EC2 VirtualBox

2012-05-28 Thread Gavin Heavyside
And another one from the debug kernel on EC2, with a slightly different
call stack:

[ 4389.480352] [ cut here ]
[ 4389.480884] kernel BUG at 
/home/smb/precise-amd64/ubuntu-2.6/kernel/sched_fair.c:1239!
[ 4389.480894] invalid opcode:  [#1] SMP 
[ 4389.480902] CPU 0 
[ 4389.480905] Modules linked in: ipt_REJECT xt_tcpudp nf_conntrack_ipv4 
nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables bnep 
rfcomm bluetooth parport_pc ppdev lp parport isofs acpiphp
[ 4389.480945] 
[ 4389.480949] Pid: 24612, comm: ohai Not tainted 3.2.0-24-virtual 
#37+lp999755dbg1  
[ 4389.480958] RIP: e030:[81050cd5]  [81050cd5] 
pick_next_entity+0x105/0x110
[ 4389.480972] RSP: e02b:8801d0cdd818  EFLAGS: 00010092
[ 4389.480977] RAX: 002c RBX: 8801d1106e00 RCX: 0003
[ 4389.480983] RDX:  RSI: 81e000a0 RDI: 0004
[ 4389.480988] RBP: 8801d0cdd838 R08: 000a R09: 
[ 4389.480994] R10:  R11:  R12: 8801dff866c0
[ 4389.481000] R13: 8801d1107a00 R14: 0008 R15: 8801d1e97800
[ 4389.481009] FS:  7f09fcc8a700() GS:8801dff73000() 
knlGS:
[ 4389.481015] CS:  e033 DS:  ES:  CR0: 8005003b
[ 4389.481021] CR2: 012dbf30 CR3: 0001d0702000 CR4: 2660
[ 4389.481028] DR0:  DR1:  DR2: 
[ 4389.481035] DR3:  DR6: 0ff0 DR7: 0400
[ 4389.481041] Process ohai (pid: 24612, threadinfo 8801d0cdc000, task 
8801d2905b80)
[ 4389.481047] Stack:
[ 4389.481051]  8801d1106e00 8801dff866c0 8801d1107a00 
0008
[ 4389.481063]  8801d0cdd868 810544e8 8801d0cdd868 
8801dff866c0
[ 4389.481075]   8801d2905f28 8801d0cdd8e8 
81652f3c
[ 4389.481086] Call Trace:
[ 4389.481093]  [810544e8] pick_next_task_fair+0x38/0x70
[ 4389.481101]  [81652f3c] __schedule+0x14c/0x6f0
[ 4389.481108]  [816535af] schedule+0x3f/0x60
[ 4389.481114]  [816546cd] schedule_hrtimeout_range_clock+0x14d/0x170
[ 4389.481124]  [8100aa1f] ? xen_restore_fl_direct_reloc+0x4/0x4
[ 4389.481131]  [8165564e] ? _raw_spin_unlock_irqrestore+0x1e/0x30
[ 4389.481139]  [8108935d] ? add_wait_queue+0x4d/0x60
[ 4389.481146]  [81654703] schedule_hrtimeout_range+0x13/0x20
[ 4389.481154]  [811877d9] poll_schedule_timeout+0x49/0x70
[ 4389.481160]  [81188356] do_select+0x4d6/0x600
[ 4389.481167]  [811878e0] ? poll_freewait+0xe0/0xe0
[ 4389.481173]  [811879d0] ? __pollwait+0xf0/0xf0
[ 4389.481179]  [81005191] ? __raw_callee_save_xen_pte_val+0x11/0x1e
[ 4389.481186]  [81054c3e] ? update_curr+0x21e/0x230
[ 4389.481192]  [8103cc65] ? pvclock_clocksource_read+0x55/0xf0
[ 4389.481199]  [810553db] ? check_preempt_wakeup+0x15b/0x230
[ 4389.481207]  [8104ed74] ? check_preempt_curr+0x84/0xa0
[ 4389.481214]  [8104edcd] ? ttwu_do_wakeup+0x3d/0x120
[ 4389.481222]  [8130fd49] ? put_dec+0x59/0x60
[ 4389.481228]  [81310c1f] ? number.isra.2+0x31f/0x350
[ 4389.481236]  [81323436] ? nla_parse+0x86/0xe0
[ 4389.481242]  [8165564e] ? _raw_spin_unlock_irqrestore+0x1e/0x30
[ 4389.481250]  [8105e610] ? try_to_wake_up+0x190/0x200
[ 4389.481257]  [81188641] core_sys_select+0x1c1/0x330
[ 4389.481263]  [8100aa32] ? check_events+0x12/0x20
[ 4389.481269]  [8100a25d] ? xen_force_evtchn_callback+0xd/0x10
[ 4389.481276]  [8100aa32] ? check_events+0x12/0x20
[ 4389.481282]  [8100aa1f] ? xen_restore_fl_direct_reloc+0x4/0x4
[ 4389.481289]  [81004c62] ? xen_mc_flush+0xb2/0x1c0
[ 4389.481295]  [8100aa1f] ? xen_restore_fl_direct_reloc+0x4/0x4
[ 4389.481302]  [811889eb] sys_select+0xbb/0x100
[ 4389.481308]  [8105edf7] ? schedule_tail+0x27/0xb0
[ 4389.481314]  [8165da42] system_call_fastpath+0x16/0x1b
[ 4389.481319] Code: 89 df e8 69 c1 5e 00 48 8b 73 38 48 c7 c7 7b ef 9f 81 31 
c0 e8 98 c9 5e 00 48 8b 73 10 48 c7 c7 98 ef 9f 81 31 c0 e8 86 c9 5e 00 0f 0b 
66 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 10 48 89 
[ 4389.481407] RIP  [81050cd5] pick_next_entity+0x105/0x110
[ 4389.484006]  RSP 8801d0cdd818
[ 4389.484006] ---[ end trace 7ee7cea7516c9821 ]---

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/999755

Title:
  Kernel crash on EC2  VirtualBox

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/999755/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs