[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
This bug was fixed in the package linux - 2.6.35-24.42 --- linux (2.6.35-24.42) maverick-proposed; urgency=low [ Brad Figg ] - LP: #683422 [ Colin Ian King ] * SAUCE: Allow registration of handler to multiple WMI events with same GUID - LP: #676997 * SAUCE: Add WMI hotkeys support for Dell All-In-One series - LP: #676997 * [Config] Enable Dell All-In-One WMI Hotkeys driver - LP: #676997 [ David Woodhouse ] * [Upstream] Call acpi_video_register() in intel_opregion_init() failure path - LP: #615947 [ Manoj Iyer ] * SAUCE: enable rfkill for rtl8192se driver - LP: #640992 * SAUCE: Enable jack sense for Thinkpad Edge 11 - LP: #677210 [ Tim Gardner ] * [Config] Use correct be2iscsi module name in d-i/modules/scsi-modules - LP: #628776 * [Config] Added NFS and related modules to virtual flavour - LP: #659084 * [Config] Add support for cross compiling armel * Simplify the use of CROSS_COMPILER [ Upstream Kernel Changes ] * Revert (pre-stable) ACPI: enable repeated PCIEXP wakeup by clearing PCIEXP_WAKE_STS on resume * Revert (pre-stable) mm: Move vma_stack_continue into mm.h * x86, cpu: After uncapping CPUID, re-run CPU feature detection - LP: #672664 * ALSA: sound/pci/rme9652: prevent reading uninitialized stack memory - LP: #672664 * ALSA: oxygen: fix analog capture on Claro halo cards - LP: #672664 * ALSA: hda - Add Dell Latitude E6400 model quirk - LP: #643891, #672664 * ALSA: prevent heap corruption in snd_ctl_new() - LP: #672664 * ALSA: rawmidi: fix oops (use after free) when unloading a driver module - LP: #672664 * hwmon: (lis3) Fix Oops with NULL platform data - LP: #672664 * USB: fix bug in initialization of interface minor numbers - LP: #672664 * usb: musb: gadget: fix kernel panic if using out ep with FIFO_TXRX style - LP: #672664 * usb: musb: gadget: restart request on clearing endpoint halt - LP: #672664 * HID: hidraw, fix a NULL pointer dereference in hidraw_ioctl - LP: #672664 * HID: hidraw, fix a NULL pointer dereference in hidraw_write - LP: #672664 * ahci: fix module refcount breakage introduced by libahci split - LP: #672664 * lib/list_sort: do not pass bad pointers to cmp callback - LP: #672664 * ACPI: invoke DSDT corruption workaround on all Toshiba Satellite - LP: #672664 * oprofile: Add Support for Intel CPU Family 6 / Model 29 - LP: #672664 * oprofile, ARM: Release resources on failure - LP: #672664 * RDMA/cxgb3: Turn off RX coalescing for iWARP connections - LP: #672664 * drm/radeon/kms: fix bad cast/shift in evergreen.c - LP: #672664 * drm/radeon/kms: avivo cursor workaround applies to evergreen as well - LP: #672664 * ARM: 6400/1: at91: fix arch_gettimeoffset fallout - LP: #672664 * ARM: 6395/1: VExpress: Set bit 22 in the PL310 (cache controller) AuxCtlr register - LP: #672664 * V4L/DVB: gspca - main: Fix a crash of some webcams on ARM arch - LP: #672664 * V4L/DVB: gspca - sn9c20x: Bad transfer size of Bayer images - LP: #672664 * mmc: sdhci-s3c: fix NULL ptr access in sdhci_s3c_remove - LP: #672664 * x86/amd-iommu: Set iommu configuration flags in enable-loop - LP: #672664 * x86/amd-iommu: Fix rounding-bug in __unmap_single - LP: #672664 * x86/amd-iommu: Work around S3 BIOS bug - LP: #672664 * tracing/x86: Don't use mcount in pvclock.c - LP: #672664 * tracing/x86: Don't use mcount in kvmclock.c - LP: #672664 * ksm: fix bad user data when swapping - LP: #672664 * i7core_edac: fix panic in udimm sysfs attributes registration - LP: #672664 * v4l1: fix 32-bit compat microcode loading translation - LP: #672664 * V4L/DVB: cx231xx: Avoid an OOPS when card is unknown (card=0) - LP: #672664 * V4L/DVB: IR: fix keys beeing stuck down forever - LP: #672664 * V4L/DVB: Don't identify PV SBTVD Hybrid as a DibCom device - LP: #672664 * Input: joydev - fix JSIOCSAXMAP ioctl - LP: #672664 * Input: wacom - fix pressure in Cintiq 21UX2 - LP: #672664 * ioat2: fix performance regression - LP: #672664 * mac80211: fix use-after-free - LP: #672664 * x86, hpet: Fix bogus error check in hpet_assign_irq() - LP: #672664 * x86, irq: Plug memory leak in sparse irq - LP: #672664 * ubd: fix incorrect sector handling during request restart - LP: #672664 * OSS: soundcard: locking bug in sound_ioctl() - LP: #672664 * virtio-blk: fix request leak. - LP: #672664 * ring-buffer: Fix typo of time extends per page - LP: #672664 * dmaengine: fix interrupt clearing for mv_xor - LP: #672664 * drivers/gpu/drm/i915/i915_gem.c: Add missing error handling code - LP: #672664 * hrtimer: Preserve timer state in remove_hrtimer() - LP: #672664 * i2c-pca: Fix waitforcompletion() return value - LP: #672664 * reiserfs: fix
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
I've verified this: * start instance of (t1.micro) # us-east-1 ami-548c783d ebs/ubuntu-maverick-10.10-amd64-server-20101007.1 * ssh instance, install kernel reboot % wget https://launchpad.net/ubuntu/+archive/primary/+files/linux-image-2.6.35-24-virtual_2.6.35-24.42_amd64.deb % sudo dpkg -i linux-image-2.6.35-24-virtual_2.6.35-24.42_amd64.deb % sudo reboot * ssh instance again, verify in new kernel, then shutdown % $ uname -a Linux ip-10-202-31-117 2.6.35-24-virtual #42-Ubuntu SMP Thu Dec 2 05:15:26 UTC 2010 x86_64 GNU/Linux % sudo poweroff * for each type c1.xlarge, m2.2xlarge $ ec2-stop-instances ${IID} $ ec2-modify-instance-attribute --instance-type ${ITYPE} ${IID} $ ec2-start-instances ${IID} # 5 times test reboot (note, the cpu info hopefully # shows E5506 where it failed before) $ for i in 1 2 3 4 5; do ssh $EC2_HOST uname -a; uptime; grep Xeon /proc/cpuinfo | head -n 1; sudo reboot echo $i: passed || echo $i: failed; sleep 2m; done $ ssh $EC2_HOST sudo poweroff I got an instance with X5550 in both c1.xlarge and m2.2xlarge and successfully rebooted and connected 5 times in a row. ** Tags added: verification-done ** Tags removed: verification-needed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/651370 Title: ec2 kernel crash invalid opcode [#1] -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
I can confirm to be able to boot the latest kernel in a m2.4xlarge instance which was usually crashing because it landed on hardware that triggered the intel_idle driver to load. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/651370 Title: ec2 kernel crash invalid opcode [#1] -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
Also confirmed that I can boot the kernel from #30 in an m2.4xlarge instance. It still sees only 32 GB of memory, though (bug 667796). -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/651370 Title: ec2 kernel crash invalid opcode [#1] -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
Accepted linux into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance! ** Changed in: linux (Ubuntu Maverick) Status: In Progress = Fix Committed ** Tags added: verification-needed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/651370 Title: ec2 kernel crash invalid opcode [#1] -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
will a fix for this be backported to Maverick? -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
This bug was fixed in the package linux - 2.6.37-3.11 --- linux (2.6.37-3.11) natty; urgency=low [ Andy Whitcroft ] * Revert ubuntu: AUFS -- update to b37c575759dc4535ccc03241c584ad5fe69e3b25 * Revert ubuntu: AUFS -- track changes to the arguements to fop fsync() * Revert ubuntu: AUFS -- update to standalone 2.6.35-rcN as at 20100601 * Revert ubuntu: AUFS -- update to standalone 2.6.34 as at 20100601 * Revert ubuntu: AUFS -- aufs2 base patch for linux-2.6.34 * [Config] Disable intel_idle for -virtual kernels - LP: #651370 * [Config] enforcer -- ensure we never enable CONFIG_IMA * debian -- pass the correct flavour name when checking configs * [Config] enforcer -- ensure CONFIG_INTEL_IDLE is off for -virtual * [Config] ensure CONFIG_IPV6=y for powerpc * [Config] enforcer -- ensure CONFIG_IPV6=y * ubuntu: AUFS -- aufs2-base.patch aufs2.1-36-UNRELEASED-20101103 * ubuntu: AUFS -- aufs2-standalone.patch aufs2.1-36-UNRELEASED-20101103 * ubuntu: AUFS -- update to aufs2.1-36-UNRELEASED-20101103 * ubuntu: AUFS -- re-enable * ubuntu: AUFS -- track changes to work queue initialisation * ubuntu: AUFS -- track changes to llseek in v2.6.37-rc1 * SAUCE: fbcon -- fix race between open and removal of framebuffers * SAUCE: fbcon -- fix OOPs triggered by race prevention fixes - LP: #614008 * SAUCE: drm -- stop early access to drm devices [ Jeremy Kerr ] * [Config] Build-in powermac ZILOG serial driver - LP: #673346 [ Kees Cook ] * SAUCE: nx-emu: use upstream ASLR when possible [ Tim Gardner ] * [Config] Use correct be2iscsi module name in d-i/modules/scsi-modules - LP: #628776 [ Upstream Kernel Changes ] * i386: NX emulation * nx-emu: drop exec-shield sysctl, merge with disable_nx * nx-emu: standardize boottime message prefix * mmap randomization for executable mappings on 32-bit * exec-randomization: brk away from exec rand area -- Andy Whitcroft a...@canonical.com Thu, 11 Nov 2010 23:46:37 + ** Changed in: linux (Ubuntu) Status: Triaged = Fix Released -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
@Brandon, sorry for the late response. Have been traveling. And yes, Scott's reply is right. The comment about 68G was made because selecting this size seems to trigger the crash more reliably. But it has nothing to do with the memory size itself. Just that requesting that size seems to get you a recent Intel box behind the covers. Just found this to happen while looking at another bug about 68G not being detected correctly in Maverick and finding that I never get the instance up due to this. -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
@Brandon, Stefan's comment in the SRU justification about 68G of memory (which should have been 64) is really only suggesting that selection of a larger instance size seems more likely to land you on newer hardware where failure is more likely. -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
** Also affects: linux (Ubuntu Maverick) Importance: Undecided Status: New -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
** Changed in: linux (Ubuntu Maverick) Status: New = In Progress ** Changed in: linux (Ubuntu) Status: Confirmed = In Progress -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
** Description changed: + SRU Justification: + + Impact: Booting an Intel based instance with certain CPU level will fail + with a panic as the driver does not seem to take into account that it is + running in a virtualized environment. This only is a problem with the + intel_idle driver. + + Fix: Turning off intel_idle driver support for the virtual kernel image + will let it use the generic idle driver as before. As this option is + only changed for the virtual kernel package there is no risk of + regression for the generic packages. + + Testcase: Booting a large instance (with 68GB of memory) very likely + results in this panic as the memory size will result in selecting + certain base hardware with Intel CPUs. Turning the option off lets those + instances boot again. + + --- + I saw a kernel crash in maverick RC testing. I will attach console output here, the system reported is the same AMI, but the issue occurred on c1.xlarge instance type. The crash begins like this: - [2725458.312511] invalid opcode: [#1] SMP - [2725458.312521] last sysfs file: - [2725458.312526] CPU 0 + [2725458.312511] invalid opcode: [#1] SMP + [2725458.312521] last sysfs file: + [2725458.312526] CPU 0 [2725458.312529] Modules linked in: - [2725458.312536] + [2725458.312536] [2725458.312541] Pid: 0, comm: swapper Not tainted 2.6.35-22-virtual #33-Ubuntu / [2725458.312548] RIP: e030:[8130805c] [8130805c] intel_idle+0xac/0x180 [2725458.312565] RSP: e02b:81a01ec8 EFLAGS: 00010046 But possibly the interesting piece of data is earlier in the log: [0.00] pcpu-alloc: s91520 r8192 d23168 u122880 alloc=30*4096 - [0.00] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 [0] 6 [0] 7 + [0.00] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 [0] 6 [0] 7 [2725457.617698] Xen: using vcpu_info placement [2725457.617705] Built 1 zonelists in Node order, mobility grouping on. Total pages: 1809808 [2725457.617707] Policy zone: Normal - [2725457.617711] Kernel command line: root=LABEL=uec-rootfs ro console=hvc0 + [2725457.617711] Kernel command line: root=LABEL=uec-rootfs ro console=hvc0 There, we go from an uptime of 0.00 to 2725457 seconds (757 hours) during boot. ProblemType: Bug DistroRelease: Ubuntu 10.10 Package: linux-image-2.6.35-22-virtual 2.6.35-22.33 Regression: No Reproducible: No ProcVersionSignature: User Name 2.6.35-22.33-virtual 2.6.35.4 Uname: Linux 2.6.35-22-virtual x86_64 AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory AplayDevices: Error: [Errno 2] No such file or directory Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory CurrentDmesg: Date: Wed Sep 29 18:03:42 2010 Ec2AMI: ami-7a699c13 Ec2AMIManifest: (unknown) Ec2AvailabilityZone: us-east-1c Ec2InstanceType: t1.micro Ec2Kernel: aki-427d952b Ec2Ramdisk: unavailable Frequency: This has only happened once. Lspci: Lsusb: Error: command ['lsusb'] failed with exit code 1: ProcCmdLine: root=LABEL=uec-rootfs ro console=hvc0 ProcEnviron: PATH=(custom, user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcModules: acpiphp 18752 0 - Live 0xa000 SourcePackage: linux ** Changed in: linux (Ubuntu) Status: In Progress = Triaged ** Changed in: linux (Ubuntu) Assignee: (unassigned) = Andy Whitcroft (apw) ** Changed in: linux (Ubuntu Maverick) Assignee: (unassigned) = John Johansen (jjohansen) ** Changed in: linux (Ubuntu Maverick) Importance: Undecided = Medium -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
Stefan: the ~32 vs ~64GB memory issue is very likely orthogonal and has a separate bug now (bug 667796). This issue is solely about intel_idle vs certain CPU types under Amazon's EC2 (Xen) environment. m2.4xlarge in us-east reproduces the crash on boot readily (and also happens to exhibit the memory limit issue), and c1.xlarge reproduces it some of the time (depending which hardware you are randomly assigned). -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
John Johansen's suggested -23.36 kernel booted, but still exhibited bug 667796. Linux ip-10-230-9-131 2.6.35-23-virtual #36~ec2 SMP Thu Oct 28 15:07:00 UTC 2010 x86_64 GNU/Linux [0.00] PERCPU: Embedded 30 pages/cpu @88000e8c7000 s91520 r8192 d23168 u122880 [0.00] pcpu-alloc: s91520 r8192 d23168 u122880 alloc=30*4096 [0.00] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 [0] 6 [0] 7 [8698363.527286] trying to map vcpu_info 0 at 88000e8d2020, mfn 10569b2, offset 32 [8698363.527290] cpu 0 using vcpu_info at 88000e8d2020 [8698363.527292] trying to map vcpu_info 1 at 88000e8f0020, mfn 1056994, offset 32 [8698363.527294] cpu 1 using vcpu_info at 88000e8f0020 ... ubu...@ip-10-230-9-131:~$ free total used free sharedbuffers cached Mem: 32810684 669128 32141556 0 7016 32268 -/+ buffers/cache: 629844 32180840 Swap:0 0 0 -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
Mike, Thanks for your test. Its interesting that we still see the time travel of roughly 100 days in your dmesg. I gather the system was otherwise usable ? Other than only showing 32G of memory. -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
I'm attaching a console output of a lucid 10.04 from: us-east-1 ami-4a0df923 canonical ebs/ubuntu-lucid-10.04-amd64-server-20101020 This shows very interesting time travel (both forward and backward) on an otherwise functional instance. Thus, while the kernel time messages are not pretty looking, they don't necessarily correlate with this bug occuring. ** Attachment added: m2.4xlarge console output showing time travel forward and back https://bugs.launchpad.net/ubuntu/+source/linux/+bug/651370/+attachment/1719478/+files/lucid-m2.4xlarge-console.txt -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
Brandon, sorry about failing to get the command line changed in the ami i rebundled. I really thought I tested that it had the proper command line before posting here. The problem in my steps above was selecting keep local version. I should have chosen use maintainers version. Regarding simple changes to the s3 amis, the easiest thing to do (and actually what i would recommend for *non* simple changes) is to download the .tar.gz file from http://uec-images.ubuntu.com/releases/maverick/ . extract it, mount it loop back, modify files (or chroot and modify files), uec-resize-image (the downloaded filesystem image is only 2G). then euca-bundle-image euca-publish-image... I also registered 'ami-aa42b6c3' and verified boot on a t1.micro and checked it has the command line. John is hoping to get rebuilt kernel images that would have these options in the config. He should point to them sometime soon. -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
Mikael, I opened bug 667696 to address the 32G issue. Brandon, I opened bug 667793 to address euca-bundle-vol not copying the filesystem label. I copied you each on the respective bugs. -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
There are maverick test kernels at kernel.ubuntu.com/~jj/linux-image-2.6.35-23-virtual_2.6.35-23.36~ec2_amd64.deb kernel.ubuntu.com/~jj/linux-image-2.6.35-23-virtual_2.6.35-23.36~ec2_i386.deb -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
Brandon's and Scott's workaround works for me partly, but the kernel on an instance started in such a way seems to detect only 32 GB of memory even for a m2.4xlarge instance which should have 68.4 GB available, according to the EC2 instances page. Is this a side-effect of the workaround, or a completely separate bug? Maveric results: ubu...@ip-10-230-9-87:~$ uname -a Linux ip-10-230-9-87 2.6.35-22-virtual #35-Ubuntu SMP Sat Oct 16 23:19:29 UTC 2010 x86_64 GNU/Linux ubu...@ip-10-230-9-87:~$ ec2metadata --instance-type m2.4xlarge ubu...@ip-10-230-9-87:~$ free total used free sharedbuffers cached Mem: 32810684 667628 32143056 0 6444 32152 -/+ buffers/cache: 629032 32181652 Swap:0 0 0 Expected results (from a SUSE 11 guest): ip-10-230-45-187:~ # uname -a Linux ip-10-230-45-187 2.6.32.19-0.3-ec2 #1 SMP 2010-09-17 20:28:21 +0200 x86_64 x86_64 x86_64 GNU/Linux ip-10-230-45-187:~ # curl http://169.254.169.254/latest/meta-data/instance-type m2.4xlarge ip-10-230-45-187:~ # free total used free sharedbuffers cached Mem: 717051162361584 69343532 0 10972 126424 -/+ buffers/cache:2224188 69480928 Swap:0 0 0 -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
I wasn't able to boot on ami-d258acbb on m2.4xlarge. It seemed to come up without the special kernel options: [0.00] Linux version 2.6.35-22-virtual (bui...@allspice) (gcc version 4.4.5 (Ubuntu/Linaro 4.4.4-14ubuntu4) ) #33-Ubuntu SMP Sun Sep 19 21:05:42 UTC 2010 (Ubuntu 2.6.35-22.33-virtual 2.6.35.4) [0.00] Command line: root=LABEL=uec-rootfs ro console=hvc0 And then hung in intel_idle as expected. Also, confirmed apparent 32GB memory limit on this kernel + machine type. -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
What's the method for making the S3 AMIs by the way? When I tried before, I tried just doing standard ec2-bundle-vol stuff inside of a fixed Maverick, but my first attempts failed because of the root device not having LABEL=euc-rootfs in the newly-launched instances, and the second generation I manually switched the root to /dev/sda1, but had other mysterious boot failures. Is there some standard tool or script used to package the official AMIs that we can use to produce identical results (with small changes)? -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
I just tried to launch 16 * m2.4xlarge instances with ami-e43e0b90 in the eu-west-1b area, and not a single one would boot up successfully, because of this bug. Any workaround yet? -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
Well, I had a hunch this morning that perhaps my test AMI was faulty (perhaps some stupid issue related to block-device mapping, etc, which varies between the variations on c1.xlarge), since it wasn't packaged by the same methods/tools as the official one. It seems this may be the case. Going off the hint from Mikael that m2.4xlarge may exhibit the problems more reliably, I did the following experiment this morning using EBS root persistence to make the change, rather than custom instance-store AMIs: 1) Booted ami-548c783d (Maverick 64-bit EBS official) on m1.large in us-east-1. 2) Logged into this machine and edited /boot/grub/menu.lst manually to add intel_idle.max_cstate=0 idle=nomwait to the kernel bootflags. 3) Rebooted, instance came up fine with messages showing intel_idle disabled. 4) Stopped the instance, used ec2-modify-instance-attributes to move it to type m2.4xlarge 5) Booted on m2.4xlarge successfully, no crash (cpuinfo shows Xeon X5550, which is also model 26 like the failing c1.xlarges) 6) Edited menu.lst to remove the added bootflags and rebooted the instance again, (staying on same m2.4xlarge hardware) 7) Instance crashed on boot in intel_idle code as always Given these results, I think the kernel flags will workaround this issue, I just built a bad test AMI during my first tests yesterday. Could someone rebuild a set of Maverick AMIs with these flags added from the get-go using whatever the official method of packaging Maverick AMIs is, for public testing among those of us experiencing the bug? -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
I tried to look in more detail at the crash this evening, because it's really causing me a lot of headache now. The most recent time I tried to boot a new c1.xlarge in us-east-1 this evening, I had to cycle through the crash/terminate/relaunch cycle 7 times before I got a working instance. I don't have a patch or answer yet, but I have a lot of hints: 1) c1.xlarge seems to be going through some changes of underlying CPU/hardware, which could explain the randomness. It probably depends which hardware you land on. The older ones are Xeon E5410 and the newer ones are Xeon E5506. So far the only times I've gotten non-crashed launches and thought to check, they've all been the E5410's. 2) The exact instruction throwing invalid opcode is MONITOR (0f 01 c8). The instructions MONITOR and MWAIT are used for efficient idling on newer CPUs, which I guess is the whole point of the intel_idle code we're crashing in. 3) These are not the sorts of instructions that can be executed in a VM environment like Xen without special support. Googling reveals discussions/patches to Xen for supporting these instructions in various ways (either as a hypercall encapsulating the whole monitor/wait pair, or masking the capability in CPUID so that Linux doesn't detect support and doesn't try to use it all). Various related links: http://lists.xensource.com/archives/html/xen-devel/2010-04/msg00043.html http://markmail.org/thread/terab63w744x3m2r http://www.sfr-fresh.com/unix/misc/xen-4.0.1.tar.gz:a/xen-4.0.1/docs/misc/cpuid-config-for-guest.txt 4) intel_idle can be effectively disabled from the kernel commandline with intel_idle.max_cstate=0 ( http://kerneltrap.org/mailarchive/git- commits-head/2010/5/28/40718 ), which will fall back on acpi_idle behavior. If it still crashes, there's also a commandline flag idle=nomwait which might prevent acpi_idle from using mwait as well. I don't know at this point where the true bug lies. It could be that the intel_idle code needs to make an exception to its detection routines under Xen. It could be that some of Amazon's Xen hosts are configured differently (wrt CPUID masking for mwait) than others. It could be any of a number of related things. However, I suspect new AMIs for Maverick on EC2 that disable mwait from the commandline in grub.conf/menu.lst per above might fix this. I'll try making my own AMIs with this change in the morning and see how it goes. -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
I forgot to add above: on the E5410 c1.xlarge's that do boot successfully, the kernel output contains: Oct 26 07:37:55 ip-10-243-51-207 kernel: [0.210255] intel_idle: MWAIT substates: 0x2220 Oct 26 07:37:55 ip-10-243-51-207 kernel: [0.210257] intel_idle: does not run on family 6 model 23 Which I believe means that intel_idle figured out that it needs to disable itself on these. The E5506's are model 26 rather than 23. The intel_idle code has a case statement that switches on this model number. Model 23 (0x17) is commented out for FUTURE_USE and thus falls through to the does not run condition with the output above. Model 26 (0x1A) has a case statement and will attempt to use intel_idle support. -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
So far my test instances with one or both of the MWAIT-related kernel flags have given even worse results than the original: They boot showing intel_idle disabled on E5410 nodes only, but the (assumed) E5506 nodes just terminate themselves quickly with no console log output at all (even after waiting a while). I've opened a web support ticket with Amazon referencing my test AMI and this bug report to ask for their input. -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
Having the same issue on c1.xlarge in us-east-1 (kernel crash on boot related to intel_idle). I've booted the Maverick release AMI several times on m1.large instances fine, but I seem to have a 50%+ failure rate getting it to initially boot without crashing on c1.xlarge. You're going to need to roll new AMIs when/if this bug is fixed, because the failure means inability boot far enough to get the kernel upgraded in the first place. FWIW, I'm only even trying Maverick because of the unresolved kernel issues with Lucid on EC2 that have been hard to pin down (divide by zero panics in network-related areas of the kernel, apparent disk i/o lockups triggered by runaway CPU load triggered by apt somehow, etc...). What's going on with kernels on EC2? Is anyone at Ubuntu actually testing them? ** Attachment added: console.txt https://bugs.launchpad.net/ubuntu/+source/linux/+bug/651370/+attachment/1710799/+files/console.txt -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
Re: [Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
On Mon, 25 Oct 2010, Brandon Black wrote: Having the same issue on c1.xlarge in us-east-1 (kernel crash on boot related to intel_idle). I've booted the Maverick release AMI several times on m1.large instances fine, but I seem to have a 50%+ failure rate getting it to initially boot without crashing on c1.xlarge. You're My experience is much lower than 50% failure rate. I've run literally hundreds of instances. This bug seems to hit in fits. The kernel team is interested in fixing these bugs. going to need to roll new AMIs when/if this bug is fixed, because the failure means inability boot far enough to get the kernel upgraded in the first place. Agreed. FWIW, I'm only even trying Maverick because of the unresolved kernel issues with Lucid on EC2 that have been hard to pin down (divide by zero panics in network-related areas of the kernel, apparent disk i/o lockups triggered by runaway CPU load triggered by apt somehow, etc...). What's Could you please open a bug ? Use ubuntu-bug /boot/vmlinuz-$(uname -r). And please attach console output of a kernel panic. I've not personally seen the bug you're describing. going on with kernels on EC2? Is anyone at Ubuntu actually testing them? We do test the kernels, our test suite (https://code.launchpad.net/~ubuntu-on-ec2/ubuntu-on-ec2/ec2-test) can admittedly be improved, but prior to any release we launch dozens of instances, spanning all sizes in all regions. I recently began publishing test results at https://code.launchpad.net/~ubuntu-on-ec2/ubuntu-on-ec2/ec2-test-results . -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
** Changed in: linux (Ubuntu) Importance: Undecided = Medium -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
** Attachment added: console log: us-east-1-x86_64-ami-1a9e6a73 (restarted) https://bugs.launchpad.net/ubuntu/+source/linux/+bug/651370/+attachment/1677126/+files/console-restart.txt -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
** Attachment added: console log: us-east-1-x86_64-ami-1a9e6a73 (first boot) https://bugs.launchpad.net/ubuntu/+source/linux/+bug/651370/+attachment/1677128/+files/console.txt -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
Moving this to confirmed, I attached 2 other console logs seeing this failure. In both cases, the clock jumped forward by hundreds of thousands of seconds. ** Changed in: linux (Ubuntu) Status: Incomplete = Confirmed -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
Hi Scott, If you could also please test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results. Thanks in advance. [This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.] ** Tags added: kj-triage ** Changed in: linux (Ubuntu) Status: New = Incomplete -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
** Attachment added: console log of failed instance https://bugs.launchpad.net/bugs/651370/+attachment/1654361/+files/console-restart.txt ** Attachment added: BootDmesg.txt https://bugs.launchpad.net/bugs/651370/+attachment/1654362/+files/BootDmesg.txt ** Attachment added: Dependencies.txt https://bugs.launchpad.net/bugs/651370/+attachment/1654363/+files/Dependencies.txt ** Attachment added: ProcCpuinfo.txt https://bugs.launchpad.net/bugs/651370/+attachment/1654364/+files/ProcCpuinfo.txt ** Attachment added: ProcInterrupts.txt https://bugs.launchpad.net/bugs/651370/+attachment/1654365/+files/ProcInterrupts.txt ** Attachment added: UdevDb.txt https://bugs.launchpad.net/bugs/651370/+attachment/1654366/+files/UdevDb.txt ** Attachment added: UdevLog.txt https://bugs.launchpad.net/bugs/651370/+attachment/1654367/+files/UdevLog.txt -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
** Tags added: iso-testing -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 651370] Re: ec2 kernel crash invalid opcode 0000 [#1]
** Description changed: I saw a kernel crash in maverick RC testing. I will attach console output here, the system reported is the same AMI, but the issue occurred on c1.xlarge instance type. The crash begins like this: - [2725458.312511] invalid opcode: [#1] SMP ^M - [2725458.312521] last sysfs file: ^M - [2725458.312526] CPU 0 ^M - [2725458.312529] Modules linked in:^M - [2725458.312536] ^M - [2725458.312541] Pid: 0, comm: swapper Not tainted 2.6.35-22-virtual #33-Ubuntu /^M - [2725458.312548] RIP: e030:[8130805c] [8130805c] intel_idle+0xac/0x180^M - [2725458.312565] RSP: e02b:81a01ec8 EFLAGS: 00010046^M + [2725458.312511] invalid opcode: [#1] SMP + [2725458.312521] last sysfs file: + [2725458.312526] CPU 0 + [2725458.312529] Modules linked in: + [2725458.312536] + [2725458.312541] Pid: 0, comm: swapper Not tainted 2.6.35-22-virtual #33-Ubuntu / + [2725458.312548] RIP: e030:[8130805c] [8130805c] intel_idle+0xac/0x180 + [2725458.312565] RSP: e02b:81a01ec8 EFLAGS: 00010046 But possibly the interesting piece of data is earlier in the log: - [0.00] pcpu-alloc: s91520 r8192 d23168 u122880 alloc=30*4096^M - [0.00] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 [0] 6 [0] 7 ^M - [2725457.617698] Xen: using vcpu_info placement^M - [2725457.617705] Built 1 zonelists in Node order, mobility grouping on. Total pages: 1809808^M - [2725457.617707] Policy zone: Normal^M - [2725457.617711] Kernel command line: root=LABEL=uec-rootfs ro console=hvc0 ^M + [0.00] pcpu-alloc: s91520 r8192 d23168 u122880 alloc=30*4096 + [0.00] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 [0] 6 [0] 7 + [2725457.617698] Xen: using vcpu_info placement + [2725457.617705] Built 1 zonelists in Node order, mobility grouping on. Total pages: 1809808 + [2725457.617707] Policy zone: Normal + [2725457.617711] Kernel command line: root=LABEL=uec-rootfs ro console=hvc0 There, we go from an uptime of 0.00 to 2725457 seconds (757 hours) during boot. ProblemType: Bug DistroRelease: Ubuntu 10.10 Package: linux-image-2.6.35-22-virtual 2.6.35-22.33 Regression: No Reproducible: No ProcVersionSignature: User Name 2.6.35-22.33-virtual 2.6.35.4 Uname: Linux 2.6.35-22-virtual x86_64 AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory AplayDevices: Error: [Errno 2] No such file or directory Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory CurrentDmesg: - + Date: Wed Sep 29 18:03:42 2010 Ec2AMI: ami-7a699c13 Ec2AMIManifest: (unknown) Ec2AvailabilityZone: us-east-1c Ec2InstanceType: t1.micro Ec2Kernel: aki-427d952b Ec2Ramdisk: unavailable Frequency: This has only happened once. Lspci: - + Lsusb: Error: command ['lsusb'] failed with exit code 1: ProcCmdLine: root=LABEL=uec-rootfs ro console=hvc0 ProcEnviron: - PATH=(custom, user) - LANG=en_US.UTF-8 - SHELL=/bin/bash + PATH=(custom, user) + LANG=en_US.UTF-8 + SHELL=/bin/bash ProcModules: acpiphp 18752 0 - Live 0xa000 SourcePackage: linux -- ec2 kernel crash invalid opcode [#1] https://bugs.launchpad.net/bugs/651370 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs