oneiric has seen the end of its life and is no longer receiving any
updates. Marking the oneiric task for this ticket as Won't Fix.
** Changed in: linux-lts-backport-oneiric (Ubuntu Oneiric)
Status: Confirmed = Won't Fix
--
You received this bug notification because you are a member of
** Branch linked: lp:ubuntu/lucid-proposed/linux-lts-backport-oneiric
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1011792
Title:
Kernel lockup running 3.0.0 and 3.2.0 on multiple EC2 instance
** Branch linked: lp:ubuntu/lucid-updates/linux-lts-backport-oneiric
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1011792
Title:
Kernel lockup running 3.0.0 and 3.2.0 on multiple EC2 instance
This bug was fixed in the package linux - 2.6.32-48.110
---
linux (2.6.32-48.110) lucid; urgency=low
[Steve Conklin]
* Release Tracking Bug
- LP: #1186340
[ Stefan Bader ]
* (config) Import Xen specific config options from ec2
- LP: #1177431
* SAUCE: xen: Send
No, you are guaranteed *not* to experience _this_ issue on a HVM guest.
That is because HVM guests use a completely different spinlock
implementation. It is possible that you see hangs/lockups but please
open a new bug report for those because it is a different issue.
--
You received this bug
Am I the only one experiencing this issue in HVM machines? Also, does
anyone happen to know if theres a precise AMI that fixes this issue?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1011792
Title:
** Branch linked: lp:ubuntu/lucid-proposed/linux-ec2
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1011792
Title:
Kernel lockup running 3.0.0 and 3.2.0 on multiple EC2 instance types
To manage
Did test runs for the current EC2 kernel (which we actually do not
expect to be affected at all) and the proposed virtual/server flavour
manually inserted into a Xen PV guest that is based on the cloud-images.
Both passed.
** Tags removed: verification-needed-lucid
** Tags added:
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed'
to 'verification-done'.
If verification is not done by one week from today, this fix will
** Branch linked: lp:ubuntu/lucid-security/linux-lts-backport-oneiric
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1011792
Title:
Kernel lockup running 3.0.0 and 3.2.0 on multiple EC2 instance
** Branch linked: lp:ubuntu/precise-security/linux-ti-omap4
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1011792
Title:
Kernel lockup running 3.0.0 and 3.2.0 on multiple EC2 instance types
To
This bug was fixed in the package linux - 3.5.0-27.46
---
linux (3.5.0-27.46) quantal-proposed; urgency=low
[Steve Conklin]
* Release Tracking Bug
- LP: #1159991
[ Steve Conklin ]
* Start New Release
[ Upstream Kernel Changes ]
* crypto: user - fix info leaks in
This bug was fixed in the package linux-lts-backport-oneiric -
3.0.0-32.51~lucid1
---
linux-lts-backport-oneiric (3.0.0-32.51~lucid1) lucid-proposed; urgency=low
[Steve Conklin]
* Release Tracking Bug
- LP: #1158541
[ Upstream Kernel Changes ]
* printk: fix buffer
This bug was fixed in the package linux - 3.0.0-32.51
---
linux (3.0.0-32.51) oneiric-proposed; urgency=low
[Steve Conklin]
* Release Tracking Bug
- LP: #1158340
[ Upstream Kernel Changes ]
* printk: fix buffer overflow when calling log_prefix function from
Verified that quantal-proposed passes the pgslam testcase.
** Tags removed: verification-needed-quantal
** Tags added: verification-done-quantal
** Changed in: linux (Ubuntu Quantal)
Status: Confirmed = Fix Committed
** Changed in: linux (Ubuntu Precise)
Assignee: Stefan Bader
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed'
to 'verification-done'.
If verification is not done by one week from today, this fix will
** Changed in: linux (Ubuntu Oneiric)
Status: Confirmed = Fix Committed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1011792
Title:
Kernel lockup running 3.0.0 and 3.2.0 on multiple EC2
Verified on Oneiric.
** Tags removed: verification-needed-oneiric
** Tags added: verification-done-oneiric
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1011792
Title:
Kernel lockup running 3.0.0
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed'
to 'verification-done'.
If verification is not done by one week from today, this fix will
This bug was fixed in the package linux - 3.2.0-39.62
---
linux (3.2.0-39.62) precise-proposed; urgency=low
[Brad Figg]
* Release Tracking Bug
- LP: #1134424
[ Herton Ronaldo Krzesinski ]
* Revert SAUCE: samsung-laptop: disable in UEFI mode
- LP: #1117693
* d-i:
Can confirm this affects Dom0; with kernel 3.2.0-38 the attached pgslam and Xen
set as:
GRUB_CMDLINE_XEN=dom0_mem=7000M dom0_max_vcpus=24 dom0_vcpus_pin
I can get a crash within two minutes on 3.2.0-38. Still testing 3.2.0-39
but it certainly gets past the two minute park.
--
You received this
For the record: we also experienced this issue on EC2 PV instances
(m2.2xlarge in us-east-1), using a very intensive workload (~1000 LXC
containers running a mix of web apps and databases). Running any kind of
3.X kernel (3.2, 3.4, 3.6) causes our test workload to crash in less
than one hour.
The
I'm testing 4 machines with pgslam with the kernel in precise-proposed.
There have been no issues for several days, it would be great if someone
could change the tag 'verification-needed' to 'verification-done', so
that it can be in the main repos.
Thanks!
--
You received this bug notification
** Tags removed: verification-needed-precise
** Tags added: verification-done-precise
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1011792
Title:
Kernel lockup running 3.0.0 and 3.2.0 on multiple
It is not completely surprising as dom0 actually is a PV guest. One with
special privileges though. But it is good to have confirmation that this also
would affect dom0 and is also fixed by the same change.
As said in comment #83, there is currently a Precise kernel in proposed that
will
Testing the fix in precise. I'll update this bug report with my findings
later this week to be sure it works.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1011792
Title:
Kernel lockup running
I've been having similar lockups on dom0's running Ubuntu 12.04 LTS
w/kernel linux-image-3.2.0-32-generic as a dom0. Below is output from
the last one, which shows the same stack trace being seen in the virtual
kernel image. I took the pgslam/setup_hi1_4xlarge_for_crash_test.sh test
scripts and
** Tags added: verification-needed-precise
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1011792
Title:
Kernel lockup running 3.0.0 and 3.2.0 on multiple EC2 instance types
To manage notifications
This bug is awaiting verification that the kernel in -proposed solves
the problem. Please test the kernel and update this bug with the
results. If the problem is solved, change the tag 'verification-needed'
to 'verification-done'.
If verification is not done by one week from today, this fix will
Stefan: Thanks for the building the oneiric backport kernel. We have
taken your advice though and updated to precise running the 3.2.0-38
patched kernel you uploaded.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
This bug was fixed in the package linux - 3.8.0-9.18
---
linux (3.8.0-9.18) raring; urgency=low
[Tim Gardner]
* Release Tracking Bug
- LP: #1135937
* [Config] CONFIG_PATA_ACPI=m
- LP: #1084783
[ Upstream Kernel Changes ]
* intel_idle: stop using driver_data for
The change from comment #79 is now upstream and also on its way to come
back into releases via upstream stable. So one of the future uploads
would be carrying that change. Note that Oneiric is quite close to be
without further support. It might be wise to think about upgrading to an
LTS. But ok, I
** Also affects: linux (Ubuntu Lucid)
Importance: Undecided
Status: New
** Also affects: linux-lts-backport-oneiric (Ubuntu Lucid)
Importance: Undecided
Status: New
** Also affects: linux (Ubuntu Oneiric)
Importance: Undecided
Status: New
** Also affects:
What is the best way to test if we are impacted by this bug? We run the
following image on c1.xlarge's and see nodes die about every 1-2 days
now under a continuous 50% CPU load. The nodes will fail the EC2
instance status check and all monitoring daemons on the node will stop
reporting.
However,
After finally having a breakthrough in understanding the source of the
lockup and further discussions upstream, the proper turns out to be to
change the way waiters are woken when a spinlock gets freed. A slightly
more verbose explanation of this is in the attached patch that likely
goes upstream.
** Changed in: linux (Ubuntu Precise)
Status: In Progress = Fix Committed
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1011792
Title:
Kernel lockup running 3.0.0 and 3.2.0 on multiple EC2
Stefan, let's be sure this fix gets upstream as well.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1011792
Title:
Kernel lockup running 3.0.0 and 3.2.0 on multiple EC2 instance types
To manage
Started to push for Precise now. Depending on whether I get into this or
the next cycle it could be 3 or 6 weeks. There will be requests for
testing and release notifications posted to this bug when it happens.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which
Any idea when this will patch will arrive in the precise kernel
packages?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1011792
Title:
Kernel lockup running 3.0.0 and 3.2.0 on multiple EC2 instance
Somehow I had hoped to find some way to understand why the fix works (or
what exactly goes wrong without it). But then other things and bad long-
time memory (sort of) came into play and this has not really progressed
much. So I try to actually get this into Precise at least (since no
testing
** Description changed:
+ SRU Justification:
+
+ Impact: Running lots of threads which utilize spinlocks (the pgslam
+ testcase is quite successful in causing this), we hit a stage where the
+ spinlock is still locked but none of the CPUs seem to be actively
+ holding it. The reason for this is
Thanks for the update, hope this works as this is a big problem for our
servers. We're thinking about downgrading to Lucid, but I'm eager to try
this out first. When do you think it can it can be available as normal
package update?
--
You received this bug notification because you are a member
It was circulated on the mailing list, but for simpler reference I am
adding it here (ok, I did clean up the comment section a bit).
** Patch added:
0001-xen-pv-spinlock-Never-enable-interrupts-in-xen_spin_.patch
** Tags added: patch
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1011792
Title:
Kernel lockup running 3.0.0 and 3.2.0 on multiple EC2 instance types
To manage notifications about this bug go to:
Stefan,
Is your patch somewhere accessible? Thx
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1011792
Title:
Kernel lockup running 3.0.0 and 3.2.0 on multiple EC2 instance types
To manage
FWIW -- we have dozens of SSD instances running the nopvspinlock patched
kernel that each serve ~1000 connections doing 10k+ QPS / 10k+ IOPS and
have uptimes in the weeks range now. I'll talk to the team about
getting some time to slot in the test1 kernel.
--
You received this bug notification
Well, the hosts are still running fine. pgslam.py eventually bombed out
on all of them with InternalError: could not open relation with OID.
Not sure why, but either way there were no lockups. It's possible
test1 resolves the issue. Konrad?
--
You received this bug notification because you are a
On Mon, Nov 12, 2012 at 08:47:11PM -, Steven Noonan wrote:
Stefan,
I did our internal SSD and network performance testing qualifications on
hi1.4xlarge with CONFIG_PARAVIRT_SPINLOCKS=n in the kernel build.
There's very little discernible difference in performance (within
statistical
I've got 'test1' running on 19 hi1.4xlarge instances using the pgslam.py
workload. They're all up to 40G of capacity used right now, still no
lockups.
I'll leave this running for a while longer and see what happens (9 hours
so far).
--
You received this bug notification because you are a member
Stefan,
I did our internal SSD and network performance testing qualifications on
hi1.4xlarge with CONFIG_PARAVIRT_SPINLOCKS=n in the kernel build.
There's very little discernible difference in performance (within
statistical noise ranges), and the benefits of disabling it are pretty
clear. Can
Steven, as I said before, that option affects both Xen and KVM
behaviour. So I want to avoid that as much as possible. I you could
please give the test1 kernel a try which only does not re-enable
interrupts of the guest while doing the hypercall. In my testing this
also would not cause the hang
I just put a test1 version of a kernel to the same place as the
nospinlock one. From my tests it seems that the problem in the Xen
paravirt spinlock implementation is the fact that they re-enable
interrupts (xen upcall event channel for that vcpu) during the hypercall
to poll for the spinlock irq.
Mike/Stefan, we're seeing the same freezing issue on an EC2 m2.2xl
instance - would love to hear if you had any success with the modified
kernel. Going to hvm would require us to over-provision to the 4xl
instance type, so we just backed out to an earlier distribution.
Broken:
uname -a
Linux
Early results are promising on the no spin lock kernel...we're at 14
instances running it and no lockups (been running it for 3 days). We're
going to see how the weekend goes and if successful, roll it out widely.
--
You received this bug notification because you are a member of Ubuntu
Bugs,
So after all nomodeset only makes it less likely, not impossible. :/ Ok,
just to be completely sure about the pv spinlock side, I compiled a
recent Precise kernel with just that disabled and put it on people[1].
If that survives a more rigorous testing, then at least that part of the
conclusion
Thanks Stefan, we just re-rotated the machine that locked up twice
yesterday with your non-pv-spinlock kernel, we'll keep you updated on
what happens.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
** Tags removed: kernel-key
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1011792
Title:
Kernel lockup running 3.0.0 and 3.2.0 on multiple EC2 instance types
To manage notifications about this bug
Summing up my current observations and theory. In the dump I too, I had
8 VPUs (0-7). With regard of spinlocks I saw the following:
CPU#0 claimed to be waiting on the spinlock for runqueue#0 (its own). Though
the actual lock was free (this could mean that it was just woken out of the
wait but
Unfortunately, it looks like we're still freezing up with noautogroup
enabled (set at boot using Grub).
Booted with:
Oct 17 00:07:45 localhost kernel: [0.00] Command line:
root=UUID=3ad27d04-4ecf-493d-bb19-4710c3caf924 ro console=hvc0
noautogroup
uname -a
Linux
I think that pretty surely is the reason that Arch is different. That
flag will cause Xen to use its own paravirtualized spinlocks which I am
suspecting of causing the problems (in some way related to having tasks
in taskgroups which autogroup does automatically to happen). This seems,
when the
I would be very surprised if there was anything Ubuntu specific in the
autogroup area. The whole kernel source is mainly what is upstream.
There are a few additional drivers, but really only a couple and I don't
think they get used here. One thing might be noteworthy, if 3.5.6-1-arch
translates
Sorry, right now v3.5.6 seems to have no 64bit packages. Working on
it...
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1011792
Title:
Kernel lockup running 3.0.0 and 3.2.0 on multiple EC2 instance
Also I found that simply installing actually does not yet work because
the pvgrub won't pick it up. Though it is simple a matter of editing
/boot/grub/menu.lst and replacing the 3.2.0-x-generic (for example) of
vmlinuz and initrd (in the first boot entry) by the new version number.
--
You
Re: pv-grub, yeah. The hook that updates the menu.lst is pretty ugly,
and specifically looks for *-virtual kernels, assuming that no other
kernel could possibly support Xen. Why not grep for the right option in
the installed config-$(uname -r)?
Anyway, 3.5.6 -might- fix it, but I only see one
At least for me, trying mainline v3.5.6-quantal has the same result as
before (getting the lockups). Currently having a run with the same
kernel but noautogroup as kernel command line which survives so far
but the DB is just at about 550M.
--
You received this bug notification because you are a
I suspected the same commit in v3.5.6 but neither picking that one nor
actually the full set seems good. The noautogroup run still runs...
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1011792
Title:
My next step(s) will be to convert the current PV setup into a HVM and
try to see whether this also breaks. In case it does I would also run
this HVM disks from a KVM host. I hope to see whether this behaviour is
related to PV, Xen in general or maybe a completely generic problem.
--
You
Seems this is tied to a PV guest. The HVM guest using the exactly same
installation, memory, VCPUs and all did not show any issues. But there
is one big difference here. Only the PV guest uses the paravirtualized
spinlocks (which I think, do allow a bit of nestedness). So right now I
would narrow
Just as a note, we run the same workload with no issues using Precise on
HVMs, it only reproduces on PV in production, so your findings match our
experience.
Just double checking because I might have missed something--is there an
Ubuntu based setup with auto groups off that doesn't freeze up?
--
Mike, I've been successfully running this configuration:
hi1.4xlarge instance in VPC
ami-eafa5883, official Ubuntu 12 PV instance-store AMI
linux 3.2.0-31-virtual (picked up on upgrade), booting with grub kernel option
noautogroup
These are running a write-heavy mysql replica load to XFS/MD/LVM
Stefan, here's the Arch Linux kernel package tree:
http://projects.archlinux.org/svntogit/packages.git/tree/linux/repos
/core-x86_64?id=6b8ed4e6660afe873aef3a207b187c5eb124c855
They build from a release tarball on kernel.org with only 4 patches
applied (none of which touch scheduling or anything
OK, with a few more iterations over kernel configs, it looks like it all
comes down to this:
--- config-3.2.0-31-virtual 2012-10-10 01:02:10.0 +
+++ config-3.2.0-31-virtual-noautogroup 2012-10-11 01:33:14.886307000 +
@@ -144,7 +144,7 @@
CONFIG_USER_NS=y
CONFIG_PID_NS=y
Steven, thank you for narrowing down the problem. I am now testing
noautogroup on some test database instances where I've been
experiencing about a 20% failure rate per day.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
Rick, I was using a Xen guest with 8 cores and ~15G of memory. Host was
a CentOs 5.(6 I believe) with Xen 3.4.3. But I also saw it happen when
the same host runs Precise with Xen 4.1.2.
Stephen, now that is very interesting info. So if the kernel commandline
would help but not the sysctl, that
Out of curiosity I tried an Arch Linux instance (running Linux
3.5.6-1-ARCH), which also has CONFIG_SCHED_AUTOGROUP:
# zgrep AUTOGROUP /proc/config.gz
CONFIG_SCHED_AUTOGROUP=y
I ran the same pgslam workload on it, and it filled 64G of the
/var/lib/postgres md-raid before I stopped it. This
It looks like Quantal has the same issue. Thing didn't even fill more
than 100MB before deadlocking.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1011792
Title:
Kernel lockup running 3.0.0 and
Stefan, the kernel version in the Amazon Linux AMI that Matt pointed at
is 3.2.21-1.32.6.amzn1.x86_64, so it is very close to comparable with
the affected Ubuntu kernel (yes, there are source differences, but they
at least have a merge base of 3.2.21 so they share significant lineage).
I was able
Stefan,
We threw the debug kernel into production and it ended up dying after a
few hours so it's highly likely you're right about it just slowing down
the race behavior.
What was the setup you used to repro the bug on your side with the
scripts I gave you?
Rick
--
You received this bug
Matt, and what kernel version would the Amazon Linux AMI be?
Rick, that could in the most annoying way confirm a race somewhere as we
suspected. Add some code that makes things slower and it becomes rare or
goes way completely.
--
You received this bug notification because you are a member of
We've ran it all the way up to 334GB on the debug kernel and it's fine.
Early next week we're going to deploy the debug kernel onto a production
host.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
Rick, thanks for the scripts. The good news is that I seem to be able to
reproduce the lockup on a local machine with less memory and CPUs (not
having a really big box at my hands). Now I should be able to get more
info out of that (given a bit of time).
--
You received this bug notification
For what it's worth, I started running this test case on the Amazon
Linux AMI (ami-aecd60c7) yesterday. It hasn't crashed. The DB is now 96
GiB.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1011792
We've been able to reproduce the bug in a more isolated environment.
I wrote a Python script (pgslam.py) that generates the (correct enough)
similar load to our production traffic. In addition, I wrote a bash
script that will setup a hi1.4xlarge EC2 instance to reproduce the
issue. During the
** Attachment added: pgslam.py
https://bugs.launchpad.net/ubuntu/+source/linux-lts-backport-oneiric/+bug/1011792/+attachment/3324324/+files/pgslam.py
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
** Attachment added: setup_hi1_4xlarge_for_crash_test.sh
https://bugs.launchpad.net/ubuntu/+source/linux-lts-backport-oneiric/+bug/1011792/+attachment/3324325/+files/setup_hi1_4xlarge_for_crash_test.sh
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is
FYI, the above repro was done with ami-3c994355.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1011792
Title:
Kernel lockup running 3.0.0 and 3.2.0 on multiple EC2 instance types
To manage
Stefan -- we are trying to repro with the debug kernel image you linked.
So far it survived the initial one minute death, but we'll run the test
for several more hours and see if it still crashes. Right now the
database isn't very large, so it's not doing any read I/O. Once the
database starts to
We've tried both the autogroups disabled and a kernel with
sched_clock_stable=0 forced. Both crashed. Going to try a lock debugging
kernel next.
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1011792
(I work with Mike Krieger.)
Our quick fix for this was to use HVM guests.
Unfortunately the only way for us to reproduce this bug at the moment is
to put a database box under production load, so we have to be judicious
in our approach. It also takes several hours for us to bring up a new
guest
This has been quiet for a while. Was there chance to try the debug
kernels or with autogroup disabled?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1011792
Title:
Kernel lockup running 3.0.0 and
It is hard to say for sure. Indeed we have the same jump of the time. Though
the problem on the lkml report seemed more like one task going into schedule
and never really getting scheduled. While here it looks like a real locking
issue. All CPUs (except CPU1) look to be waiting on some
** Tags added: kernel-da-key
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1011792
Title:
Kernel lockup running 3.0.0 and 3.2.0 on multiple EC2 instance types
To manage notifications about this
** Tags added: kernel-key
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1011792
Title:
Kernel lockup running 3.0.0 and 3.2.0 on multiple EC2 instance types
To manage notifications about this bug
@Matt, when you produce those cpu stacktraces, how do you do that? Is
that from a dump or somehow tapping into the still running instance?
@smb, these are traces from running, but unresponsive, instances. I pull
the traces from the vCPU context in the hypervisor, then resolve symbols
from the
Hi Mike,
could you post the dmesg of that instance? Or actually if it is running
for a while, boot messages may be gone from the ring buffer. Probably
sudo grep -r . /sys/hypervisor in the guest is good enough.
So the issue was already there with Natty (2.6.38) but happens more
often since
In the hope that maybe this catches something I put some Precise kernels
to http://people.canonical.com/~smb/lp1011792/. Those have lock
debugging enabled. If you could install the virtual packages (the extras
package is not required) to one Precise ami and let it run. If that
locks up without any
We see the same clock jump issues and lockup issues as in this thread:
http://lists.xen.org/archives/html/xen-devel/2012-04/msg00888.html
Could it be related?
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
Hi Stefan,
Yes, the same instance that froze (collected it after a reboot).
- looking at the same instance type, does it happen on all of them
sooner or later or are there exceptions?
There is one of our instances of that type that is under the same load
but hasn't frozen in weeks. Since
One more note--I tried stressing out the same instance using bonnie++
and some CPU burning yes processes, but it stood up okay. It does seem
somehow related to throwing heavy read traffic at PostgreSQL on these
instances.
--
You received this bug notification because you are a member of Ubuntu
Thanks, Mike for the details. Just to make sure, you collected the info
from the same instance that locked up (either before or after a reboot)?
That would make sure that whatever information about the host is really
belonging to the host where the problem happened.
As for more details, not right
1 - 100 of 105 matches
Mail list logo