After finally having a breakthrough in understanding the source of the lockup and further discussions upstream, the proper turns out to be to change the way waiters are woken when a spinlock gets freed. A slightly more verbose explanation of this is in the attached patch that likely goes upstream. So there is a chance that relatively soon the work-around gets replaced. I pre-compiled a current version of kernels with that change and uploaded them to (http://people.canonical.com/~smb/lp1011792/). I have been running the pgslam testcase on those without experiencing any hangs. If anybody wants to give them a early try in production that would be appreciated.
** Patch added: "0001-xen-Send-spinlock-IPI-to-all-waiters.patch" https://bugs.launchpad.net/ubuntu/+source/linux-lts-backport-oneiric/+bug/1011792/+attachment/3530204/+files/0001-xen-Send-spinlock-IPI-to-all-waiters.patch -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1011792 Title: Kernel lockup running 3.0.0 and 3.2.0 on multiple EC2 instance types To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1011792/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
