Noah,

thanks for testing and reporting the results. The first thing to do now, is to 
decide whether v1 or v3 should be the goal. v1 could be considered well tested 
by now. The downside I see with that, is that to avoid some problems on certain 
older hypervisor code, this uses real spinning spinlocks. Which means while 
waiting for a lock, the virtual cpu will busily wait (which could have some 
impact on the cloud hosts cpu usage. Also this gives no queuing, which means 
that getting the lock can be unfair in contented situations.
The v3 kernel would in principle use the same implementation, which could 
theoretically be the wrong thing on older hypervisor versions (though the 
chance to have an instance launched on such an older host version is likely to 
get smaller every day). At least it is the same risk as we have now and the 
lockups happened on newer hypervisor versions. So I would tend towards the v3 
solution but for that it would be good to have more hours testing with v3 to 
see it is not showing other problems that might be related to this change.

Normally the process to get a change into an official kernel means to
propose it for SRU (stable release update), I will propose the patches
for inclusion and when accepted those get into a proposed kernel.
Normally those are prepared and made available and then verification has
to be done within a week. Which is not working with a bug like this. But
if there is a reasonable confidence that a test kernel has been running
on your busy instances without the original issue and new stability
problems, this should be a good argument.

Since the time I build the current v3 kernels there have been other
updates, too. So I would go ahead and prepare a new set of those. I will
post here when those are ready. If you then could start migrating your
instances to those and report back here when you feel confident about
the stability. Then I would start the steps required to integrate the
changes into the official kernels.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/929941

Title:
  Kernel deadlock in scheduler on multiple EC2 instance types

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-ec2/+bug/929941/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to