We always do a burn in and memory test before putting systems in production. 
Should a system exhibit multiple crashes, we revert back to a know stable 
kernel version. If that doesn't solve the problem, we move the virtual machines 
off and re-test the hardware. We are also talking about dozens of identical 
systems, which exhibit the same behaviour, which is also the reason why it is 
so frustrating.  We never know if the new version will be stable or not.
I know that memory errors are a common cause for such problems and I would 
completely agree to do a memtest86 again, but
the point is, before we upgraded to the kernels mentioned above, the systems 
have been stable, some for more than a year (if you discount the necessary 
reboot after 208 days because of the overflow of the sched clock). 

I haven't given up on -40 yet, but the best uptime we have ome up so far
is 8 days, and we can only have so many crashes before have to go back
to a stable version (currently 2.6.32-38 or 3.0.0-15) in order to keep
customers happy.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/809313

Title:
  mcelog errors and server freeze with qemu-kvm 0.12.3 and linux-
  image-2.6.32-32-server

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/809313/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to