We always do a burn in and memory test before putting systems in production. Should a system exhibit multiple crashes, we revert back to a know stable kernel version. If that doesn't solve the problem, we move the virtual machines off and re-test the hardware. We are also talking about dozens of identical systems, which exhibit the same behaviour, which is also the reason why it is so frustrating. We never know if the new version will be stable or not. I know that memory errors are a common cause for such problems and I would completely agree to do a memtest86 again, but the point is, before we upgraded to the kernels mentioned above, the systems have been stable, some for more than a year (if you discount the necessary reboot after 208 days because of the overflow of the sched clock).
I haven't given up on -40 yet, but the best uptime we have ome up so far is 8 days, and we can only have so many crashes before have to go back to a stable version (currently 2.6.32-38 or 3.0.0-15) in order to keep customers happy. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/809313 Title: mcelog errors and server freeze with qemu-kvm 0.12.3 and linux- image-2.6.32-32-server To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/809313/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
