I am seeing this problem on heavily utilized HP G9 servers, specifically when the raid battery fails. The battery will fail, the chassis will print a similar error in dmesg, and the heavily utilized process (in this case Java/Elastic search) will stop responding. A reboot recovers the host, the HP utilities then report the raid battery is bad. We swap the battery, reboot the host, and it recovers.
WE've had this occurs 6 different times on 6 different boxes. The HP G9's are known to have troublesome raid batteries, I just didn't expect the kernel to have such an issue with it. [Tue Oct 17 01:06:15 2017] /build/linux-EO9xOi/linux-4.4.0/mm/pgtable- generic.c:33: bad pmd ffff88156c4320b8(000000212e6009e2) uname -a Linux xx 4.4.0-59-generic #80-Ubuntu SMP Fri Jan 6 17:47:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux cat /etc/issue Ubuntu 16.04.1 LTS \n \l -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1644056 Title: kernel BUG at /build/linux-lts-xenial-gUF4JR/linux-lts- xenial-4.4.0/mm/huge_memory.c:1931! To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1644056/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
