Hi!

I want to keep you updated: The problem isn't fixed, still, so I
I'm running this simple script via cron to avoid uncontrolled kernel panic:
---snip---
#!/usr/bin/sh
# Detect RAM corruption. If detected log a message and reboot
# to prevent kernel panic

#cron jobs need a PATH
PATH=/sbin:/usr/sbin:/usr/bin:/bin
if journalctl -b -g 'Code: Bad RIP value|BUG: Bad rss-counter state mm:' 
>/dev/null
then
    MSG='RAM corruption detected, starting pro-active reboot'
    logger -t reboot-before-panic -p local0.notice "$MSG"
    shutdown -r +1 "$MSG"
fi
---

Still I suspect it might be related to snapshots being made. After a few days 
of running the problems started again like this:
Mar 26 23:00:01 h19 systemd[1]: Started Timeline of Snapper Snapshots.
Mar 26 23:00:01 h19 dbus-daemon[5700]: [system] Activating via systemd: service 
name='org.opensuse.Snapper' unit='snapperd.service' requested by ':1.343' 
(uid=0 pid=11200 comm="/usr/lib/snapper/systemd-helper --timeline ")
Mar 26 23:00:01 h19 systemd[1]: Starting DBus interface for snapper...
Mar 26 23:00:01 h19 dbus-daemon[5700]: [system] Successfully activated service 
'org.opensuse.Snapper'
Mar 26 23:00:01 h19 systemd[1]: Started DBus interface for snapper.
Mar 26 23:00:01 h19 systemd[1]: snapper-timeline.service: Succeeded.
Mar 26 23:00:01 h19 systemd[1]: Created slice Slice /system/systemd-coredump.
Mar 26 23:00:01 h19 systemd[1]: Started Process Core Dump (PID 11227/UID 0).
Mar 26 23:00:01 h19 systemd-coredump[11231]: Process 11226 (run-crons) of user 
0 dumped core.

                                                  Stack trace of thread 11226:
                                                  #0  0x00007f89ff9dacdb raise 
(libc.so.6 + 0x4acdb)
                                                  #1  0x00007f89ff9dc324 abort 
(libc.so.6 + 0x4c324)
                                                  #2  0x00007f89ffa20b07 
__libc_message (libc.so.6 + 0x90b07)
                                                  #3  0x00007f89ffa28b8a 
malloc_printerr (libc.so.6 + 0x98b8a)
                                                  #4  0x00007f89ffa2a634 
_int_free (libc.so.6 + 0x9a634)
                                                  #5  0x000055c998de3963 
command_substitute (bash + 0x9f963)
                                                  #6  0x000055c998ddb380 n/a 
(bash + 0x97380)
                                                  #7  0x000055c998ddda57 n/a 
(bash + 0x99a57)
                                                  #8  0x000055c998ddcb94 n/a 
(bash + 0x98b94)
                                                  #9  0x000055c998dc8955 n/a 
(bash + 0x84955)
                                                  #10 0x000055c998dc756d 
execute_command_internal (bash + 0x8356d)
                                                  #11 0x000055c998dc86e1 
execute_command (bash + 0x846e1)
                                                  #12 0x000055c998dc76fd 
execute_command_internal (bash + 0x836fd)
                                                  #13 0x000055c998dc86e1 
execute_command (bash + 0x846e1)
                                                  #14 0x000055c998dc8516 
execute_command_internal (bash + 0x84516)
                                                  #15 0x000055c998dc773c 
execute_command_internal (bash + 0x8373c)
                                                  #16 0x000055c998dc86e1 
execute_command (bash + 0x846e1)
                                                  #17 0x000055c998dc8007 
execute_command_internal (bash + 0x84007)
                                                  #18 0x000055c998dc86e1 
execute_command (bash + 0x846e1)
                                                  #19 0x000055c998dbce2b 
reader_loop (bash + 0x78e2b)
                                                  #20 0x000055c998dbcabc main 
(bash + 0x78abc)
                                                  #21 0x00007f89ff9c52bd 
__libc_start_main (libc.so.6 + 0x352bd)
                                                  #22 0x000055c998df729a _start 
(bash + 0xb329a)
Mar 26 23:00:01 h19 systemd[1]: systemd-coredump@0-11227-0.service: Succeeded.
Mar 26 23:00:01 h19 kernel: BUG: Bad rss-counter state mm:00000000acc74328 
idx:1 val:14
Mar 26 23:01:01 h19 systemd[1]: snapperd.service: Succeeded.
Mar 26 23:05:01 h19 reboot-before-panic[12356]: RAM corruption detected, 
starting pro-active reboot

Regards,
Ulrich


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to