Hi, When repeated 'node standby' and 'node online', lrmd crashed with SIGSEGV because "op->id" in cancel_recurring_action() was NULL.
Dec 17 19:01:21 vm3 crmd[2433]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Dec 17 19:01:21 vm3 crmd[2433]: info: do_te_invoke: Processing graph 437 (ref=pe_calc-dc-1387274481-5672) derived from /var/lib/pacemaker/pengine/pe-input-437.bz2 Dec 17 19:01:21 vm3 crmd[2433]: notice: te_rsc_command: Initiating action 17: stop prmStonith4_stop_0 on vm3 (local) Dec 17 19:01:21 vm3 crmd[2433]: info: do_lrm_rsc_op: Performing key=17:437:0:40d7b9a2-c373-4459-a811-9c225d1a9555 op=prmStonith4_stop_0 Dec 17 19:01:21 vm3 lrmd[2430]: info: log_execute: executing - rsc:prmStonith4 action:stop call_id:3487 Dec 17 19:01:21 vm3 stonith-ng[2429]: info: stonith_command: Processed st_device_remove from lrmd.2430: OK (0) Dec 17 19:01:21 vm3 lrmd[2430]: info: log_finished: finished - rsc:prmStonith4 action:stop call_id:3487 exit-code:0 exec-time:0ms queue-time:0ms Dec 17 19:01:21 vm3 pengine[2432]: notice: process_pe_message: Calculated Transition 437: /var/lib/pacemaker/pengine/pe-input-437.bz2 Dec 17 19:01:21 vm3 crmd[2433]: notice: te_rsc_command: Initiating action 33: stop prmPg_stop_0 on vm3 (local) Dec 17 19:01:21 vm3 lrmd[2430]: info: cancel_recurring_action: Cancelling operation prmPg_monitor_10000 Dec 17 19:01:21 vm3 crmd[2433]: info: do_lrm_rsc_op: Performing key=33:437:0:40d7b9a2-c373-4459-a811-9c225d1a9555 op=prmPg_stop_0 Dec 17 19:01:21 vm3 lrmd[2430]: info: log_execute: executing - rsc:prmPg action:stop call_id:3489 Dec 17 19:01:21 vm3 crmd[2433]: info: process_lrm_event: LRM operation prmStonith4_monitor_3600000 (call=3473, status=1, cib-update=0, confirmed=true) Cancelled Dec 17 19:01:21 vm3 crmd[2433]: notice: process_lrm_event: LRM operation prmStonith4_stop_0 (call=3487, rc=0, cib-update=3090, confirmed=true) ok Dec 17 19:01:21 vm3 crmd[2433]: info: process_lrm_event: LRM operation prmPg_monitor_10000 (call=3485, status=1, cib-update=0, confirmed=true) Cancelled Dec 17 19:01:21 vm3 crmd[2433]: info: match_graph_event: Action prmStonith4_stop_0 (17) confirmed on vm3 (rc=0) Dec 17 19:01:21 vm3 crmd[2433]: notice: te_rsc_command: Initiating action 40: stop prmPing_stop_0 on vm3 (local) Dec 17 19:01:21 vm3 cib[2428]: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=local/crmd/3090, version=0.440.2) Dec 17 19:01:21 vm3 stonith-ng[2429]: info: crm_client_destroy: Destroying 0 events Dec 17 19:01:21 vm3 pacemakerd[2424]: error: child_death_dispatch: Managed process 2430 (lrmd) dumped core Dec 17 19:01:21 vm3 pacemakerd[2424]: notice: pcmk_child_exit: Child process lrmd terminated with signal 11 (pid=2430, core=1) Dec 17 19:01:21 vm3 pacemakerd[2424]: notice: pcmk_process_exit: Respawning failed child process: lrmd Dec 17 19:01:21 vm3 pacemakerd[2424]: error: pcmk_process_exit: Rebooting system Dec 17 19:10:40 vm3 root: Mark:pcmk:1387275040 $ gdb /usr/libexec/pacemaker/lrmd core.2430 (gdb) bt #0 0x000000323f8480ac in vfprintf () from /lib64/libc.so.6 #1 0x000000323f86f9d2 in vsnprintf () from /lib64/libc.so.6 #2 0x0000003fcb81726d in qb_log_real_va_ (cs=0x3fcf208658, ap=0x7ffff6f5fc80) at log.c:230 #3 0x0000003fcb8173ea in qb_log_real_ (cs=0x3fcf208658) at log.c:255 #4 0x0000003fcf003a9c in cancel_recurring_action (op=0xb9fae0) at services.c:356 #5 0x0000003fcf003bc6 in services_action_cancel (name=0xb9f350 "prmPing", action=0xb9ee90 "monitor", interval=10000) at services.c:381 #6 0x0000000000406595 in cancel_op (rsc_id=0xb9f350 "prmPing", action=0xb9ee90 "monitor", interval=10000) at lrmd.c:1197 #7 0x00000000004067aa in process_lrmd_rsc_cancel (client=0xb926c0, id=7030, request=0xb95ad0) at lrmd.c:1261 #8 0x0000000000406a51 in process_lrmd_message (client=0xb926c0, id=7030, request=0xb95ad0) at lrmd.c:1300 #9 0x0000000000402a06 in lrmd_ipc_dispatch (c=0xb91af0, data=0x7f9f30acbc08, size=362) at main.c:141 #10 0x0000003fcb8126f8 in _process_request_ (c=0xb91af0, ms_timeout=10) at ipcs.c:698 #11 0x0000003fcb812ad5 in qb_ipcs_dispatch_connection_request (fd=5, revents=1, data=0xb91af0) at ipcs.c:801 #12 0x0000003fcc0327b1 in gio_read_socket (gio=0xb92880, condition=G_IO_IN, data=0xb91138) at mainloop.c:437 #13 0x0000003fc9c3feb2 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #14 0x0000003fc9c43d68 in ?? () from /lib64/libglib-2.0.so.0 #15 0x0000003fc9c44275 in g_main_loop_run () from /lib64/libglib-2.0.so.0 #16 0x00000000004030cc in main (argc=1, argv=0x7ffff6f606c8) at main.c:314 Although I'm investigating the cause, I have not discovered yet... Because size was big, I put crm_report here. https://drive.google.com/file/d/0B9eNn1AWfKD4WGY5bllMQW1BbDA/edit?usp=sharing Best Regards, Kazunori INOUE _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org