Hi! Inspecting the logs after the cluster had rebalanced resources, I'm wondering: It looks as if pacemaker-controld does log a success message when a local migration succeeded, but not if a remote one did.
Actions planned: Migrate prm_xen_test-jeos1 ( h16 -> h18 ) Migrate prm_xen_test-jeos2 ( h18 -> h19 ) Migrate prm_xen_test-jeos4 ( h16 -> h18 ) Like this (h19 being DC): h19 pacemaker-controld[7539]: notice: Initiating migrate_from operation prm_xen_test-jeos2_migrate_from_0 locally on h19 h19 pacemaker-controld[7539]: notice: Result of migrate_from operation for prm_xen_test-jeos2 on h19: ok But for a non-local migration it looks quite different: h19 pacemaker-controld[7539]: notice: Initiating migrate_to operation prm_xen_test-jeos1_migrate_to_0 on h16 h19 pacemaker-controld[7539]: notice: Initiating migrate_to operation prm_xen_test-jeos4_migrate_to_0 on h16 h19 pacemaker-controld[7539]: notice: Initiating migrate_from operation prm_xen_test-jeos4_migrate_from_0 on h18 h19 pacemaker-controld[7539]: notice: Initiating migrate_from operation prm_xen_test-jeos1_migrate_from_0 on h18 h19 pacemaker-controld[7539]: notice: Initiating stop operation prm_xen_test-jeos4_stop_0 on h16 h19 pacemaker-controld[7539]: notice: Initiating stop operation prm_xen_test-jeos1_stop_0 on h16 h19 pacemaker-controld[7539]: notice: Initiating monitor operation prm_xen_test-jeos4_monitor_600000 on h18 h19 pacemaker-controld[7539]: notice: Initiating monitor operation prm_xen_test-jeos1_monitor_600000 on h18 h19 pacemaker-controld[7539]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE (no more messages for a while) On the remote nodes I see: h18 pacemaker-controld[7131]: notice: Result of migrate_from operation for prm_xen_test-jeos4 on h18: ok h18 pacemaker-controld[7131]: notice: Result of migrate_from operation for prm_xen_test-jeos1 on h18: ok h16 pacemaker-controld[7223]: notice: Result of migrate_to operation for prm_xen_test-jeos4 on h16: ok h16 pacemaker-controld[7223]: notice: Result of migrate_to operation for prm_xen_test-jeos1 on h16: ok The other thing I noticed is that a start operation is logged with execution time like this: h19 pacemaker-execd[7536]: notice: prm_cron_snap_test-jeos2 start (call 170, PID 11079) exited with status 0 (execution time 55ms, queue time 0ms) But for a migration there is no such timing info (logged by the DC; it's logged on the remote node). Considering that migrations can time-out, wouldn't that be a useful info to have as well? Regards, Ulrich _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/