On Wed, Oct 27, 2010 at 12:36 PM, Andrew Beekhof <[email protected]> wrote: > On Thu, Oct 21, 2010 at 10:30 AM, <[email protected]> wrote: >> Hi, >> >> We confirmed movement when we set freeze in no-quorum-policy. >> In the cluster that freeze setting became effective, we stopped the service. >> >> However, a stop of the service took time very much. >> >> We set "shutdown-escalation" for five minutes to shorten the time for test. >> But, a stop of the service of one node takes time more than five minutes. >> >> I confirmed it in the next procedure. >> >> Step1) Start four nodes and send cib.xml. >> Step2) Intercept Heartbeat communication and divide it in two nodes. >> Step3) The node does freeze. >> Step4) In two divided one nodes, we stop Hearbeat at the same time. >> >> [r...@srv03 ~]# service heartbeat stop >> Stopping High-Availability services: >> [r...@srv04 ~]# service heartbeat stop >> Stopping High-Availability services: >> >> Step5) Heartbeat of one node stops in a few minutes. >> [r...@srv04 ~]# service heartbeat stop >> Stopping High-Availability services: [ OK ] >> >> Step6) But, Heartbeat of one node does not stop anymore unless, furthermore, >> time passes. >> * The timer of shutdown-escalation starts, but time when we set it(5min) >> does not seem to become >> effective. >> >> [r...@srv03 ~]# service heartbeat stop >> Stopping High-Availability services: [ OK ] >> >> Oct 21 16:46:57 srv03 crmd: [4432]: info: do_shutdown_req: Sending shutdown >> request to DC: srv03 >> Oct 21 16:46:57 srv03 crmd: [4432]: info: handle_shutdown_request: Creating >> shutdown request for srv03 >> (state=S_IDLE) >> Oct 21 16:53:07 srv03 cib: [4428]: info: cib_stats: Processed 805 operations >> (38149.00us average, 5% >> utilization) in the last 10min >> Oct 21 16:57:20 srv03 crmd: [4432]: ERROR: crm_timer_popped: Shutdown >> Escalation (I_STOP) just popped! >> Oct 21 16:57:20 srv03 crmd: [4432]: ERROR: do_log: FSA: Input I_STOP from >> crm_timer_popped() received >> in state S_IDLE >> Oct 21 16:57:20 srv03 crmd: [4432]: info: do_state_transition: State >> transition S_IDLE -> S_STOPPING [ >> input=I_STOP cause=C_TIMER_POPPED origin=crm_timer_popped ] >> Oct 21 16:57:20 srv03 crmd: [4432]: info: do_dc_release: DC role released >> Oct 21 16:57:20 srv03 crmd: [4432]: info: stop_subsystem: Sent -TERM to >> pengine: [5007] >> >> >> Is it right movement to take time to this service stop? > > It's what I would expect to happen, but its possibly not ideal.
Wait, I think I read that wrong. I would expect that no-matter what that pacemaker would exit after shutdown-escalation. You're saying it didn't? Better create a bug and attach the logs. > >> * Because the log was very big, I did not attach it. >> * If log is necessary, I send it in Bugzilla. >> >> Best Regards, >> Hideo Yamauchi. >> >> >> _______________________________________________ >> Pacemaker mailing list: [email protected] >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >> > _______________________________________________ Pacemaker mailing list: [email protected] http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
