On Thu, Oct 28, 2010 at 3:11 AM, <[email protected]> wrote: > Hi Andrew, > > >> Wait, I think I read that wrong. >> I would expect that no-matter what that pacemaker would exit after >> shutdown-escalation. >> >> You're saying it didn't? >> Better create a bug and attach the logs. > > At the time of Step4, srv03,srv04 requested a stop of the Heartbeat service. > > To see log, the request of the stop of srv03 is considered to be it at > 16:46:57. > > Because I set "shutdown-escalation" for five minutes, I thought that the > srv03 node stopped at about > 16:52:00. > > But, the srv03 node started a stop at 16:57:20. > > Is understanding of my "shutdown-escalation" wrong?
I don't think so, I think you probably found a bug. > >> Better create a bug and attach the logs. > > ok. > Please wait.... > > Best Regards, > Hideo Yamauchi. > >> >> Oct 21 16:46:57 srv03 crmd: [4432]: info: do_shutdown_req: Sending >> >> shutdown request to DC: >> srv03 >> >> Oct 21 16:46:57 srv03 crmd: [4432]: info: handle_shutdown_request: >> >> Creating shutdown request >> for srv03 >> >> (state=S_IDLE) >> >> Oct 21 16:53:07 srv03 cib: [4428]: info: cib_stats: Processed 805 >> >> operations (38149.00us >> average, 5% >> >> utilization) in the last 10min >> >> Oct 21 16:57:20 srv03 crmd: [4432]: ERROR: crm_timer_popped: Shutdown >> >> Escalation (I_STOP) >> just popped! > > > > --- Andrew Beekhof <[email protected]> wrote: > >> On Wed, Oct 27, 2010 at 12:36 PM, Andrew Beekhof <[email protected]> wrote: >> > On Thu, Oct 21, 2010 at 10:30 AM, �<[email protected]> >> > wrote: >> >> Hi, >> >> >> >> We confirmed movement when we set freeze in no-quorum-policy. >> >> In the cluster that freeze setting became effective, we stopped the >> >> service. >> >> >> >> However, a stop of the service took time very much. >> >> >> >> We set "shutdown-escalation" for five minutes to shorten the time for >> >> test. >> >> But, a stop of the service of one node takes time more than five minutes. >> >> >> >> I confirmed it in the next procedure. >> >> >> >> Step1) Start four nodes and send cib.xml. >> >> Step2) Intercept Heartbeat communication and divide it in two nodes. >> >> Step3) The node does freeze. >> >> Step4) In two divided one nodes, we stop Hearbeat at the same time. >> >> >> >> [r...@srv03 ~]# service heartbeat stop >> >> Stopping High-Availability services: >> >> [r...@srv04 ~]# service heartbeat stop >> >> Stopping High-Availability services: >> >> >> >> Step5) Heartbeat of one node stops in a few minutes. >> >> [r...@srv04 ~]# service heartbeat stop >> >> Stopping High-Availability services: � � � � >> >> � � > � � � � � [ �OK �] >> >> >> >> Step6) But, Heartbeat of one node does not stop anymore unless, >> >> furthermore, time passes. >> >> �* The timer of shutdown-escalation starts, but time when we set >> >> it(5min) does not seem to >> become >> >> effective. >> >> >> >> [r...@srv03 ~]# service heartbeat stop >> >> Stopping High-Availability services: � � � � >> >> � � > � � � � � [ �OK �] >> >> >> >> Oct 21 16:46:57 srv03 crmd: [4432]: info: do_shutdown_req: Sending >> >> shutdown request to DC: >> srv03 >> >> Oct 21 16:46:57 srv03 crmd: [4432]: info: handle_shutdown_request: >> >> Creating shutdown request >> for srv03 >> >> (state=S_IDLE) >> >> Oct 21 16:53:07 srv03 cib: [4428]: info: cib_stats: Processed 805 >> >> operations (38149.00us >> average, 5% >> >> utilization) in the last 10min >> >> Oct 21 16:57:20 srv03 crmd: [4432]: ERROR: crm_timer_popped: Shutdown >> >> Escalation (I_STOP) >> just popped! >> >> Oct 21 16:57:20 srv03 crmd: [4432]: ERROR: do_log: FSA: Input I_STOP from >> >> crm_timer_popped() >> received >> >> in state S_IDLE >> >> Oct 21 16:57:20 srv03 crmd: [4432]: info: do_state_transition: State >> >> transition S_IDLE -> >> S_STOPPING [ >> >> input=I_STOP cause=C_TIMER_POPPED origin=crm_timer_popped ] >> >> Oct 21 16:57:20 srv03 crmd: [4432]: info: do_dc_release: DC role released >> >> Oct 21 16:57:20 srv03 crmd: [4432]: info: stop_subsystem: Sent -TERM to >> >> pengine: [5007] >> >> >> >> >> >> Is it right movement to take time to this service stop? >> > >> > It's what I would expect to happen, but its possibly not ideal. >> >> Wait, I think I read that wrong. >> I would expect that no-matter what that pacemaker would exit after >> shutdown-escalation. >> >> You're saying it didn't? >> Better create a bug and attach the logs. >> >> > >> >> �* Because the log was very big, I did not attach it. >> >> �* If log is necessary, I send it in Bugzilla. >> >> >> >> Best Regards, >> >> Hideo Yamauchi. >> >> >> >> >> >> _______________________________________________ >> >> Pacemaker mailing list: [email protected] >> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> >> >> Project Home: http://www.clusterlabs.org >> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> >> Bugs: >> >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >> >> >> > >> >> _______________________________________________ >> Pacemaker mailing list: [email protected] >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >> > > > _______________________________________________ > Pacemaker mailing list: [email protected] > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: [email protected] http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
