Hi Andrew,
> Wait, I think I read that wrong. > I would expect that no-matter what that pacemaker would exit after > shutdown-escalation. > > You're saying it didn't? > Better create a bug and attach the logs. At the time of Step4, srv03,srv04 requested a stop of the Heartbeat service. To see log, the request of the stop of srv03 is considered to be it at 16:46:57. Because I set "shutdown-escalation" for five minutes, I thought that the srv03 node stopped at about 16:52:00. But, the srv03 node started a stop at 16:57:20. Is understanding of my "shutdown-escalation" wrong? > Better create a bug and attach the logs. ok. Please wait.... Best Regards, Hideo Yamauchi. > >> Oct 21 16:46:57 srv03 crmd: [4432]: info: do_shutdown_req: Sending > >> shutdown request to DC: > srv03 > >> Oct 21 16:46:57 srv03 crmd: [4432]: info: handle_shutdown_request: > >> Creating shutdown request > for srv03 > >> (state=S_IDLE) > >> Oct 21 16:53:07 srv03 cib: [4428]: info: cib_stats: Processed 805 > >> operations (38149.00us > average, 5% > >> utilization) in the last 10min > >> Oct 21 16:57:20 srv03 crmd: [4432]: ERROR: crm_timer_popped: Shutdown > >> Escalation (I_STOP) > just popped! --- Andrew Beekhof <[email protected]> wrote: > On Wed, Oct 27, 2010 at 12:36 PM, Andrew Beekhof <[email protected]> wrote: > > On Thu, Oct 21, 2010 at 10:30 AM, �<[email protected]> > > wrote: > >> Hi, > >> > >> We confirmed movement when we set freeze in no-quorum-policy. > >> In the cluster that freeze setting became effective, we stopped the > >> service. > >> > >> However, a stop of the service took time very much. > >> > >> We set "shutdown-escalation" for five minutes to shorten the time for test. > >> But, a stop of the service of one node takes time more than five minutes. > >> > >> I confirmed it in the next procedure. > >> > >> Step1) Start four nodes and send cib.xml. > >> Step2) Intercept Heartbeat communication and divide it in two nodes. > >> Step3) The node does freeze. > >> Step4) In two divided one nodes, we stop Hearbeat at the same time. > >> > >> [r...@srv03 ~]# service heartbeat stop > >> Stopping High-Availability services: > >> [r...@srv04 ~]# service heartbeat stop > >> Stopping High-Availability services: > >> > >> Step5) Heartbeat of one node stops in a few minutes. > >> [r...@srv04 ~]# service heartbeat stop > >> Stopping High-Availability services: � � � � > >> � � � � � � � [ �OK �] > >> > >> Step6) But, Heartbeat of one node does not stop anymore unless, > >> furthermore, time passes. > >> �* The timer of shutdown-escalation starts, but time when we set > >> it(5min) does not seem to > become > >> effective. > >> > >> [r...@srv03 ~]# service heartbeat stop > >> Stopping High-Availability services: � � � � > >> � � � � � � � [ �OK �] > >> > >> Oct 21 16:46:57 srv03 crmd: [4432]: info: do_shutdown_req: Sending > >> shutdown request to DC: > srv03 > >> Oct 21 16:46:57 srv03 crmd: [4432]: info: handle_shutdown_request: > >> Creating shutdown request > for srv03 > >> (state=S_IDLE) > >> Oct 21 16:53:07 srv03 cib: [4428]: info: cib_stats: Processed 805 > >> operations (38149.00us > average, 5% > >> utilization) in the last 10min > >> Oct 21 16:57:20 srv03 crmd: [4432]: ERROR: crm_timer_popped: Shutdown > >> Escalation (I_STOP) > just popped! > >> Oct 21 16:57:20 srv03 crmd: [4432]: ERROR: do_log: FSA: Input I_STOP from > >> crm_timer_popped() > received > >> in state S_IDLE > >> Oct 21 16:57:20 srv03 crmd: [4432]: info: do_state_transition: State > >> transition S_IDLE -> > S_STOPPING [ > >> input=I_STOP cause=C_TIMER_POPPED origin=crm_timer_popped ] > >> Oct 21 16:57:20 srv03 crmd: [4432]: info: do_dc_release: DC role released > >> Oct 21 16:57:20 srv03 crmd: [4432]: info: stop_subsystem: Sent -TERM to > >> pengine: [5007] > >> > >> > >> Is it right movement to take time to this service stop? > > > > It's what I would expect to happen, but its possibly not ideal. > > Wait, I think I read that wrong. > I would expect that no-matter what that pacemaker would exit after > shutdown-escalation. > > You're saying it didn't? > Better create a bug and attach the logs. > > > > >> �* Because the log was very big, I did not attach it. > >> �* If log is necessary, I send it in Bugzilla. > >> > >> Best Regards, > >> Hideo Yamauchi. > >> > >> > >> _______________________________________________ > >> Pacemaker mailing list: [email protected] > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > >> Project Home: http://www.clusterlabs.org > >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> Bugs: > >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > >> > > > > _______________________________________________ > Pacemaker mailing list: [email protected] > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: [email protected] http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
