On Thu, Oct 21, 2010 at 10:30 AM, <[email protected]> wrote: > Hi, > > We confirmed movement when we set freeze in no-quorum-policy. > In the cluster that freeze setting became effective, we stopped the service. > > However, a stop of the service took time very much. > > We set "shutdown-escalation" for five minutes to shorten the time for test. > But, a stop of the service of one node takes time more than five minutes. > > I confirmed it in the next procedure. > > Step1) Start four nodes and send cib.xml. > Step2) Intercept Heartbeat communication and divide it in two nodes. > Step3) The node does freeze. > Step4) In two divided one nodes, we stop Hearbeat at the same time. > > [r...@srv03 ~]# service heartbeat stop > Stopping High-Availability services: > [r...@srv04 ~]# service heartbeat stop > Stopping High-Availability services: > > Step5) Heartbeat of one node stops in a few minutes. > [r...@srv04 ~]# service heartbeat stop > Stopping High-Availability services: [ OK ] > > Step6) But, Heartbeat of one node does not stop anymore unless, furthermore, > time passes. > * The timer of shutdown-escalation starts, but time when we set it(5min) > does not seem to become > effective. > > [r...@srv03 ~]# service heartbeat stop > Stopping High-Availability services: [ OK ] > > Oct 21 16:46:57 srv03 crmd: [4432]: info: do_shutdown_req: Sending shutdown > request to DC: srv03 > Oct 21 16:46:57 srv03 crmd: [4432]: info: handle_shutdown_request: Creating > shutdown request for srv03 > (state=S_IDLE) > Oct 21 16:53:07 srv03 cib: [4428]: info: cib_stats: Processed 805 operations > (38149.00us average, 5% > utilization) in the last 10min > Oct 21 16:57:20 srv03 crmd: [4432]: ERROR: crm_timer_popped: Shutdown > Escalation (I_STOP) just popped! > Oct 21 16:57:20 srv03 crmd: [4432]: ERROR: do_log: FSA: Input I_STOP from > crm_timer_popped() received > in state S_IDLE > Oct 21 16:57:20 srv03 crmd: [4432]: info: do_state_transition: State > transition S_IDLE -> S_STOPPING [ > input=I_STOP cause=C_TIMER_POPPED origin=crm_timer_popped ] > Oct 21 16:57:20 srv03 crmd: [4432]: info: do_dc_release: DC role released > Oct 21 16:57:20 srv03 crmd: [4432]: info: stop_subsystem: Sent -TERM to > pengine: [5007] > > > Is it right movement to take time to this service stop?
It's what I would expect to happen, but its possibly not ideal. > * Because the log was very big, I did not attach it. > * If log is necessary, I send it in Bugzilla. > > Best Regards, > Hideo Yamauchi. > > > _______________________________________________ > Pacemaker mailing list: [email protected] > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: [email protected] http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
