Re: [Pacemaker] Time to a service stop is very long.

renayama19661014 Wed, 27 Oct 2010 18:14:41 -0700

Hi Andrew,


> Wait, I think I read that wrong.
> I would expect that no-matter what that pacemaker would exit after
> shutdown-escalation.
> 
> You're saying it didn't?
> Better create a bug and attach the logs.

At the time of Step4, srv03,srv04 requested a stop of the Heartbeat service.

To see log, the request of the stop of srv03 is considered to be it at 16:46:57.

Because I set "shutdown-escalation" for five minutes, I thought that the srv03 
node stopped at about
16:52:00.

But, the srv03 node started a stop at 16:57:20.

Is understanding of my "shutdown-escalation" wrong?

> Better create a bug and attach the logs.

ok.
Please wait....

Best Regards,
Hideo Yamauchi.

> >> Oct 21 16:46:57 srv03 crmd: [4432]: info: do_shutdown_req: Sending 
> >> shutdown request to DC:
> srv03
> >> Oct 21 16:46:57 srv03 crmd: [4432]: info: handle_shutdown_request: 
> >> Creating shutdown request
> for srv03
> >> (state=S_IDLE)
> >> Oct 21 16:53:07 srv03 cib: [4428]: info: cib_stats: Processed 805 
> >> operations (38149.00us
> average, 5%
> >> utilization) in the last 10min
> >> Oct 21 16:57:20 srv03 crmd: [4432]: ERROR: crm_timer_popped: Shutdown 
> >> Escalation (I_STOP)
> just popped!



--- Andrew Beekhof <[email protected]> wrote:

> On Wed, Oct 27, 2010 at 12:36 PM, Andrew Beekhof <[email protected]> wrote:
> > On Thu, Oct 21, 2010 at 10:30 AM, &#65533;<[email protected]> 
> > wrote:
> >> Hi,
> >>
> >> We confirmed movement when we set freeze in no-quorum-policy.
> >> In the cluster that freeze setting became effective, we stopped the 
> >> service.
> >>
> >> However, a stop of the service took time very much.
> >>
> >> We set "shutdown-escalation" for five minutes to shorten the time for test.
> >> But, a stop of the service of one node takes time more than five minutes.
> >>
> >> I confirmed it in the next procedure.
> >>
> >> Step1) Start four nodes and send cib.xml.
> >> Step2) Intercept Heartbeat communication and divide it in two nodes.
> >> Step3) The node does freeze.
> >> Step4) In two divided one nodes, we stop Hearbeat at the same time.
> >>
> >> [r...@srv03 ~]# service heartbeat stop
> >> Stopping High-Availability services:
> >> [r...@srv04 ~]# service heartbeat stop
> >> Stopping High-Availability services:
> >>
> >> Step5) Heartbeat of one node stops in a few minutes.
> >> [r...@srv04 ~]# service heartbeat stop
> >> Stopping High-Availability services: &#65533; &#65533; &#65533; &#65533; 
> >> &#65533; &#65533;
&#65533; &#65533; &#65533; &#65533; &#65533; [ &#65533;OK &#65533;]
> >>
> >> Step6) But, Heartbeat of one node does not stop anymore unless, 
> >> furthermore, time passes.
> >> &#65533;* The timer of shutdown-escalation starts, but time when we set 
> >> it(5min) does not seem to
> become
> >> effective.
> >>
> >> [r...@srv03 ~]# service heartbeat stop
> >> Stopping High-Availability services: &#65533; &#65533; &#65533; &#65533; 
> >> &#65533; &#65533;
&#65533; &#65533; &#65533; &#65533; &#65533; [ &#65533;OK &#65533;]
> >>
> >> Oct 21 16:46:57 srv03 crmd: [4432]: info: do_shutdown_req: Sending 
> >> shutdown request to DC:
> srv03
> >> Oct 21 16:46:57 srv03 crmd: [4432]: info: handle_shutdown_request: 
> >> Creating shutdown request
> for srv03
> >> (state=S_IDLE)
> >> Oct 21 16:53:07 srv03 cib: [4428]: info: cib_stats: Processed 805 
> >> operations (38149.00us
> average, 5%
> >> utilization) in the last 10min
> >> Oct 21 16:57:20 srv03 crmd: [4432]: ERROR: crm_timer_popped: Shutdown 
> >> Escalation (I_STOP)
> just popped!
> >> Oct 21 16:57:20 srv03 crmd: [4432]: ERROR: do_log: FSA: Input I_STOP from 
> >> crm_timer_popped()
> received
> >> in state S_IDLE
> >> Oct 21 16:57:20 srv03 crmd: [4432]: info: do_state_transition: State 
> >> transition S_IDLE ->
> S_STOPPING [
> >> input=I_STOP cause=C_TIMER_POPPED origin=crm_timer_popped ]
> >> Oct 21 16:57:20 srv03 crmd: [4432]: info: do_dc_release: DC role released
> >> Oct 21 16:57:20 srv03 crmd: [4432]: info: stop_subsystem: Sent -TERM to 
> >> pengine: [5007]
> >>
> >>
> >> Is it right movement to take time to this service stop?
> >
> > It's what I would expect to happen, but its possibly not ideal.
> 
> Wait, I think I read that wrong.
> I would expect that no-matter what that pacemaker would exit after
> shutdown-escalation.
> 
> You're saying it didn't?
> Better create a bug and attach the logs.
> 
> >
> >> &#65533;* Because the log was very big, I did not attach it.
> >> &#65533;* If log is necessary, I send it in Bugzilla.
> >>
> >> Best Regards,
> >> Hideo Yamauchi.
> >>
> >>
> >> _______________________________________________
> >> Pacemaker mailing list: [email protected]
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: 
> >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >>
> >
> 
> _______________________________________________
> Pacemaker mailing list: [email protected]
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> 


_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Time to a service stop is very long.

Reply via email to