Re: [Pacemaker] Time to a service stop is very long.

Andrew Beekhof Wed, 27 Oct 2010 22:21:29 -0700

On Thu, Oct 28, 2010 at 3:11 AM,  <[email protected]> wrote:
> Hi Andrew,
>
>
>> Wait, I think I read that wrong.
>> I would expect that no-matter what that pacemaker would exit after
>> shutdown-escalation.
>>
>> You're saying it didn't?
>> Better create a bug and attach the logs.
>
> At the time of Step4, srv03,srv04 requested a stop of the Heartbeat service.
>
> To see log, the request of the stop of srv03 is considered to be it at 
> 16:46:57.
>
> Because I set "shutdown-escalation" for five minutes, I thought that the 
> srv03 node stopped at about
> 16:52:00.
>
> But, the srv03 node started a stop at 16:57:20.
>
> Is understanding of my "shutdown-escalation" wrong?


I don't think so, I think you probably found a bug.

>
>> Better create a bug and attach the logs.
>
> ok.
> Please wait....
>
> Best Regards,
> Hideo Yamauchi.
>
>> >> Oct 21 16:46:57 srv03 crmd: [4432]: info: do_shutdown_req: Sending 
>> >> shutdown request to DC:
>> srv03
>> >> Oct 21 16:46:57 srv03 crmd: [4432]: info: handle_shutdown_request: 
>> >> Creating shutdown request
>> for srv03
>> >> (state=S_IDLE)
>> >> Oct 21 16:53:07 srv03 cib: [4428]: info: cib_stats: Processed 805 
>> >> operations (38149.00us
>> average, 5%
>> >> utilization) in the last 10min
>> >> Oct 21 16:57:20 srv03 crmd: [4432]: ERROR: crm_timer_popped: Shutdown 
>> >> Escalation (I_STOP)
>> just popped!
>
>
>
> --- Andrew Beekhof <[email protected]> wrote:
>
>> On Wed, Oct 27, 2010 at 12:36 PM, Andrew Beekhof <[email protected]> wrote:
>> > On Thu, Oct 21, 2010 at 10:30 AM, &#65533;<[email protected]> 
>> > wrote:
>> >> Hi,
>> >>
>> >> We confirmed movement when we set freeze in no-quorum-policy.
>> >> In the cluster that freeze setting became effective, we stopped the 
>> >> service.
>> >>
>> >> However, a stop of the service took time very much.
>> >>
>> >> We set "shutdown-escalation" for five minutes to shorten the time for 
>> >> test.
>> >> But, a stop of the service of one node takes time more than five minutes.
>> >>
>> >> I confirmed it in the next procedure.
>> >>
>> >> Step1) Start four nodes and send cib.xml.
>> >> Step2) Intercept Heartbeat communication and divide it in two nodes.
>> >> Step3) The node does freeze.
>> >> Step4) In two divided one nodes, we stop Hearbeat at the same time.
>> >>
>> >> [r...@srv03 ~]# service heartbeat stop
>> >> Stopping High-Availability services:
>> >> [r...@srv04 ~]# service heartbeat stop
>> >> Stopping High-Availability services:
>> >>
>> >> Step5) Heartbeat of one node stops in a few minutes.
>> >> [r...@srv04 ~]# service heartbeat stop
>> >> Stopping High-Availability services: &#65533; &#65533; &#65533; &#65533; 
>> >> &#65533; &#65533;
> &#65533; &#65533; &#65533; &#65533; &#65533; [ &#65533;OK &#65533;]
>> >>
>> >> Step6) But, Heartbeat of one node does not stop anymore unless, 
>> >> furthermore, time passes.
>> >> &#65533;* The timer of shutdown-escalation starts, but time when we set 
>> >> it(5min) does not seem to
>> become
>> >> effective.
>> >>
>> >> [r...@srv03 ~]# service heartbeat stop
>> >> Stopping High-Availability services: &#65533; &#65533; &#65533; &#65533; 
>> >> &#65533; &#65533;
> &#65533; &#65533; &#65533; &#65533; &#65533; [ &#65533;OK &#65533;]
>> >>
>> >> Oct 21 16:46:57 srv03 crmd: [4432]: info: do_shutdown_req: Sending 
>> >> shutdown request to DC:
>> srv03
>> >> Oct 21 16:46:57 srv03 crmd: [4432]: info: handle_shutdown_request: 
>> >> Creating shutdown request
>> for srv03
>> >> (state=S_IDLE)
>> >> Oct 21 16:53:07 srv03 cib: [4428]: info: cib_stats: Processed 805 
>> >> operations (38149.00us
>> average, 5%
>> >> utilization) in the last 10min
>> >> Oct 21 16:57:20 srv03 crmd: [4432]: ERROR: crm_timer_popped: Shutdown 
>> >> Escalation (I_STOP)
>> just popped!
>> >> Oct 21 16:57:20 srv03 crmd: [4432]: ERROR: do_log: FSA: Input I_STOP from 
>> >> crm_timer_popped()
>> received
>> >> in state S_IDLE
>> >> Oct 21 16:57:20 srv03 crmd: [4432]: info: do_state_transition: State 
>> >> transition S_IDLE ->
>> S_STOPPING [
>> >> input=I_STOP cause=C_TIMER_POPPED origin=crm_timer_popped ]
>> >> Oct 21 16:57:20 srv03 crmd: [4432]: info: do_dc_release: DC role released
>> >> Oct 21 16:57:20 srv03 crmd: [4432]: info: stop_subsystem: Sent -TERM to 
>> >> pengine: [5007]
>> >>
>> >>
>> >> Is it right movement to take time to this service stop?
>> >
>> > It's what I would expect to happen, but its possibly not ideal.
>>
>> Wait, I think I read that wrong.
>> I would expect that no-matter what that pacemaker would exit after
>> shutdown-escalation.
>>
>> You're saying it didn't?
>> Better create a bug and attach the logs.
>>
>> >
>> >> &#65533;* Because the log was very big, I did not attach it.
>> >> &#65533;* If log is necessary, I send it in Bugzilla.
>> >>
>> >> Best Regards,
>> >> Hideo Yamauchi.
>> >>
>> >>
>> >> _______________________________________________
>> >> Pacemaker mailing list: [email protected]
>> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >>
>> >> Project Home: http://www.clusterlabs.org
>> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> >> Bugs: 
>> >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>> >>
>> >
>>
>> _______________________________________________
>> Pacemaker mailing list: [email protected]
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: 
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>
>
> _______________________________________________
> Pacemaker mailing list: [email protected]
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>

_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Time to a service stop is very long.

Reply via email to