Re: [Pacemaker] Time to a service stop is very long.

Andrew Beekhof Wed, 27 Oct 2010 03:42:35 -0700

On Wed, Oct 27, 2010 at 12:36 PM, Andrew Beekhof <[email protected]> wrote:
> On Thu, Oct 21, 2010 at 10:30 AM,  <[email protected]> wrote:
>> Hi,
>>
>> We confirmed movement when we set freeze in no-quorum-policy.
>> In the cluster that freeze setting became effective, we stopped the service.
>>
>> However, a stop of the service took time very much.
>>
>> We set "shutdown-escalation" for five minutes to shorten the time for test.
>> But, a stop of the service of one node takes time more than five minutes.
>>
>> I confirmed it in the next procedure.
>>
>> Step1) Start four nodes and send cib.xml.
>> Step2) Intercept Heartbeat communication and divide it in two nodes.
>> Step3) The node does freeze.
>> Step4) In two divided one nodes, we stop Hearbeat at the same time.
>>
>> [r...@srv03 ~]# service heartbeat stop
>> Stopping High-Availability services:
>> [r...@srv04 ~]# service heartbeat stop
>> Stopping High-Availability services:
>>
>> Step5) Heartbeat of one node stops in a few minutes.
>> [r...@srv04 ~]# service heartbeat stop
>> Stopping High-Availability services:                       [  OK  ]
>>
>> Step6) But, Heartbeat of one node does not stop anymore unless, furthermore, 
>> time passes.
>>  * The timer of shutdown-escalation starts, but time when we set it(5min) 
>> does not seem to become
>> effective.
>>
>> [r...@srv03 ~]# service heartbeat stop
>> Stopping High-Availability services:                       [  OK  ]
>>
>> Oct 21 16:46:57 srv03 crmd: [4432]: info: do_shutdown_req: Sending shutdown 
>> request to DC: srv03
>> Oct 21 16:46:57 srv03 crmd: [4432]: info: handle_shutdown_request: Creating 
>> shutdown request for srv03
>> (state=S_IDLE)
>> Oct 21 16:53:07 srv03 cib: [4428]: info: cib_stats: Processed 805 operations 
>> (38149.00us average, 5%
>> utilization) in the last 10min
>> Oct 21 16:57:20 srv03 crmd: [4432]: ERROR: crm_timer_popped: Shutdown 
>> Escalation (I_STOP) just popped!
>> Oct 21 16:57:20 srv03 crmd: [4432]: ERROR: do_log: FSA: Input I_STOP from 
>> crm_timer_popped() received
>> in state S_IDLE
>> Oct 21 16:57:20 srv03 crmd: [4432]: info: do_state_transition: State 
>> transition S_IDLE -> S_STOPPING [
>> input=I_STOP cause=C_TIMER_POPPED origin=crm_timer_popped ]
>> Oct 21 16:57:20 srv03 crmd: [4432]: info: do_dc_release: DC role released
>> Oct 21 16:57:20 srv03 crmd: [4432]: info: stop_subsystem: Sent -TERM to 
>> pengine: [5007]
>>
>>
>> Is it right movement to take time to this service stop?
>
> It's what I would expect to happen, but its possibly not ideal.


Wait, I think I read that wrong.
I would expect that no-matter what that pacemaker would exit after
shutdown-escalation.

You're saying it didn't?
Better create a bug and attach the logs.

>
>>  * Because the log was very big, I did not attach it.
>>  * If log is necessary, I send it in Bugzilla.
>>
>> Best Regards,
>> Hideo Yamauchi.
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: [email protected]
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: 
>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
>>
>

_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Time to a service stop is very long.

Reply via email to