Re: [ClusterLabs] How does failure-timeout works, will the resource not be scheduled when setting too short?

2018-05-27 Thread Ken Gaillot
On Sun, 2018-05-20 at 10:19 +0800, lkxjtu wrote:
> 
> I have two pacemaker resources. We call them A and B. Because of
> environmental reasons, their start methods and monitor methods always
> return failure 
> 
> (OCF_ERR_GENERIC). The following are their configurations:(The
> cluster property of start-failure-is-fatal is false)
> 
> primitive A A \
>     op monitor interval=20 timeout=120 \
>     op stop interval=0 timeout=120 on-fail=restart \
>     op start interval=0 timeout=240 on-fail=restart \
>     meta failure-timeout=60s
> primitive B B \
>     op monitor interval=20 timeout=120 \
>     op stop interval=0 timeout=120 on-fail=restart \
>     op start interval=0 timeout=240 on-fail=restart \
>     meta failure-timeout=60s
> clone A_cl A
> clone B_cl B
> 
> The time consuming of their methods is different:
> A:
> start = 60s   monitor < 1s    stop = 80s
> B:
> start < 1s    monitor < 1s    stop < 1s 
> 
> Resource of A is scheduled normally, always start and stop. But for
> resource B, there is only circular monitor fails, without start and
> stop.
> . And there is no fail-count showing of B in "crm status -f". 
> 
> Two operations can solve the problem of B not being scheduled:
> 1,Set failure-timeout of B from 60s to 600s
> 2,Modify ocf of A,make the stop method return as soon as possible
> 
> I tested it several times, and the results were the same. Why does
> the resource not be scheduled when failure-timeout setting too short?
> And what does 
> 
> it have to do with the time consuming stop of another resource?  Is
> this a bug?
> 
> My pacemaker version is 1.1.16. Any suggestion is welcome. Thank you!
> 
> 
> James
> 2018-05-20

That behavior is unexpected. Can you share logs?
-- 
Ken Gaillot 
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] How does failure-timeout works, will the resource not be scheduled when setting too short?

2018-05-19 Thread lkxjtu

I have two pacemaker resources. We call them A and B. Because of environmental 
reasons, their start methods and monitor methods always return failure

(OCF_ERR_GENERIC). The following are their configurations:(The cluster property 
of start-failure-is-fatal is false)

primitive A A \
op monitor interval=20 timeout=120 \
op stop interval=0 timeout=120 on-fail=restart \
op start interval=0 timeout=240 on-fail=restart \
meta failure-timeout=60s
primitive B B \
op monitor interval=20 timeout=120 \
op stop interval=0 timeout=120 on-fail=restart \
op start interval=0 timeout=240 on-fail=restart \
meta failure-timeout=60s
clone A_cl A
clone B_cl B

The time consuming of their methods is different:
A:
start = 60s   monitor < 1sstop = 80s
B:
start < 1smonitor < 1sstop < 1s

Resource of A is scheduled normally, always start and stop. But for resource B, 
there is only circular monitor fails, without start and stop.
. And there is no fail-count showing of B in "crm status -f".

Two operations can solve the problem of B not being scheduled:
1,Set failure-timeout of B from 60s to 600s
2,Modify ocf of A,make the stop method return as soon as possible

I tested it several times, and the results were the same. Why does the resource 
not be scheduled when failure-timeout setting too short? And what does

it have to do with the time consuming stop of another resource?  Is this a bug?

My pacemaker version is 1.1.16. Any suggestion is welcome. Thank you!


James
2018-05-20
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org