Re: [Pacemaker] [Problem] lrmd detects monitor time-out by revision of the system time.

2014-09-10 Thread Alan Robertson
On 09/07/2014 08:12 PM, Andrew Beekhof wrote:
 On 5 Sep 2014, at 2:22 pm, renayama19661...@ybb.ne.jp wrote:

 Hi All,

 We confirmed that lrmd caused the time-out of the monitor when the time of 
 the system was revised.
 When a system considers revision of the time when I used ntpd, it is a 
 problem very much.

 We can confirm this problem in the next procedure.

 Step1) Start Pacemaker in a single node.
 [root@snmp1 ~]# start pacemaker.combined
 pacemaker.combined start/running, process 11382

 Step2) Send simple crm.

 trac2915-3.crm
 primitive prmDummyA ocf:pacemaker:Dummy1 \
 op start interval=0s timeout=60s on-fail=restart \
 op monitor interval=10s timeout=30s on-fail=restart \
 op stop interval=0s timeout=60s on-fail=block
 group grpA prmDummyA
 location rsc_location-grpA-1 grpA \
 rule $id=rsc_location-grpA-1-rule   200: #uname eq snmp1 \
 rule $id=rsc_location-grpA-1-rule-0 100: #uname eq snmp2

 property $id=cib-bootstrap-options \
 no-quorum-policy=ignore \
 stonith-enabled=false \
 crmd-transition-delay=2s
 rsc_defaults $id=rsc-options \
 resource-stickiness=INFINITY \
 migration-threshold=1
 --

 [root@snmp1 ~]# crm configure load update trac2915-3.crm 
 WARNING: rsc_location-grpA-1: referenced node snmp2 does not exist

 [root@snmp1 ~]# crm_mon -1 -Af
 Last updated: Fri Sep  5 13:09:45 2014
 Last change: Fri Sep  5 13:09:13 2014
 Stack: corosync
 Current DC: snmp1 (3232238180) - partition WITHOUT quorum
 Version: 1.1.12-561c4cf
 1 Nodes configured
 1 Resources configured


 Online: [ snmp1 ]

  Resource Group: grpA
  prmDummyA  (ocf::pacemaker:Dummy1):Started snmp1 

 Node Attributes:
 * Node snmp1:

 Migration summary:
 * Node snmp1: 

 Step3) After the monitor of the resource just began, we push forward time 
 than the timeout(timeout=30s) of the monitor.
 [root@snmp1 ~]#  date -s +40sec
 Fri Sep  5 13:11:04 JST 2014

 Step4) The time-out of the monitor occurs.

 [root@snmp1 ~]# crm_mon -1 -Af
 Last updated: Fri Sep  5 13:11:24 2014
 Last change: Fri Sep  5 13:09:13 2014
 Stack: corosync
 Current DC: snmp1 (3232238180) - partition WITHOUT quorum
 Version: 1.1.12-561c4cf
 1 Nodes configured
 1 Resources configured


 Online: [ snmp1 ]


 Node Attributes:
 * Node snmp1:

 Migration summary:
 * Node snmp1: 
prmDummyA: migration-threshold=1 fail-count=1 last-failure='Fri Sep  5 
 13:11:04 2014'

 Failed actions:
 prmDummyA_monitor_1 on snmp1 'unknown error' (1): call=7, 
 status=Timed Out, last-rc-change='Fri Sep  5 13:11:04 2014', queued=0ms, 
 exec=0ms


 I confirmed some problems, but seem to be caused by the fact that an event 
 occurs somehow or other in g_main_loop of lrmd in the period when it is 
 shorter than a monitor.
 So if you create a trivial program with g_main_loop and a timer, and then 
 change the system time, does the timer expire early?

 This problem does not seem to happen somehow or other in lrmd of PM1.0.
 cluster-glue was probably using custom timeout code.
Yes it was.  Exactly because of this well-known problem.  I think recent
versions of the glib code have fixed that.  I can't tell you how many
different bugs we ran into that related to timing - like this -- or time
wraparound.  There were probably a dozen time-related bugs.  Most of
them weren't in the LRM code, but the rest of the universe -- like this one.

I filed the bug against glib probably 8-10 years ago.  It takes a while
for things to get fixed, then it takes even longer for them to get fixed
in the various distros.  Some of them are 5+ years behind current code. 
FreeBSD had a similar problem - even with our custom code because they
weren't following POSIX.  But I filed the bug against them, and they
fixed it (eventually).

-- Alan Robertson
   al...@unix.sh

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem] lrmd detects monitor time-out by revision of the system time.

2014-09-09 Thread renayama19661014
Hi Andrew,

I confirmed it in various ways.

The conclusion varies in movement by a version of glib.
 * The problem occurs in RHEL6.x.
 * The problem does not occur in RHEL7.0.

And this problem is solved in glib of a new version.

A change of next glib seems to solve a problem in a new version.
 * 
https://github.com/GNOME/glib/commit/91113a8aeea40cc2d7dda65b09537980bb602a06#diff-fc9b4bb280a13f8e51c51b434e7d26fd

Many users expect right movement in old glib.
 * Till it shifts to RHEL7...

Do you not make modifications in Pacemaker to support an old version?
 * Model it on old G_() function.

Best Regards,
Hideo Yamauchi.



- Original Message -
 From: Andrew Beekhof and...@beekhof.net
 To: renayama19661...@ybb.ne.jp
 Cc: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
 Date: 2014/9/8, Mon 19:55
 Subject: Re: [Pacemaker] [Problem] lrmd detects monitor time-out by revision 
 of the system time.
 
 
 On 8 Sep 2014, at 7:12 pm, renayama19661...@ybb.ne.jp wrote:
 
  Hi Andrew,
 
  I confirmed some problems, but seem to be caused by the 
 fact that 
 
  an event 
  occurs somehow or other in g_main_loop of lrmd in the period 
 when it is 
  shorter 
  than a monitor.
 
  So if you create a trivial program with g_main_loop and a 
 timer, and 
  then change 
  the system time, does the timer expire early?
 
  Yes.
 
  That sounds like a glib bug. Ideally we'd get it fixed there rather 
 than 
  work-around it in pacemaker.
  Have you spoken to them at all?
 
 
 
  No.
  I investigate glib library a little more.
  And I talk with community of glib.
 
  I may talk again afterwards.
 
 Cool. I somewhat expect them to say working as designed.
 Which would be unfortunate, but it shouldn't be too hard to work around.
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem] lrmd detects monitor time-out by revision of the system time.

2014-09-09 Thread Andrew Beekhof

On 10 Sep 2014, at 2:48 pm, renayama19661...@ybb.ne.jp wrote:

 Hi Andrew,
 
 I confirmed it in various ways.
 
 The conclusion varies in movement by a version of glib.
  * The problem occurs in RHEL6.x.
  * The problem does not occur in RHEL7.0.
 
 And this problem is solved in glib of a new version.
 
 A change of next glib seems to solve a problem in a new version.
  * 
 https://github.com/GNOME/glib/commit/91113a8aeea40cc2d7dda65b09537980bb602a06#diff-fc9b4bb280a13f8e51c51b434e7d26fd
 
 Many users expect right movement in old glib.
  * Till it shifts to RHEL7...
 
 Do you not make modifications in Pacemaker to support an old version?
  * Model it on old G_() function.

I'll file a bug against glib on RHEL6 so that it gets fixed there.
Can you send me your simple reproducer program?

 
 Best Regards,
 Hideo Yamauchi.
 
 
 
 - Original Message -
 From: Andrew Beekhof and...@beekhof.net
 To: renayama19661...@ybb.ne.jp
 Cc: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
 Date: 2014/9/8, Mon 19:55
 Subject: Re: [Pacemaker] [Problem] lrmd detects monitor time-out by revision 
 of the system time.
 
 
 On 8 Sep 2014, at 7:12 pm, renayama19661...@ybb.ne.jp wrote:
 
 Hi Andrew,
 
 I confirmed some problems, but seem to be caused by the 
 fact that 
 
 an event 
 occurs somehow or other in g_main_loop of lrmd in the period 
 when it is 
 shorter 
 than a monitor.
 
 So if you create a trivial program with g_main_loop and a 
 timer, and 
 then change 
 the system time, does the timer expire early?
 
 Yes.
 
 That sounds like a glib bug. Ideally we'd get it fixed there rather 
 than 
 work-around it in pacemaker.
 Have you spoken to them at all?
 
 
 
 No.
 I investigate glib library a little more.
 And I talk with community of glib.
 
 I may talk again afterwards.
 
 Cool. I somewhat expect them to say working as designed.
 Which would be unfortunate, but it shouldn't be too hard to work around.
 



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem] lrmd detects monitor time-out by revision of the system time.

2014-09-09 Thread renayama19661014
Hi Andrew,

Thank you for comments.

 I'll file a bug against glib on RHEL6 so that it gets fixed there.
 Can you send me your simple reproducer program?




I make revision during practice of timer_func2() at the time
When timer_func2() is carried out, time-out of timer_func() is completed before 
planned time.

-
#include stdio.h
#include glib.h
#include sys/times.h
gboolean timer_func(gpointer data){
        printf(TIMER EXPIRE!\n);
        fflush(stdout);
        exit(1);
//      return FALSE;
}
gboolean timer_func2(gpointer data){
        clock_t         ret;
        struct tms buff;

        ret = times(buff);

        printf(TIMER2 EXPIRE! %d\n, ret);
        fflush(stdout);
        return TRUE;
}
int main(int argc, char** argv){
        GMainLoop *m;
        clock_t         ret;
        struct tms buff;        gint64 t;

//      t = g_get_monotonic_time();
        m = g_main_new(FALSE);
        g_timeout_add(5000, timer_func2, NULL);
        g_timeout_add(6, timer_func, NULL);
        ret = times(buff);
        printf(START! %d\n, ret);]
        g_main_run(m);
}
-



Many Thanks,
Hideo Yamauchi.


- Original Message -
 From: Andrew Beekhof and...@beekhof.net
 To: renayama19661...@ybb.ne.jp
 Cc: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
 Date: 2014/9/10, Wed 13:56
 Subject: Re: [Pacemaker] [Problem] lrmd detects monitor time-out by revision 
 of the system time.
 
 
 On 10 Sep 2014, at 2:48 pm, renayama19661...@ybb.ne.jp wrote:
 
  Hi Andrew,
 
  I confirmed it in various ways.
 
  The conclusion varies in movement by a version of glib.
   * The problem occurs in RHEL6.x.
   * The problem does not occur in RHEL7.0.
 
  And this problem is solved in glib of a new version.
 
  A change of next glib seems to solve a problem in a new version.
   * 
 https://github.com/GNOME/glib/commit/91113a8aeea40cc2d7dda65b09537980bb602a06#diff-fc9b4bb280a13f8e51c51b434e7d26fd
 
  Many users expect right movement in old glib.
   * Till it shifts to RHEL7...
 
  Do you not make modifications in Pacemaker to support an old version?
   * Model it on old G_() function.
 
 I'll file a bug against glib on RHEL6 so that it gets fixed there.
 Can you send me your simple reproducer program?
 
 
  Best Regards,
  Hideo Yamauchi.
 
 
 
  - Original Message -
  From: Andrew Beekhof and...@beekhof.net
  To: renayama19661...@ybb.ne.jp
  Cc: The Pacemaker cluster resource manager 
 pacemaker@oss.clusterlabs.org
  Date: 2014/9/8, Mon 19:55
  Subject: Re: [Pacemaker] [Problem] lrmd detects monitor time-out by 
 revision of the system time.
 
 
  On 8 Sep 2014, at 7:12 pm, renayama19661...@ybb.ne.jp wrote:
 
  Hi Andrew,
 
  I confirmed some problems, but seem to be caused by 
 the 
  fact that 
 
  an event 
  occurs somehow or other in g_main_loop of lrmd in the 
 period 
  when it is 
  shorter 
  than a monitor.
 
  So if you create a trivial program with g_main_loop and 
 a 
  timer, and 
  then change 
  the system time, does the timer expire early?
 
  Yes.
 
  That sounds like a glib bug. Ideally we'd get it fixed 
 there rather 
  than 
  work-around it in pacemaker.
  Have you spoken to them at all?
 
 
 
  No.
  I investigate glib library a little more.
  And I talk with community of glib.
 
  I may talk again afterwards.
 
  Cool. I somewhat expect them to say working as designed.
  Which would be unfortunate, but it shouldn't be too hard to work 
 around.
 
 

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem] lrmd detects monitor time-out by revision of the system time.

2014-09-08 Thread renayama19661014
Hi Andrew,

  I confirmed some problems, but seem to be caused by the fact that 

 an event 
  occurs somehow or other in g_main_loop of lrmd in the period when it is 
 shorter 
  than a monitor.
 
  So if you create a trivial program with g_main_loop and a timer, and 
 then change 
  the system time, does the timer expire early?
 
  Yes.
 
 That sounds like a glib bug. Ideally we'd get it fixed there rather than 
 work-around it in pacemaker.
 Have you spoken to them at all?
 


No.
I investigate glib library a little more.
And I talk with community of glib.

I may talk again afterwards.

Many Thanks,
Hideo Yamauchi.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem] lrmd detects monitor time-out by revision of the system time.

2014-09-08 Thread Andrew Beekhof

On 8 Sep 2014, at 7:12 pm, renayama19661...@ybb.ne.jp wrote:

 Hi Andrew,
 
 I confirmed some problems, but seem to be caused by the fact that 
 
 an event 
 occurs somehow or other in g_main_loop of lrmd in the period when it is 
 shorter 
 than a monitor.
 
 So if you create a trivial program with g_main_loop and a timer, and 
 then change 
 the system time, does the timer expire early?
 
 Yes.
 
 That sounds like a glib bug. Ideally we'd get it fixed there rather than 
 work-around it in pacemaker.
 Have you spoken to them at all?
 
 
 
 No.
 I investigate glib library a little more.
 And I talk with community of glib.
 
 I may talk again afterwards.

Cool. I somewhat expect them to say working as designed.
Which would be unfortunate, but it shouldn't be too hard to work around.


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem] lrmd detects monitor time-out by revision of the system time.

2014-09-07 Thread Andrew Beekhof

On 5 Sep 2014, at 2:22 pm, renayama19661...@ybb.ne.jp wrote:

 Hi All,
 
 We confirmed that lrmd caused the time-out of the monitor when the time of 
 the system was revised.
 When a system considers revision of the time when I used ntpd, it is a 
 problem very much.
 
 We can confirm this problem in the next procedure.
 
 Step1) Start Pacemaker in a single node.
 [root@snmp1 ~]# start pacemaker.combined
 pacemaker.combined start/running, process 11382
 
 Step2) Send simple crm.
 
 trac2915-3.crm
 primitive prmDummyA ocf:pacemaker:Dummy1 \
 op start interval=0s timeout=60s on-fail=restart \
 op monitor interval=10s timeout=30s on-fail=restart \
 op stop interval=0s timeout=60s on-fail=block
 group grpA prmDummyA
 location rsc_location-grpA-1 grpA \
 rule $id=rsc_location-grpA-1-rule   200: #uname eq snmp1 \
 rule $id=rsc_location-grpA-1-rule-0 100: #uname eq snmp2
 
 property $id=cib-bootstrap-options \
 no-quorum-policy=ignore \
 stonith-enabled=false \
 crmd-transition-delay=2s
 rsc_defaults $id=rsc-options \
 resource-stickiness=INFINITY \
 migration-threshold=1
 --
 
 [root@snmp1 ~]# crm configure load update trac2915-3.crm 
 WARNING: rsc_location-grpA-1: referenced node snmp2 does not exist
 
 [root@snmp1 ~]# crm_mon -1 -Af
 Last updated: Fri Sep  5 13:09:45 2014
 Last change: Fri Sep  5 13:09:13 2014
 Stack: corosync
 Current DC: snmp1 (3232238180) - partition WITHOUT quorum
 Version: 1.1.12-561c4cf
 1 Nodes configured
 1 Resources configured
 
 
 Online: [ snmp1 ]
 
  Resource Group: grpA
  prmDummyA  (ocf::pacemaker:Dummy1):Started snmp1 
 
 Node Attributes:
 * Node snmp1:
 
 Migration summary:
 * Node snmp1: 
 
 Step3) After the monitor of the resource just began, we push forward time 
 than the timeout(timeout=30s) of the monitor.
 [root@snmp1 ~]#  date -s +40sec
 Fri Sep  5 13:11:04 JST 2014
 
 Step4) The time-out of the monitor occurs.
 
 [root@snmp1 ~]# crm_mon -1 -Af
 Last updated: Fri Sep  5 13:11:24 2014
 Last change: Fri Sep  5 13:09:13 2014
 Stack: corosync
 Current DC: snmp1 (3232238180) - partition WITHOUT quorum
 Version: 1.1.12-561c4cf
 1 Nodes configured
 1 Resources configured
 
 
 Online: [ snmp1 ]
 
 
 Node Attributes:
 * Node snmp1:
 
 Migration summary:
 * Node snmp1: 
prmDummyA: migration-threshold=1 fail-count=1 last-failure='Fri Sep  5 
 13:11:04 2014'
 
 Failed actions:
 prmDummyA_monitor_1 on snmp1 'unknown error' (1): call=7, 
 status=Timed Out, last-rc-change='Fri Sep  5 13:11:04 2014', queued=0ms, 
 exec=0ms
 
 
 I confirmed some problems, but seem to be caused by the fact that an event 
 occurs somehow or other in g_main_loop of lrmd in the period when it is 
 shorter than a monitor.

So if you create a trivial program with g_main_loop and a timer, and then 
change the system time, does the timer expire early?

 
 This problem does not seem to happen somehow or other in lrmd of PM1.0.

cluster-glue was probably using custom timeout code.


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem] lrmd detects monitor time-out by revision of the system time.

2014-09-07 Thread renayama19661014
Hi Andrew,

Thank you for comments.

  I confirmed some problems, but seem to be caused by the fact that an event 
 occurs somehow or other in g_main_loop of lrmd in the period when it is 
 shorter 
 than a monitor.
 
 So if you create a trivial program with g_main_loop and a timer, and then 
 change 
 the system time, does the timer expire early?

Yes.

 
  This problem does not seem to happen somehow or other in lrmd of PM1.0.
 
 cluster-glue was probably using custom timeout code.


I watched implementation of glue, too.
The time-out handling of new lrmd seems to have to perform implementation 
similar to glue somehow or other.


Best Regards,
Hideo Yamauchi.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] [Problem] lrmd detects monitor time-out by revision of the system time.

2014-09-07 Thread Andrew Beekhof

On 8 Sep 2014, at 12:46 pm, renayama19661...@ybb.ne.jp wrote:

 Hi Andrew,
 
 Thank you for comments.
 
 I confirmed some problems, but seem to be caused by the fact that an event 
 occurs somehow or other in g_main_loop of lrmd in the period when it is 
 shorter 
 than a monitor.
 
 So if you create a trivial program with g_main_loop and a timer, and then 
 change 
 the system time, does the timer expire early?
 
 Yes.

That sounds like a glib bug. Ideally we'd get it fixed there rather than 
work-around it in pacemaker.
Have you spoken to them at all?

 
  
 This problem does not seem to happen somehow or other in lrmd of PM1.0.
 
 cluster-glue was probably using custom timeout code.
 
 
 I watched implementation of glue, too.
 The time-out handling of new lrmd seems to have to perform implementation 
 similar to glue somehow or other.
 
 
 Best Regards,
 Hideo Yamauchi.
 



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] [Problem] lrmd detects monitor time-out by revision of the system time.

2014-09-04 Thread renayama19661014
Hi All,

We confirmed that lrmd caused the time-out of the monitor when the time of the 
system was revised.
When a system considers revision of the time when I used ntpd, it is a problem 
very much.

We can confirm this problem in the next procedure.

Step1) Start Pacemaker in a single node.
[root@snmp1 ~]# start pacemaker.combined
pacemaker.combined start/running, process 11382

Step2) Send simple crm.

trac2915-3.crm
primitive prmDummyA ocf:pacemaker:Dummy1 \
    op start interval=0s timeout=60s on-fail=restart \
    op monitor interval=10s timeout=30s on-fail=restart \
    op stop interval=0s timeout=60s on-fail=block
group grpA prmDummyA
location rsc_location-grpA-1 grpA \
    rule $id=rsc_location-grpA-1-rule   200: #uname eq snmp1 \
    rule $id=rsc_location-grpA-1-rule-0 100: #uname eq snmp2

property $id=cib-bootstrap-options \
    no-quorum-policy=ignore \
    stonith-enabled=false \
    crmd-transition-delay=2s
rsc_defaults $id=rsc-options \
    resource-stickiness=INFINITY \
    migration-threshold=1
--

[root@snmp1 ~]# crm configure load update trac2915-3.crm 
WARNING: rsc_location-grpA-1: referenced node snmp2 does not exist

[root@snmp1 ~]# crm_mon -1 -Af
Last updated: Fri Sep  5 13:09:45 2014
Last change: Fri Sep  5 13:09:13 2014
Stack: corosync
Current DC: snmp1 (3232238180) - partition WITHOUT quorum
Version: 1.1.12-561c4cf
1 Nodes configured
1 Resources configured


Online: [ snmp1 ]

 Resource Group: grpA
     prmDummyA  (ocf::pacemaker:Dummy1):        Started snmp1 

Node Attributes:
* Node snmp1:

Migration summary:
* Node snmp1: 

Step3) After the monitor of the resource just began, we push forward time than 
the timeout(timeout=30s) of the monitor.
[root@snmp1 ~]#  date -s +40sec
Fri Sep  5 13:11:04 JST 2014

Step4) The time-out of the monitor occurs.

[root@snmp1 ~]# crm_mon -1 -Af
Last updated: Fri Sep  5 13:11:24 2014
Last change: Fri Sep  5 13:09:13 2014
Stack: corosync
Current DC: snmp1 (3232238180) - partition WITHOUT quorum
Version: 1.1.12-561c4cf
1 Nodes configured
1 Resources configured


Online: [ snmp1 ]


Node Attributes:
* Node snmp1:

Migration summary:
* Node snmp1: 
   prmDummyA: migration-threshold=1 fail-count=1 last-failure='Fri Sep  5 
13:11:04 2014'

Failed actions:
    prmDummyA_monitor_1 on snmp1 'unknown error' (1): call=7, status=Timed 
Out, last-rc-change='Fri Sep  5 13:11:04 2014', queued=0ms, exec=0ms


I confirmed some problems, but seem to be caused by the fact that an event 
occurs somehow or other in g_main_loop of lrmd in the period when it is shorter 
than a monitor.

This problem does not seem to happen somehow or other in lrmd of PM1.0.

Best Regards,
Hideo Yamauchi.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org