Re: [Pacemaker] Pacemaker/corosync freeze

2014-09-04 Thread Sreenivasa
Hi Attila,

Did you try compiling libqb 0.17.0 ? Wondering if that solved your issue ?
I also have the same issue. Please suggest if you already solved it.

Thanks
Sreenivasa 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] [Problem] lrmd detects monitor time-out by revision of the system time.

2014-09-04 Thread renayama19661014
Hi All,

We confirmed that lrmd caused the time-out of the monitor when the time of the 
system was revised.
When a system considers revision of the time when I used ntpd, it is a problem 
very much.

We can confirm this problem in the next procedure.

Step1) Start Pacemaker in a single node.
[root@snmp1 ~]# start pacemaker.combined
pacemaker.combined start/running, process 11382

Step2) Send simple crm.

trac2915-3.crm
primitive prmDummyA ocf:pacemaker:Dummy1 \
    op start interval=0s timeout=60s on-fail=restart \
    op monitor interval=10s timeout=30s on-fail=restart \
    op stop interval=0s timeout=60s on-fail=block
group grpA prmDummyA
location rsc_location-grpA-1 grpA \
    rule $id=rsc_location-grpA-1-rule   200: #uname eq snmp1 \
    rule $id=rsc_location-grpA-1-rule-0 100: #uname eq snmp2

property $id=cib-bootstrap-options \
    no-quorum-policy=ignore \
    stonith-enabled=false \
    crmd-transition-delay=2s
rsc_defaults $id=rsc-options \
    resource-stickiness=INFINITY \
    migration-threshold=1
--

[root@snmp1 ~]# crm configure load update trac2915-3.crm 
WARNING: rsc_location-grpA-1: referenced node snmp2 does not exist

[root@snmp1 ~]# crm_mon -1 -Af
Last updated: Fri Sep  5 13:09:45 2014
Last change: Fri Sep  5 13:09:13 2014
Stack: corosync
Current DC: snmp1 (3232238180) - partition WITHOUT quorum
Version: 1.1.12-561c4cf
1 Nodes configured
1 Resources configured


Online: [ snmp1 ]

 Resource Group: grpA
     prmDummyA  (ocf::pacemaker:Dummy1):        Started snmp1 

Node Attributes:
* Node snmp1:

Migration summary:
* Node snmp1: 

Step3) After the monitor of the resource just began, we push forward time than 
the timeout(timeout=30s) of the monitor.
[root@snmp1 ~]#  date -s +40sec
Fri Sep  5 13:11:04 JST 2014

Step4) The time-out of the monitor occurs.

[root@snmp1 ~]# crm_mon -1 -Af
Last updated: Fri Sep  5 13:11:24 2014
Last change: Fri Sep  5 13:09:13 2014
Stack: corosync
Current DC: snmp1 (3232238180) - partition WITHOUT quorum
Version: 1.1.12-561c4cf
1 Nodes configured
1 Resources configured


Online: [ snmp1 ]


Node Attributes:
* Node snmp1:

Migration summary:
* Node snmp1: 
   prmDummyA: migration-threshold=1 fail-count=1 last-failure='Fri Sep  5 
13:11:04 2014'

Failed actions:
    prmDummyA_monitor_1 on snmp1 'unknown error' (1): call=7, status=Timed 
Out, last-rc-change='Fri Sep  5 13:11:04 2014', queued=0ms, exec=0ms


I confirmed some problems, but seem to be caused by the fact that an event 
occurs somehow or other in g_main_loop of lrmd in the period when it is shorter 
than a monitor.

This problem does not seem to happen somehow or other in lrmd of PM1.0.

Best Regards,
Hideo Yamauchi.


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org