Hi, i found the error. It was failing on just one note and it was always the passive node. I had a broken symlink from /var/www to my drbd device. After fixing it the ClusterMonitor runs just fine.
Best Regards, Sebastian Koch -----Ursprüngliche Nachricht----- Von: Dejan Muhamedagic [mailto:deja...@fastmail.fm] Gesendet: Donnerstag, 24. Juni 2010 15:33 An: The Pacemaker cluster resource manager Betreff: Re: [Pacemaker] ClusterMon failing: call=220, rc=1, status=complete): unknown error On Thu, Jun 24, 2010 at 02:14:51PM +0200, Koch, Sebastian wrote: > Hi, > > i got a small issue with the CLusterMon agent. The monitor > actions for this agent seem to fail (if i look into syslog, > you'll find it below) and i am not able to troubleshoot it. I > tried to start the agent on the failed node by hand but it > don't see startup / status errors. The ClusterMon seems to fail > only on the passive node, therefore i thought it should be a > problem caused by missing www directories or something else but > i cannot see the error. > > -------------------------------------------------------------------------------------------------------------- > r...@pilot01-node2:~/clustercompare# > /usr/lib/ocf/resource.d/heartbeat/ClusterMon validate-all > Validate OK > r...@pilot01-node2:~/clustercompare# > /usr/lib/ocf/resource.d/heartbeat/ClusterMon stop; echo "res: $?" > res: 0 > r...@pilot01-node2:~/clustercompare# > /usr/lib/ocf/resource.d/heartbeat/ClusterMon start; echo "res: $?" > res: 0 > r...@pilot01-node2:~/clustercompare# > /usr/lib/ocf/resource.d/heartbeat/ClusterMon status; echo "res: $?" > usage: /usr/lib/ocf/resource.d/heartbeat/ClusterMon > {start|stop|monitor|validate-all|meta-data} > > Expects to have a fully populated OCF RA-compliant environment set. > res: 3 If you want to run it by hand you need to set the parameters (OCF_RESKEY_*) and export OCF_ROOT=/usr/lib/ocf. > -------------------------------------------------------------------------------------------------------------- > > I can see that CLusterMon is started and even the html output > works but there is still this error. Take a look at the logs. In particular for output from ClusterMon and lrmd. Thanks, Dejan > -------------------------------------------------------------------------------------------------------------- > ============ > Last updated: Thu Jun 24 14:02:48 2010 > Stack: openais > Current DC: pilot01-node2 - partition with quorum > Version: 1.0.8-2c98138c2f070fcb6ddeab1084154cffbf44ba75 > 2 Nodes configured, 2 expected votes > 4 Resources configured. > ============ > > Online: [ pilot01-node1 pilot01-node2 ] > > Resource Group: grp_MySQL > res_Filesystem (ocf::heartbeat:Filesystem): Started pilot01-node2 > res_ClusterIP (ocf::heartbeat:IPaddr2): Started pilot01-node2 > res_MySQL (lsb:mysql): Started pilot01-node2 > res_Apache (lsb:apache2): Started pilot01-node2 > res_ClusterMonitor (ocf::pacemaker:ClusterMon): Started pilot01-node2 > res_Nagios (lsb:nagios3): Started pilot01-node2 > Master/Slave Set: ms_drbd_mysql0 > Masters: [ pilot01-node2 ] > Slaves: [ pilot01-node1 ] > Clone Set: cl-pinggw > Started: [ pilot01-node1 pilot01-node2 ] > Monitor-Cluster (ocf::pacemaker:ClusterMon): Started pilot01-node2 > (unmanaged) FAILED > > Failed actions: > Monitor-Cluster_stop_0 (node=pilot01-node2, call=220, rc=1, > status=complete): unknown error > -------------------------------------------------------------------------------------------------------------- > > I linked /var/www on both nodes to my cluster drbd storage. > > -------------------------------------------------------------------------------------------------------------- > r...@pilot01-node1:/mnt/cluster/var/www# ll /var/www > lrwxrwxrwx 1 root root 20 23. Jun 17:06 /var/www -> /mnt/cluster/var/www > -------------------------------------------------------------------------------------------------------------- > > This is my configuration. > > -------------------------------------------------------------------------------------------------------------- > node pilot01-node1 \ > attributes standby="off" > node pilot01-node2 \ > attributes standby="off" > primitive Monitor-Cluster ocf:pacemaker:ClusterMon \ > params htmlfile="/mnt/cluster/var/www/cluster-monitor.html" \ > params pidfile="/var/run/rlb-cluster-monitor.pid" \ > op start interval="0" timeout="90s" \ > op stop interval="0" timeout="100s" > primitive drbd_pilot0 ocf:linbit:drbd \ > params drbd_resource="pilot0" \ > operations $id="drbd_pilot0-operations" \ > op monitor interval="15s" > primitive pinggw ocf:pacemaker:pingd \ > params host_list="10.1.1.162" multiplier="200" \ > op monitor interval="10s" > primitive res_Apache lsb:apache2 \ > operations $id="res_Apache-operations" \ > op monitor interval="15s" timeout="20s" start-delay="15s" > primitive res_ClusterIP ocf:heartbeat:IPaddr2 \ > params iflabel="ClusterIP" ip="10.1.1.12" nic="eth0" > cidr_netmask="24" \ > operations $id="res_ClusterIP_1-operations" \ > op monitor start-delay="0" interval="10s" > primitive res_ClusterMonitor ocf:pacemaker:ClusterMon \ > params htmlfile="/mnt/cluster/var/www/cluster-monitor.html" \ > params pidfile="/var/run/rlb-cluster-monitor.pid" \ > op start interval="0" timeout="90s" \ > op stop interval="0" timeout="100s" \ > meta target-role="Started" > primitive res_Filesystem ocf:heartbeat:Filesystem \ > params fstype="xfs" directory="/mnt/cluster" device="/dev/drbd0" > options="noatime,nodiratime,barrier=0" > primitive res_MySQL lsb:mysql > primitive res_Nagios lsb:nagios3 \ > operations $id="res_Nagios-operations" \ > op monitor interval="15s" timeout="20s" \ > meta target-role="Started" > group grp_MySQL res_Filesystem res_ClusterIP res_MySQL res_Apache > res_ClusterMonitor res_Nagios > ms ms_drbd_mysql0 drbd_pilot0 \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" > clone cl-pinggw pinggw \ > meta globally-unique="false" > location drbd-fence-by-handler-ms_drbd_mysql0 ms_drbd_mysql0 \ > rule $id="drbd-fence-by-handler-rule-ms_drbd_mysql0" $role="Master" > -inf: #uname ne pilot01-node2 > location grp_MySQL-with-pinggw grp_MySQL \ > rule $id="grp_MySQL-with-pinggw-rule-1" -inf: not_defined pingd or > pingd lte 0 > colocation col_drbd_on_mysql inf: grp_MySQL ms_drbd_mysql0:Master > order mysql_after_drbd inf: ms_drbd_mysql0:promote grp_MySQL:start > property $id="cib-bootstrap-options" \ > expected-quorum-votes="2" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" \ > dc-version="1.0.8-2c98138c2f070fcb6ddeab1084154cffbf44ba75" \ > cluster-infrastructure="openais" \ > last-lrm-refresh="1277380951" \ > symmetric-cluster="true" \ > default-action-timeout="240s" > -------------------------------------------------------------------------------------------------------------- > > Sebastian Koch > > > NETZWERK GmbH > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker