> Date: Mon, 23 Jul 2012 12:16:20 +0200 > From: Andreas Kurz > > On 07/23/2012 07:06 AM, David Barchas wrote: > > Hello. > > > > I have been working on this for 3 days now, and must be so stressed out > > that I am being blinded to what is probably an obvious cause of this. In > > a word, HELP. > > > > I am trying specifically to utilize ocf:heartbeat:IPaddr2, but this > > issue seems to occur with any of the ocf:heartbeat agents. I will just > > focus on IPaddr2 for purposes of figuring this out, but it happens > > exactly the same with any of the default agents. However, I can > > successfully use ocf:linbit:drbd for example. it seems to be limited to > > the RAs that are installed along with coro/pace in the resource-agents > > package. > > > > > What are the exact package versions you have installed? > > pacemaker* > resource-agents > cluster-glue* > bah, all the info i provide and miss that. clusterlib-3.0.12.1-32.el6.x86_64 cluster-glue-1.0.5-6.el6.x86_64 cluster-glue-libs-1.0.5-6.el6.x86_64
pacemaker-cli-1.1.7-6.el6.x86_64 pacemaker-libs-1.1.7-6.el6.x86_64 pacemaker-cluster-libs-1.1.7-6.el6.x86_64 pacemaker-1.1.7-6.el6.x86_64 resource-agents-3.9.2-12.el6.x86_64 my full rpm -qa just in case its helpful http://pastebin.com/d2y7Sii4 > > > > > > I am using CentOS 6.3, fully updated (though this happens in 6.2 with no > > updates as well). Install pacemaker/coro from default repo. I have > > stripped everything down to figure this out in vmware and just install > > centos, update it, install pace/coro (no drbd for this discussion), > > configure coro, and then start it. pacemaker starts up fine (or at least > > I think its fine). I can set quorum ignore for example from crm. (crm > > configure property no-quorum-policy="ignore") > > > > here is the process list > > root 1447 0.3 0.6 556080 6636 ? Ssl 21:09 0:00 corosync > > 499 1453 0.0 0.5 88720 5556 ? S 21:09 0:00 \_ > > /usr/libexec/pacemaker/cib > > root 1454 0.0 0.3 86968 3488 ? S 21:09 0:00 \_ > > /usr/libexec/pacemaker/stonithd > > root 1455 0.0 0.2 76188 2492 ? S 21:09 0:00 \_ > > /usr/lib64/heartbeat/lrmd > > 499 1456 0.0 0.3 91160 3432 ? S 21:09 0:00 \_ > > /usr/libexec/pacemaker/attrd > > 499 1457 0.0 0.3 87440 3824 ? S 21:09 0:00 \_ > > /usr/libexec/pacemaker/pengine > > 499 1458 0.0 0.3 91312 3884 ? S 21:09 0:00 \_ > > /usr/libexec/pacemaker/crmd > > > > > so you are using plugin version 0 to start Pacemaker .... That would > explain why /etc/init.d/pacemaker is unable to start ... it is already > started by Corosync. i mostly included that info "just in case" and because its confusing to me that I can't start pacemaker even from fresh install before configuring or starting corosync. > > > > > 499 is hacluster btw. > > > > ***BUT*** > > > > When I run as root the following: > > # crm ra meta ocf:heartbeat:IPaddr2 > > > > I get this response: > > lrmadmin[1484]: 2012/07/22_13:28:23 ERROR: > > lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply > > message of rmetadata with function get_ret_from_msg. > > ERROR: ocf:heartbeat:IPaddr2: could not parse meta-data: > > > > And this is in /var/log/messages: > > Jul 22 16:35:14 MST lrmd: [48093]: ERROR: get_resource_meta: pclose > > failed: Resource temporarily unavailable > > Jul 22 16:35:14 MST lrmd: [48093]: WARN: on_msg_get_metadata: empty > > metadata for ocf::heartbeat::IPaddr2. > > Jul 22 16:35:14 MST lrmd: [48093]: WARN: G_SIG_dispatch: Dispatch > > function for SIGCHLD was delayed 200 ms (> 100 ms) before being called > > (GSource: 0x187df10) > > Jul 22 16:35:14 MST lrmd: [48093]: info: G_SIG_dispatch: started at > > 429616889 should have started at 429616869 > > Jul 22 16:35:14 MST lrmadmin: [48254]: ERROR: > > lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply > > message of rmetadata with function get_ret_from_msg. > > > > I am using crm ra meta as a way to test, but crm will not accept my > > trying to add the resource as a primitive either. > > > > In my research, I have found that often it's permissions. So just to > > rule that out i set my entire system to 777 permissions. no joy. > > > > Another suggestion i find often has been to set OCF_ROOT (export > > OCF_ROOT=/usr/lib/ocf) and then do > > /usr/lib/ocf/resource.d/heartbeat/IPaddr2 meta-data. > > That produces the desired output. But does not work before i export. > > And CRM still does not accept my meta request > > > > Another suggestion i find is to make sure that shellfuncs exists in the > > agents folder. the soft links exist > > lrwxrwxrwx. 1 root root 32 Jul 22 04:08 .ocf-binaries -> > > ../../lib/heartbeat/ocf-binaries > > lrwxrwxrwx. 1 root root 35 Jul 22 04:08 .ocf-directories -> > > ../../lib/heartbeat/ocf-directories > > lrwxrwxrwx. 1 root root 35 Jul 22 04:08 .ocf-returncodes -> > > ../../lib/heartbeat/ocf-returncodes > > lrwxrwxrwx. 1 root root 34 Jul 22 04:08 .ocf-shellfuncs -> > > ../../lib/heartbeat/ocf-shellfuncs > > > > And just to make sure I did un-hidden soft links as well with no joy. > > Strange, that errors are typically related to wrong paths for > initialization of environment and helper functions: > > # Initialization: > > : ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat} > . ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs > > DRBD agent has an extra failback check, that may be the reason that it > still works ... > > # Resource-agents have moved their ocf-shellfuncs file around. > # There are supposed to be symlinks or wrapper files in the old location, > # pointing to the new one, but people seem to get it wrong all the time. > # Try several locations. > > if test -n "${OCF_FUNCTIONS_DIR}" ; then > if test -e "${OCF_FUNCTIONS_DIR}/ocf-shellfuncs" ; then > . "${OCF_FUNCTIONS_DIR}/ocf-shellfuncs" > elif test -e "${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs" ; then > . "${OCF_FUNCTIONS_DIR}/.ocf-shellfuncs" > fi > else > if test -e "${OCF_ROOT}/lib/heartbeat/ocf-shellfuncs" ; then > . "${OCF_ROOT}/lib/heartbeat/ocf-shellfuncs" > elif test -e "${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs"; then > . "${OCF_ROOT}/resource.d/heartbeat/.ocf-shellfuncs" > fi > fi > I noticed this as well, and I tried updating the IPaddr2 agent to use the directory code from DRBD (what you have above) with no success either. Though, pace was already running. I assume it doesn't load all the agents into ram and never read them again. Instead executing them when needed. So no caching issue. i am going to try that again though because it really does sound like it could fix it. Though not explain why its busted in the first place. right now i'll take a hack though if it works. Pretty sure it won't though. > > > Regards, > Andreas thanks for the help. greatly appreciated.
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org