On Thu, Nov 19, 2009 at 2:39 AM, Luke Bigum <lbi...@iseek.com.au> wrote:
> Angie, > > > > I can't tell exactly what's you've provided, can you post your CRM > configuration (the output of 'crm configure show')? While you're at it, also > provide ' crm_verify -LV' and 'crm_mon -fo1'. > > Here are the outputs: > # crm configure show node test1.localdomain node test2.localdomain primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip="10.0.0.102" cidr_netmask="255.255.255.0" \ op monitor interval="10s" primitive LoadBalancer lsb:haproxy \ op monitor interval="10s" primitive WebSite ocf:heartbeat:apache \ params configfile="/etc/httpd/conf/httpd.conf" \ op monitor interval="1min" colocation LoadBalancer-with-ClusterIP inf: LoadBalancer ClusterIP order LoadBalancer-after-ClusterIP inf: ClusterIP LoadBalancer property $id="cib-bootstrap-options" \ stonith-enabled="false" \ expected-quorum-votes="2" \ dc-version="1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7" \ cluster-infrastructure="openais" \ no-quorum-policy="ignore" # crm_verify -VL crm_verify[14263]: 2009/11/19_12:22:57 WARN: unpack_rsc_op: Processing failed op WebSite_start_0 on test1.localdomain: unknown error crm_verify[14263]: 2009/11/19_12:22:57 WARN: unpack_rsc_op: Processing failed op WebSite_start_0 on test2.localdomain: unknown error crm_verify[14263]: 2009/11/19_12:22:57 WARN: common_apply_stickiness: Forcing WebSite away from test1.localdomain after 1000000 failures (max=1000000) crm_verify[14263]: 2009/11/19_12:22:57 WARN: common_apply_stickiness: Forcing WebSite away from test2.localdomain after 1000000 failures (max=1000000) crm_verify[14263]: 2009/11/19_12:22:57 WARN: native_color: Resource WebSite cannot run anywhere Warnings found during check: config may not be valid # crm_mon -fo1 ============ Last updated: Thu Nov 19 12:29:41 2009 Stack: openais Current DC: test1.localdomain - partition with quorum Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7 2 Nodes configured, 2 expected votes 3 Resources configured. ============ Online: [ test1.localdomain test2.localdomain ] ClusterIP (ocf::heartbeat:IPaddr2): Started test1.localdomain LoadBalancer (lsb:haproxy): Started test1.localdomain Operations: * Node test1.localdomain: ClusterIP: migration-threshold=1000000 + (4) start: rc=0 (ok) + (5) monitor: interval=10000ms rc=0 (ok) LoadBalancer: migration-threshold=1000000 + (6) start: rc=0 (ok) + (7) monitor: interval=10000ms rc=0 (ok) WebSite: migration-threshold=1000000 fail-count=1000000 + (9) start: rc=1 (unknown error) + (10) stop: rc=0 (ok) * Node test2.localdomain: WebSite: migration-threshold=1000000 fail-count=1000000 + (5) start: rc=1 (unknown error) + (6) stop: rc=0 (ok) Failed actions: WebSite_start_0 (node=test1.localdomain, call=9, rc=1, status=complete): unknown error WebSite_start_0 (node=test2.localdomain, call=5, rc=1, status=complete): unknown error This looks suspicious though: > > > > Nov 19 01:25:08 test2 crmd: [24251]: info: process_lrm_event: LRM operation > WebServer_monitor_60000 (call=483, rc=-2, cib-update=0, confirmed=true) > Cancelled unknown exec error > > > > Personally I'd start with the OCF RA and leave LSB:httpd alone. From the > above error message, something inside lssb:httpd is returning -2, which is > not a supported return code. > > > > Depending on how confident you are with shell scripts, you might find it > helpful to eliminate Pacemaker from the equation and call the Resource Agent > script yourself to debug problems manually, like so... > > I'll be doing this and reporting you back. > > Disable your resource so Pacemaker doesn't interfere: > > > > crm_resource -r WebSite -m -p target-role -v stopped > > > > Then move into the RA directory and set a necessary environment variable: > > > > cd =/usr/lib/ocf/resource.d/heartbeat > > export OCF_ROOT=/usr/lib/ocf > > > > Start testing the apache RA, setting the only mandatory environment > variable for ocf:heartbeat:apache : > > > > export OCF_RESKEY_configfile=/path/to/your/main/apache/config > > ./apache start > > echo $? > > > > That should echo "0" for success. Judging by your logs, you can start > Apache but the monitor is failing: > > > > ./apache monitor > > echo $? > > > > If that doesn't echo "0", you might get a helpful error message explaining > what's wrong. You might have to read through the apache script itself to > figure out why it's failing. Finally test the 'stop' operation: > > > > ./apache stop > > echo $? > > > > Should echo "0" as well. If this all works for you, but the resource in > Pacemaker is still not working, then it's probably something in your CIB > (like a bad attribute), as you've just done pretty much exactly what > Pacemaker will do. > > > > Let us know how you go. > Sure, I will. Thank you so much. > Tod > > *Luke Bigum* > > *Systems Administrator* > > (p) 1300 661 668 > > (f) 1300 661 540 > > (e) lbi...@iseek.com.au > > http://www.iseek.com.au > > Level 1, 100 Ipswich Road Woolloongabba QLD 4102 > > > > [image: iseekbar.jpg] > > > > This e-mail and any files transmitted with it may contain confidential and > privileged material for the sole use of the intended recipient. Any review, > use, distribution or disclosure by others is strictly prohibited. If you are > not the intended recipient (or authorised to receive for the recipient), > please contact the sender by reply e-mail and delete all copies of this > message. > > > > > > *From:* Angie T. Muhammad [mailto:angie.taw...@gmail.com] > *Sent:* Thursday 19 November 2009 9:57 AM > *To:* pacemaker@oss.clusterlabs.org > *Subject:* [Pacemaker] Error starting Apache on 2 nodes cluster > > > > Hello > I'm a pacemaker and openais beginner. > I followed the document 'cluster from scratch' and I successfully managed > to create and monitor a 'ClusterIP' and 'LoadBalancer' resources. > > But, Whenever I try to start Apache: > # crm configure primitive WebSite ocf:heartbeat:apache params > configfile=/etc/httpd/conf/httpd.conf op monitor interval=1min > > whether using (ocf:heartbeat:apache) or (lsb::httpd) I get the following > errors when watching crm_mon: > > ============ > Last updated: Thu Nov 19 01:38:33 2009 > Stack: openais > Current DC: test1.localdomain - partition with quorum > Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7 > 2 Nodes configured, 2 expected votes > 3 Resources configured. > ============ > > Online: [ test1.localdomain test2.localdomain ] > > ClusterIP (ocf::heartbeat:IPaddr2): Started test1.localdomain > LoadBalancer (lsb:haproxy): Started test1.localdomain > > Failed actions: > WebSite_start_0 (node=test1.localdomain, call=9, rc=1, > status=complete): unknown error > WebSite_start_0 (node=test2.localdomain, call=5, rc=1, > status=complete): unknown error > > /************************************************************************************************************/ > > Knowing that I am using: > CentOS 5.4.. > openais-0.80.5-15.1 > pacemaker-1.0.5-4.1 > # chkconfig httpd off > server-status is not enabled in my httpd.conf ... > > I always check apache processes before configuring my crm using: > > # ps aux | grep httpd > /* to make sure there are no zombie processes */ > > # /etc/init.d/httpd status > /* to gurantee it's stopped and nothing is locked */ > > Last but not least I am ataching the *last 100 lines of my > /var/log/messages* of the 2nd node to help you help me. > I have been on this loop for four days now and I have no idea why the crm > can't start apache though when manually starting it, everything runs > smoothly!!! > > Thank you in advance > -- > All the best, > Angie > > _______________________________________________ > Pacemaker mailing list > Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > -- All the best, Angie
<<image001.jpg>>
_______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker