Re: [Linux-HA] Need HA Help - standby / online not switching automatically
Hi Lars, Thank you for the tools to look at things, however, on a whim before getting into them as DRBD was looking fine in that scenario I decided to just run through the install on a different pair of VMs and making sure I used the gitco.de repository when it came to drbd83 and the clusterlabs repo for pacemaker (heartbeat and everything comes with it once the libesmtp requirement is settled, in this case by using a later epel install: rpm -ivH epel-release-5-4.noarch.rpm): So using the exact same configuration in crm except standby is "off" on both VMs of course, when I do the same crm node standby on one the other takes over and then back again, no problem. I am going to go back and either reinstall the other and/or compare each and every rpm and source to see which is broken or just store my install procedure. Now off to learn what you mentioned about crm resource move, thanks again. Regards, Randy On 5/20/2011 1:03 AM, Lars Ellenberg wrote: > On Thu, May 19, 2011 at 11:53:24PM -0700, Randy Katz wrote: >> Lars, >> >> Thank you much for the answer on the "standby" issue. >> It seems that that was the tip of my real issue. So now I have both nodes >> coming online. And it seems ha1 starts fine with all the resources starting. >> >> With them both online if I issue the: crm mode standby ha1.iohost.com > Why. > > Learn about "crm resource move". > (and unmove, for that matter). > >> Then I see IP Takeover on ha2 but the other resources do not start, >> ever, it remains: >> >> Node ha1.iohost.com (b159178d-c19b-4473-aa8e-13e487b65e33): standby >> Online: [ ha2.iohost.com ] >> >>Resource Group: WebServices >>ip1(ocf::heartbeat:IPaddr2): Started ha2.iohost.com >>ip1arp (ocf::heartbeat:SendArp): Started ha2.iohost.com >>fs_webfs (ocf::heartbeat:Filesystem):Stopped >>fs_mysql (ocf::heartbeat:Filesystem):Stopped >>apache2(lsb:httpd):Stopped >>mysql (ocf::heartbeat:mysql): Stopped >>Master/Slave Set: ms_drbd_mysql >>Slaves: [ ha2.iohost.com ] >>Stopped: [ drbd_mysql:0 ] >>Master/Slave Set: ms_drbd_webfs >>Slaves: [ ha2.iohost.com ] >>Stopped: [ drbd_webfs:0 ] >> >> In looking in the recent log I see this: May 20 12:46:42 ha2.iohost.com >> pengine: [3117]: info: native_color: Resource fs_webfs cannot run anywhere >> >> I am not sure why it cannot promote the other resources on ha2, I >> checked drbd before putting ha1 on standby and it was up to date. > Double check the status of drbd: > # cat /proc/drbd > > Check what the cluster would do, and why: > # ptest -LVVV -s > [add more Vs to see more detail, but brace yourself for maximum confusion ;-)] > > Check for constraints that get in the way: > # crm configure show | grep -Ee 'location|order' > > check the "master scores" in the cib: > # cibadmin -Ql -o status | grep master > > Look at the actions that have been performed on the resource, > on both nodes: > vv-- the ID of your primitive > # grep "lrmd:.*drbd_mysql" /var/log/ha.log > or wherever that ends up on your box > >> Here are the surrounding log entries, the only thing I changed in the >> config is standby="off" on both nodes: >> >> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: group_print: >> Resource Group: WebServices >> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: >> ip1 (ocf::heartbeat:IPaddr2): Started ha2.iohost.com >> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: >> ip1arp (ocf::heartbeat:SendArp): Started ha2.iohost.com >> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: >> fs_webfs (ocf::heartbeat:Filesystem):Stopped >> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: >> fs_mysql (ocf::heartbeat:Filesystem):Stopped >> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: >> apache2 (lsb:httpd):Stopped >> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: >> mysql(ocf::heartbeat:mysql): Stopped >> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: clone_print: >> Master/Slave Set: ms_drbd_mysql >> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: short_print: >> Slaves: [ ha2.iohost.com ] >> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: short_print: >> Stopped: [ drbd_mysql:0 ] >> May 20 12:
Re: [Linux-HA] Need HA Help - standby / online not switching automatically
Regards, Randy On 5/19/2011 11:19 PM, Lars Ellenberg wrote: > On Thu, May 19, 2011 at 03:46:37PM -0700, Randy Katz wrote: >> To clarify, I was not seeking a quick response. I just noticed the >> threads I searched >> were NEVER answered, with the problem that I reported. That being said >> and about standby: >> >> Why does my node come up as standby and not as online? > Because you put it there. > > The standby setting (as a few others) can take a "lifetime", > and usually that defaults to "forever", though you can explicitly > specify an "until reboot", which actually means until restart of the > cluster system on that node. > >> Is there a setting in my conf file that affects that? >> Or another issue, is it configuration, please advise. >> >> Thanks, >> Randy >> >> PS - Here are some threads were it seems they were never answered, one >> going back 3 years ago: >> >> http://www.mail-archive.com/linux-ha@lists.linux-ha.org/msg09886.html >> http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg07663.html >> http://lists.community.tummy.com/pipermail/linux-ha/2008-August/034310.html > Then they probably have been solved off list, via IRC or support, > or by the original user finally having a facepalm experience. > > Besides, yes, it happens that threads go unanswered, most of the time > because the question was badly asked ("does not work. why?"), and those > that could figure it out have been distracted by more important things, > or decided that, at that time, trying to figure it out was too time > consuming. > > That's life. > > If it happens to you, do a friendly bump, > and/or try to ask a smarter version of the question ;-) > > Most of the time, the answer is in the logs, and the config. > > But please break down the issue to a minimal configuration, > and post that minimal config plust logs of one "incident". > Don't post your 2 MB xml config, plus a 2G log, > and expect people to dig through that for fun. > > BTW, none of the quoted threads has anything to do with your experience, > afaiks. > >> On 5/19/2011 3:16 AM, Lars Ellenberg wrote: >>> On Wed, May 18, 2011 at 09:55:00AM -0700, Randy Katz wrote: >>>> ps - I searched a lot online and I see this issue coming up, >>> I doubt that _this_ issue comes up that often ;-) >>> >>>> and then after about 3-4 emails they request the resources and >>>> constraints and then there is never an answer to the thread, why?! >>> Hey, it's not even a day since you provided the config. >>> People have day jobs. >>> People get _payed_ to do support on these kinds of things, >>> so they probably first deal with requests by paying customers. >>> >>> If you need SLAs, you may need to check out a support contranct. >>> >>> Otherwise you need to be patient. >>> >>> >>>> From what I read, you probably just have misunderstood some concepts. >>> "Standby" is not what I think you think it is ;-) >>> >>> "Standby" is NOT for deciding where resources will be placed. >>> >>> "Standby" is for manually switching a node into a mode where it WILL NOT >>> run any resources. And it WILL NOT leave that state by itself. >>> It is not supposed to. >>> >>> You switch a node into standby if you want to do maintenance on that >>> node, do major software, system or hardware upgrades, or otherwise >>> expect that it won't be useful to run resources there. >>> >>> It won't even run DRBD secondaries. >>> It will run nothing there. >>> >>> >>> If you want automatic failover, DO NOT put your nodes in standby. >>> Because, if you do, they can not take over resources. >>> >>> You have to have your nodes online for any kind of failover to happen. >>> >>> If you want to have a "preferred" location for your resources, >>> use location constraints. >>> >>> >>> Does that help? >>> >>> >>> >> ___ >> Linux-HA mailing list >> Linux-HA@lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Need HA Help - standby / online not switching automatically
To clarify, I was not seeking a quick response. I just noticed the threads I searched were NEVER answered, with the problem that I reported. That being said and about standby: Why does my node come up as standby and not as online? Is there a setting in my conf file that affects that? Or another issue, is it configuration, please advise. Thanks, Randy PS - Here are some threads were it seems they were never answered, one going back 3 years ago: http://www.mail-archive.com/linux-ha@lists.linux-ha.org/msg09886.html http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg07663.html http://lists.community.tummy.com/pipermail/linux-ha/2008-August/034310.html On 5/19/2011 3:16 AM, Lars Ellenberg wrote: > On Wed, May 18, 2011 at 09:55:00AM -0700, Randy Katz wrote: >> ps - I searched a lot online and I see this issue coming up, > I doubt that _this_ issue comes up that often ;-) > >> and then after about 3-4 emails they request the resources and >> constraints and then there is never an answer to the thread, why?! > Hey, it's not even a day since you provided the config. > People have day jobs. > People get _payed_ to do support on these kinds of things, > so they probably first deal with requests by paying customers. > > If you need SLAs, you may need to check out a support contranct. > > Otherwise you need to be patient. > > > > From what I read, you probably just have misunderstood some concepts. > > "Standby" is not what I think you think it is ;-) > > "Standby" is NOT for deciding where resources will be placed. > > "Standby" is for manually switching a node into a mode where it WILL NOT > run any resources. And it WILL NOT leave that state by itself. > It is not supposed to. > > You switch a node into standby if you want to do maintenance on that > node, do major software, system or hardware upgrades, or otherwise > expect that it won't be useful to run resources there. > > It won't even run DRBD secondaries. > It will run nothing there. > > > If you want automatic failover, DO NOT put your nodes in standby. > Because, if you do, they can not take over resources. > > You have to have your nodes online for any kind of failover to happen. > > If you want to have a "preferred" location for your resources, > use location constraints. > > > Does that help? > > > ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Need HA Help - standby / online not switching automatically
ps - I searched a lot online and I see this issue coming up, and then after about 3-4 emails they request the resources and constraints and then there is never an answer to the thread, why?! On 5/17/2011 10:00 PM, Randy Katz wrote: > Configs as follows, drbd.conf: > > global { > usage-count no; > # minor-count dialog-refresh disable-ip-verification > } > > resource r0 { > protocol C; > syncer { > rate 4M; > } > startup { > wfc-timeout 15; > degr-wfc-timeout 60; > } > net { > cram-hmac-alg sha1; > shared-secret "vz2700.1"; > } > on ha1.iohost.com { > device /dev/drbd0; > disk /dev/vg0/mysql; > address 10.1.1.197:7788; > meta-disk internal; > } > on ha2.iohost.com { > device /dev/drbd0; > disk /dev/vg0/mysql; > address 10.1.1.187:7788; > meta-disk internal; > } > } > resource r1 { > protocol C; > syncer { > rate 4M; > } > startup { > wfc-timeout 15; > degr-wfc-timeout 60; > } > net { > cram-hmac-alg sha1; > shared-secret "vz2700.1"; > } > on ha1.iohost.com { > device /dev/drbd1; > disk /dev/vg0/html; > address 10.1.1.197:7789; > meta-disk internal; > } > on ha2.iohost.com { > device /dev/drbd1; > disk /dev/vg0/html; > address 10.1.1.187:7789; > meta-disk internal; > } > } > > ha.cf: > > # Logging > debug 1 > use_logdfalse > logfile /var/log/heartbeat.log > logfacility daemon > > # Misc Options > traditional_compression off > compression bz2 > coredumps true > > # Communications > udpport 691 > bcast eth0 > autojoinany > > # Thresholds (in seconds) > keepalive 1 > warntime6 > deadtime10 > initdead15 > crm yes > > crm dump: > > node $id="b159178d-c19b-4473-aa8e-13e487b65e33" ha1.iohost.com \ > attributes standby="off" > node $id="b7f38306-1cb7-4f8d-8ff2-4332bbea6e78" ha2.iohost.com \ > attributes standby="on" > primitive apache2 lsb:httpd \ > op start interval="0" timeout="60" \ > op stop interval="0" timeout="120" start-delay="15" \ > meta target-role="Started" > primitive drbd_mysql ocf:linbit:drbd \ > params drbd_resource="r0" \ > op monitor interval="15s" > primitive drbd_webfs ocf:linbit:drbd \ > params drbd_resource="r1" \ > op monitor interval="15s" > primitive fs_mysql ocf:heartbeat:Filesystem \ > params device="/dev/drbd/by-res/r0" directory="/var/lib/mysql" > fstype="ext3" \ > op start interval="0" timeout="60" \ > op stop interval="0" timeout="120" \ > meta target-role="Started" > primitive fs_webfs ocf:heartbeat:Filesystem \ > params device="/dev/drbd/by-res/r1" directory="/var/www/html" > fstype="ext3" \ > op start interval="0" timeout="60" \ > op stop interval="0" timeout="120" \ > meta target-role="Started" > primitive ip1 ocf:heartbeat:IPaddr2 \ > params ip="72.9.147.196" nic="eth0" \ > op monitor interval="5s" > primitive ip1arp ocf:heartbeat:SendArp \ > params ip="72.9.147.196" nic="eth0" > primitive mysql ocf:heartbeat:mysql \ > params binary="/usr/bin/mysqld_safe" config="/etc/my.cnf" > user="mysql" group="mysql" log="/var/log/mysqld.log" > pid="/var/run/mysqld/mysqld.pid" datadir="/var/li > b/mysql" socket="/var/lib/mysql/mysql.sock" \ > op monitor interval="30s" timeout="30s" \ > op start interval="0" timeout="120" \ > op stop interval="0" timeout="120" > group WebServices ip1 ip1
Re: [Linux-HA] HA Nodes Port 691 UDP
ok thanks and I see there is 694 reserved for ha-clusters. On 5/18/2011 7:17 AM, mike wrote: > On 11-05-18 09:31 AM, Randy Katz wrote: >> Hi, does anyone on this list know why there are UDP requests on port 691 >> of the HA nodes? I turned on firewalling and my crm_mon would not show >> both nodes' status until I allowed UDP port 691 to flow through, please >> advise, >> >> Regards, >> Randy >> ___ >> Linux-HA mailing list >> Linux-HA@lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems >> >> > What is in your ha.cf file? Communications on my clusters take place > over port 693. Maybe you set yours to 691?? > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] HA Nodes Port 691 UDP
Hi, does anyone on this list know why there are UDP requests on port 691 of the HA nodes? I turned on firewalling and my crm_mon would not show both nodes' status until I allowed UDP port 691 to flow through, please advise, Regards, Randy ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Need HA Help - standby / online not switching automatically
Configs as follows, drbd.conf: global { usage-count no; # minor-count dialog-refresh disable-ip-verification } resource r0 { protocol C; syncer { rate 4M; } startup { wfc-timeout 15; degr-wfc-timeout 60; } net { cram-hmac-alg sha1; shared-secret "vz2700.1"; } on ha1.iohost.com { device /dev/drbd0; disk /dev/vg0/mysql; address 10.1.1.197:7788; meta-disk internal; } on ha2.iohost.com { device /dev/drbd0; disk /dev/vg0/mysql; address 10.1.1.187:7788; meta-disk internal; } } resource r1 { protocol C; syncer { rate 4M; } startup { wfc-timeout 15; degr-wfc-timeout 60; } net { cram-hmac-alg sha1; shared-secret "vz2700.1"; } on ha1.iohost.com { device /dev/drbd1; disk /dev/vg0/html; address 10.1.1.197:7789; meta-disk internal; } on ha2.iohost.com { device /dev/drbd1; disk /dev/vg0/html; address 10.1.1.187:7789; meta-disk internal; } } ha.cf: # Logging debug 1 use_logdfalse logfile /var/log/heartbeat.log logfacility daemon # Misc Options traditional_compression off compression bz2 coredumps true # Communications udpport 691 bcast eth0 autojoinany # Thresholds (in seconds) keepalive 1 warntime6 deadtime10 initdead15 crm yes crm dump: node $id="b159178d-c19b-4473-aa8e-13e487b65e33" ha1.iohost.com \ attributes standby="off" node $id="b7f38306-1cb7-4f8d-8ff2-4332bbea6e78" ha2.iohost.com \ attributes standby="on" primitive apache2 lsb:httpd \ op start interval="0" timeout="60" \ op stop interval="0" timeout="120" start-delay="15" \ meta target-role="Started" primitive drbd_mysql ocf:linbit:drbd \ params drbd_resource="r0" \ op monitor interval="15s" primitive drbd_webfs ocf:linbit:drbd \ params drbd_resource="r1" \ op monitor interval="15s" primitive fs_mysql ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/r0" directory="/var/lib/mysql" fstype="ext3" \ op start interval="0" timeout="60" \ op stop interval="0" timeout="120" \ meta target-role="Started" primitive fs_webfs ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/r1" directory="/var/www/html" fstype="ext3" \ op start interval="0" timeout="60" \ op stop interval="0" timeout="120" \ meta target-role="Started" primitive ip1 ocf:heartbeat:IPaddr2 \ params ip="72.9.147.196" nic="eth0" \ op monitor interval="5s" primitive ip1arp ocf:heartbeat:SendArp \ params ip="72.9.147.196" nic="eth0" primitive mysql ocf:heartbeat:mysql \ params binary="/usr/bin/mysqld_safe" config="/etc/my.cnf" user="mysql" group="mysql" log="/var/log/mysqld.log" pid="/var/run/mysqld/mysqld.pid" datadir="/var/li b/mysql" socket="/var/lib/mysql/mysql.sock" \ op monitor interval="30s" timeout="30s" \ op start interval="0" timeout="120" \ op stop interval="0" timeout="120" group WebServices ip1 ip1arp fs_webfs fs_mysql apache2 mysql \ meta target-role="Started" ms ms_drbd_mysql drbd_mysql \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" ms ms_drbd_webfs drbd_webfs \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" colocation apache2_with_ip inf: apache2 ip1 colocation apache2_with_mysql inf: apache2 ms_drbd_mysql:Master colocation apache2_with_webfs inf: apache2 ms_drbd_webfs:Master colocation fs_on_drbd inf: fs_mysql ms_drbd_mysql:Master colocation ip_with_ip_arp inf: ip1 ip1arp colocation mysqlfs_on_drbd inf: fs_mysql ms_drbd_mysql:Master colocation webfs_on_drbd inf: fs_webfs ms_drbd_webfs:Master order apache-after-webfs inf: fs_webfs:start apache2:start order arp-after-ip inf: ip1:start ip1arp:start order fs-mysql-after-drbd inf: ms_drbd_mysql:promote fs_mysql:start order fs-webfs-after-drbd inf: ms_drbd_webfs:promote fs_webfs:start order mysql-after-fs-mysql inf: fs_mysql:start mysql:start property $id="cib-bootstrap-options" \ dc-version="1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87" \ cluster-infrastructure="Heartbeat" \ expected-quorum-votes="1" \ stonith-enabled="false" \ no-quorum-policy="ignore" rsc_defaults $id="rsc-options" \ resource-stickiness="100" On 5/17/2011 9:53 PM, Michael Schwartzkopff wrote: >> If do do on ha2: crm node online ha2.iohost.com i
Re: [Linux-HA] Need HA Help - standby / online not switching automatically
If do do on ha2: crm node online ha2.iohost.com it starts the VIP, it will ping, but does not do the DRBD mounts and does not start web or mysql services. If I then issue the crm node online ha1.iohost.com on ha1 it will make ha2 online with all services active! Then if I make ha2 standby ha1 will become online with all services, just fine! Any insights will be greatly appreciated, thanks! Randy On 5/17/2011 9:44 PM, Randy Katz wrote: > In the logs, on ha2, I see at the time crm node standby ha1: > > May 18 10:32:54 ha2.iohost.com cib: [2378]: info: write_cib_contents: > Archived previous version as /var/lib/heartbeat/crm/cib-25.raw > May 18 10:32:54 ha2.iohost.com cib: [2378]: info: write_cib_contents: > Wrote version 0.102.0 of the CIB to disk (digest: > b445d9afde4b209981c3da08d4c24ecc) > May 18 10:32:54 ha2.iohost.com cib: [2378]: info: retrieveCib: Reading > cluster configuration from: /var/lib/heartbeat/crm/cib.f5FXZH (digest: > /var/lib/heartbeat/crm/cib.irSIZ7) > May 18 10:32:54 ha2.iohost.com cib: [7779]: info: Managed > write_cib_contents process 2378 exited with return code 0. > May 18 10:33:11 ha2.iohost.com attrd: [7782]: info: attrd_ha_callback: > flush message from ha1.iohost.com > May 18 10:33:12 ha2.iohost.com attrd: [7782]: info: attrd_ha_callback: > flush message from ha1.iohost.com > May 18 10:33:12 ha2.iohost.com attrd: [7782]: info: attrd_ha_callback: > flush message from ha1.iohost.com > May 18 10:33:12 ha2.iohost.com attrd: [7782]: info: attrd_ha_callback: > flush message from ha1.iohost.com > May 18 10:35:14 ha2.iohost.com cib: [7779]: info: cib_stats: Processed > 48 operations (13125.00us average, 0% utilization) in the last 10min > > And on ha1: > > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: - > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: - > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: - > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: - > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: - id="nodes-b159178d-c19b-4473-aa8e-13e487b65e33"> > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: - id="nodes-b159178d-c19b-4473-aa8e-13e487b65e33-standby" /> > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: - > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: - > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: - > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: - > May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: > abort_transition_graph: need_abort:59 - Triggered transition abort > (complete=1) : Non-status change > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: - > May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: need_abort: Aborting > on change to admin_epoch > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: + > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: + > May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: do_state_transition: > State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC > cause=C_FSA_INTERNAL origin=abort_transition_graph ] > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: + > May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: do_state_transition: > All 2 cluster nodes are eligible to run resources. > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: + > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: + id="nodes-b159178d-c19b-4473-aa8e-13e487b65e33"> > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: + id="nodes-b159178d-c19b-4473-aa8e-13e487b65e33-standby" /> > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: + > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: + > May 17 22:33:02 ha1.iohost.com crmd: [8656]: info: do_pe_invoke: Query > 337: Requesting the current CIB: S_POLICY_ENGINE > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: + > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: + > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: log_data_element: > cib:diff: + > May 17 22:33:02 ha1.iohost.com cib: [8652]: info: cib_process_request: > Operation complete: op cib_modify for section nodes > (origin=local/crm_attribute/4, version=0.102.1): ok (rc=0) > May 17 22:33:02 h
Re: [Linux-HA] Need HA Help - standby / online not switching automatically
hts: ms_drbd_mysql: Rolling back scores from fs_mysql May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color: Resource drbd_mysql:0 cannot run anywhere May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: native_color: Resource drbd_mysql:1 cannot run anywhere May 17 22:33:02 ha1.iohost.com pengine: [8685]: info: rsc_merge_weights: ms_drbd_mysql: Rolling back scores from apache2 On 5/17/2011 9:28 PM, Randy Katz wrote: > Hi, > > Relatively new to HA though I have been using Xen and reading > this list here and there, now need some help: > > I have 2 nodes, physical, let's call node1/node2: > In each I have VM's (Xen paravirt / ha1& ha2). In each VM I have > 2 LVs which are DRBD'd (r0 and r1, mysql data, and html data). There is a > VIP between them, resolving the website, which is a simple > Wordpress blog so to have database, and works well. > > When I start them (reboot VMs) they start up fine and ha1 is > online (primary) and ha2 is standby (secondary). If I: > > 1. crm node standby ha1.iohost.com - sometimes ha2.iohost.com takes > over, sometimes I am left with 2 nodes on standby, not > sure why. > 2. If 2 nodes are in standby and I issue: crm node online ha1.iohost.com > sometimes ha2 will become active as it should when ha1 > went standby, sometimes ha1 will become active and sometimes, they will > remain standby, not sure why. > > Question: How do I test and debug this? What parameters in which config > file affect this behavior? > > Thank you in advance, > Randy > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Need HA Help - standby / online not switching automatically
Hi, Relatively new to HA though I have been using Xen and reading this list here and there, now need some help: I have 2 nodes, physical, let's call node1/node2: In each I have VM's (Xen paravirt / ha1 & ha2). In each VM I have 2 LVs which are DRBD'd (r0 and r1, mysql data, and html data). There is a VIP between them, resolving the website, which is a simple Wordpress blog so to have database, and works well. When I start them (reboot VMs) they start up fine and ha1 is online (primary) and ha2 is standby (secondary). If I: 1. crm node standby ha1.iohost.com - sometimes ha2.iohost.com takes over, sometimes I am left with 2 nodes on standby, not sure why. 2. If 2 nodes are in standby and I issue: crm node online ha1.iohost.com sometimes ha2 will become active as it should when ha1 went standby, sometimes ha1 will become active and sometimes, they will remain standby, not sure why. Question: How do I test and debug this? What parameters in which config file affect this behavior? Thank you in advance, Randy ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] drbd compile question
Hi, I loaded openSUSE 11.4 and installed corosync, pacemaker, drbd and xen and now it says the drbd tools are not up to date. There is no other tool/package then drbd so I would like to just compile drbd from source. The kernel is currently 2.6.37.1-1.2-xen. On the drbd site it says that would correspond to drbd 8.3.9. Other then the prefix and config directories I notice there are other configuration options for drbd, must they be set in order for it to work properly with corosync, etc...? I am referring specifically to these options: --with-xen Enable Xen integration --with-pacemakerEnable Pacemaker integration --with-heartbeatEnable Heartbeat integration Thank you in advance, Randy ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Sort of crm commandes but off line ?
This might sound obvious but is an ssh call acceptable? On 3/23/2011 8:38 AM, Alain.Moulle wrote: > Hi, > > I'm looking for a command which will give to me information of the HA > cluster , > such as for example all nodes hostnames which are in the same HA cluster BUT > from a node where Pacemaker is not active. > > For example: I have a cluster with node1 , node2, node3 > Pacemaker is running on node2& node3 > Pacemaker is not running on node1 , so any crm command returns > "Signon to CIB failed: connection failed > Init failed, could not perform requested operations" > I'm on node1 : I want to know (by script) if Pacemaker is active > on at least another node in the HA cluster including the node > where I am (so node1) > > Is there a command which could give me such information "offline" , > or do I have to scan the uname fields in the record > and to ssh on other nodes to get information ... > > Thanks > Alain > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] question on Creating an Active/Passive iSCSI configuration
This was posted quite a while ago, anyone care to look into it and answer? Thank you in advance, Randy On 3/14/2011 6:18 AM, Dejan Muhamedagic wrote: > On Fri, Mar 11, 2011 at 10:23:37AM -0800, Randy Katz wrote: >> On 3/11/2011 3:29 AM, Dejan Muhamedagic wrote: >>> Hi, >>> >>> On Fri, Mar 11, 2011 at 01:36:25AM -0800, Randy Katz wrote: >>>> On 3/11/2011 12:50 AM, RaSca wrote: >>>>> Il giorno Ven 11 Mar 2011 07:32:32 CET, Randy Katz ha scritto: >>>>>> ps - in /var/log/messages I find this: >>>>>> >>>>>> Mar 10 22:31:45 drbd1 lrmd: [3274]: ERROR: get_resource_meta: pclose >>>>>> failed: Interrupted system call >>>>>> Mar 10 22:31:45 drbd1 lrmd: [3274]: WARN: on_msg_get_metadata: empty >>>>>> metadata for ocf::linbit::drbd. >>>>>> Mar 10 22:31:45 drbd1 lrmadmin: [3481]: ERROR: >>>>>> lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply >>>>>> message of rmetadata with function get_ret_from_msg. >>>>> [...] >>>>> >>>>> Hi, >>>>> I think that the message "no such resource agent" is explaining what's >>>>> the matter. >>>>> Does the file /usr/lib/ocf/resource.d/linbit/drbd exists? Is the drbd >>>>> file executable? Have you correctly installed the drbd packages? >>>>> >>>>> Check those things, you can try to reinstall drbd. >>>>> >>>> Hi >>>> >>>> # ls -l /usr/lib/ocf/resource.d/linbit/drbd >>>> -rwxr-xr-x 1 root root 24523 Jun 4 2010 >>>> /usr/lib/ocf/resource.d/linbit/drbd >>> Which cluster-glue version do you run? >>> Try also: >>> >>> # lrmadmin -C >>> # lrmadmin -P ocf drbd >>> # export OCF_ROOT=/usr/lib/ocf >>> # /usr/lib/ocf/resource.d/linbit/drbd meta-data >> I am running from a source build/install as per clusterlabs.org as the >> rpm's had broken dependencies and >> would not install. I have now blown away that CentOS (one of them) >> machine and installed openSUSE as they >> said everything was included but it seems on 11.3 not on 11.4, on 11.4 >> the install is broken and so now > I guess that openSUSE would like to hear about it too, just in > which way it is broken. > I did an openSUSE 11.4 install from DVD. I then used zypper to install pacemaker heartbeat corosync libpacemaker3. I ended up with a clusterlabs.repo and older versions and had to break dependency for pacemaker or it would not install. I found out later there are later versions, precompiled in the openSUSE repository, you just need to call the specific versions and they will install, I had to remove the previous as some new dependencies were created. The versions I ended up with are: Name: pacemaker Version: 1.1.5-3.2 Arch: x86_64 Vendor: openSUSE Name: libpacemaker3 Version: 1.1.5-3.2 Arch: x86_64 Vendor: openSUSE Name: heartbeat Version: 3.0.4-25.28.1 Arch: x86_64 Vendor: openSUSE Name: corosync Version: 1.3.0-3.1 Arch: x86_64 Vendor: openSUSE At this point I was not sure whether to install iet or tgt, I saw some examples with tgt so I installed that. So far it looks like I have CRM and have mocked up the example from ha-iscsi.pdf (trying to mitigate some of the errors, there are errors!). I noticed the floating ip addresses do not ping so I added a new set and they ping though the original ones do not, perhaps something else in the config is prohibiting that. Here are my current crm config commands: property stonith-enabled="false" property no-quorum-policy="ignore" property default-resource-stickiness="200" primitive res_drbd_iscsivg01 ocf:linbit:drbd params drbd_resource="iscsivg01" op monitor interval="10s" ms ms_drbd_iscsivg01 res_drbd_iscsivg01 meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" primitive res_drbd_iscsivg02 ocf:linbit:drbd params drbd_resource="iscsivg02" op monitor interval="10s" ms ms_drbd_iscsivg02 res_drbd_iscsivg02 meta clone-max="2" notify="true" primitive res_ip_alicebob01 ocf:heartbeat:IPaddr2 params ip="192.168.1.218" cidr_netmask="24" op monitor interval="10s" primitive res_ip_alicebob02 ocf:heartbeat:IPaddr2 params ip="192.168.1.219" cidr_netmask="24" op monitor interval="10s" primitive res_ip_c1c201 ocf:heartbeat:IPaddr2 params ip="192.168.1.220" cidr_netmask="24" op monitor interval="10s" primitive res_ip_c1c202 ocf:heartbeat:IPaddr2 p
Re: [Linux-HA] question on Creating an Active/Passive iSCSI configuration
On 3/14/2011 6:18 AM, Dejan Muhamedagic wrote: > On Fri, Mar 11, 2011 at 10:23:37AM -0800, Randy Katz wrote: >> On 3/11/2011 3:29 AM, Dejan Muhamedagic wrote: >>> Hi, >>> >>> On Fri, Mar 11, 2011 at 01:36:25AM -0800, Randy Katz wrote: >>>> On 3/11/2011 12:50 AM, RaSca wrote: >>>>> Il giorno Ven 11 Mar 2011 07:32:32 CET, Randy Katz ha scritto: >>>>>> ps - in /var/log/messages I find this: >>>>>> >>>>>> Mar 10 22:31:45 drbd1 lrmd: [3274]: ERROR: get_resource_meta: pclose >>>>>> failed: Interrupted system call >>>>>> Mar 10 22:31:45 drbd1 lrmd: [3274]: WARN: on_msg_get_metadata: empty >>>>>> metadata for ocf::linbit::drbd. >>>>>> Mar 10 22:31:45 drbd1 lrmadmin: [3481]: ERROR: >>>>>> lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply >>>>>> message of rmetadata with function get_ret_from_msg. >>>>> [...] >>>>> >>>>> Hi, >>>>> I think that the message "no such resource agent" is explaining what's >>>>> the matter. >>>>> Does the file /usr/lib/ocf/resource.d/linbit/drbd exists? Is the drbd >>>>> file executable? Have you correctly installed the drbd packages? >>>>> >>>>> Check those things, you can try to reinstall drbd. >>>>> >>>> Hi >>>> >>>> # ls -l /usr/lib/ocf/resource.d/linbit/drbd >>>> -rwxr-xr-x 1 root root 24523 Jun 4 2010 >>>> /usr/lib/ocf/resource.d/linbit/drbd >>> Which cluster-glue version do you run? >>> Try also: >>> >>> # lrmadmin -C >>> # lrmadmin -P ocf drbd >>> # export OCF_ROOT=/usr/lib/ocf >>> # /usr/lib/ocf/resource.d/linbit/drbd meta-data >> I am running from a source build/install as per clusterlabs.org as the >> rpm's had broken dependencies and >> would not install. I have now blown away that CentOS (one of them) >> machine and installed openSUSE as they >> said everything was included but it seems on 11.3 not on 11.4, on 11.4 >> the install is broken and so now > I guess that openSUSE would like to hear about it too, just in > which way it is broken. > I did an openSUSE 11.4 install from DVD. I then used zypper to install pacemaker heartbeat corosync libpacemaker3. I ended up with a clusterlabs.repo and older versions and had to break dependency for pacemaker or it would not install. I found out later there are later versions, precompiled in the openSUSE repository, you just need to call the specific versions and they will install, I had to remove the previous as some new dependencies were created. The versions I ended up with are: Name: pacemaker Version: 1.1.5-3.2 Arch: x86_64 Vendor: openSUSE Name: libpacemaker3 Version: 1.1.5-3.2 Arch: x86_64 Vendor: openSUSE Name: heartbeat Version: 3.0.4-25.28.1 Arch: x86_64 Vendor: openSUSE Name: corosync Version: 1.3.0-3.1 Arch: x86_64 Vendor: openSUSE At this point I was not sure whether to install iet or tgt, I saw some examples with tgt so I installed that. So far it looks like I have CRM and have mocked up the example from ha-iscsi.pdf (trying to mitigate some of the errors, there are errors!). I noticed the floating ip addresses do not ping so I added a new set and they ping though the original ones do not, perhaps something else in the config is prohibiting that. Here are my current crm config commands: property stonith-enabled="false" property no-quorum-policy="ignore" property default-resource-stickiness="200" primitive res_drbd_iscsivg01 ocf:linbit:drbd params drbd_resource="iscsivg01" op monitor interval="10s" ms ms_drbd_iscsivg01 res_drbd_iscsivg01 meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" primitive res_drbd_iscsivg02 ocf:linbit:drbd params drbd_resource="iscsivg02" op monitor interval="10s" ms ms_drbd_iscsivg02 res_drbd_iscsivg02 meta clone-max="2" notify="true" primitive res_ip_alicebob01 ocf:heartbeat:IPaddr2 params ip="192.168.1.218" cidr_netmask="24" op monitor interval="10s" primitive res_ip_alicebob02 ocf:heartbeat:IPaddr2 params ip="192.168.1.219" cidr_netmask="24" op monitor interval="10s" primitive res_ip_c1c201 ocf:heartbeat:IPaddr2 params ip="192.168.1.220" cidr_netmask="24" op monitor interval="10s" primitive res_ip_c1c202 ocf:heartbeat:IPaddr2 params ip="192.168.1.221" cidr_netmask="24" op monitor interval=&
Re: [Linux-HA] question on Creating an Active/Passive iSCSI configuration
On 3/11/2011 3:29 AM, Dejan Muhamedagic wrote: > Hi, > > On Fri, Mar 11, 2011 at 01:36:25AM -0800, Randy Katz wrote: >> On 3/11/2011 12:50 AM, RaSca wrote: >>> Il giorno Ven 11 Mar 2011 07:32:32 CET, Randy Katz ha scritto: >>>> ps - in /var/log/messages I find this: >>>> >>>> Mar 10 22:31:45 drbd1 lrmd: [3274]: ERROR: get_resource_meta: pclose >>>> failed: Interrupted system call >>>> Mar 10 22:31:45 drbd1 lrmd: [3274]: WARN: on_msg_get_metadata: empty >>>> metadata for ocf::linbit::drbd. >>>> Mar 10 22:31:45 drbd1 lrmadmin: [3481]: ERROR: >>>> lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply >>>> message of rmetadata with function get_ret_from_msg. >>> [...] >>> >>> Hi, >>> I think that the message "no such resource agent" is explaining what's >>> the matter. >>> Does the file /usr/lib/ocf/resource.d/linbit/drbd exists? Is the drbd >>> file executable? Have you correctly installed the drbd packages? >>> >>> Check those things, you can try to reinstall drbd. >>> >> Hi >> >> # ls -l /usr/lib/ocf/resource.d/linbit/drbd >> -rwxr-xr-x 1 root root 24523 Jun 4 2010 >> /usr/lib/ocf/resource.d/linbit/drbd > Which cluster-glue version do you run? > Try also: > > # lrmadmin -C > # lrmadmin -P ocf drbd > # export OCF_ROOT=/usr/lib/ocf > # /usr/lib/ocf/resource.d/linbit/drbd meta-data I am running from a source build/install as per clusterlabs.org as the rpm's had broken dependencies and would not install. I have now blown away that CentOS (one of them) machine and installed openSUSE as they said everything was included but it seems on 11.3 not on 11.4, on 11.4 the install is broken and so now running some later later versions and running into some other issues, will report back with findings. What os distro is the least of the problems to get this stuff running on? I just want to get it running, run a few tests, and then figure out where to go from there. Thanks, Randy ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] question on Creating an Active/Passive iSCSI configuration
On 3/11/2011 12:50 AM, RaSca wrote: > Il giorno Ven 11 Mar 2011 07:32:32 CET, Randy Katz ha scritto: >> ps - in /var/log/messages I find this: >> >> Mar 10 22:31:45 drbd1 lrmd: [3274]: ERROR: get_resource_meta: pclose >> failed: Interrupted system call >> Mar 10 22:31:45 drbd1 lrmd: [3274]: WARN: on_msg_get_metadata: empty >> metadata for ocf::linbit::drbd. >> Mar 10 22:31:45 drbd1 lrmadmin: [3481]: ERROR: >> lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply >> message of rmetadata with function get_ret_from_msg. > [...] > > Hi, > I think that the message "no such resource agent" is explaining what's > the matter. > Does the file /usr/lib/ocf/resource.d/linbit/drbd exists? Is the drbd > file executable? Have you correctly installed the drbd packages? > > Check those things, you can try to reinstall drbd. > Hi # ls -l /usr/lib/ocf/resource.d/linbit/drbd -rwxr-xr-x 1 root root 24523 Jun 4 2010 /usr/lib/ocf/resource.d/linbit/drbd DRBD is running fine, I have setup that part of it already. I am using the ha-scsi.pdf and up to this point everything is fine. Randy ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] question on Creating an Active/Passive iSCSI configuration
ps - in /var/log/messages I find this: Mar 10 22:31:45 drbd1 lrmd: [3274]: ERROR: get_resource_meta: pclose failed: Interrupted system call Mar 10 22:31:45 drbd1 lrmd: [3274]: WARN: on_msg_get_metadata: empty metadata for ocf::linbit::drbd. Mar 10 22:31:45 drbd1 lrmadmin: [3481]: ERROR: lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply message of rmetadata with function get_ret_from_msg. On 3/10/2011 10:29 PM, Randy Katz wrote: > Hi, > > I hope this is the correct forum for this, it is crm: > > The initial configuration commands worked or so it seems (not sure how > to check). > > crm(live)# configure > crm(live)configure# property stonith-enabled="false" > crm(live)configure# property no-quorum-policy="ignore" > crm(live)configure# property default-resource-stickiness="200" > crm(live)configure# commit > > Then the following, the first command gives an error: > > # crm > crm(live)# configure > crm(live)configure# primitive res_drbd_iscsivg01 \ > > ocf:linbit:drbd \ > > params drbd_resource="iscsivg01" \ > > op monitor interval="10" > lrmadmin[3437]: 2011/03/10_22:29:03 ERROR: > lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply > message of rmetadata with function get_ret_from_msg. > ERROR: ocf:linbit:drbd: could not parse meta-data: > ERROR: ocf:linbit:drbd: no such resource agent > > I am a bit lost not sure what to check from here. > Thank you in advance, > Randy > ___ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] question on Creating an Active/Passive iSCSI configuration
Hi, I hope this is the correct forum for this, it is crm: The initial configuration commands worked or so it seems (not sure how to check). crm(live)# configure crm(live)configure# property stonith-enabled="false" crm(live)configure# property no-quorum-policy="ignore" crm(live)configure# property default-resource-stickiness="200" crm(live)configure# commit Then the following, the first command gives an error: # crm crm(live)# configure crm(live)configure# primitive res_drbd_iscsivg01 \ > ocf:linbit:drbd \ > params drbd_resource="iscsivg01" \ > op monitor interval="10" lrmadmin[3437]: 2011/03/10_22:29:03 ERROR: lrm_get_rsc_type_metadata(578): got a return code HA_FAIL from a reply message of rmetadata with function get_ret_from_msg. ERROR: ocf:linbit:drbd: could not parse meta-data: ERROR: ocf:linbit:drbd: no such resource agent I am a bit lost not sure what to check from here. Thank you in advance, Randy ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems