[Linux-ha-dev] [PATCH] RA dev guide: ocf_run: replace -v with -q
# HG changeset patch # User Tim Serong tser...@suse.com # Date 1313069842 -36000 # Node ID 1949ea5878aa58466be5883484a3c91dbdbdc57e # Parent 1caec681e0a823874c3185b8385e70c4a29a126f RA dev guide: ocf_run: replace -v with -q diff -r 1caec681e0a8 -r 1949ea5878aa dev-guides/ra-dev-guide.txt --- a/dev-guides/ra-dev-guide.txt Wed Feb 23 14:02:44 2011 +0100 +++ b/dev-guides/ra-dev-guide.txt Thu Aug 11 23:37:22 2011 +1000 @@ -1204,17 +1204,18 @@ With the command specified above, the re +frobnicate --spam=eggs+ and capture its output and exit code. If the exit code is nonzero (indicating an error), +ocf_run+ logs the command output with the +err+ logging severity, and -the resource agent subsequently exits. +the resource agent subsequently exits. If the exit code is zero +(indicating success), any command couput will be logged with the +info+ +logging severity. -If the resource agent wishes to capture the output of _both_ a -successful and a failed command execution, it can use the +-v+ flag -with +ocf_run+. In the example below, +ocf_run+ will log any output -from the command with the +info+ severity if the command exit code is -zero (indicating success), and with +err+ if it is nonzero. +If the resource agent wishes to ignore the output of a successful +command execution, it can use the +-q+ flag with +ocf_run+. In the +example below, +ocf_run+ will only log output if the command exit code +is nonzero. [source,bash] -- -ocf_run -v frobnicate --spam=eggs || exit $OCF_ERR_GENERIC +ocf_run -q frobnicate --spam=eggs || exit $OCF_ERR_GENERIC -- Finally, if the resource agent wants to log the output of a command ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-HA] Problem with kvm virtual machine and cluster
On Wed, Aug 10, 2011 at 11:15 PM, Maloja01 maloj...@arcor.de wrote: The order constraints do work as I assume, but I guess that you run into a pifall: A clone is marked as up, if one instance in the cluster is started successfully. The order does not say, that the clone on the same node must be up. Use a colocation constraint to have that Kind regards Fabian On 08/10/2011 01:43 PM, i...@umbertocarrara.it wrote: hi, excuse me for my poor english, i use google to help me in traslation and I am a newbie in clustering :-). I'm trying to start a cluster with tree nodes for virtualizzation, I have used a how-to that I found at http://www.linbit.com/support/ha-kvm.pdf to configure the cluster, volumes of vm are shared on openfiler cluster on iscsi that works well. vm start ok in hosts if I'm out of the cluster. The problem is that the vm start before libvirt and open-iscsi initiator I have set a order rule but seems wont work. after when services are started the cluster can not restart the machine so the output of crm_mon -1 is Last updated: Wed Aug 10 12:40:20 2011 Stack: openais Current DC: host1 - partition with quorum Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b 3 Nodes configured, 3 expected votes 2 Resources configured. Online: [ host1 host2 host3 ] Clone Set: BackEndClone Started: [ host1 host2 host3 ] Samba (ocf::heartbeat:VirtualDomain) Started [ host1 host2 host3 ] Failed actions: Samba_monitor_0 (node=host1, call=15, rc=1, status=complete): unknown error Samba_stop_0 (node=host1, call=16, rc=1, status=complete): unknown error Samba_monitor_0 (node=host2, call=12, rc=1, status=complete): unknown error Samba_stop_0 (node=host2, call=13, rc=1, status=complete): unknown error Samba_monitor_0 (node=host3, call=12, rc=1, status=complete): unknown error Samba_stop_0 (node=host3, call=13, rc=1, status=complete): unknown error this is my cluster config: root@host1:~# crm configure show node host1 \ attributes standby=on node host2 \ attributes standby=on node host3 \ attributes standby=on primitive Iscsi lsb:open-iscsi \ op monitor interval=30 primitive Samba ocf:heartbeat:VirtualDomain \ params config=/etc/libvirt/qemu/samba.iso.xml \ meta allow-migrate=true \ op monitor interval=30 primitive Virsh lsb:libvirt-bin \ op monitor interval=30 group BackEnd Iscsi Virsh clone BackEndClone BackEnd \ meta target-role=Started colocation SambaOnBackEndClone inf: Samba BackEndClone order SambaBeforeBackEndClone inf: BackEndClone Samba property $id=cib-bootstrap-options \ dc-version=1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b \ cluster-infrastructure=openais \ expected-quorum-votes=3 \ stonith-enabled=false \ no-quorum-policy=ignore \ default-action-timeout=100 \ last-lrm-refresh=1312970592 rsc_defaults $id=rsc-options \ resource-stickiness=200 my log is: Aug 10 13:36:34 host1 pengine: [1923]: info: get_failcount: Samba has failed INFINITY times on host1 Aug 10 13:36:34 host1 pengine: [1923]: WARN: common_apply_stickiness: Forcing Samba away from host1 after 100 failures (max=100) Aug 10 13:36:34 host1 pengine: [1923]: info: get_failcount: Samba has failed INFINITY times on host2 Aug 10 13:36:34 host1 pengine: [1923]: WARN: common_apply_stickiness: Forcing Samba away from host2 after 100 failures (max=100) Aug 10 13:36:34 host1 pengine: [1923]: info: get_failcount: Samba has failed INFINITY times on host3 Aug 10 13:36:34 host1 pengine: [1923]: WARN: common_apply_stickiness: Forcing Samba away from host3 after 100 failures (max=100) Aug 10 13:36:34 host1 pengine: [1923]: info: native_merge_weights: BackEndClone: Rolling back scores from Samba Aug 10 13:36:34 host1 pengine: [1923]: info: native_color: Unmanaged resource Samba allocated to 'nowhere': failed Aug 10 13:36:34 host1 pengine: [1923]: WARN: native_create_actions: Attempting recovery of resource Samba Aug 10 13:36:34 host1 pengine: [1923]: notice: LogActions: Leave resource Iscsi:0 (Started host1) Aug 10 13:36:34 host1 pengine: [1923]: notice: LogActions: Leave resource Virsh:0 (Started host1) Aug 10 13:36:34 host1 pengine: [1923]: notice: LogActions: Leave resource Iscsi:1 (Started host2) Aug 10 13:36:34 host1 pengine: [1923]: notice: LogActions: Leave resource Virsh:1 (Started host2) Aug 10 13:36:34 host1 pengine: [1923]: notice: LogActions: Leave resource Iscsi:2 (Started host3) Aug 10 13:36:34 host1 pengine: [1923]: notice: LogActions: Leave resource Virsh:2 (Started host3) Aug 10 13:36:34 host1 pengine: [1923]: notice: LogActions: Leave resource Samba (Started unmanaged) ___ Linux-HA mailing list
Re: [Linux-HA] remove resource WITHOUT moving the other resources
On Thu, Aug 11, 2011 at 5:28 AM, ala...@julian-seifert.de wrote: Hi List, I have a little problem with my 2 node pacemaker cluster. (Active/Passive Setup). vpsnode01-rz (current active node) and vpsnode01-nk (passive) It's a bunch of OpenVZ containers grouped together and colocated to where the DRBD-Master resource is running. I have a problem with a newly created Container openvzve_itv. 1. It's recognized as running on the passive node. (Which it is NOT and that wouldn't be possible anyway as the VEs are located on the DRBD storage backends) Then the agent is broken. 2. I am NOT able to remove the openvzve_itv resource WITHOUT ptest suggesting that EVERY other resource will switch nodes. To clarify what I just describe I pasted the crm_mon output and the current configuration: http://pastebin.ubuntu.com/653023/ Now what I am looking for is a way to completely delete/remove openvzve_itv without affecting the other resources. is-managed-default=false and/or resource-stickiness=INFINITY I tried deleting everything in the configuration (crm configuration edit) related to openvzve_itv (namely the resource itself and its entry in the openvz group) but as I mentioned earlier when running ptest vvv nograph after that and before commiting of course it suggests that ALL resources will get moved. PTEST: crm(live)configure# ptest vvv nograph ptest[7623]: 2011/08/10_21:27:50 notice: unpack_config: On loss of CCM Quorum: Ignore ptest[7623]: 2011/08/10_21:27:50 notice: unpack_rsc_op: Hard error - openvzve_itv_start_0 failed with rc=5: Preventing openvzve_itv from re-starting on vpsnode01-nk ptest[7623]: 2011/08/10_21:27:50 WARN: unpack_rsc_op: Processing failed op openvzve_itv_start_0 on vpsnode01-nk: not installed (5) ptest[7623]: 2011/08/10_21:27:50 WARN: unpack_rsc_op: Processing failed op openvzve_itv_stop_0 on vpsnode01-nk: unknown error (1) ptest[7623]: 2011/08/10_21:27:50 notice: clone_print: Clone Set: connected ptest[7623]: 2011/08/10_21:27:50 notice: short_print: Started: [ vpsnode01-nk vpsnode01-rz ] ptest[7623]: 2011/08/10_21:27:50 notice: group_print: Resource Group: openvz ptest[7623]: 2011/08/10_21:27:50 notice: native_print: fs_openvz (ocf::heartbeat:Filesystem): Started vpsnode01-rz ptest[7623]: 2011/08/10_21:27:50 notice: native_print: ip_openvz (ocf::heartbeat:IPaddr2): Started vpsnode01-rz ptest[7623]: 2011/08/10_21:27:50 notice: native_print: openvzve_plesk953 (ocf::heartbeat:ManageVE): Started vpsnode01-rz ptest[7623]: 2011/08/10_21:27:50 notice: native_print: openvzve_stream3 (ocf::heartbeat:ManageVE): Started vpsnode01-rz ptest[7623]: 2011/08/10_21:27:50 notice: native_print: openvzve_mail (ocf::heartbeat:ManageVE): Started vpsnode01-rz ptest[7623]: 2011/08/10_21:27:50 notice: native_print: openvzve_dns1 (ocf::heartbeat:ManageVE): Started vpsnode01-rz ptest[7623]: 2011/08/10_21:27:50 notice: native_print: openvzve_itv (ocf::heartbeat:ManageVE): Started vpsnode01-nk (unmanaged) FAILED ptest[7623]: 2011/08/10_21:27:50 notice: clone_print: Master/Slave Set: ms_drbd_openvz ptest[7623]: 2011/08/10_21:27:50 notice: short_print: Masters: [ vpsnode01-rz ] ptest[7623]: 2011/08/10_21:27:50 notice: short_print: Slaves: [ vpsnode01-nk ] ptest[7623]: 2011/08/10_21:27:50 notice: common_apply_stickiness: openvzve_plesk953 can fail 99 more times on vpsnode01-rz before being forced off ptest[7623]: 2011/08/10_21:27:50 WARN: common_apply_stickiness: Forcing openvzve_itv away from vpsnode01-nk after 100 failures (max=100) ptest[7623]: 2011/08/10_21:27:50 notice: RecurringOp: Start recurring monitor (10s) for openvzve_plesk953 on vpsnode01-nk ptest[7623]: 2011/08/10_21:27:50 notice: RecurringOp: Start recurring monitor (10s) for openvzve_stream3 on vpsnode01-nk ptest[7623]: 2011/08/10_21:27:50 notice: RecurringOp: Start recurring monitor (10s) for openvzve_mail on vpsnode01-nk ptest[7623]: 2011/08/10_21:27:50 notice: RecurringOp: Start recurring monitor (10s) for openvzve_dns1 on vpsnode01-nk ptest[7623]: 2011/08/10_21:27:50 notice: RecurringOp: Start recurring monitor (20s) for drbd_openvz:1 on vpsnode01-rz ptest[7623]: 2011/08/10_21:27:50 ERROR: create_notification_boundaries: Creating boundaries for ms_drbd_openvz ptest[7623]: 2011/08/10_21:27:50 ERROR: create_notification_boundaries: Creating boundaries for ms_drbd_openvz ptest[7623]: 2011/08/10_21:27:50 notice: RecurringOp: Start recurring monitor (20s) for drbd_openvz:1 on vpsnode01-rz ptest[7623]: 2011/08/10_21:27:50 ERROR: create_notification_boundaries: Creating boundaries for ms_drbd_openvz ptest[7623]: 2011/08/10_21:27:50 ERROR: create_notification_boundaries: Creating boundaries for ms_drbd_openvz ptest[7623]: 2011/08/10_21:27:50 notice: LogActions: Leave resource ping:0 (Started vpsnode01-nk) ptest[7623]: 2011/08/10_21:27:50 notice: LogActions: Leave
[Linux-HA] Antw: Re: Link recovery?
Michael Moon moo...@yahoo.com schrieb am 11.08.2011 um 02:09 in Nachricht 1313021358.88558.yahoomail...@web39413.mail.mud.yahoo.com: From: Ulrich Windl ulrich.wi...@rz.uni-regensburg.de To: linux-ha@lists.linux-ha.org linux-ha@lists.linux-ha.org; Michael Moon moo...@yahoo.com Sent: Tuesday, August 9, 2011 11:44 PM Subject: Antw: [Linux-HA] Link recovery? Did you try # corosync-cfgtool -s or examine the logs? Why would I try corosync-cfgtool -s if I am not running corosync? Did I examine the logs? Did you even read my posting? Sorry, I mixed up heartbeat with pacemaker. When I unplug eth1 on Box A, both ha-log files correctly show that eth1 on Box A is dead. If I plug the connection back in, Box A reports that eth1 is back up, but Box B continues to show that the link to Box A is dead. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Antw: Re: Q: default vs. default (e.g. exportfs)
Andrew Beekhof and...@beekhof.net schrieb am 11.08.2011 um 07:57 in Nachricht CAEDLWG3UfkJsYf3x9CUu45K9vdO1rce7FF9V1sooHkdp_X=x...@mail.gmail.com: On Sat, Aug 6, 2011 at 12:01 AM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! I frequently see problems I don't understand: When configuring an exportfs resource using crm shell without explicitly specifying operations or timeouts, I get warnings like these: WARNING: prm_nfs_v03: default timeout 20s for start is smaller than the advised 40 I wonder: If the default is 40s, It is not the default. It is the recommended minimum for that operation on that resource. OK, let's rephrase it: If there is an advertised minimum, and I do not specify a timeout, why isn't hat advertised minimum used (as a default)? Ulrich and I specify none, why isn't that default used? Is it because CRM has ist own defaults? Regards, Ulrich ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Renaming a running resource: to do, or not to do?
Hi! Using crm shell, you cannot rename a running resource. However I managed to do it via a shadow cib: I renamed the resource in the shadow cib, then committed the shadow cib. From the XML changes, I got the impression that the old primitive is removed, and then the new primitive is added. This caused the old resource to be stopped, the new one to be started, and one resource that was a successor in the group to be restarted. There was a temporary active orphan (the old name) and Configuration WARNINGs found during PE processing, but that vanished when the states changed (transitions completed). So obviously there is no rename operation for resources. However when you add more and more resources to your cluster, one might find the point where some renaming for consistency might be a good idea. In priniple that could be done online without taking any resource down, but LRM seems to be not prepared for that. Are there any technical reasons for that? Regards, Ulrich ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] ocf:heartbeat:Xen: shutdown timeout
Hi! Sorry, if this has been discussed before, but I think ocf:heartbeat:Xen does not what the documentations says about timeout: parameter name=shutdown_timeout longdesc lang=en The Xen agent will first try an orderly shutdown using xm shutdown. Should this not succeed within this timeout, the agent will escalate to xm destroy, forcibly killing the node. If this is not set, it will default to two-third of the stop action timeout. Setting this value to 0 forces an immediate destroy. /longdesc The code to set the timeout is this: if [ -n $OCF_RESKEY_shutdown_timeout ]; then timeout=$OCF_RESKEY_shutdown_timeout elif [ -n $OCF_RESKEY_CRM_meta_timeout ]; then # Allow 2/3 of the action timeout for the orderly shutdown # (The origin unit is ms, hence the conversion) timeout=$((OCF_RESKEY_CRM_meta_timeout/1500)) else timeout=60 fi The primitive was configured like this: primitive prm_v02_xen ocf:heartbeat:Xen params xmfile=/etc/xen/vm/v02 op start timeout=300 op stop timeout=300 op monitor interval=1200 timeout=90 So I'd expect 2/3rds of 300s to be 200s. However the syslog says: Aug 11 10:14:37 h01 Xen[25140]: INFO: Xen domain v02 will be stopped (timeout: 13s) Aug 11 10:14:50 h01 Xen[25140]: WARNING: Xen domain v02 will be destroyed! According to the code, that's printed here: if [ $timeout -gt 0 ]; then ocf_log info Xen domain $dom will be stopped (timeout: ${timeout}s) So I guess something is wrong. Regards, Ulrich ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] about STONITH in HA
Hi All, This is Sam for Ericsson IPWorks product maintenance team. We have an urgent problem on the Linux HA solution. I am not sure if this is the right mail box, however it is very appreciated if any one can help us. Our product has used SLES 10 SP4 X86_64 with HA version 2.1.4-0.24.9. We have a problem in the STONITH implement. There are only two nodes in HA cluster. However if there is split brain situation, Two HA nodes will shutdown the peer nodes both at the same time? Then we only let STONTH running in one of HA nodes, is this a right configuration? Is there any Best Practice for STONITH implementation in HA which only has two nodes? Thanks, Sam ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] about STONITH in HA
On Thu, Aug 11, 2011 at 9:29 PM, Sam Sun sam@ericsson.com wrote: Hi All, This is Sam for Ericsson IPWorks product maintenance team. We have an urgent problem on the Linux HA solution. I am not sure if this is the right mail box, however it is very appreciated if any one can help us. Our product has used SLES 10 SP4 X86_64 with HA version 2.1.4-0.24.9. I'd contact SUSE - you pay them to give you their full attention :-) We have a problem in the STONITH implement. There are only two nodes in HA cluster. However if there is split brain situation, Two HA nodes will shutdown the peer nodes both at the same time? Yes Then we only let STONTH running in one of HA nodes, is this a right configuration? No. Is there any Best Practice for STONITH implementation in HA which only has two nodes? Thanks, Sam ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: Re: Q: default vs. default (e.g. exportfs)
On Thu, Aug 11, 2011 at 5:08 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Andrew Beekhof and...@beekhof.net schrieb am 11.08.2011 um 07:57 in Nachricht CAEDLWG3UfkJsYf3x9CUu45K9vdO1rce7FF9V1sooHkdp_X=x...@mail.gmail.com: On Sat, Aug 6, 2011 at 12:01 AM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! I frequently see problems I don't understand: When configuring an exportfs resource using crm shell without explicitly specifying operations or timeouts, I get warnings like these: WARNING: prm_nfs_v03: default timeout 20s for start is smaller than the advised 40 I wonder: If the default is 40s, It is not the default. It is the recommended minimum for that operation on that resource. OK, let's rephrase it: If there is an advertised minimum, and I do not specify a timeout, why isn't hat advertised minimum used (as a default)? Because its only a recommendation Because you may have configured it outside of the shell - with a tools which doesn't know about the agent's metadata. Because the metadata might be different on other machines in the cluster. Ulrich and I specify none, why isn't that default used? Is it because CRM has ist own defaults? Regards, Ulrich ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Renaming a running resource: to do, or not to do?
On Thu, Aug 11, 2011 at 5:37 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: Hi! Using crm shell, you cannot rename a running resource. However I managed to do it via a shadow cib: I renamed the resource in the shadow cib, then committed the shadow cib. From the XML changes, I got the impression that the old primitive is removed, and then the new primitive is added. This caused the old resource to be stopped, the new one to be started, and one resource that was a successor in the group to be restarted. There was a temporary active orphan (the old name) and Configuration WARNINGs found during PE processing, but that vanished when the states changed (transitions completed). So obviously there is no rename operation for resources. However when you add more and more resources to your cluster, one might find the point where some renaming for consistency might be a good idea. In priniple that could be done online without taking any resource down, but LRM seems to be not prepared for that. Are there any technical reasons for that? The resource name is the equivalent of a primary key in a database table. Its the sole point of comparison when deciding if two resources are the same, therefor rename is not a valid operation to consider. Any implementation would have to use delete + create underneath. Regards, Ulrich ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems