Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter
On Tue, 2011-05-10 at 08:24 +0200, Andrew Beekhof wrote: On Mon, May 9, 2011 at 8:44 PM, Holger Teutsch holger.teut...@web.de wrote: On Wed, 2011-04-27 at 13:25 +0200, Andrew Beekhof wrote: On Sun, Apr 24, 2011 at 4:31 PM, Holger Teutsch holger.teut...@web.de wrote: ... Remaining diffs seem to be not related to my changes. Unlikely I'm afraid. We run the regression tests after every commit and complain loudly if they fail. What is the regression test output? That's the output of tools/regression.sh of pacemaker-devel *without* my patches: Version: parent: 10731:bf7b957f4cbe tip see attachment There seems to be something not quite right with your environment. Had you built the tools directory before running the test? Yes, + install In a clean chroot it passes onboth opensuse and fedora: http://build.clusterlabs.org:8010/builders/opensuse-11.3-i386-devel/builds/48/steps/cli_test/logs/stdio and http://build.clusterlabs.org:8010/builders/fedora-13-x86_64-devel/builds/48/steps/cli_test/logs/stdio What distro are you on? Opensuse 11.4 Could you try running it as: /full/path/to/pacemaker/sources/tools/regression.sh The PATH magic that allows the tests to be run from the source directory may not be fully functional. Did not help, will do further investigation. -holger ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter
On Fri, 2011-05-06 at 16:15 +0200, Andrew Beekhof wrote: On Fri, May 6, 2011 at 12:28 PM, Holger Teutsch holger.teut...@web.de wrote: On Fri, 2011-05-06 at 11:03 +0200, Andrew Beekhof wrote: On Fri, May 6, 2011 at 9:53 AM, Andrew Beekhof and...@beekhof.net wrote: On Thu, May 5, 2011 at 5:43 PM, Holger Teutsch holger.teut...@web.de wrote: On Fri, 2011-04-29 at 09:41 +0200, Andrew Beekhof wrote: Unfortunately the devel code does not run at all in my environment so I have to fix this first. Oh? I ran CTS on it the other day and it was fine here. I installed pacemaker-devel on top of a compilation of pacemaker-1.1. In addition I tried make uninstall for both versions and then again make install for devel. Pacemaker does not come up, crm_mon shows nodes as offline. I suspect reason is May 5 17:09:34 devel1 crmd: [5942]: notice: crmd_peer_update: Status update: Client devel1/crmd now has status [online] (DC=null) May 5 17:09:34 devel1 crmd: [5942]: info: crm_update_peer: Node devel1: id=1790093504 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00111312 (new) May 5 17:09:34 devel1 crmd: [5942]: info: pcmk_quorum_notification: Membership 0: quorum retained (0) May 5 17:09:34 devel1 crmd: [5942]: debug: do_fsa_action: actions:trace: #011// A_STARTED May 5 17:09:34 devel1 crmd: [5942]: info: do_started: Delaying start, no membership data (0010) ^ May 5 17:09:34 devel1 crmd: [5942]: debug: register_fsa_input_adv: Stalling the FSA pending further input: cause=C_FSA_INTERNAL Any ideas ? Hg version? Corosync config? I'm running -devel here right now and things are fine. Uh, I think I see now. Try http://hg.clusterlabs.org/pacemaker/1.1/rev/b94ce5673ce4 Yeah, I realized afterwards that it was specific to devel. What does your corosync config look like? I run corosync-1.3.0-3.1.x86_64. It's exactly the same config that worked with pacemaker 1.1 rev 10608:b4f456380f60 # Please read the corosync.conf.5 manual page compatibility: whitetank aisexec { # Run as root - this is necessary to be able to manage # resources with Pacemaker user: root group: root } service { # Load the Pacemaker Cluster Resource Manager ver:1 name: pacemaker use_mgmtd: yes use_logd: yes } totem { # The only valid version is 2 version:2 # How long before declaring a token lost (ms) token: 5000 # How many token retransmits before forming a new configuration token_retransmits_before_loss_const: 10 # How long to wait for join messages in the membership protocol (ms) join: 60 # How long to wait for consensus to be achieved before starting # a new round of membership configuration (ms) consensus: 6000 # Turn off the virtual synchrony filter vsftype:none # Number of messages that may be sent by one processor on # receipt of the token max_messages: 20 # Limit generated nodeids to 31-bits (positive signed integers) clear_node_high_bit: yes # Disable encryption secauth:off # How many threads to use for encryption/decryption threads:0 # Optionally assign a fixed node id (integer) # nodeid: 1234 rrp_mode: active interface { ringnumber: 0 # The following values need to be set based on your environment bindnetaddr:192.168.178.0 mcastaddr: 226.94.40.1 mcastport: 5409 } interface { ringnumber: 1 # The following values need to be set based on your environment bindnetaddr:10.1.1.0 mcastaddr: 226.94.41.1 mcastport: 5411 } } logging { fileline: off to_stderr: no to_logfile: no to_syslog: yes syslog_facility: daemon debug: on timestamp: off } amf { mode: disabled } ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter
I had 1.1 but Dejan asked my to rebase my patches on devel. So long story short: devel now works after upgrading to the rev you mentioned and I got back to working on my patches. Thanx Holger On Mon, 2011-05-09 at 10:58 +0200, Andrew Beekhof wrote: I thought you said you were running 1.1? May 5 17:09:33 devel1 pacemakerd: [5929]: info: read_config: Reading configure for stack: corosync This message is specific to the devel branch. Update to get the following fix and you should be fine: http://hg.clusterlabs.org/pacemaker/devel/rev/84ef5401322f ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter
On Wed, 2011-04-27 at 13:25 +0200, Andrew Beekhof wrote: On Sun, Apr 24, 2011 at 4:31 PM, Holger Teutsch holger.teut...@web.de wrote: ... Remaining diffs seem to be not related to my changes. Unlikely I'm afraid. We run the regression tests after every commit and complain loudly if they fail. What is the regression test output? That's the output of tools/regression.sh of pacemaker-devel *without* my patches: Version: parent: 10731:bf7b957f4cbe tip see attachment -holger Using local binaries from: . * Passed: cibadmin - Require --force for CIB erasure * Passed: cibadmin - Allow CIB erasure with --force * Passed: cibadmin - Query CIB * Passed: crm_attribute - Set cluster option * Passed: cibadmin - Query new cluster option * Passed: cibadmin - Query cluster options * Passed: cibadmin - Delete nvpair * Passed: cibadmin - Create operaton should fail with: -21, The object already exists * Passed: cibadmin - Modify cluster options section * Passed: cibadmin - Query updated cluster option * Passed: crm_attribute - Set duplicate cluster option * Passed: crm_attribute - Setting multiply defined cluster option should fail with -216, Could not set cluster option * Passed: crm_attribute - Set cluster option with -s * Passed: crm_attribute - Delete cluster option with -i * Passed: cibadmin - Create node entry * Passed: cibadmin - Create node status entry * Passed: crm_attribute - Create node attribute * Passed: cibadmin - Query new node attribute * Passed: cibadmin - Digest calculation * Passed: cibadmin - Replace operation should fail with: -45, Update was older than existing configuration * Passed: crm_standby- Default standby value * Passed: crm_standby- Set standby status * Passed: crm_standby- Query standby value * Passed: crm_standby- Delete standby value * Passed: cibadmin - Create a resource * Passed: crm_resource - Create a resource meta attribute * Passed: crm_resource - Query a resource meta attribute * Passed: crm_resource - Remove a resource meta attribute * Passed: crm_resource - Create a resource attribute * Passed: crm_resource - List the configured resources * Passed: crm_resource - Set a resource's fail-count * Passed: crm_resource - Require a destination when migrating a resource that is stopped * Passed: crm_resource - Don't support migration to non-existant locations * Passed: crm_resource - Migrate a resource * Passed: crm_resource - Un-migrate a resource --- ./regression.exp2011-05-09 20:26:27.669381187 +0200 +++ ./regression.out2011-05-09 20:38:27.112098949 +0200 @@ -616,7 +616,7 @@ /status /cib * Passed: crm_resource - List the configured resources -cib epoch=16 num_updates=2 admin_epoch=0 validate-with=pacemaker-1.2 +cib epoch=16 num_updates=1 admin_epoch=0 validate-with=pacemaker-1.2 configuration crm_config cluster_property_set id=cib-bootstrap-options/ @@ -642,19 +642,13 @@ constraints/ /configuration status -node_state id=clusterNode-UUID uname=clusterNode-UNAME - transient_attributes id=clusterNode-UUID -instance_attributes id=status-clusterNode-UUID - nvpair id=status-clusterNode-UUID-fail-count-dummy name=fail-count-dummy value=10/ -/instance_attributes - /transient_attributes -/node_state +node_state id=clusterNode-UUID uname=clusterNode-UNAME/ /status /cib * Passed: crm_resource - Set a resource's fail-count Resource dummy not moved: not-active and no preferred location specified. Error performing operation: cib object missing -cib epoch=16 num_updates=2 admin_epoch=0 validate-with=pacemaker-1.2 +cib epoch=16 num_updates=1 admin_epoch=0 validate-with=pacemaker-1.2 configuration crm_config cluster_property_set id=cib-bootstrap-options/ @@ -680,19 +674,13 @@ constraints/ /configuration status -node_state id=clusterNode-UUID uname=clusterNode-UNAME - transient_attributes id=clusterNode-UUID -instance_attributes id=status-clusterNode-UUID - nvpair id=status-clusterNode-UUID-fail-count-dummy name=fail-count-dummy value=10/ -/instance_attributes - /transient_attributes -/node_state +node_state id=clusterNode-UUID uname=clusterNode-UNAME/ /status /cib * Passed: crm_resource - Require a destination when migrating a resource that is stopped Error performing operation: i.dont.exist is not a known node Error performing operation: The object/attribute does not exist -cib epoch=16 num_updates=2 admin_epoch=0 validate-with=pacemaker-1.2 +cib epoch=16 num_updates=1 admin_epoch=0 validate-with=pacemaker-1.2 configuration crm_config cluster_property_set id=cib-bootstrap-options/ @@ -718,13 +706,7 @@ constraints/ /configuration status -node_state id=clusterNode-UUID uname=clusterNode-UNAME - transient_attributes id=clusterNode-UUID
Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter
On Fri, 2011-05-06 at 11:03 +0200, Andrew Beekhof wrote: On Fri, May 6, 2011 at 9:53 AM, Andrew Beekhof and...@beekhof.net wrote: On Thu, May 5, 2011 at 5:43 PM, Holger Teutsch holger.teut...@web.de wrote: On Fri, 2011-04-29 at 09:41 +0200, Andrew Beekhof wrote: Unfortunately the devel code does not run at all in my environment so I have to fix this first. Oh? I ran CTS on it the other day and it was fine here. I installed pacemaker-devel on top of a compilation of pacemaker-1.1. In addition I tried make uninstall for both versions and then again make install for devel. Pacemaker does not come up, crm_mon shows nodes as offline. I suspect reason is May 5 17:09:34 devel1 crmd: [5942]: notice: crmd_peer_update: Status update: Client devel1/crmd now has status [online] (DC=null) May 5 17:09:34 devel1 crmd: [5942]: info: crm_update_peer: Node devel1: id=1790093504 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00111312 (new) May 5 17:09:34 devel1 crmd: [5942]: info: pcmk_quorum_notification: Membership 0: quorum retained (0) May 5 17:09:34 devel1 crmd: [5942]: debug: do_fsa_action: actions:trace: #011// A_STARTED May 5 17:09:34 devel1 crmd: [5942]: info: do_started: Delaying start, no membership data (0010) ^ May 5 17:09:34 devel1 crmd: [5942]: debug: register_fsa_input_adv: Stalling the FSA pending further input: cause=C_FSA_INTERNAL Any ideas ? Hg version? Corosync config? I'm running -devel here right now and things are fine. Uh, I think I see now. Try http://hg.clusterlabs.org/pacemaker/1.1/rev/b94ce5673ce4 Page not found. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter
On Fri, 2011-04-29 at 09:41 +0200, Andrew Beekhof wrote: Unfortunately the devel code does not run at all in my environment so I have to fix this first. Oh? I ran CTS on it the other day and it was fine here. I installed pacemaker-devel on top of a compilation of pacemaker-1.1. In addition I tried make uninstall for both versions and then again make install for devel. Pacemaker does not come up, crm_mon shows nodes as offline. I suspect reason is May 5 17:09:34 devel1 crmd: [5942]: notice: crmd_peer_update: Status update: Client devel1/crmd now has status [online] (DC=null) May 5 17:09:34 devel1 crmd: [5942]: info: crm_update_peer: Node devel1: id=1790093504 state=unknown addr=(null) votes=0 born=0 seen=0 proc=00111312 (new) May 5 17:09:34 devel1 crmd: [5942]: info: pcmk_quorum_notification: Membership 0: quorum retained (0) May 5 17:09:34 devel1 crmd: [5942]: debug: do_fsa_action: actions:trace: #011// A_STARTED May 5 17:09:34 devel1 crmd: [5942]: info: do_started: Delaying start, no membership data (0010) ^ May 5 17:09:34 devel1 crmd: [5942]: debug: register_fsa_input_adv: Stalling the FSA pending further input: cause=C_FSA_INTERNAL Any ideas ? -holger May 5 17:09:33 devel1 pacemakerd: [5929]: info: Invoked: pacemakerd May 5 17:09:33 devel1 pacemakerd: [5929]: info: crm_log_init_worker: Changed active directory to /var/lib/heartbeat/cores/root May 5 17:09:33 devel1 pacemakerd: [5929]: info: get_cluster_type: Cluster type is: 'corosync' May 5 17:09:33 devel1 pacemakerd: [5929]: info: read_config: Reading configure for stack: corosync May 5 17:09:33 devel1 corosync[2101]: [CONFDB] lib_init_fn: conn=0x6872f0 May 5 17:09:33 devel1 pacemakerd: [5929]: info: config_find_next: Processing additional logging options... May 5 17:09:33 devel1 pacemakerd: [5929]: info: get_config_opt: Found 'on' for option: debug May 5 17:09:33 devel1 pacemakerd: [5929]: info: get_config_opt: Found 'no' for option: to_logfile May 5 17:09:33 devel1 pacemakerd: [5929]: info: get_config_opt: Found 'yes' for option: to_syslog May 5 17:09:33 devel1 pacemakerd: [5929]: info: get_config_opt: Found 'daemon' for option: syslog_facility May 5 17:09:33 devel1 corosync[2101]: [CONFDB] exit_fn for conn=0x6872f0 May 5 17:09:33 devel1 pacemakerd: [5931]: info: crm_log_init_worker: Changed active directory to /var/lib/heartbeat/cores/root May 5 17:09:33 devel1 pacemakerd: [5931]: info: main: Starting Pacemaker 1.1.5 (Build: unknown): ncurses corosync-quorum corosync May 5 17:09:33 devel1 pacemakerd: [5931]: info: main: Maximum core file size is: 18446744073709551615 May 5 17:09:33 devel1 pacemakerd: [5931]: debug: cluster_connect_cfg: Our nodeid: 1790093504 May 5 17:09:33 devel1 pacemakerd: [5931]: debug: cluster_connect_cfg: Adding fd=5 to mainloop May 5 17:09:33 devel1 corosync[2101]: [CPG ] lib_init_fn: conn=0x68bfc0, cpd=0x6903f0 May 5 17:09:33 devel1 pacemakerd: [5931]: debug: cluster_connect_cpg: Our nodeid: 1790093504 May 5 17:09:33 devel1 pacemakerd: [5931]: debug: cluster_connect_cpg: Adding fd=6 to mainloop May 5 17:09:33 devel1 pacemakerd: [5931]: info: update_node_processes: 0x60d920 Node 1790093504 now known as devel1 (was: (null)) May 5 17:09:33 devel1 pacemakerd: [5931]: info: update_node_processes: Node devel1 now has process list: 0002 (was ) May 5 17:09:33 devel1 corosync[2101]: [TOTEM ] mcasted message added to pending queue May 5 17:09:33 devel1 corosync[2101]: [CPG ] got mcast request on 0x68bfc0 May 5 17:09:33 devel1 corosync[2101]: [TOTEM ] Delivering 24 to 25 May 5 17:09:33 devel1 corosync[2101]: [TOTEM ] Delivering MCAST message with seq 25 to pending delivery queue May 5 17:09:33 devel1 corosync[2101]: [CPG ] got procjoin message from cluster node 1790093504 May 5 17:09:33 devel1 corosync[2101]: [TOTEM ] Received ringid(192.168.178.106:332) seq 25 May 5 17:09:33 devel1 corosync[2101]: [TOTEM ] Received ringid(192.168.178.106:332) seq 25 May 5 17:09:33 devel1 corosync[2101]: [TOTEM ] mcasted message added to pending queue May 5 17:09:33 devel1 corosync[2101]: [TOTEM ] Delivering 25 to 26 May 5 17:09:33 devel1 corosync[2101]: [TOTEM ] Delivering MCAST message with seq 26 to pending delivery queue May 5 17:09:33 devel1 corosync[2101]: [TOTEM ] Received ringid(192.168.178.106:332) seq 26 May 5 17:09:33 devel1 corosync[2101]: [TOTEM ] Received ringid(192.168.178.106:332) seq 26 May 5 17:09:33 devel1 corosync[2101]: [TOTEM ] releasing messages up to and including 25 May 5 17:09:33 devel1 pacemakerd: [5931]: info: start_child: Forked child 5935 for process stonith-ng May 5 17:09:33 devel1 pacemakerd: [5931]: info: update_node_processes: Node devel1 now has process list: 0012 (was
Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter
Hi Dejan, On Tue, 2011-04-26 at 17:58 +0200, Dejan Muhamedagic wrote: Hi Holger, On Sun, Apr 24, 2011 at 04:31:33PM +0200, Holger Teutsch wrote: On Mon, 2011-04-11 at 20:50 +0200, Andrew Beekhof wrote: why? CMD_ERR(Resource %s not moved: specifying --master is not supported for --move-from\n, rsc_id); it did not look sensible to me but I can't recall the exact reasons 8-) It's now implemented. also the legacy handling is a little off - do a make install and run tools/regression.sh and you'll see what i mean. Remaining diffs seem to be not related to my changes. other than that the crm_resource part looks pretty good. can you add some regression testcases in tools/ too please? Will add them once the code is in the repo. Latest diffs are attached. The diffs seem to be against the 1.1 code, but this should go into the devel repository. Can you please rebase the patches against the devel code. Unfortunately the devel code does not run at all in my environment so I have to fix this first. - holger Cheers, Dejan -holger ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter
On Mon, 2011-04-11 at 20:50 +0200, Andrew Beekhof wrote: why? CMD_ERR(Resource %s not moved: specifying --master is not supported for --move-from\n, rsc_id); it did not look sensible to me but I can't recall the exact reasons 8-) It's now implemented. also the legacy handling is a little off - do a make install and run tools/regression.sh and you'll see what i mean. Remaining diffs seem to be not related to my changes. other than that the crm_resource part looks pretty good. can you add some regression testcases in tools/ too please? Will add them once the code is in the repo. Latest diffs are attached. -holger diff -r b4f456380f60 shell/modules/ui.py.in --- a/shell/modules/ui.py.in Thu Mar 17 09:41:25 2011 +0100 +++ b/shell/modules/ui.py.in Sun Apr 24 16:18:59 2011 +0200 @@ -738,8 +738,9 @@ rsc_status = crm_resource -W -r '%s' rsc_showxml = crm_resource -q -r '%s' rsc_setrole = crm_resource --meta -r '%s' -p target-role -v '%s' -rsc_migrate = crm_resource -M -r '%s' %s -rsc_unmigrate = crm_resource -U -r '%s' +rsc_move_to = crm_resource --move-to -r '%s' %s +rsc_move_from = crm_resource --move-from -r '%s' %s +rsc_move_cleanup = crm_resource --move-cleanup -r '%s' rsc_cleanup = crm_resource -C -r '%s' -H '%s' rsc_cleanup_all = crm_resource -C -r '%s' rsc_param = { @@ -776,8 +777,12 @@ self.cmd_table[demote] = (self.demote,(1,1),0) self.cmd_table[manage] = (self.manage,(1,1),0) self.cmd_table[unmanage] = (self.unmanage,(1,1),0) +# the next two commands are deprecated self.cmd_table[migrate] = (self.migrate,(1,4),0) self.cmd_table[unmigrate] = (self.unmigrate,(1,1),0) +self.cmd_table[move-to] = (self.move_to,(2,4),0) +self.cmd_table[move-from] = (self.move_from,(1,4),0) +self.cmd_table[move-cleanup] = (self.move_cleanup,(1,1),0) self.cmd_table[param] = (self.param,(3,4),1) self.cmd_table[meta] = (self.meta,(3,4),1) self.cmd_table[utilization] = (self.utilization,(3,4),1) @@ -846,9 +851,67 @@ if not is_name_sane(rsc): return False return set_deep_meta_attr(is-managed,false,rsc) +def move_to(self,cmd,*args): +usage: move-to rsc[:master] node [lifetime] [force] +elem = args[0].split(':') +rsc = elem[0] +master = False +if len(elem) 1: +master = elem[1] +if master != master: +common_error(%s is invalid, specify 'master' % master) +return False +master = True +if not is_name_sane(rsc): +return False +node = args[1] +lifetime = None +force = False +if len(args) == 3: +if args[2] == force: +force = True +else: +lifetime = args[2] +elif len(args) == 4: +if args[2] == force: +force = True +lifetime = args[3] +elif args[3] == force: +force = True +lifetime = args[2] +else: +syntax_err((cmd,force)) +return False + +opts = '' +if node: +opts = --node='%s' % node +if lifetime: +opts = %s --lifetime='%s' % (opts,lifetime) +if force or user_prefs.get_force(): +opts = %s --force % opts +if master: +opts = %s --master % opts +return ext_cmd(self.rsc_move_to % (rsc,opts)) == 0 + def migrate(self,cmd,*args): -usage: migrate rsc [node] [lifetime] [force] -rsc = args[0] +Deprecated: migrate rsc [node] [lifetime] [force] +common_warning(migrate is deprecated, use move-to or move-from) +if len(args) = 2 and args[1] in listnodes(): +return self.move_to(cmd, *args) +return self.move_from(cmd, *args) + +def move_from(self,cmd,*args): +usage: move-from rsc[:master] [node] [lifetime] [force] +elem = args[0].split(':') +rsc = elem[0] +master = False +if len(elem) 1: +master = elem[1] +if master != master: +common_error(%s is invalid, specify 'master' % master) +return False +master = True if not is_name_sane(rsc): return False node = None @@ -888,12 +951,18 @@ opts = %s --lifetime='%s' % (opts,lifetime) if force or user_prefs.get_force(): opts = %s --force % opts -return ext_cmd(self.rsc_migrate % (rsc,opts)) == 0 -def unmigrate(self,cmd,rsc): -usage: unmigrate rsc +if master: +opts = %s --master % opts +return ext_cmd(self.rsc_move_from % (rsc,opts)) == 0 +def move_cleanup(self,cmd,rsc): +usage: move_cleanup rsc if not is_name_sane(rsc):
Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter
On Thu, 2011-04-07 at 12:33 +0200, Dejan Muhamedagic wrote: New syntax: --- crm_resource --move-from --resource myresource --node mynode - all resource variants: check whether active on mynode, then create standby constraint crm_resource --move-from --resource myresource - primitive/group: set --node `current_node`, then create standby constraint - clone/master: refused crm_resource --move-to --resource myresource --node mynode - all resource variants: create prefer constraint crm_resource --move-to --resource myresource --master --node mynode - master: check whether active as slave on mynode, then create prefer constraint for master role - others: refused crm_resource --move-cleanup --resource myresource - zap constraints As we are already short on meaningful single letter options I vote for long options only. Backwards Compatibility: crm_resource {-M|--move} --resource myresource - output deprecation warning - treat as crm_resource --move-from --resource myresource crm_resource {-M|--move} --resource myresource --node mynode - output deprecation warning - treat as crm_resource --move-to --resource myresource --node mynode crm_resource {-U|--unmove} --resource myresource - output deprecation warning - treat as crm_resource --move-cleanup --resource myresource All looks fine to me. For the shell: Should we go for similar commands or keep migrate-XXX migrate is a bit of a misnomer, could be confused with the migrate operation. I'd vote to leave old migrate/unmigrate as deprecated and introduce just move-from/to/cleanup variants. Deajn Andrew, please find attached the patches that implement these commands for review. The require the patch Low: lib/common/utils.c: Don't try to print unprintable option values in crm_help that I send separately because it is not directly related to the movement stuff. I think that the preceding discussions were very valuable to fully understand issues and implications and I'm confident that the new command set is consistent and behaves with predictable outcome. Regards Holger diff -r b4f456380f60 doc/crm_cli.txt --- a/doc/crm_cli.txt Thu Mar 17 09:41:25 2011 +0100 +++ b/doc/crm_cli.txt Fri Apr 08 14:23:59 2011 +0200 @@ -810,28 +810,44 @@ unmanage rsc ... -[[cmdhelp_resource_migrate,migrate a resource to another node]] - `migrate` (`move`) - -Migrate a resource to a different node. If node is left out, the -resource is migrated by creating a constraint which prevents it from -running on the current node. Additionally, you may specify a +[[cmdhelp_resource_move-to,move a resource to another node]] + `move-to` + +Move a resource to a different node. The resource is moved by creating +a constraint which forces it to run on the specified node. +Additionally, you may specify a lifetime for the constraint---once it +expires, the location constraint will no longer be active. +For a master resource specify rsc:master to move the master role. + +Usage: +... +move-to rsc[:master] node [lifetime] [force] +... + +[[cmdhelp_resource_move-from,move a resource away from the specified node]] + `move-from` + +Move a resource away from the specified node. +If node is left out, the the node where the resource is currently active +is used. +The resource is moved by creating a constraint which prevents it from +running on the specified node. Additionally, you may specify a lifetime for the constraint---once it expires, the location constraint will no longer be active. Usage: ... -migrate rsc [node] [lifetime] [force] +move-from rsc [node] [lifetime] [force] ... -[[cmdhelp_resource_unmigrate,unmigrate a resource to another node]] - `unmigrate` (`unmove`) - -Remove the constraint generated by the previous migrate command. +[[cmdhelp_resource_move-cleanup,Cleanup previously created move constraint]] + `move-cleanup` + +Remove the constraint generated by the previous move-to/move-from command. Usage: ... -unmigrate rsc +move-cleanup rsc ... [[cmdhelp_resource_param,manage a parameter of a resource]] diff -r b4f456380f60 tools/crm_resource.c --- a/tools/crm_resource.c Thu Mar 17 09:41:25 2011 +0100 +++ b/tools/crm_resource.c Fri Apr 08 15:02:39 2011 +0200 @@ -52,7 +52,8 @@ const char *prop_id = NULL; const char *prop_set = NULL; char *move_lifetime = NULL; -char rsc_cmd = 'L'; +int move_master = 0; +int rsc_cmd = 'L'; char *our_pid = NULL; IPC_Channel *crmd_channel = NULL; char *xml_file = NULL; @@ -192,6 +193,33 @@ return 0; } +/* return role of resource on node */ +static int +role_on_node(resource_t *rsc, const char *node_uname) +{ +GListPtr lpc = NULL; + +if(rsc-variant pe_native) { +/* recursively call down */ +
[Pacemaker] [PATCH]Low: lib/common/utils.c: Don't try to print unprintable option values in crm_help
Hi, during work on the move-XXX stuff I discovered this. Regards Holger # HG changeset patch # User Holger Teutsch holger.teut...@web.de # Date 1302259903 -7200 # Branch mig # Node ID caed31174dc966450a31da048b640201980870a8 # Parent 9451c288259b7b9fd6f32f5df01d47569e570c58 Low: lib/common/utils.c: Don't try to print unprintable option values in crm_help diff -r 9451c288259b -r caed31174dc9 lib/common/utils.c --- a/lib/common/utils.c Tue Apr 05 13:24:21 2011 +0200 +++ b/lib/common/utils.c Fri Apr 08 12:51:43 2011 +0200 @@ -2281,7 +2281,13 @@ fprintf(stream, %s\n, crm_long_options[i].desc); } else { - fprintf(stream, -%c, --%s%c%s\t%s\n, crm_long_options[i].val, crm_long_options[i].name, +/* is val printable as char ? */ +if(crm_long_options[i].val = UCHAR_MAX) { +fprintf(stream, -%c,, crm_long_options[i].val); +} else { +fputs(, stream); +} + fprintf(stream, --%s%c%s\t%s\n, crm_long_options[i].name, crm_long_options[i].has_arg?'=':' ',crm_long_options[i].has_arg?value:, crm_long_options[i].desc?crm_long_options[i].desc:); } ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter
On Thu, 2011-04-07 at 08:57 +0200, Andrew Beekhof wrote: On Wed, Apr 6, 2011 at 5:48 PM, Holger Teutsch holger.teut...@web.de wrote: On Wed, 2011-04-06 at 15:38 +0200, Dejan Muhamedagic wrote: On Wed, Apr 06, 2011 at 01:00:36PM +0200, Andrew Beekhof wrote: On Tue, Apr 5, 2011 at 12:27 PM, Dejan Muhamedagic deja...@fastmail.fm wrote: Ah, right, sorry, wanted to ask about the difference between move-off and move. The description looks the same as for move. Is it that in this case it is for clones so crm_resource needs an extra node parameter? You wrote in the doc: +Migrate a resource (-instance for clones/masters) off the specified node. The '-instance' looks somewhat funny. Why not say Move/migrate a clone or master/slave instance away from the specified node? I must say that I still find all this quite confusing, i.e. now we have move, unmove, and move-off, but it's probably just me :) Not just you. The problem is that we didn't fully understand all the use case permutations at the time. I think, not withstanding legacy computability, move should probably be renamed to move-to and this new option be called move-from. That seems more obvious and syntactically consistent with the rest of the system. Yes, move-to and move-from seem more consistent than other options. The problem is that the old move is at times one and then at times another. In the absence of a host name, each uses the current location for the named group/primitive resource and complains for clones. The biggest question in my mind is what to call unmove... move-cleanup perhaps? move-remove? :D Actually, though the word is a bit awkward, unmove sounds fine to me. I would vote for move-cleanup. It's consistent to move-XXX and to my (german) ears unmove seems to stand for the previous move being undone and the stuff comes back. BTW: Has someone already tried out the code or do you trust me 8-D ? I trust no-one - which is why we have regression tests :-) Stay tuned for updated patches... Now, after an additional discussion round I propose the following: Please note that for consistency the --node argument is optional for --move-from New syntax: --- crm_resource --move-from --resource myresource --node mynode - all resource variants: check whether active on mynode, then create standby constraint crm_resource --move-from --resource myresource - primitive/group: set --node `current_node`, then create standby constraint - clone/master: refused crm_resource --move-to --resource myresource --node mynode - all resource variants: create prefer constraint crm_resource --move-to --resource myresource --master --node mynode - master: check whether active as slave on mynode, then create prefer constraint for master role - others: refused crm_resource --move-cleanup --resource myresource - zap constraints As we are already short on meaningful single letter options I vote for long options only. Backwards Compatibility: crm_resource {-M|--move} --resource myresource - output deprecation warning - treat as crm_resource --move-from --resource myresource crm_resource {-M|--move} --resource myresource --node mynode - output deprecation warning - treat as crm_resource --move-to --resource myresource --node mynode crm_resource {-U|--unmove} --resource myresource - output deprecation warning - treat as crm_resource --move-cleanup --resource myresource For the shell: Should we go for similar commands or keep migrate-XXX Coming back to Dejan's proposal of move-remove: That can be implemented of reexecuting the last move (a remove). Reimplemeting unmove as undo of the last move you have shortcuts for your favorite move operation move move-unmove - back move-remove - and forth Just kidding... - holger ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter
On Wed, 2011-04-06 at 15:38 +0200, Dejan Muhamedagic wrote: On Wed, Apr 06, 2011 at 01:00:36PM +0200, Andrew Beekhof wrote: On Tue, Apr 5, 2011 at 12:27 PM, Dejan Muhamedagic deja...@fastmail.fm wrote: Ah, right, sorry, wanted to ask about the difference between move-off and move. The description looks the same as for move. Is it that in this case it is for clones so crm_resource needs an extra node parameter? You wrote in the doc: +Migrate a resource (-instance for clones/masters) off the specified node. The '-instance' looks somewhat funny. Why not say Move/migrate a clone or master/slave instance away from the specified node? I must say that I still find all this quite confusing, i.e. now we have move, unmove, and move-off, but it's probably just me :) Not just you. The problem is that we didn't fully understand all the use case permutations at the time. I think, not withstanding legacy computability, move should probably be renamed to move-to and this new option be called move-from. That seems more obvious and syntactically consistent with the rest of the system. Yes, move-to and move-from seem more consistent than other options. The problem is that the old move is at times one and then at times another. In the absence of a host name, each uses the current location for the named group/primitive resource and complains for clones. The biggest question in my mind is what to call unmove... move-cleanup perhaps? move-remove? :D Actually, though the word is a bit awkward, unmove sounds fine to me. I would vote for move-cleanup. It's consistent to move-XXX and to my (german) ears unmove seems to stand for the previous move being undone and the stuff comes back. BTW: Has someone already tried out the code or do you trust me 8-D ? Stay tuned for updated patches... - holger Thanks, Dejan ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter
On Mon, 2011-04-04 at 21:31 +0200, Holger Teutsch wrote: On Mon, 2011-04-04 at 15:24 +0200, Andrew Beekhof wrote: On Mon, Apr 4, 2011 at 2:43 PM, Holger Teutsch holger.teut...@web.de wrote: On Mon, 2011-04-04 at 11:05 +0200, Andrew Beekhof wrote: On Sat, Mar 19, 2011 at 11:55 AM, Holger Teutsch holger.teut...@web.de wrote: Hi Dejan, On Fri, 2011-03-18 at 14:24 +0100, Dejan Muhamedagic wrote: Hi, On Fri, Mar 18, 2011 at 12:21:40PM +0100, Holger Teutsch wrote: Hi, I would like to submit 2 patches of an initial implementation for discussion. .. To recall: crm_resource --move resource creates a standby rule that moves the resource off the currently active node while crm_resource --move resource --node newnode creates a prefer rule that moves the resource to the new node. When dealing with clones and masters the behavior was random as the code only considers the node where the first instance of the clone was started. The new code behaves consistently for the master role of an m/s resource. The options --master and rsc:master are somewhat redundant as a slave move is not supported. Currently it's more an acknowledgement of the user. On the other hand it is desirable (and was requested several times on the ML) to stop a single resource instance of a clone or master on a specific node. Should that be implemented by something like crm_resource --move-off --resource myresource --node devel2 ? or should crm_resource refuse to work on clones and/or should moving the master role be the default for m/s resources and the --master option discarded ? I think that we also need to consider the case when clone-max is less than the number of nodes. If I understood correctly what you were saying. So, all of move slave and move master and move clone should be possible. I think the following use cases cover what can be done with such kind of interface: crm_resource --moveoff --resource myresource --node mynode - all resource variants: check whether active on mynode, then create standby constraint crm_resource --move --resource myresource - primitive/group: convert to --moveoff --node `current_node` - clone/master: refused crm_resource --move --resource myresource --node mynode - primitive/group: create prefer constraint - clone/master: refused Not sure this needs to be refused. I see the problem that the node where the resource instance should be moved off had to be specified as well to get predictable behavior. Consider a a 2 way clone on a 3 node cluster. If the clone is active on A and B what should crm_resource --move --resource myClone --node C do ? I would expect it to create the +inf constraint for C but no contraint(s) for the current location(s) You are right. These are different and valid use cases. crm_resource --move --resource myClone --node C - I want an instance on C, regardless where it is moved off crm_resource --move-off --resource myClone --node C - I want the instance moved off C, regardless where it is moved on I tried them out with a reimplementation of the patch on a 3 node cluster with a resource with clone-max=2. The behavior appears logical (at least to me 8-) ). This would require an additional --from-node or similar. Other than that the proposal looks sane. My first thought was to make --move behave like --move-off if the resource is a clone or /ms, but since the semantics are the exact opposite, that might introduce introduce more problems than it solves. That was my perception as well. Does the original crm_resource patch implement this? No, I will submit an updated version later this week. - holger Hi, I submit revised patches for review. Summarizing preceding discussions the following functionality is implemented: crm_resource --move-off --resource myresource --node mynode - all resource variants: check whether active on mynode, then create standby constraint crm_resource --move --resource myresource - primitive/group: convert to --move-off --node `current_node` - clone/master: refused crm_resource --move --resource myresource --node mynode - all resource variants: create prefer constraint crm_resource --move --resource myresource --master --node mynode - master: check whether active as slave on mynode, then create prefer constraint for master role - others: refused The patch shell_migrate.diff supports this in the shell. This stuff is agnostic of what crm_migrate really does. Regards Holger diff -r b4f456380f60 doc/crm_cli.txt --- a/doc/crm_cli.txt Thu Mar 17 09:41:25 2011 +0100 +++ b/doc/crm_cli.txt Mon Apr 04
Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter
Hi Dejan, On Tue, 2011-04-05 at 13:40 +0200, Dejan Muhamedagic wrote: Hi Holger, On Tue, Apr 05, 2011 at 01:19:56PM +0200, Holger Teutsch wrote: Hi Dejan, On Tue, 2011-04-05 at 12:27 +0200, Dejan Muhamedagic wrote: On Tue, Apr 05, 2011 at 12:10:48PM +0200, Holger Teutsch wrote: Hi Dejan, On Tue, 2011-04-05 at 11:48 +0200, Dejan Muhamedagic wrote: Hi Holger, On Mon, Apr 04, 2011 at 09:31:02PM +0200, Holger Teutsch wrote: On Mon, 2011-04-04 at 15:24 +0200, Andrew Beekhof wrote: [...] crm_resource --move-off --resource myClone --node C - I want the instance moved off C, regardless where it is moved on What is the difference between move-off and unmigrate (-U)? --move-off - create a constraint that a resource should *not* run on the specific node (partly as before --move without --node) -U: zap all migration constraints (as before) Ah, right, sorry, wanted to ask about the difference between move-off and move. The description looks the same as for move. Is it that in this case it is for clones so crm_resource needs an extra node parameter? You wrote in the doc: +Migrate a resource (-instance for clones/masters) off the specified node. The '-instance' looks somewhat funny. Why not say Move/migrate a clone or master/slave instance away from the specified node? Moving away works for all kinds of resources so the text now looks like: diff -r b4f456380f60 doc/crm_cli.txt --- a/doc/crm_cli.txt Thu Mar 17 09:41:25 2011 +0100 +++ b/doc/crm_cli.txt Tue Apr 05 13:08:10 2011 +0200 @@ -818,10 +818,25 @@ running on the current node. Additionally, you may specify a lifetime for the constraint---once it expires, the location constraint will no longer be active. +For a master resource specify rsc:master to move the master role. Usage: ... -migrate rsc [node] [lifetime] [force] +migrate rsc[:master] [node] [lifetime] [force] +... + +[[cmdhelp_resource_migrateoff,migrate a resource off the specified node]] + `migrateoff` (`moveoff`) + +Migrate a resource away from the specified node. +The resource is migrated by creating a constraint which prevents it from +running on the specified node. Additionally, you may specify a +lifetime for the constraint---once it expires, the location +constraint will no longer be active. + +Usage: +... +migrateoff rsc node [lifetime] [force] ... [[cmdhelp_resource_unmigrate,unmigrate a resource to another node]] I must say that I still find all this quite confusing, i.e. now we have move, unmove, and move-off, but it's probably just me :) Think of move == move-to then it is simpler 8-) ... keeping in mind that for backward compatibility crm_resource --move --resource myResource is equivalent crm_resource --move-off --resource myResource --node $(current node) But as there is no current node for clones / masters the old implementation did some random movements... OK. Thanks for the clarification. I'd like to revise my previous comment about restricting use of certain constructs. For instance, in this case, if the command would result in a random movement then the shell should at least issue a warning about it. Or perhaps refuse to do that completely. I didn't take a look yet at the code, perhaps you've already done that. Thanks, Dejan I admit you have to specify more verbosely what you want to achieve but then the patched versions (based on patches I submitted today around 10:01) execute consistent and without surprises - at least for my test cases. Regards Holger ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter
Hi Dejan, On Mon, 2011-03-21 at 16:11 +0100, Dejan Muhamedagic wrote: Hi Holger, On Sat, Mar 19, 2011 at 11:55:57AM +0100, Holger Teutsch wrote: Hi Dejan, On Fri, 2011-03-18 at 14:24 +0100, Dejan Muhamedagic wrote: Hi, On Fri, Mar 18, 2011 at 12:21:40PM +0100, Holger Teutsch wrote: Hi, I would like to submit 2 patches of an initial implementation for discussion. .. To recall: crm_resource --move resource creates a standby rule that moves the resource off the currently active node while crm_resource --move resource --node newnode creates a prefer rule that moves the resource to the new node. When dealing with clones and masters the behavior was random as the code only considers the node where the first instance of the clone was started. The new code behaves consistently for the master role of an m/s resource. The options --master and rsc:master are somewhat redundant as a slave move is not supported. Currently it's more an acknowledgement of the user. On the other hand it is desirable (and was requested several times on the ML) to stop a single resource instance of a clone or master on a specific node. Should that be implemented by something like crm_resource --move-off --resource myresource --node devel2 ? or should crm_resource refuse to work on clones and/or should moving the master role be the default for m/s resources and the --master option discarded ? I think that we also need to consider the case when clone-max is less than the number of nodes. If I understood correctly what you were saying. So, all of move slave and move master and move clone should be possible. I think the following use cases cover what can be done with such kind of interface: crm_resource --moveoff --resource myresource --node mynode - all resource variants: check whether active on mynode, then create standby constraint crm_resource --move --resource myresource - primitive/group: convert to --moveoff --node `current_node` - clone/master: refused crm_resource --move --resource myresource --node mynode - primitive/group: create prefer constraint - clone/master: refused crm_resource --move --resource myresource --master --node mynode - master: create prefer constraint for master role - others: refused They should work (witch foreseeable outcome!) regardless of the setting of clone-max. This seems quite complicated to me. Took me a while to figure out what's what and where :) Why bother doing the thinking for I'm afraid the matter *is* complicated. The current implementation of crm_resource --move --resource myResource (without node name) is moving off the resource from the node it is currently active on by creating a standby constraint. For clones and masters there is no such *single* active node the constraint can be constructed for. Consider this use case: I have 2 nodes and a clone or master and would like to safely get rid of one instance on a particular node (e.g. with agents 1.0.5 the slave of a DB2 HADR pair 8-) ). No idea how that should be done without a move-off functionality. users? The only case which seems to me worth considering is refusing setting role for non-ms resources. Otherwise, let's let the user move things around and enjoy the consequences. Definitely not true for production clusters. The tools should produce least surprise consequences. Cheers, Over the weekend I implemented the above mentioned functionality. Drop me note if you want to play with an early snapshot 8-) Regards Holger ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter
Hi, I would like to submit 2 patches of an initial implementation for discussion. Patch 1 implements migration of the Master role of an m/s resource to another node in crm_resource Patch 2 adds support for the shell. crm_resource does this with options crm_resource --move --resource ms_test --master --node devel2 The shell does the same with crm resource migrate ms_test:master devel2 crm_resource insist on the options --master --node xxx if dealing with m/s resources. It is not easy to assess the expectations that a move command should fulfill for something more complex than a group. To recall: crm_resource --move resource creates a standby rule that moves the resource off the currently active node while crm_resource --move resource --node newnode creates a prefer rule that moves the resource to the new node. When dealing with clones and masters the behavior was random as the code only considers the node where the first instance of the clone was started. The new code behaves consistently for the master role of an m/s resource. The options --master and rsc:master are somewhat redundant as a slave move is not supported. Currently it's more an acknowledgement of the user. On the other hand it is desirable (and was requested several times on the ML) to stop a single resource instance of a clone or master on a specific node. Should that be implemented by something like crm_resource --move-off --resource myresource --node devel2 ? or should crm_resource refuse to work on clones and/or should moving the master role be the default for m/s resources and the --master option discarded ? Regards Holger # HG changeset patch # User Holger Teutsch holger.teut...@web.de # Date 1300439791 -3600 # Branch mig # Node ID dac1a4eae844f0bd857951b1154a171c80c25772 # Parent b4f456380f60bd308acdc462215620f5bf530854 crm_resource.c: Add support for move of Master role of a m/s resource diff -r b4f456380f60 -r dac1a4eae844 tools/crm_resource.c --- a/tools/crm_resource.c Thu Mar 17 09:41:25 2011 +0100 +++ b/tools/crm_resource.c Fri Mar 18 10:16:31 2011 +0100 @@ -52,6 +52,7 @@ const char *prop_id = NULL; const char *prop_set = NULL; char *move_lifetime = NULL; +int move_master = 0; char rsc_cmd = 'L'; char *our_pid = NULL; IPC_Channel *crmd_channel = NULL; @@ -192,6 +193,32 @@ return 0; } +/* is m/s resource in master role on a host? */ +static int +is_master_on(resource_t *rsc, const char *check_uname) +{ +GListPtr lpc = NULL; + +if(rsc-variant pe_native) { +/* recursively call down */ + GListPtr gIter = rsc-children; + for(; gIter != NULL; gIter = gIter-next) { + if(is_master_on(gIter-data, check_uname)) + return 1; +} + return 0; +} + +for(lpc = rsc-running_on; lpc != NULL; lpc = lpc-next) { + node_t *node = (node_t*)lpc-data; + if(rsc-variant == pe_native rsc-role == RSC_ROLE_MASTER +safe_str_eq(node-details-uname, check_uname)) { +return 1; +} +} +return 0; +} + #define cons_string(x) x?x:NA static void print_cts_constraints(pe_working_set_t *data_set) @@ -797,6 +824,7 @@ static int move_resource( const char *rsc_id, +int move_master, const char *existing_node, const char *preferred_node, cib_t * cib_conn) { @@ -935,6 +963,10 @@ crm_xml_add(rule, XML_ATTR_ID, id); crm_free(id); +if(move_master) { +crm_xml_add(rule, XML_RULE_ATTR_ROLE, Master); +} + crm_xml_add(rule, XML_RULE_ATTR_SCORE, INFINITY_S); crm_xml_add(rule, XML_RULE_ATTR_BOOLEAN_OP, and); @@ -1093,6 +1125,8 @@ crm_free(prefix); } +/* out of single letter options */ +#define OPT_MASTER (256 + 'm') static struct crm_option long_options[] = { /* Top-level Options */ {help,0, 0, '?', \t\tThis text}, @@ -1120,10 +1154,10 @@ {get-property,1, 0, 'G', Display the 'class', 'type' or 'provider' of a resource, 1}, {set-property,1, 0, 'S', (Advanced) Set the class, type or provider of a resource, 1}, {move,0, 0, 'M', - \t\tMove a resource from its current location, optionally specifying a destination (-N) and/or a period for which it should take effect (-u) + \t\tMove a resource from its current location, optionally specifying a role (--master), a destination (-N) and/or a period for which it should take effect (-u) \n\t\t\t\tIf -N is not specified, the cluster will force the resource to move by creating a rule for the current location and a score of -INFINITY \n\t\t\t\tNOTE: This will prevent the resource from running on this node until the constraint is removed with -U}, -{un-move, 0, 0, 'U', \tRemove all constraints created by a move command}, +{un-move, 0, 0, 'U', \t\tRemove all constraints created by a move command}, {-spacer-, 1, 0, '-', \nAdvanced Commands:}, {delete, 0, 0, 'D', \t\tDelete a resource from the CIB}, @@ -1137,6 +1171,7 @@ {resource-type, 1, 0, 't', Resource type
Re: [Pacemaker] Failing back a multi-state resource eg. DRBD
On Mon, 2011-03-07 at 14:21 +0100, Dejan Muhamedagic wrote: Hi, On Fri, Mar 04, 2011 at 09:12:46AM -0500, David McCurley wrote: Are you wanting to move all the resources back or just that one resource? I'm still learning, but one simple way I move all resources back from nodeb to nodea is like this: # on nodeb sudo crm node standby # now services migrate to nodea # still on nodeb sudo crm node online This may be a naive way to do it but it works for now :) Yes, that would work. Though that would also make all other resources move from the standby node. There is also a crm resource migrate to migrate individual resources. For that, see here: resource migrate has no option to move ms resources, i.e. to make another node the master. What would work right now is to create a temporary location constraint: location tmp1 ms-drbd0 \ rule $id=tmp1-rule $role=Master inf: #uname eq nodea Then, once the drbd got promoted on nodea, just remove the constraint: crm configure delete tmp1 Obviously, we'd need to make some improvements here. resource migrate uses crm_resource to insert the location constraint, perhaps we should update it to also accept the role parameter. Can you please make an enhancement bugzilla report so that this doesn't get lost. Thanks, Dejan Hi Dejan, it seems that the original author did not file the bug. I entered it as http://developerbugs.linux-foundation.org/show_bug.cgi?id=2567 Regards Holger ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated
Hi Dejan, On Thu, 2011-03-10 at 10:14 +0100, Dejan Muhamedagic wrote: Hi Holger, On Wed, Mar 09, 2011 at 07:58:02PM +0100, Holger Teutsch wrote: Hi Dejan, On Wed, 2011-03-09 at 14:00 +0100, Dejan Muhamedagic wrote: Hi Holger, In order to show the intention of the arguments clearer: Instead of def _verify(self, set_obj_semantic, set_obj_all = None): if not set_obj_all: set_obj_all = set_obj_semantic rc1 = set_obj_all.verify() if user_prefs.check_frequency != never: rc2 = set_obj_semantic.semantic_check(set_obj_all) else: rc2 = 0 return rc1 and rc2 = 1 def verify(self,cmd): usage: verify if not cib_factory.is_cib_sane(): return False return self._verify(mkset_obj(xml)) This way (always passing both args): def _verify(self, set_obj_semantic, set_obj_all): rc1 = set_obj_all.verify() if user_prefs.check_frequency != never: rc2 = set_obj_semantic.semantic_check(set_obj_all) else: rc2 = 0 return rc1 and rc2 = 1 def verify(self,cmd): usage: verify if not cib_factory.is_cib_sane(): return False set_obj_all = mkset_obj(xml) return self._verify(set_obj_all, set_obj_all) See patch set_obj_all.diff My only remaining concern is performance. Though the meta-data is cached, perhaps it will pay off to save the RAInfo instance with the element. But we can worry about that later. I can work on this as next step. I'll do some testing on really big configurations and try to gauge the impact. OK The patch makes some regression tests blow: + File /usr/lib64/python2.6/site-packages/crm/ui.py, line 1441, in verify +return self._verify(mkset_obj(xml)) + File /usr/lib64/python2.6/site-packages/crm/ui.py, line 1433, in _verify +rc2 = set_obj_semantic.semantic_check(set_obj_all) + File /usr/lib64/python2.6/site-packages/crm/cibconfig.py, line 294, in semantic_check +rc = self.__check_unique_clash(set_obj_all) + File /usr/lib64/python2.6/site-packages/crm/cibconfig.py, line 274, in __check_unique_clash +process_primitive(node, clash_dict) + File /usr/lib64/python2.6/site-packages/crm/cibconfig.py, line 259, in process_primitive +if ra_params[ name ]['unique'] == '1': +KeyError: 'OCF_CHECK_LEVEL' Can't recall why OCF_CHECK_LEVEL appears here. There must be some good explanation :) The good explanation is: Not only params are in instance_atributes ... but OCF_CHECK_LEVEL as well within operations ... The latest version no longer blows the test - semantic_check.diff Regards Holger # HG changeset patch # User Holger Teutsch holger.teut...@web.de # Date 1299775617 -3600 # Branch hot # Node ID 30730ccc0aa09c3a476a18c6d95c680b3595 # Parent 9fa61ee6e35ef190f4126e163e9bfe6911e35541 Low: Shell: Rename variable set_obj_verify to set_obj_all as it always contains all objects Simplify usage of this var in [_]verify, pass to CibObjectSet.semantic_check diff -r 9fa61ee6e35e -r 30730ccc0aa0 shell/modules/cibconfig.py --- a/shell/modules/cibconfig.py Wed Mar 09 13:41:27 2011 +0100 +++ b/shell/modules/cibconfig.py Thu Mar 10 17:46:57 2011 +0100 @@ -230,7 +230,7 @@ See below for specific implementations. ''' pass -def semantic_check(self): +def semantic_check(self, set_obj_all): ''' Test objects for sanity. This is about semantics. ''' diff -r 9fa61ee6e35e -r 30730ccc0aa0 shell/modules/ui.py.in --- a/shell/modules/ui.py.in Wed Mar 09 13:41:27 2011 +0100 +++ b/shell/modules/ui.py.in Thu Mar 10 17:46:57 2011 +0100 @@ -1425,12 +1425,10 @@ set_obj = mkset_obj(*args) err_buf.release() # show them, but get an ack from the user return set_obj.edit() -def _verify(self, set_obj_semantic, set_obj_verify = None): -if not set_obj_verify: -set_obj_verify = set_obj_semantic -rc1 = set_obj_verify.verify() +def _verify(self, set_obj_semantic, set_obj_all): +rc1 = set_obj_all.verify() if user_prefs.check_frequency != never: -rc2 = set_obj_semantic.semantic_check() +rc2 = set_obj_semantic.semantic_check(set_obj_all) else: rc2 = 0 return rc1 and rc2 = 1 @@ -1438,7 +1436,8 @@ usage: verify if not cib_factory.is_cib_sane(): return False -return self._verify(mkset_obj(xml)) +set_obj_all = mkset_obj(xml) +return self._verify(set_obj_all, set_obj_all) def save(self,cmd,*args): usage: save [xml] filename if not cib_factory.is_cib_sane(): # HG changeset patch # User Holger Teutsch holger.teut...@web.de # Date 1299779740 -3600 # Branch hot # Node ID 73021c988d92c5dad4c503af9f8826f5d1c34373 # Parent
Re: [Pacemaker] Patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated
Hi Dejan, On Tue, 2011-03-08 at 19:26 +0100, Holger Teutsch wrote: Hi Dejan, On Tue, 2011-03-08 at 12:07 +0100, Dejan Muhamedagic wrote: Hi Holger, On Tue, Mar 08, 2011 at 09:15:01AM +0100, Holger Teutsch wrote: On Fri, 2011-03-04 at 13:06 +0100, Holger Teutsch wrote: On Thu, 2011-03-03 at 10:55 +0100, Florian Haas wrote: On 2011-03-03 10:43, Holger Teutsch wrote: Hi, I submit a patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated for discussion. ... It looks good, just a few notes. The check function should move to the CibObjectSetRaw class and be invoked from Will move it there. semantic_check(). There's rc1 = set_obj_verify.verify() if user_prefs.check_frequency != never: rc2 = set_obj_semantic.semantic_check() The last should be changed to: rc2 = set_obj_semantic.semantic_check(set_obj_verify) set_obj_verify always contains all CIB elements (well, that means that its name should probably be changed too :). Now, the code should check _only_ new and changed primitives which are contained in set_obj_semantic. That's because we don't want to repeatedly print warnings for all objects on commit, but only for those which were added/changed in the meantime. On the other hand, verify is an explicit check and in that case the whole CIB is always verified. +ra_class = prim.getAttribute(class) +ra_provider = prim.getAttribute(provider) +ra_type = prim.getAttribute(type) +ra_id = prim.getAttribute(id) + +ra = RAInfo(ra_class, ra_type, ra_provider) There's a convenience function get_ra(node) for this. I did not use this as I need all ra_XXX value anyhow later in the code for building k. +if ra == None: +return +ra_params = ra.params() + +attributes = prim.getElementsByTagName(instance_attributes) +if len(attributes) == 0: +return + +for p in attributes[0].getElementsByTagName(nvpair): +name = p.getAttribute(name) +if ra_params[ name ]['unique'] == '1': +value = p.getAttribute(value) +k = (ra_class, ra_provider, ra_type, name, value) +try: +clash_dict[k].append(ra_id) +except: +clash_dict[k] = [ra_id] +return + +clash_dict = {} +for p in cib_factory.mkobj_list(xml,type:primitive): This would become: for p in all_obj_list: # passed from _verify() if is_primitive(p.node): +process_primitive(p.node, clash_dict) Or perhaps to loop through self.obj_list and build clash_dict against all elements? Otherwise, you'll need to skip elements which don't pass the check but are not new/changed (in self.obj_list). I did not pass set_obj_verify in semantic check as this variable only by chance contains the right values. - holger Output: crm(live)configure# primitive ip1a ocf:heartbeat:IPaddr2 params ip=1.2.3.4 meta target-role=stopped crm(live)configure# primitive ip1b ocf:heartbeat:IPaddr2 params ip=1.2.3.4 meta target-role=stopped crm(live)configure# commit WARNING: Resources ip1a,ip1b violate uniqueness for parameter ip: 1.2.3.4 Do you still want to commit? y crm(live)configure# primitive ip2a ocf:heartbeat:IPaddr2 params ip=1.2.3.5 meta target-role=stopped crm(live)configure# commit crm(live)configure# primitive ip2b ocf:heartbeat:IPaddr2 params ip=1.2.3.5 meta target-role=stopped crm(live)configure# primitive ip3 ocf:heartbeat:IPaddr2 params ip=1.2.3.6 meta target-role=stopped crm(live)configure# commit WARNING: Resources ip2a,ip2b violate uniqueness for parameter ip: 1.2.3.5 Do you still want to commit? y crm(live)configure# primitive dummy_1 ocf:heartbeat:Dummy params fake=abc meta target-role=stopped crm(live)configure# primitive dummy_2 ocf:heartbeat:Dummy params fake=abc meta target-role=stopped crm(live)configure# primitive dummy_3 ocf:heartbeat:Dummy meta target-role=stopped crm(live)configure# commit crm(live)configure# verify WARNING: Resources ip1a,ip1b violate uniqueness for parameter ip: 1.2.3.4 WARNING: Resources ip2a,ip2b violate uniqueness for parameter ip: 1.2.3.5 crm(live)configure# diff -r a35d8d6d0ab1 shell/modules/cibconfig.py --- a/shell/modules/cibconfig.py Wed Mar 09 11:21:03 2011 +0100 +++ b/shell/modules/cibconfig.py Wed Mar 09 13:20:14 2011 +0100 @@ -230,11 +230,66 @@ See below for specific implementations. ''' pass + +def __check_unique_clash(self): +'Check whether resource parameters with attribute unique clash' + +def process_primitive
Re: [Pacemaker] Patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated
Hi Dejan, On Wed, 2011-03-09 at 14:00 +0100, Dejan Muhamedagic wrote: Hi Holger, This would become: for p in all_obj_list: # passed from _verify() if is_primitive(p.node): +process_primitive(p.node, clash_dict) Or perhaps to loop through self.obj_list and build clash_dict against all elements? Otherwise, you'll need to skip elements which don't pass the check but are not new/changed (in self.obj_list). I did not pass set_obj_verify in semantic check as this variable only by chance contains the right values. But it's not by chance. As I wrote earlier, it always contains the whole CIB. It has to, otherwise crm_verify wouldn't work. It should actually be renamed to set_obj_all or similar. Since we already have that list created, it's better to reuse it than to create another one from scratch. Further, we may need to add more semantic checks which would require looking at the whole CIB. OK, I implemented it this way. In order to show the intention of the arguments clearer: Instead of def _verify(self, set_obj_semantic, set_obj_all = None): if not set_obj_all: set_obj_all = set_obj_semantic rc1 = set_obj_all.verify() if user_prefs.check_frequency != never: rc2 = set_obj_semantic.semantic_check(set_obj_all) else: rc2 = 0 return rc1 and rc2 = 1 def verify(self,cmd): usage: verify if not cib_factory.is_cib_sane(): return False return self._verify(mkset_obj(xml)) This way (always passing both args): def _verify(self, set_obj_semantic, set_obj_all): rc1 = set_obj_all.verify() if user_prefs.check_frequency != never: rc2 = set_obj_semantic.semantic_check(set_obj_all) else: rc2 = 0 return rc1 and rc2 = 1 def verify(self,cmd): usage: verify if not cib_factory.is_cib_sane(): return False set_obj_all = mkset_obj(xml) return self._verify(set_obj_all, set_obj_all) My only remaining concern is performance. Though the meta-data is cached, perhaps it will pay off to save the RAInfo instance with the element. But we can worry about that later. I can work on this as next step. Cheers, Dejan - holger diff -r a35d8d6d0ab1 shell/modules/cibconfig.py --- a/shell/modules/cibconfig.py Wed Mar 09 11:21:03 2011 +0100 +++ b/shell/modules/cibconfig.py Wed Mar 09 19:53:50 2011 +0100 @@ -230,11 +230,68 @@ See below for specific implementations. ''' pass -def semantic_check(self): + +def __check_unique_clash(self, set_obj_all): +'Check whether resource parameters with attribute unique clash' + +def process_primitive(prim, clash_dict): +''' +Update dict clash_dict with +(ra_class, ra_provider, ra_type, name, value) - [ resourcename ] +if parameter name should be unique +''' +ra_class = prim.getAttribute(class) +ra_provider = prim.getAttribute(provider) +ra_type = prim.getAttribute(type) +ra_id = prim.getAttribute(id) + +ra = RAInfo(ra_class, ra_type, ra_provider) +if ra == None: +return +ra_params = ra.params() + +attributes = prim.getElementsByTagName(instance_attributes) +if len(attributes) == 0: +return + +for p in attributes[0].getElementsByTagName(nvpair): +name = p.getAttribute(name) +if ra_params[ name ]['unique'] == '1': +value = p.getAttribute(value) +k = (ra_class, ra_provider, ra_type, name, value) +try: +clash_dict[k].append(ra_id) +except: +clash_dict[k] = [ra_id] +return + +# we check the whole CIB for clashes as a clash may originate between +# an object already committed and a new one +clash_dict = {} +for obj in set_obj_all.obj_list: +node = obj.node +if is_primitive(node): +process_primitive(node, clash_dict) + +# but we only warn if a 'new' object is involved +check_set = set([o.node.getAttribute(id) for o in self.obj_list if is_primitive(o.node)]) + +rc = 0 +for param, resources in clash_dict.items(): +# at least one new object must be involved +if len(resources) 1 and len(set(resources) check_set) 0: +rc = 2 +msg = 'Resources %s violate uniqueness for parameter %s: %s' %\ +(,.join(sorted(resources)), param[3], param[4]) +common_warning(msg) + +return rc + +def semantic_check(self,
Re: [Pacemaker] Patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated
On Fri, 2011-03-04 at 13:06 +0100, Holger Teutsch wrote: On Thu, 2011-03-03 at 10:55 +0100, Florian Haas wrote: On 2011-03-03 10:43, Holger Teutsch wrote: Hi, I submit a patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated for discussion. I'll leave it do Dejan to review the code, but I love the functionality. Thanks a lot for tackling this. My only suggestion for an improvement is to make the warning message a bit more terse, as in: WARNING: Resources ip1a, ip1b violate uniqueness for parameter ip: 1.2.3.4 Florian, I see your point. Although my formatting allows for an unlimited number of collisions ( 8-) ) in real life we will only have 2 or 3. Will change this together with Dejan's hints. Cheers, Florian Florian + Dejan, here the version with terse output. The code got terser as well. - holger crm(live)configure# primitive ip1a ocf:heartbeat:IPaddr2 params ip=1.2.3.4 meta target-role=stopped crm(live)configure# primitive ip1b ocf:heartbeat:IPaddr2 params ip=1.2.3.4 meta target-role=stopped crm(live)configure# primitive ip2a ocf:heartbeat:IPaddr2 params ip=1.2.3.5 meta target-role=stopped crm(live)configure# primitive ip2b ocf:heartbeat:IPaddr2 params ip=1.2.3.5 meta target-role=stopped crm(live)configure# primitive ip3 ocf:heartbeat:IPaddr2 params ip=1.2.3.6 meta target-role=stopped crm(live)configure# primitive dummy_1 ocf:heartbeat:Dummy params fake=abc meta target-role=stopped crm(live)configure# primitive dummy_2 ocf:heartbeat:Dummy params fake=abc meta target-role=stopped crm(live)configure# primitive dummy_3 ocf:heartbeat:Dummy meta target-role=stopped crm(live)configure# commit WARNING: Resources ip1a,ip1b violate uniqueness for parameter ip: 1.2.3.4 WARNING: Resources ip2a,ip2b violate uniqueness for parameter ip: 1.2.3.5 Do you still want to commit? diff -r cf4e9febed8e shell/modules/ui.py.in --- a/shell/modules/ui.py.in Wed Feb 23 14:52:34 2011 +0100 +++ b/shell/modules/ui.py.in Tue Mar 08 09:11:38 2011 +0100 @@ -1509,6 +1509,55 @@ return False set_obj = mkset_obj(xml) return ptestlike(set_obj.ptest,'vv',cmd,*args) + +def __check_unique_clash(self): +'Check whether resource parameters with attribute unique clash' + +def process_primitive(prim, clash_dict): +''' +Update dict clash_dict with +(ra_class, ra_provider, ra_type, name, value) - [ resourcename ] +if parameter name should be unique +''' +ra_class = prim.getAttribute(class) +ra_provider = prim.getAttribute(provider) +ra_type = prim.getAttribute(type) +ra_id = prim.getAttribute(id) + +ra = RAInfo(ra_class, ra_type, ra_provider) +if ra == None: +return +ra_params = ra.params() + +attributes = prim.getElementsByTagName(instance_attributes) +if len(attributes) == 0: +return + +for p in attributes[0].getElementsByTagName(nvpair): +name = p.getAttribute(name) +if ra_params[ name ]['unique'] == '1': +value = p.getAttribute(value) +k = (ra_class, ra_provider, ra_type, name, value) +try: +clash_dict[k].append(ra_id) +except: +clash_dict[k] = [ra_id] +return + +clash_dict = {} +for p in cib_factory.mkobj_list(xml,type:primitive): +process_primitive(p.node, clash_dict) + +no_clash = 1 +for param, resources in clash_dict.items(): +if len(resources) 1: +no_clash = 0 +msg = 'Resources %s violate uniqueness for parameter %s: %s' %\ +(,.join(sorted(resources)), param[3], param[4]) +common_warning(msg) + +return no_clash + def commit(self,cmd,force = None): usage: commit [force] if force and force != force: @@ -1523,7 +1572,8 @@ rc1 = cib_factory.is_current_cib_equal() rc2 = cib_factory.is_cib_empty() or \ self._verify(mkset_obj(xml,changed),mkset_obj(xml)) -if rc1 and rc2: +rc3 = self.__check_unique_clash() +if rc1 and rc2 and rc3: return cib_factory.commit() if force or user_prefs.get_force(): common_info(commit forced) ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated
Hi Dejan, On Tue, 2011-03-08 at 12:07 +0100, Dejan Muhamedagic wrote: Hi Holger, On Tue, Mar 08, 2011 at 09:15:01AM +0100, Holger Teutsch wrote: On Fri, 2011-03-04 at 13:06 +0100, Holger Teutsch wrote: On Thu, 2011-03-03 at 10:55 +0100, Florian Haas wrote: On 2011-03-03 10:43, Holger Teutsch wrote: Hi, I submit a patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated for discussion. ... It looks good, just a few notes. The check function should move to the CibObjectSetRaw class and be invoked from Will move it there. semantic_check(). There's rc1 = set_obj_verify.verify() if user_prefs.check_frequency != never: rc2 = set_obj_semantic.semantic_check() The last should be changed to: rc2 = set_obj_semantic.semantic_check(set_obj_verify) set_obj_verify always contains all CIB elements (well, that means that its name should probably be changed too :). Now, the code should check _only_ new and changed primitives which are contained in set_obj_semantic. That's because we don't want to repeatedly print warnings for all objects on commit, but only for those which were added/changed in the meantime. On the other hand, verify is an explicit check and in that case the whole CIB is always verified. +ra_class = prim.getAttribute(class) +ra_provider = prim.getAttribute(provider) +ra_type = prim.getAttribute(type) +ra_id = prim.getAttribute(id) + +ra = RAInfo(ra_class, ra_type, ra_provider) There's a convenience function get_ra(node) for this. I did not use this as I need all ra_XXX value anyhow later in the code for building k. +if ra == None: +return +ra_params = ra.params() + +attributes = prim.getElementsByTagName(instance_attributes) +if len(attributes) == 0: +return + +for p in attributes[0].getElementsByTagName(nvpair): +name = p.getAttribute(name) +if ra_params[ name ]['unique'] == '1': +value = p.getAttribute(value) +k = (ra_class, ra_provider, ra_type, name, value) +try: +clash_dict[k].append(ra_id) +except: +clash_dict[k] = [ra_id] +return + +clash_dict = {} +for p in cib_factory.mkobj_list(xml,type:primitive): This would become: for p in all_obj_list: # passed from _verify() if is_primitive(p.node): +process_primitive(p.node, clash_dict) Or perhaps to loop through self.obj_list and build clash_dict against all elements? Otherwise, you'll need to skip elements which don't pass the check but are not new/changed (in self.obj_list). The typical occurrences of clashes will originate from old objects and new/changed objects. I think I have to loop over all objects to build clash dict and then ... + +no_clash = 1 +for param, resources in clash_dict.items(): +if len(resources) 1: ... only emit a warning if the intersection of a clash set with changed objects is not empty. +no_clash = 0 +msg = 'Resources %s violate uniqueness for parameter %s: %s' %\ +(,.join(sorted(resources)), param[3], param[4]) +common_warning(msg) + +return no_clash + I will submit an updated version later this week. -holger ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Antwort: Re: WG: time pressure - software raid cluster, raid1 ressource agent, help needed
Hi, SAN drivers often cave large timeouts configured, so are you patient enough ? At least this demonstrates that the problem is currently not in the cluster... - holger On Mon, 2011-03-07 at 11:04 +0100, patrik.rappo...@knapp.com wrote: Hy, thx for answer. I tested this now, the problem is, mdadm hangs totally when we simulate the fail of one storage. (we already tried two ways: 1. removing the mapping., 2. removing one path, and then disabling the remaining path through the port on the san switch - which is nearly the same like a total fail of the storage). So I can't get the output of mdadm, because it hangs. I think it must be a problem with mdadm. This is my mdadm.conf: DEVICE /dev/mapper/3600a0b800050c94e07874d2e0028_part1 /dev/mapper/3600a0b8000511f5414b14d2df1b1_part1 /dev/mapper/3600a0b800050c94e07874d2e0028_part2 /dev/mapper/3600a0b8000511f5414b14d2df1b1_part2 /dev/mapper/3600a0b800050c94e07874d2e0028_part3 /dev/mapper/3600a0b8000511f5414b14d2df1b1_part3 ARRAY /dev/md0 metadata=0.90 UUID=c411c076:bb28916f:d50a93ef:800dd1f0 ARRAY /dev/md1 metadata=0.90 UUID=522279fa:f3cdbe3a:d50a93ef:800dd1f0 ARRAY /dev/md2 metadata=0.90 UUID=01e07d7d:5305e46c:d50a93ef:800dd1f0 kr Patrik Mit freundlichen Grüßen / Best Regards Patrik Rapposch, BSc System Administration KNAPP Systemintegration GmbH Waltenbachstraße 9 8700 Leoben, Austria Phone: +43 3842 805-915 Fax: +43 3842 805-500 patrik.rappo...@knapp.com www.KNAPP.com Commercial register number: FN 138870x Commercial register court: Leoben The information in this e-mail (including any attachment) is confidential and intended to be for the use of the addressee(s) only. If you have received the e-mail by mistake, any disclosure, copy, distribution or use of the contents of the e-mail is prohibited, and you must delete the e-mail from your system. As e-mail can be changed electronically KNAPP assumes no responsibility for any alteration to this e-mail or its attachments. KNAPP has taken every reasonable precaution to ensure that any attachment to this e-mail has been swept for virus. However, KNAPP does not accept any liability for damage sustained as a result of such attachment being virus infected and strongly recommend that you carry out your own virus check before opening any attachment. Holger Teutsch holger.teut...@web.de 06.03.2011 19:56 Bitte antworten an The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org An The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org Kopie Thema Re: [Pacemaker] WG: time pressure - software raid cluster, raid1 ressource agent, help needed On Sun, 2011-03-06 at 12:40 +0100, patrik.rappo...@knapp.com wrote: Hi, assume the basic problem is in your raid configuration. If you unmap one box the devices should not be in status FAIL but in degraded. So what is the exit status of mdadm --detail --test /dev/md0 after unmapping ? Furthermore I would start start with one isolated group containing the raid, LVM, and FS to keep it simple. Regards Holger Hy, does anyone have an idea to that? I only have the servers till next week friday, so to my regret I am under time pressure :( Like I already wrote, I would appreciate and test any idea of you. Also if someone already made clusters with lvm-mirror, I would be happy to get a cib or some configuration examples. Thank you very much in advance. kr Patrik patrik.rappo...@knapp.com 03.03.2011 15:11Bitte antworten anThe Pacemaker cluster resource manager An pacemaker@oss.clusterlabs.org Kopie Blindkopie Thema [Pacemaker] software raid cluster, raid1 ressource agent,help needed Good Day, I have a 2 node active/passive cluster which is connected to two ibm 4700 storages. I configured 3 raids and I use the Raid1 ressource agent for managing the Raid1s in the cluster. When I now disable the mapping of one storage, to simulate the fail of one storage, the Raid1 Ressources change to the State FAILED and the second node then takes over the ressources and is able to start the raid devices. So I am confused, why the active node can't keep the raid1 ressources and the former passive node takes them over and can start them correct. I would really appreciate your advice, or maybe someone already has a example configuration for Raid1 with two storages. Thank you very much in advance. Attached you can find my cib.xml. kr Patrik Mit freundlichen Grüßen / Best Regards Patrik Rapposch, BSc System Administration KNAPP Systemintegration GmbH Waltenbachstraße 9 8700 Leoben, Austria Phone: +43 3842 805-915 Fax: +43 3842 805-500
Re: [Pacemaker] WG: time pressure - software raid cluster, raid1 ressource agent, help needed
On Sun, 2011-03-06 at 12:40 +0100, patrik.rappo...@knapp.com wrote: Hi, assume the basic problem is in your raid configuration. If you unmap one box the devices should not be in status FAIL but in degraded. So what is the exit status of mdadm --detail --test /dev/md0 after unmapping ? Furthermore I would start start with one isolated group containing the raid, LVM, and FS to keep it simple. Regards Holger Hy, does anyone have an idea to that? I only have the servers till next week friday, so to my regret I am under time pressure :( Like I already wrote, I would appreciate and test any idea of you. Also if someone already made clusters with lvm-mirror, I would be happy to get a cib or some configuration examples. Thank you very much in advance. kr Patrik patrik.rappo...@knapp.com 03.03.2011 15:11Bitte antworten anThe Pacemaker cluster resource manager An pacemaker@oss.clusterlabs.org Kopie Blindkopie Thema [Pacemaker] software raid cluster, raid1 ressource agent,help needed Good Day, I have a 2 node active/passive cluster which is connected to two ibm 4700 storages. I configured 3 raids and I use the Raid1 ressource agent for managing the Raid1s in the cluster. When I now disable the mapping of one storage, to simulate the fail of one storage, the Raid1 Ressources change to the State FAILED and the second node then takes over the ressources and is able to start the raid devices. So I am confused, why the active node can't keep the raid1 ressources and the former passive node takes them over and can start them correct. I would really appreciate your advice, or maybe someone already has a example configuration for Raid1 with two storages. Thank you very much in advance. Attached you can find my cib.xml. kr Patrik Mit freundlichen Grüßen / Best Regards Patrik Rapposch, BSc System Administration KNAPP Systemintegration GmbH Waltenbachstraße 9 8700 Leoben, Austria Phone: +43 3842 805-915 Fax: +43 3842 805-500 patrik.rappo...@knapp.com www.KNAPP.com Commercial register number: FN 138870x Commercial register court: Leoben The information in this e-mail (including any attachment) is confidential and intended to be for the use of the addressee(s) only. If you have received the e-mail by mistake, any disclosure, copy, distribution or use of the contents of the e-mail is prohibited, and you must delete the e-mail from your system. As e-mail can be changed electronically KNAPP assumes no responsibility for any alteration to this e-mail or its attachments. KNAPP has taken every reasonable precaution to ensure that any attachment to this e-mail has been swept for virus. However, KNAPP does not accept any liability for damage sustained as a result of such attachment being virus infected and strongly recommend that you carry out your own virus check before opening any attachment. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated
On Thu, 2011-03-03 at 10:55 +0100, Florian Haas wrote: On 2011-03-03 10:43, Holger Teutsch wrote: Hi, I submit a patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated for discussion. I'll leave it do Dejan to review the code, but I love the functionality. Thanks a lot for tackling this. My only suggestion for an improvement is to make the warning message a bit more terse, as in: WARNING: Resources ip1a, ip1b violate uniqueness for parameter ip: 1.2.3.4 Florian, I see your point. Although my formatting allows for an unlimited number of collisions ( 8-) ) in real life we will only have 2 or 3. Will change this together with Dejan's hints. Cheers, Florian ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] Patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated
Hi, I submit a patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated for discussion. devel1:~ # crm configure crm(live)configure# primitive ip1a ocf:heartbeat:IPaddr2 params ip=1.2.3.4 meta target-role=stopped crm(live)configure# primitive ip1b ocf:heartbeat:IPaddr2 params ip=1.2.3.4 meta target-role=stopped crm(live)configure# primitive ip2a ocf:heartbeat:IPaddr2 params ip=1.2.3.5 meta target-role=stopped crm(live)configure# primitive ip2b ocf:heartbeat:IPaddr2 params ip=1.2.3.5 meta target-role=stopped crm(live)configure# primitive ip3 ocf:heartbeat:IPaddr2 params ip=1.2.3.6 meta target-role=stopped crm(live)configure# primitive dummy_1 ocf:heartbeat:Dummy params fake=abc meta target-role=stopped crm(live)configure# primitive dummy_2 ocf:heartbeat:Dummy params fake=abc meta target-role=stopped crm(live)configure# primitive dummy_3 ocf:heartbeat:Dummy meta target-role=stopped crm(live)configure# commit WARNING: Violations of instance parameters with attribute unique detected: Agent ocf:heartbeat:IPaddr2 parameter ip value 1.2.3.4 in resources ip1a ip1b Agent ocf:heartbeat:IPaddr2 parameter ip value 1.2.3.5 in resources ip2a ip2b Do you still want to commit? n crm(live)configure# The code now lives in ui.py. I'm not sure whether it should be considered as more cib related an be moved to some other module. Regards Holger diff -r cf4e9febed8e -r 810c5ea83873 shell/modules/ui.py.in --- a/shell/modules/ui.py.in Wed Feb 23 14:52:34 2011 +0100 +++ b/shell/modules/ui.py.in Thu Mar 03 10:24:51 2011 +0100 @@ -1509,6 +1509,60 @@ return False set_obj = mkset_obj(xml) return ptestlike(set_obj.ptest,'vv',cmd,*args) + +def __check_unique_clash(self): +'Check whether resource parameters with attribute unique clash' + +def process_primitive(prim, clash_dict): +''' +Update dict clash_dict with +(ra_class, ra_provider, ra_type, name, value) - [ resourcename ] +if parameter name should be unique +''' +ra_class = prim.getAttribute(class) +ra_provider = prim.getAttribute(provider) +ra_type = prim.getAttribute(type) +ra_id = prim.getAttribute(id) + +ra = RAInfo(ra_class, ra_type, ra_provider) +if ra == None: +return +ra_params = ra.params() + +attributes = prim.getElementsByTagName(instance_attributes) +if len(attributes) == 0: +return + +for p in attributes[0].getElementsByTagName(nvpair): +name = p.getAttribute(name) +if ra_params[ name ]['unique'] == '1': +value = p.getAttribute(value) +k = (ra_class, ra_provider, ra_type, name, value) +try: +clash_dict[k].append(ra_id) +except: +clash_dict[k] = [ra_id] +return + +clash_dict = {} +for p in cib_factory.mkobj_list(xml,type:primitive): +process_primitive(p.node, clash_dict) + +clash_msg = [] +for param, resources in clash_dict.items(): +if len(resources) 1: +tag = ':'.join(param[:3]) +clash_msg.append(' Agent %s parameter %s value %s in resources'\ +%(tag, param[3], param[4])) +for r in sorted(resources): +clash_msg.append( %s%r) +clash_msg.append() + +if len(clash_msg) 0: +common_warning(Violations of instance parameters with attribute unique detected:) +print \n.join(clash_msg) +return 0 +return 1 def commit(self,cmd,force = None): usage: commit [force] if force and force != force: @@ -1523,7 +1577,8 @@ rc1 = cib_factory.is_current_cib_equal() rc2 = cib_factory.is_cib_empty() or \ self._verify(mkset_obj(xml,changed),mkset_obj(xml)) -if rc1 and rc2: +rc3 = self.__check_unique_clash() +if rc1 and rc2 and rc3: return cib_factory.commit() if force or user_prefs.get_force(): common_info(commit forced) ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] rpm packages in Pacemaker's rpm-next do not install
At least for Opensuse 11.3 x86_64. May be the title of my previous mail was misleading. -h On Tue, 2011-02-15 at 17:57 +0100, Holger Teutsch wrote: Hi, the packages from rpm-next(64bit) for opensuse 11.3 do not install there (at least true for 1.1.4 and 1.1.5). The plugin is in ./usr/lib/lcrso/pacemaker.lcrso but should be in ./usr/lib64/lcrso/pacemaker.lcrso I think the patch below (borrowed from the 'official' packages) cures. Regards Holger diff -r 43a11c0daae4 pacemaker.spec --- a/pacemaker.spec Mon Feb 14 15:25:13 2011 +0100 +++ b/pacemaker.spec Tue Feb 15 17:50:27 2011 +0100 @@ -1,3 +1,7 @@ +%if 0%{?suse_version} +%define _libexecdir %{_libdir} +%endif + %global gname haclient %global uname hacluster %global pcmk_docdir %{_docdir}/%{name} ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] Packages for Opensuse 11.3 don't build / install
Hi, the packages from rpm-next(64bit) for opensuse 11.3 do not install there (at least true for 1.1.4 and 1.1.5). The plugin is in ./usr/lib/lcrso/pacemaker.lcrso but should be in ./usr/lib64/lcrso/pacemaker.lcrso I think the patch below (borrowed from the 'official' packages) cures. Regards Holger diff -r 43a11c0daae4 pacemaker.spec --- a/pacemaker.specMon Feb 14 15:25:13 2011 +0100 +++ b/pacemaker.specTue Feb 15 17:50:27 2011 +0100 @@ -1,3 +1,7 @@ +%if 0%{?suse_version} +%define _libexecdir %{_libdir} +%endif + %global gname haclient %global uname hacluster %global pcmk_docdir %{_docdir}/%{name} ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Howto write a STONITH agent
On Fri, 2011-01-14 at 17:10 +0100, Christoph Herrmann wrote: -Ursprüngliche Nachricht- Von: Dejan Muhamedagic deja...@fastmail.fm Gesendet: Fr 14.01.2011 12:31 An: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org; Betreff: Re: [Pacemaker] Howto write a STONITH agent Hi, On Thu, Jan 13, 2011 at 09:09:38PM +0100, Christoph Herrmann wrote: Hi, I have some brand new HP Blades with ILO Boards (iLO 2 Standard Blade Edition 1.81 ...) But I'm not able to connect with them via the external/riloe agent. When i try: stonith -t external/riloe -p hostlist=node1 ilo_hostname=ilo1 ilo_user=ilouser ilo_password=ilopass ilo_can_reset=1 ilo_protocol=2.0 ilo_powerdown_method=power -S Try this: stonith -t external/riloe hostlist=node1 ilo_hostname=ilo1 ilo_user=ilouser ilo_password=ilopass ilo_can_reset=1 ilo_protocol=2.0 ilo_powerdown_method=power -S thats much better (looks like PEBKAC ;-), thanks! But it is not reliable. I've tested it about 10 times and 5 times it hangs. That's not what I want. I had the same experience. Ilo is _extremely_ slow and unreliable. Go for external/ipmi. That works very fast and reliable. It is available with ILO 2.x firmware. - holger ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Split-site cluster in two locations
On Tue, 2011-01-11 at 10:21 +0100, Christoph Herrmann wrote: -Ursprüngliche Nachricht- Von: Andrew Beekhof and...@beekhof.net Gesendet: Di 11.01.2011 09:01 An: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org; CC: Michael Schwartzkopff mi...@clusterbau.com; Betreff: Re: [Pacemaker] Split-site cluster in two locations On Tue, Dec 28, 2010 at 10:21 PM, Anton Altaparmakov ai...@cam.ac.uk wrote: Hi, On 28 Dec 2010, at 20:32, Michael Schwartzkopff wrote: Hi, I have four nodes in a split site scenario located in two computing centers. STONITH is enabled. Is there and best practise how to deal with this setup? Does it make sense to set expected-quorum-votes to 3 to make the whole setup still running with one data center online? Is this possible at all? Is quorum needed with STONITH enabled? Is there a quorum server available already? I couldn't see a quorum server in Pacemaker so I have installed a third dummy node which is not allowed to run any resources (using location constraints and setting the cluster to not be symmetric) which just acts as a third vote. I am hoping this effectively acts as a quorum server as a node that looses connectivity will lose quorum and shut down its services whilst the other real node will retain connectivity and thus quorum due to the dummy node still being present. Obviously this is quite wasteful of servers as you can only run a single Pacemaker instance on a server (as far as I know) so that is a lot of dummy servers when you run multiple pacemaker clusters... Solution for us is to use virtualization - one physical server with VMs and each VM is a dummy node for a cluster... With recent 1.1.x builds it should be possible to run just the corosync piece (no pacemaker). As long as you have only two computing centers it doesn't matter if you run a corosync only piece or whatever on a physikal or a virtual machine. The question is: How to configure a four node (or six node, an even number bigger then two) corosync/pacemaker cluster to continue services if you have a blackout in one computing center (you will always loose (at least) one half of your nodes), but to shutdown everything if you have less then half of the node available. Are there any best practices on how to deal with clusters in two computing centers? Anything like an external quorum node or a quorum partition? I'd like to set the expected-quorum-votes to 3 but this is not possible (with corosync-1.2.6 and pacemaker-1.1.2 on SLES11 SP1) Does anybody know why? Currently, the only way I can figure out is to run the cluster with no-quorum-policy=ignore. But I don't like that. Any suggestions? Best regards Christoph Hi, I assume the only solution is to work with manual intervention, i.e. the stonith meatware module. Whenever a site goes down a human being has to confirm that it is lost, pull the power cords or the inter-site links so it will not come back unintentionally. Then confirm with meatclient on the healthy site that the no longer reachable site can be considered gone. From theory this can be configured with an additional meatware stonith resource with lower priority. The intention is to let your regular stonith resources do the work with meatware as last resort. Although I was not able to get this running with versions packaged with SLES11 SP1. The priority was not honored and a lot of zombie meatware processes were left over. I found some patches in the upstream repositories that seem to address these problems but I didn't follow up. Regards Holger ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
Re: [Pacemaker] Fail-count and failure timeout
The resource failed when the sleep expired, i.e. each 600 secs. Now I changed the resource to sleep 7200, failure-timeout 3600 i.e. to values far beyond the recheck-interval opf 15m. Now everything behaves as expected. Mit freundlichen Grüßen / Kind regards Holger Teutsch From: Andrew Beekhof and...@beekhof.net To: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org Date: 05.10.2010 11:09 Subject:Re: [Pacemaker] Fail-count and failure timeout On Tue, Oct 5, 2010 at 11:07 AM, Andrew Beekhof and...@beekhof.net wrote: On Fri, Oct 1, 2010 at 3:40 PM, holger.teut...@fresenius-netcare.com wrote: Hi, I observed the following in pacemaker Versions 1.1.3 and tip up to patch 10258. In a small test environment to study fail-count behavior I have one resource anything doing sleep 600 with monitoring interval 10 secs. The failure-timeout is 300. I would expect to never see a failcount higher than 1. Why? The fail-count is only reset when the PE runs... which is on a failure and/or after the cluster-recheck-interval So I'd expect a maximum of two. Actually this is wrong. There is no maximum, because there needs to have been 300s since the last failure when the PE runs. And since it only runs when the resource fails, it is never reset. cluster-recheck-interval = time [15min] Polling interval for time based changes to options, resource parameters and constraints. The Cluster is primarily event driven, however the configuration can have elements that change based on time. To ensure these changes take effect, we can optionally poll the cluster’s status for changes. Allowed values: Zero disables polling. Positive values are an interval in seconds (unless other SI units are specified. eg. 5min) I observed some sporadic clears but mostly the count is increasing by 1 each 10 minutes. Am I mistaken or is this a bug ? Hard to say without logs. What value did it reach? Regards Holger -- complete cib for reference --- cib epoch=32 num_updates=0 admin_epoch=0 validate-with=pacemaker-1.2 crm_feature_set=3.0.4 have-quorum=0 cib-last-written=Fri Oct 1 14:17:31 2010 dc-uuid=hotlx configuration crm_config cluster_property_set id=cib-bootstrap-options nvpair id=cib-bootstrap-options-dc-version name=dc-version value=1.1.3-09640bd6069e677d5eed65203a6056d9bf562e67/ nvpair id=cib-bootstrap-options-cluster-infrastructure name=cluster-infrastructure value=openais/ nvpair id=cib-bootstrap-options-expected-quorum-votes name=expected-quorum-votes value=2/ nvpair id=cib-bootstrap-options-no-quorum-policy name=no-quorum-policy value=ignore/ nvpair id=cib-bootstrap-options-stonith-enabled name=stonith-enabled value=false/ nvpair id=cib-bootstrap-options-start-failure-is-fatal name=start-failure-is-fatal value=false/ nvpair id=cib-bootstrap-options-last-lrm-refresh name=last-lrm-refresh value=1285926879/ /cluster_property_set /crm_config nodes node id=hotlx uname=hotlx type=normal/ /nodes resources primitive class=ocf id=test provider=heartbeat type=anything meta_attributes id=test-meta_attributes nvpair id=test-meta_attributes-target-role name=target-role value=started/ nvpair id=test-meta_attributes-failure-timeout name=failure-timeout value=300/ /meta_attributes operations id=test-operations op id=test-op-monitor-10 interval=10 name=monitor on-fail=restart timeout=20s/ op id=test-op-start-0 interval=0 name=start on-fail=restart timeout=20s/ /operations instance_attributes id=test-instance_attributes nvpair id=test-instance_attributes-binfile name=binfile value=sleep 600/ /instance_attributes /primitive /resources constraints/ /configuration /cib ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http
Re: [Pacemaker] Dependency on either of two resources
Hi, a similar or related use case that we tried to solve without success: - a stretch cluster with two disk boxes - a LUN on each disk box guarded by an individual SFEX - a mirror (raid1 or clvm) that survives an outage of one disk box - the mirror should be started if at least one SFEX can be obtained and the other one could not be obtained on a different node IMHO sdb is not an alternative as this introduces a SPOF. Mit freundlichen Grüßen / Kind regards Holger Teutsch From: Vladislav Bogdanov bub...@hoster-ok.com To: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org Date: 04.10.2010 06:33 Subject:[Pacemaker] Dependency on either of two resources Hi all, just wondering, is there a way to make resource depend on (be colocated with) either of two other resources? Use case is iSCSI initiator connection to iSCSI target with two portals. Idea is to have f.e. device manager multipath resource depend on both iSCSI connection resources, but in a soft way, so fail of any single iSCSI connection will not cause multipath resource to stop, but fail of both connections will cause it. I should be missing something but I cannot find answer is it possible with current pacemaker. Can someone bring some light? Best, Vladislav ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
[Pacemaker] Fail-count and failure timeout
Hi, I observed the following in pacemaker Versions 1.1.3 and tip up to patch 10258. In a small test environment to study fail-count behavior I have one resource anything doing sleep 600 with monitoring interval 10 secs. The failure-timeout is 300. I would expect to never see a failcount higher than 1. I observed some sporadic clears but mostly the count is increasing by 1 each 10 minutes. Am I mistaken or is this a bug ? Regards Holger -- complete cib for reference --- cib epoch=32 num_updates=0 admin_epoch=0 validate-with=pacemaker-1.2 crm_feature_set=3.0.4 have-quorum=0 cib-last-written=Fri Oct 1 14:17:31 2010 dc-uuid=hotlx configuration crm_config cluster_property_set id=cib-bootstrap-options nvpair id=cib-bootstrap-options-dc-version name=dc-version value=1.1.3-09640bd6069e677d5eed65203a6056d9bf562e67/ nvpair id=cib-bootstrap-options-cluster-infrastructure name=cluster-infrastructure value=openais/ nvpair id=cib-bootstrap-options-expected-quorum-votes name=expected-quorum-votes value=2/ nvpair id=cib-bootstrap-options-no-quorum-policy name=no-quorum-policy value=ignore/ nvpair id=cib-bootstrap-options-stonith-enabled name=stonith-enabled value=false/ nvpair id=cib-bootstrap-options-start-failure-is-fatal name=start-failure-is-fatal value=false/ nvpair id=cib-bootstrap-options-last-lrm-refresh name=last-lrm-refresh value=1285926879/ /cluster_property_set /crm_config nodes node id=hotlx uname=hotlx type=normal/ /nodes resources primitive class=ocf id=test provider=heartbeat type=anything meta_attributes id=test-meta_attributes nvpair id=test-meta_attributes-target-role name=target-role value=started/ nvpair id=test-meta_attributes-failure-timeout name=failure-timeout value=300/ /meta_attributes operations id=test-operations op id=test-op-monitor-10 interval=10 name=monitor on-fail=restart timeout=20s/ op id=test-op-start-0 interval=0 name=start on-fail=restart timeout=20s/ /operations instance_attributes id=test-instance_attributes nvpair id=test-instance_attributes-binfile name=binfile value=sleep 600/ /instance_attributes /primitive /resources constraints/ /configuration /cib ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker