Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter

2011-05-10 Thread Holger Teutsch
On Tue, 2011-05-10 at 08:24 +0200, Andrew Beekhof wrote:
 On Mon, May 9, 2011 at 8:44 PM, Holger Teutsch holger.teut...@web.de wrote:
  On Wed, 2011-04-27 at 13:25 +0200, Andrew Beekhof wrote:
  On Sun, Apr 24, 2011 at 4:31 PM, Holger Teutsch holger.teut...@web.de 
  wrote:
  ...
   Remaining diffs seem to be not related to my changes.
 
  Unlikely I'm afraid.  We run the regression tests after every commit
  and complain loudly if they fail.
  What is the regression test output?
 
  That's the output of tools/regression.sh of pacemaker-devel *without* my
  patches:
  Version: parent: 10731:bf7b957f4cbe tip
 
  see attachment
 
 There seems to be something not quite right with your environment.
 Had you built the tools directory before running the test?
Yes, + install

 In a clean chroot it passes onboth opensuse and fedora:
 
 http://build.clusterlabs.org:8010/builders/opensuse-11.3-i386-devel/builds/48/steps/cli_test/logs/stdio
 and
 
 http://build.clusterlabs.org:8010/builders/fedora-13-x86_64-devel/builds/48/steps/cli_test/logs/stdio
 
 What distro are you on?
 
Opensuse 11.4

 Could you try running it as:
 /full/path/to/pacemaker/sources/tools/regression.sh
 
 The PATH magic that allows the tests to be run from the source
 directory may not be fully functional.
 
Did not help, will do further investigation.
-holger



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter

2011-05-09 Thread Holger Teutsch
On Fri, 2011-05-06 at 16:15 +0200, Andrew Beekhof wrote:
 On Fri, May 6, 2011 at 12:28 PM, Holger Teutsch holger.teut...@web.de wrote:
  On Fri, 2011-05-06 at 11:03 +0200, Andrew Beekhof wrote:
  On Fri, May 6, 2011 at 9:53 AM, Andrew Beekhof and...@beekhof.net wrote:
   On Thu, May 5, 2011 at 5:43 PM, Holger Teutsch holger.teut...@web.de 
   wrote:
   On Fri, 2011-04-29 at 09:41 +0200, Andrew Beekhof wrote:
   
Unfortunately the devel code does not run at all in my environment 
so I
have to fix this first.
  
   Oh?  I ran CTS on it the other day and it was fine here.
  
  
   I installed pacemaker-devel on top of a compilation of pacemaker-1.1. In
   addition I tried make uninstall for both versions and then again
   make install for devel. Pacemaker does not come up, crm_mon shows
   nodes as offline.
  
   I suspect reason is
   May  5 17:09:34 devel1 crmd: [5942]: notice: crmd_peer_update: Status 
   update: Client devel1/crmd now has status [online] (DC=null)
   May  5 17:09:34 devel1 crmd: [5942]: info: crm_update_peer: Node 
   devel1: id=1790093504 state=unknown addr=(null) votes=0 born=0 seen=0 
   proc=00111312 (new)
   May  5 17:09:34 devel1 crmd: [5942]: info: pcmk_quorum_notification: 
   Membership 0: quorum retained (0)
   May  5 17:09:34 devel1 crmd: [5942]: debug: do_fsa_action: 
   actions:trace: #011// A_STARTED
   May  5 17:09:34 devel1 crmd: [5942]: info: do_started: Delaying start, 
   no membership data (0010)
 
   ^
   May  5 17:09:34 devel1 crmd: [5942]: debug: register_fsa_input_adv: 
   Stalling the FSA pending further input: cause=C_FSA_INTERNAL
  
   Any ideas ?
  
   Hg version?  Corosync config?
   I'm running -devel here right now and things are fine.
 
  Uh, I think I see now.
  Try http://hg.clusterlabs.org/pacemaker/1.1/rev/b94ce5673ce4
 
 
 Yeah, I realized afterwards that it was specific to devel.
 What does your corosync config look like?
I run corosync-1.3.0-3.1.x86_64.
It's exactly the same config that worked with
pacemaker 1.1 rev 10608:b4f456380f60



# Please read the corosync.conf.5 manual page
compatibility: whitetank

aisexec {
# Run as root - this is necessary to be able to manage
# resources with Pacemaker
user:   root
group:  root
}

service {
# Load the Pacemaker Cluster Resource Manager
ver:1
name:   pacemaker
use_mgmtd:  yes
use_logd:   yes
}

totem {
# The only valid version is 2
version:2

# How long before declaring a token lost (ms)
token:  5000

# How many token retransmits before forming a new configuration
token_retransmits_before_loss_const: 10

# How long to wait for join messages in the membership protocol (ms)
join:   60

# How long to wait for consensus to be achieved before starting
# a new round of membership configuration (ms)
consensus:  6000

# Turn off the virtual synchrony filter
vsftype:none

# Number of messages that may be sent by one processor on
# receipt of the token
max_messages:   20

# Limit generated nodeids to 31-bits (positive signed integers)
clear_node_high_bit: yes

# Disable encryption
secauth:off

# How many threads to use for encryption/decryption
threads:0

# Optionally assign a fixed node id (integer)
# nodeid:   1234

rrp_mode:   active

interface {
ringnumber: 0

# The following values need to be set based on your environment
bindnetaddr:192.168.178.0
mcastaddr:  226.94.40.1
mcastport:  5409
}

interface {
ringnumber: 1

# The following values need to be set based on your environment
bindnetaddr:10.1.1.0
mcastaddr:  226.94.41.1
mcastport:  5411
}
}

logging {
fileline:   off
to_stderr:  no
to_logfile: no
to_syslog:  yes
syslog_facility: daemon
debug:  on
timestamp:  off
}

amf {
mode: disabled
}

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter

2011-05-09 Thread Holger Teutsch
I had 1.1 but Dejan asked my to rebase my patches on devel.

So long story short: devel now works after upgrading to the rev you
mentioned and I got back to working on my patches.

Thanx
Holger

On Mon, 2011-05-09 at 10:58 +0200, Andrew Beekhof wrote:
 I thought you said you were running 1.1?
 
 May  5 17:09:33 devel1 pacemakerd: [5929]: info: read_config: Reading
 configure for stack: corosync
 
 This message is specific to the devel branch.
 
 Update to get the following fix and you should be fine:
 http://hg.clusterlabs.org/pacemaker/devel/rev/84ef5401322f
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter

2011-05-09 Thread Holger Teutsch
On Wed, 2011-04-27 at 13:25 +0200, Andrew Beekhof wrote:
 On Sun, Apr 24, 2011 at 4:31 PM, Holger Teutsch holger.teut...@web.de wrote:
...
  Remaining diffs seem to be not related to my changes.
 
 Unlikely I'm afraid.  We run the regression tests after every commit
 and complain loudly if they fail.
 What is the regression test output?

That's the output of tools/regression.sh of pacemaker-devel *without* my
patches:
Version: parent: 10731:bf7b957f4cbe tip

see attachment
-holger


Using local binaries from: .
* Passed: cibadmin   - Require --force for CIB erasure
* Passed: cibadmin   - Allow CIB erasure with --force
* Passed: cibadmin   - Query CIB
* Passed: crm_attribute  - Set cluster option
* Passed: cibadmin   - Query new cluster option
* Passed: cibadmin   - Query cluster options
* Passed: cibadmin   - Delete nvpair
* Passed: cibadmin   - Create operaton should fail with: -21, The object 
already exists
* Passed: cibadmin   - Modify cluster options section
* Passed: cibadmin   - Query updated cluster option
* Passed: crm_attribute  - Set duplicate cluster option
* Passed: crm_attribute  - Setting multiply defined cluster option should fail 
with -216, Could not set cluster option
* Passed: crm_attribute  - Set cluster option with -s
* Passed: crm_attribute  - Delete cluster option with -i
* Passed: cibadmin   - Create node entry
* Passed: cibadmin   - Create node status entry
* Passed: crm_attribute  - Create node attribute
* Passed: cibadmin   - Query new node attribute
* Passed: cibadmin   - Digest calculation
* Passed: cibadmin   - Replace operation should fail with: -45, Update was 
older than existing configuration
* Passed: crm_standby- Default standby value
* Passed: crm_standby- Set standby status
* Passed: crm_standby- Query standby value
* Passed: crm_standby- Delete standby value
* Passed: cibadmin   - Create a resource
* Passed: crm_resource   - Create a resource meta attribute
* Passed: crm_resource   - Query a resource meta attribute
* Passed: crm_resource   - Remove a resource meta attribute
* Passed: crm_resource   - Create a resource attribute
* Passed: crm_resource   - List the configured resources
* Passed: crm_resource   - Set a resource's fail-count
* Passed: crm_resource   - Require a destination when migrating a resource that 
is stopped
* Passed: crm_resource   - Don't support migration to non-existant locations
* Passed: crm_resource   - Migrate a resource
* Passed: crm_resource   - Un-migrate a resource
--- ./regression.exp2011-05-09 20:26:27.669381187 +0200
+++ ./regression.out2011-05-09 20:38:27.112098949 +0200
@@ -616,7 +616,7 @@
   /status
 /cib
 * Passed: crm_resource   - List the configured resources
-cib epoch=16 num_updates=2 admin_epoch=0 validate-with=pacemaker-1.2 
+cib epoch=16 num_updates=1 admin_epoch=0 validate-with=pacemaker-1.2 
   configuration
 crm_config
   cluster_property_set id=cib-bootstrap-options/
@@ -642,19 +642,13 @@
 constraints/
   /configuration
   status
-node_state id=clusterNode-UUID uname=clusterNode-UNAME
-  transient_attributes id=clusterNode-UUID
-instance_attributes id=status-clusterNode-UUID
-  nvpair id=status-clusterNode-UUID-fail-count-dummy 
name=fail-count-dummy value=10/
-/instance_attributes
-  /transient_attributes
-/node_state
+node_state id=clusterNode-UUID uname=clusterNode-UNAME/
   /status
 /cib
 * Passed: crm_resource   - Set a resource's fail-count
 Resource dummy not moved: not-active and no preferred location specified.
 Error performing operation: cib object missing
-cib epoch=16 num_updates=2 admin_epoch=0 validate-with=pacemaker-1.2 
+cib epoch=16 num_updates=1 admin_epoch=0 validate-with=pacemaker-1.2 
   configuration
 crm_config
   cluster_property_set id=cib-bootstrap-options/
@@ -680,19 +674,13 @@
 constraints/
   /configuration
   status
-node_state id=clusterNode-UUID uname=clusterNode-UNAME
-  transient_attributes id=clusterNode-UUID
-instance_attributes id=status-clusterNode-UUID
-  nvpair id=status-clusterNode-UUID-fail-count-dummy 
name=fail-count-dummy value=10/
-/instance_attributes
-  /transient_attributes
-/node_state
+node_state id=clusterNode-UUID uname=clusterNode-UNAME/
   /status
 /cib
 * Passed: crm_resource   - Require a destination when migrating a resource 
that is stopped
 Error performing operation: i.dont.exist is not a known node
 Error performing operation: The object/attribute does not exist
-cib epoch=16 num_updates=2 admin_epoch=0 validate-with=pacemaker-1.2 
+cib epoch=16 num_updates=1 admin_epoch=0 validate-with=pacemaker-1.2 
   configuration
 crm_config
   cluster_property_set id=cib-bootstrap-options/
@@ -718,13 +706,7 @@
 constraints/
   /configuration
   status
-node_state id=clusterNode-UUID uname=clusterNode-UNAME
-  transient_attributes id=clusterNode-UUID

Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter

2011-05-06 Thread Holger Teutsch
On Fri, 2011-05-06 at 11:03 +0200, Andrew Beekhof wrote:
 On Fri, May 6, 2011 at 9:53 AM, Andrew Beekhof and...@beekhof.net wrote:
  On Thu, May 5, 2011 at 5:43 PM, Holger Teutsch holger.teut...@web.de 
  wrote:
  On Fri, 2011-04-29 at 09:41 +0200, Andrew Beekhof wrote:
  
   Unfortunately the devel code does not run at all in my environment so I
   have to fix this first.
 
  Oh?  I ran CTS on it the other day and it was fine here.
 
 
  I installed pacemaker-devel on top of a compilation of pacemaker-1.1. In
  addition I tried make uninstall for both versions and then again
  make install for devel. Pacemaker does not come up, crm_mon shows
  nodes as offline.
 
  I suspect reason is
  May  5 17:09:34 devel1 crmd: [5942]: notice: crmd_peer_update: Status 
  update: Client devel1/crmd now has status [online] (DC=null)
  May  5 17:09:34 devel1 crmd: [5942]: info: crm_update_peer: Node devel1: 
  id=1790093504 state=unknown addr=(null) votes=0 born=0 seen=0 
  proc=00111312 (new)
  May  5 17:09:34 devel1 crmd: [5942]: info: pcmk_quorum_notification: 
  Membership 0: quorum retained (0)
  May  5 17:09:34 devel1 crmd: [5942]: debug: do_fsa_action: actions:trace: 
  #011// A_STARTED
  May  5 17:09:34 devel1 crmd: [5942]: info: do_started: Delaying start, no 
  membership data (0010)

  ^
  May  5 17:09:34 devel1 crmd: [5942]: debug: register_fsa_input_adv: 
  Stalling the FSA pending further input: cause=C_FSA_INTERNAL
 
  Any ideas ?
 
  Hg version?  Corosync config?
  I'm running -devel here right now and things are fine.
 
 Uh, I think I see now.
 Try http://hg.clusterlabs.org/pacemaker/1.1/rev/b94ce5673ce4
 

Page not found.



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter

2011-05-05 Thread Holger Teutsch
On Fri, 2011-04-29 at 09:41 +0200, Andrew Beekhof wrote:
 
  Unfortunately the devel code does not run at all in my environment so I
  have to fix this first.
 
 Oh?  I ran CTS on it the other day and it was fine here.
 

I installed pacemaker-devel on top of a compilation of pacemaker-1.1. In
addition I tried make uninstall for both versions and then again
make install for devel. Pacemaker does not come up, crm_mon shows
nodes as offline.

I suspect reason is
May  5 17:09:34 devel1 crmd: [5942]: notice: crmd_peer_update: Status update: 
Client devel1/crmd now has status [online] (DC=null)
May  5 17:09:34 devel1 crmd: [5942]: info: crm_update_peer: Node devel1: 
id=1790093504 state=unknown addr=(null) votes=0 born=0 seen=0 
proc=00111312 (new)
May  5 17:09:34 devel1 crmd: [5942]: info: pcmk_quorum_notification: Membership 
0: quorum retained (0)
May  5 17:09:34 devel1 crmd: [5942]: debug: do_fsa_action: actions:trace: 
#011// A_STARTED
May  5 17:09:34 devel1 crmd: [5942]: info: do_started: Delaying start, no 
membership data (0010)
   
^
May  5 17:09:34 devel1 crmd: [5942]: debug: register_fsa_input_adv: Stalling 
the FSA pending further input: cause=C_FSA_INTERNAL

Any ideas ?
-holger




May  5 17:09:33 devel1 pacemakerd: [5929]: info: Invoked: pacemakerd 
May  5 17:09:33 devel1 pacemakerd: [5929]: info: crm_log_init_worker: Changed active directory to /var/lib/heartbeat/cores/root
May  5 17:09:33 devel1 pacemakerd: [5929]: info: get_cluster_type: Cluster type is: 'corosync'
May  5 17:09:33 devel1 pacemakerd: [5929]: info: read_config: Reading configure for stack: corosync
May  5 17:09:33 devel1 corosync[2101]:  [CONFDB] lib_init_fn: conn=0x6872f0
May  5 17:09:33 devel1 pacemakerd: [5929]: info: config_find_next: Processing additional logging options...
May  5 17:09:33 devel1 pacemakerd: [5929]: info: get_config_opt: Found 'on' for option: debug
May  5 17:09:33 devel1 pacemakerd: [5929]: info: get_config_opt: Found 'no' for option: to_logfile
May  5 17:09:33 devel1 pacemakerd: [5929]: info: get_config_opt: Found 'yes' for option: to_syslog
May  5 17:09:33 devel1 pacemakerd: [5929]: info: get_config_opt: Found 'daemon' for option: syslog_facility
May  5 17:09:33 devel1 corosync[2101]:  [CONFDB] exit_fn for conn=0x6872f0
May  5 17:09:33 devel1 pacemakerd: [5931]: info: crm_log_init_worker: Changed active directory to /var/lib/heartbeat/cores/root
May  5 17:09:33 devel1 pacemakerd: [5931]: info: main: Starting Pacemaker 1.1.5 (Build: unknown):  ncurses corosync-quorum corosync
May  5 17:09:33 devel1 pacemakerd: [5931]: info: main: Maximum core file size is: 18446744073709551615
May  5 17:09:33 devel1 pacemakerd: [5931]: debug: cluster_connect_cfg: Our nodeid: 1790093504
May  5 17:09:33 devel1 pacemakerd: [5931]: debug: cluster_connect_cfg: Adding fd=5 to mainloop
May  5 17:09:33 devel1 corosync[2101]:  [CPG   ] lib_init_fn: conn=0x68bfc0, cpd=0x6903f0
May  5 17:09:33 devel1 pacemakerd: [5931]: debug: cluster_connect_cpg: Our nodeid: 1790093504
May  5 17:09:33 devel1 pacemakerd: [5931]: debug: cluster_connect_cpg: Adding fd=6 to mainloop
May  5 17:09:33 devel1 pacemakerd: [5931]: info: update_node_processes: 0x60d920 Node 1790093504 now known as devel1 (was: (null))
May  5 17:09:33 devel1 pacemakerd: [5931]: info: update_node_processes: Node devel1 now has process list: 0002 (was )
May  5 17:09:33 devel1 corosync[2101]:  [TOTEM ] mcasted message added to pending queue
May  5 17:09:33 devel1 corosync[2101]:  [CPG   ] got mcast request on 0x68bfc0
May  5 17:09:33 devel1 corosync[2101]:  [TOTEM ] Delivering 24 to 25
May  5 17:09:33 devel1 corosync[2101]:  [TOTEM ] Delivering MCAST message with seq 25 to pending delivery queue
May  5 17:09:33 devel1 corosync[2101]:  [CPG   ] got procjoin message from cluster node 1790093504
May  5 17:09:33 devel1 corosync[2101]:  [TOTEM ] Received ringid(192.168.178.106:332) seq 25
May  5 17:09:33 devel1 corosync[2101]:  [TOTEM ] Received ringid(192.168.178.106:332) seq 25
May  5 17:09:33 devel1 corosync[2101]:  [TOTEM ] mcasted message added to pending queue
May  5 17:09:33 devel1 corosync[2101]:  [TOTEM ] Delivering 25 to 26
May  5 17:09:33 devel1 corosync[2101]:  [TOTEM ] Delivering MCAST message with seq 26 to pending delivery queue
May  5 17:09:33 devel1 corosync[2101]:  [TOTEM ] Received ringid(192.168.178.106:332) seq 26
May  5 17:09:33 devel1 corosync[2101]:  [TOTEM ] Received ringid(192.168.178.106:332) seq 26
May  5 17:09:33 devel1 corosync[2101]:  [TOTEM ] releasing messages up to and including 25
May  5 17:09:33 devel1 pacemakerd: [5931]: info: start_child: Forked child 5935 for process stonith-ng
May  5 17:09:33 devel1 pacemakerd: [5931]: info: update_node_processes: Node devel1 now has process list: 0012 (was 

Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter

2011-04-28 Thread Holger Teutsch
Hi Dejan,

On Tue, 2011-04-26 at 17:58 +0200, Dejan Muhamedagic wrote:
 Hi Holger,
 
 On Sun, Apr 24, 2011 at 04:31:33PM +0200, Holger Teutsch wrote:
  On Mon, 2011-04-11 at 20:50 +0200, Andrew Beekhof wrote:
   why?
   CMD_ERR(Resource %s not moved:
specifying --master is not supported for
   --move-from\n, rsc_id);
   
  it did not look sensible to me but I can't recall the exact reasons 8-)
  It's now implemented.
   also the legacy handling is a little off - do a make install and run
   tools/regression.sh and you'll see what i mean.
  
  Remaining diffs seem to be not related to my changes.
  
   other than that the crm_resource part looks pretty good.
   can you add some regression testcases in tools/ too please?
   
  Will add them once the code is in the repo.
  
  Latest diffs are attached.
 
 The diffs seem to be against the 1.1 code, but this should go
 into the devel repository. Can you please rebase the patches
 against the devel code.
 
Unfortunately the devel code does not run at all in my environment so I
have to fix this first.

- holger
 Cheers,
 
 Dejan
 
  -holger
  
 



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter

2011-04-24 Thread Holger Teutsch
On Mon, 2011-04-11 at 20:50 +0200, Andrew Beekhof wrote:
 why?
 CMD_ERR(Resource %s not moved:
  specifying --master is not supported for
 --move-from\n, rsc_id);
 
it did not look sensible to me but I can't recall the exact reasons 8-)
It's now implemented.
 also the legacy handling is a little off - do a make install and run
 tools/regression.sh and you'll see what i mean.

Remaining diffs seem to be not related to my changes.

 other than that the crm_resource part looks pretty good.
 can you add some regression testcases in tools/ too please?
 
Will add them once the code is in the repo.

Latest diffs are attached.

-holger

diff -r b4f456380f60 shell/modules/ui.py.in
--- a/shell/modules/ui.py.in	Thu Mar 17 09:41:25 2011 +0100
+++ b/shell/modules/ui.py.in	Sun Apr 24 16:18:59 2011 +0200
@@ -738,8 +738,9 @@
 rsc_status = crm_resource -W -r '%s'
 rsc_showxml = crm_resource -q -r '%s'
 rsc_setrole = crm_resource --meta -r '%s' -p target-role -v '%s'
-rsc_migrate = crm_resource -M -r '%s' %s
-rsc_unmigrate = crm_resource -U -r '%s'
+rsc_move_to = crm_resource --move-to -r '%s' %s
+rsc_move_from = crm_resource --move-from -r '%s' %s
+rsc_move_cleanup = crm_resource --move-cleanup -r '%s'
 rsc_cleanup = crm_resource -C -r '%s' -H '%s'
 rsc_cleanup_all = crm_resource -C -r '%s'
 rsc_param =  {
@@ -776,8 +777,12 @@
 self.cmd_table[demote] = (self.demote,(1,1),0)
 self.cmd_table[manage] = (self.manage,(1,1),0)
 self.cmd_table[unmanage] = (self.unmanage,(1,1),0)
+# the next two commands are deprecated
 self.cmd_table[migrate] = (self.migrate,(1,4),0)
 self.cmd_table[unmigrate] = (self.unmigrate,(1,1),0)
+self.cmd_table[move-to] = (self.move_to,(2,4),0)
+self.cmd_table[move-from] = (self.move_from,(1,4),0)
+self.cmd_table[move-cleanup] = (self.move_cleanup,(1,1),0)
 self.cmd_table[param] = (self.param,(3,4),1)
 self.cmd_table[meta] = (self.meta,(3,4),1)
 self.cmd_table[utilization] = (self.utilization,(3,4),1)
@@ -846,9 +851,67 @@
 if not is_name_sane(rsc):
 return False
 return set_deep_meta_attr(is-managed,false,rsc)
+def move_to(self,cmd,*args):
+usage: move-to rsc[:master] node [lifetime] [force]
+elem = args[0].split(':')
+rsc = elem[0]
+master = False
+if len(elem)  1:
+master = elem[1]
+if master != master:
+common_error(%s is invalid, specify 'master' % master)
+return False
+master = True
+if not is_name_sane(rsc):
+return False
+node = args[1]
+lifetime = None
+force = False
+if len(args) == 3:
+if args[2] == force:
+force = True
+else:
+lifetime = args[2]
+elif len(args) == 4:
+if args[2] == force:
+force = True
+lifetime = args[3]
+elif args[3] == force:
+force = True
+lifetime = args[2]
+else:
+syntax_err((cmd,force))
+return False
+
+opts = ''
+if node:
+opts = --node='%s' % node
+if lifetime:
+opts = %s --lifetime='%s' % (opts,lifetime)
+if force or user_prefs.get_force():
+opts = %s --force % opts
+if master:
+opts = %s --master % opts
+return ext_cmd(self.rsc_move_to % (rsc,opts)) == 0
+
 def migrate(self,cmd,*args):
-usage: migrate rsc [node] [lifetime] [force]
-rsc = args[0]
+Deprecated: migrate rsc [node] [lifetime] [force]
+common_warning(migrate is deprecated, use move-to or move-from)
+if len(args) = 2 and args[1] in listnodes():
+return self.move_to(cmd, *args)
+return self.move_from(cmd, *args)
+
+def move_from(self,cmd,*args):
+usage: move-from rsc[:master] [node] [lifetime] [force]
+elem = args[0].split(':')
+rsc = elem[0]
+master = False
+if len(elem)  1:
+master = elem[1]
+if master != master:
+common_error(%s is invalid, specify 'master' % master)
+return False
+master = True
 if not is_name_sane(rsc):
 return False
 node = None
@@ -888,12 +951,18 @@
 opts = %s --lifetime='%s' % (opts,lifetime)
 if force or user_prefs.get_force():
 opts = %s --force % opts
-return ext_cmd(self.rsc_migrate % (rsc,opts)) == 0
-def unmigrate(self,cmd,rsc):
-usage: unmigrate rsc
+if master:
+opts = %s --master % opts
+return ext_cmd(self.rsc_move_from % (rsc,opts)) == 0
+def move_cleanup(self,cmd,rsc):
+usage: move_cleanup rsc
 if not is_name_sane(rsc):
 

Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter

2011-04-08 Thread Holger Teutsch
On Thu, 2011-04-07 at 12:33 +0200, Dejan Muhamedagic wrote:
  New syntax:
  ---
  
  crm_resource --move-from --resource myresource --node mynode
 - all resource variants: check whether active on mynode, then create 
  standby constraint
  
  crm_resource --move-from --resource myresource
 - primitive/group: set --node `current_node`, then create standby 
  constraint
 - clone/master: refused
  
  crm_resource --move-to --resource myresource --node mynode
- all resource variants: create prefer constraint
  
  crm_resource --move-to --resource myresource --master --node mynode
- master: check whether active as slave on mynode, then create prefer 
  constraint for master role
- others: refused
  
  crm_resource --move-cleanup --resource myresource
- zap constraints
  
  As we are already short on meaningful single letter options I vote for long 
  options only.
  
  Backwards Compatibility:
  
  
  crm_resource {-M|--move} --resource myresource
- output deprecation warning
- treat as crm_resource --move-from --resource myresource
  
  crm_resource {-M|--move} --resource myresource --node mynode
- output deprecation warning
- treat as crm_resource --move-to --resource myresource --node mynode
  
  crm_resource {-U|--unmove} --resource myresource
- output deprecation warning
- treat as crm_resource --move-cleanup --resource myresource
 
 All looks fine to me.
 
  For the shell:
  Should we go for similar commands or keep migrate-XXX
 
 migrate is a bit of a misnomer, could be confused with the
 migrate operation. I'd vote to leave old migrate/unmigrate
 as deprecated and introduce just move-from/to/cleanup variants.
 

Deajn  Andrew,
please find attached the patches that implement these commands for
review. The require the patch
 
Low: lib/common/utils.c: Don't try to print unprintable option values in 
crm_help

that I send separately because it is not directly related to the
movement stuff.

I think that the preceding discussions were very valuable to fully
understand issues and implications and I'm confident that the new
command set is consistent and behaves with predictable outcome.

Regards
Holger


diff -r b4f456380f60 doc/crm_cli.txt
--- a/doc/crm_cli.txt	Thu Mar 17 09:41:25 2011 +0100
+++ b/doc/crm_cli.txt	Fri Apr 08 14:23:59 2011 +0200
@@ -810,28 +810,44 @@
 unmanage rsc
 ...
 
-[[cmdhelp_resource_migrate,migrate a resource to another node]]
- `migrate` (`move`)
-
-Migrate a resource to a different node. If node is left out, the
-resource is migrated by creating a constraint which prevents it from
-running on the current node. Additionally, you may specify a
+[[cmdhelp_resource_move-to,move a resource to another node]]
+ `move-to`
+
+Move a resource to a different node. The resource is moved by creating
+a constraint which forces it to run on the specified node.
+Additionally, you may specify a lifetime for the constraint---once it
+expires, the location constraint will no longer be active.
+For a master resource specify rsc:master to move the master role.
+
+Usage:
+...
+move-to rsc[:master] node [lifetime] [force]
+...
+
+[[cmdhelp_resource_move-from,move a resource away from the specified node]]
+ `move-from`
+
+Move a resource away from the specified node. 
+If node is left out, the the node where the resource is currently active
+is used.
+The resource is moved by creating a constraint which prevents it from
+running on the specified node. Additionally, you may specify a
 lifetime for the constraint---once it expires, the location
 constraint will no longer be active.
 
 Usage:
 ...
-migrate rsc [node] [lifetime] [force]
+move-from rsc [node] [lifetime] [force]
 ...
 
-[[cmdhelp_resource_unmigrate,unmigrate a resource to another node]]
- `unmigrate` (`unmove`)
-
-Remove the constraint generated by the previous migrate command.
+[[cmdhelp_resource_move-cleanup,Cleanup previously created move constraint]]
+ `move-cleanup`
+
+Remove the constraint generated by the previous move-to/move-from command.
 
 Usage:
 ...
-unmigrate rsc
+move-cleanup rsc
 ...
 
 [[cmdhelp_resource_param,manage a parameter of a resource]]
diff -r b4f456380f60 tools/crm_resource.c
--- a/tools/crm_resource.c	Thu Mar 17 09:41:25 2011 +0100
+++ b/tools/crm_resource.c	Fri Apr 08 15:02:39 2011 +0200
@@ -52,7 +52,8 @@
 const char *prop_id = NULL;
 const char *prop_set = NULL;
 char *move_lifetime = NULL;
-char rsc_cmd = 'L';
+int move_master = 0;
+int rsc_cmd = 'L';
 char *our_pid = NULL;
 IPC_Channel *crmd_channel = NULL;
 char *xml_file = NULL;
@@ -192,6 +193,33 @@
 return 0;
 }
 
+/* return role of resource on node */
+static int
+role_on_node(resource_t *rsc, const char *node_uname)
+{
+GListPtr lpc = NULL;
+
+if(rsc-variant  pe_native) {
+/* recursively call down */
+	

[Pacemaker] [PATCH]Low: lib/common/utils.c: Don't try to print unprintable option values in crm_help

2011-04-08 Thread Holger Teutsch
Hi,
during work on the move-XXX stuff I discovered this.
Regards
Holger

# HG changeset patch
# User Holger Teutsch holger.teut...@web.de
# Date 1302259903 -7200
# Branch mig
# Node ID caed31174dc966450a31da048b640201980870a8
# Parent  9451c288259b7b9fd6f32f5df01d47569e570c58
Low: lib/common/utils.c: Don't try to print unprintable option values in crm_help

diff -r 9451c288259b -r caed31174dc9 lib/common/utils.c
--- a/lib/common/utils.c	Tue Apr 05 13:24:21 2011 +0200
+++ b/lib/common/utils.c	Fri Apr 08 12:51:43 2011 +0200
@@ -2281,7 +2281,13 @@
 		fprintf(stream, %s\n, crm_long_options[i].desc);
 		
 	} else {
-		fprintf(stream,  -%c, --%s%c%s\t%s\n, crm_long_options[i].val, crm_long_options[i].name,
+/* is val printable as char ? */
+if(crm_long_options[i].val = UCHAR_MAX) {
+fprintf(stream,  -%c,, crm_long_options[i].val);
+} else {
+fputs(, stream);
+}
+		fprintf(stream,  --%s%c%s\t%s\n, crm_long_options[i].name,
 			crm_long_options[i].has_arg?'=':' ',crm_long_options[i].has_arg?value:,
 			crm_long_options[i].desc?crm_long_options[i].desc:);
 	}
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter

2011-04-07 Thread Holger Teutsch
On Thu, 2011-04-07 at 08:57 +0200, Andrew Beekhof wrote:
 On Wed, Apr 6, 2011 at 5:48 PM, Holger Teutsch holger.teut...@web.de wrote:
  On Wed, 2011-04-06 at 15:38 +0200, Dejan Muhamedagic wrote:
  On Wed, Apr 06, 2011 at 01:00:36PM +0200, Andrew Beekhof wrote:
   On Tue, Apr 5, 2011 at 12:27 PM, Dejan Muhamedagic deja...@fastmail.fm 
   wrote:
Ah, right, sorry, wanted to ask about the difference between
move-off and move. The description looks the same as for move. Is
it that in this case it is for clones so crm_resource needs an
extra node parameter? You wrote in the doc:
   
   +Migrate a resource (-instance for clones/masters) off the 
specified node.
   
The '-instance' looks somewhat funny. Why not say Move/migrate a
clone or master/slave instance away from the specified node?
   
I must say that I still find all this quite confusing, i.e. now
we have move, unmove, and move-off, but it's probably just me :)
  
   Not just you.  The problem is that we didn't fully understand all the
   use case permutations at the time.
  
   I think, not withstanding legacy computability, move should probably
   be renamed to move-to and this new option be called move-from.
   That seems more obvious and syntactically consistent with the rest of
   the system.
 
  Yes, move-to and move-from seem more consistent than other
  options. The problem is that the old move is at times one and
  then at times another.
 
   In the absence of a host name, each uses the current location for the
   named group/primitive resource and complains for clones.
  
   The biggest question in my mind is what to call unmove...
   move-cleanup perhaps?
 
  move-remove? :D
  Actually, though the word is a bit awkward, unmove sounds fine
  to me.
 
  I would vote for move-cleanup. It's consistent to move-XXX and to my
  (german) ears unmove seems to stand for the previous move being
  undone and the stuff comes back.
 
  BTW: Has someone already tried out the code or do you trust me 8-D ?
 
 I trust no-one - which is why we have regression tests :-)
 
 
  Stay tuned for updated patches...

Now, after an additional discussion round I propose the following:
Please note that for consistency the --node argument is optional for 
--move-from

New syntax:
---

crm_resource --move-from --resource myresource --node mynode
   - all resource variants: check whether active on mynode, then create 
standby constraint

crm_resource --move-from --resource myresource
   - primitive/group: set --node `current_node`, then create standby constraint
   - clone/master: refused

crm_resource --move-to --resource myresource --node mynode
  - all resource variants: create prefer constraint

crm_resource --move-to --resource myresource --master --node mynode
  - master: check whether active as slave on mynode, then create prefer 
constraint for master role
  - others: refused

crm_resource --move-cleanup --resource myresource
  - zap constraints

As we are already short on meaningful single letter options I vote for long 
options only.

Backwards Compatibility:


crm_resource {-M|--move} --resource myresource
  - output deprecation warning
  - treat as crm_resource --move-from --resource myresource

crm_resource {-M|--move} --resource myresource --node mynode
  - output deprecation warning
  - treat as crm_resource --move-to --resource myresource --node mynode

crm_resource {-U|--unmove} --resource myresource
  - output deprecation warning
  - treat as crm_resource --move-cleanup --resource myresource

For the shell:
Should we go for similar commands or keep migrate-XXX


Coming back to Dejan's proposal of move-remove:

That can be implemented of reexecuting the last move (a remove).
Reimplemeting unmove as undo of the last move you have shortcuts for your 
favorite move operation

move
move-unmove - back
move-remove - and forth

Just kidding...
 

 
  - holger
 



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter

2011-04-06 Thread Holger Teutsch
On Wed, 2011-04-06 at 15:38 +0200, Dejan Muhamedagic wrote:
 On Wed, Apr 06, 2011 at 01:00:36PM +0200, Andrew Beekhof wrote:
  On Tue, Apr 5, 2011 at 12:27 PM, Dejan Muhamedagic deja...@fastmail.fm 
  wrote:
   Ah, right, sorry, wanted to ask about the difference between
   move-off and move. The description looks the same as for move. Is
   it that in this case it is for clones so crm_resource needs an
   extra node parameter? You wrote in the doc:
  
  +Migrate a resource (-instance for clones/masters) off the 
   specified node.
  
   The '-instance' looks somewhat funny. Why not say Move/migrate a
   clone or master/slave instance away from the specified node?
  
   I must say that I still find all this quite confusing, i.e. now
   we have move, unmove, and move-off, but it's probably just me :)
  
  Not just you.  The problem is that we didn't fully understand all the
  use case permutations at the time.
  
  I think, not withstanding legacy computability, move should probably
  be renamed to move-to and this new option be called move-from.
  That seems more obvious and syntactically consistent with the rest of
  the system.
 
 Yes, move-to and move-from seem more consistent than other
 options. The problem is that the old move is at times one and
 then at times another.
 
  In the absence of a host name, each uses the current location for the
  named group/primitive resource and complains for clones.
  
  The biggest question in my mind is what to call unmove...
  move-cleanup perhaps?
 
 move-remove? :D
 Actually, though the word is a bit awkward, unmove sounds fine
 to me.

I would vote for move-cleanup. It's consistent to move-XXX and to my
(german) ears unmove seems to stand for the previous move being
undone and the stuff comes back.

BTW: Has someone already tried out the code or do you trust me 8-D ?

Stay tuned for updated patches...

- holger
 
 Thanks,
 
 Dejan
 
  ___
  Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
  
  Project Home: http://www.clusterlabs.org
  Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
  Bugs: 
  http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter

2011-04-05 Thread Holger Teutsch
On Mon, 2011-04-04 at 21:31 +0200, Holger Teutsch wrote:
 On Mon, 2011-04-04 at 15:24 +0200, Andrew Beekhof wrote:
  On Mon, Apr 4, 2011 at 2:43 PM, Holger Teutsch holger.teut...@web.de 
  wrote:
   On Mon, 2011-04-04 at 11:05 +0200, Andrew Beekhof wrote:
   On Sat, Mar 19, 2011 at 11:55 AM, Holger Teutsch holger.teut...@web.de 
   wrote:
Hi Dejan,
   
On Fri, 2011-03-18 at 14:24 +0100, Dejan Muhamedagic wrote:
Hi,
   
On Fri, Mar 18, 2011 at 12:21:40PM +0100, Holger Teutsch wrote:
 Hi,
 I would like to submit 2 patches of an initial implementation for
 discussion.
..
 To recall:

 crm_resource --move resource
 creates a standby rule that moves the resource off the currently
 active node

 while

 crm_resource --move resource --node newnode
 creates a prefer rule that moves the resource to the new node.

 When dealing with clones and masters the behavior was random as the 
 code
 only considers the node where the first instance of the clone was
 started.

 The new code behaves consistently for the master role of an m/s
 resource. The options --master and rsc:master are somewhat 
 redundant
 as a slave move is not supported. Currently it's more an
 acknowledgement of the user.

 On the other hand it is desirable (and was requested several times 
 on
 the ML) to stop a single resource instance of a clone or master on a
 specific node.

 Should that be implemented by something like

 crm_resource --move-off --resource myresource --node devel2 ?

 or should

 crm_resource refuse to work on clones

 and/or should moving the master role be the default for m/s 
 resources
 and the --master option discarded ?
   
I think that we also need to consider the case when clone-max is
less than the number of nodes. If I understood correctly what you
were saying. So, all of move slave and move master and move clone
should be possible.
   
   
I think the following use cases cover what can be done with such kind 
of
interface:
   
crm_resource --moveoff --resource myresource --node mynode
  - all resource variants: check whether active on mynode, then 
create standby constraint
   
crm_resource --move --resource myresource
  - primitive/group: convert to --moveoff --node `current_node`
  - clone/master: refused
   
crm_resource --move --resource myresource --node mynode
 - primitive/group: create prefer constraint
 - clone/master: refused
  
   Not sure this needs to be refused.
  
   I see the problem that the node where the resource instance should be
   moved off had to be specified as well to get predictable behavior.
  
   Consider a a 2 way clone on a 3 node cluster.
   If the clone is active on A and B what should
  
   crm_resource --move --resource myClone --node C
  
   do ?
  
  I would expect it to create the +inf constraint for C but no
  contraint(s) for the current location(s)
 
 You are right. These are different and valid use cases.
 
 crm_resource --move --resource myClone --node C
- I want an instance on C, regardless where it is moved off
 
 crm_resource --move-off --resource myClone --node C
- I want the instance moved off C, regardless where it is moved on
 
 I tried them out with a reimplementation of the patch on a 3 node
 cluster with a resource with clone-max=2. The behavior appears logical
 (at least to me 8-) ).
 
  
   This would require an additional --from-node or similar.
  
   Other than that the proposal looks sane.
  
   My first thought was to make --move behave like --move-off if the
   resource is a clone or /ms, but since the semantics are the exact
   opposite, that might introduce introduce more problems than it solves.
  
   That was my perception as well.
  
  
   Does the original crm_resource patch implement this?
  
   No, I will submit an updated version later this week.
  
   - holger

Hi,
I submit revised patches for review.
Summarizing preceding discussions the following functionality is
implemented:

crm_resource --move-off --resource myresource --node mynode
   - all resource variants: check whether active on mynode, then create 
standby constraint

crm_resource --move --resource myresource
   - primitive/group: convert to --move-off --node `current_node`
   - clone/master: refused

crm_resource --move --resource myresource --node mynode
  - all resource variants: create prefer constraint

crm_resource --move --resource myresource --master --node mynode
  - master: check whether active as slave on mynode, then create prefer 
constraint for master role
  - others: refused


The patch shell_migrate.diff supports this in the shell. This stuff is
agnostic of what crm_migrate really does.

Regards
Holger

diff -r b4f456380f60 doc/crm_cli.txt
--- a/doc/crm_cli.txt	Thu Mar 17 09:41:25 2011 +0100
+++ b/doc/crm_cli.txt	Mon Apr 04

Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter

2011-04-05 Thread Holger Teutsch
Hi Dejan,

On Tue, 2011-04-05 at 13:40 +0200, Dejan Muhamedagic wrote:
 Hi Holger,
 
 On Tue, Apr 05, 2011 at 01:19:56PM +0200, Holger Teutsch wrote:
  Hi Dejan,
  
  On Tue, 2011-04-05 at 12:27 +0200, Dejan Muhamedagic wrote:
   On Tue, Apr 05, 2011 at 12:10:48PM +0200, Holger Teutsch wrote:
Hi Dejan,

On Tue, 2011-04-05 at 11:48 +0200, Dejan Muhamedagic wrote:
 Hi Holger,
 
 On Mon, Apr 04, 2011 at 09:31:02PM +0200, Holger Teutsch wrote:
  On Mon, 2011-04-04 at 15:24 +0200, Andrew Beekhof wrote:
 [...]
  
  crm_resource --move-off --resource myClone --node C
 - I want the instance moved off C, regardless where it is moved 
  on
 
 What is the difference between move-off and unmigrate (-U)?

--move-off - create a constraint that a resource should *not* run on
the specific node (partly as before --move without --node)

-U: zap all migration constraints (as before) 
   
   Ah, right, sorry, wanted to ask about the difference between
   move-off and move. The description looks the same as for move. Is
   it that in this case it is for clones so crm_resource needs an
   extra node parameter? You wrote in the doc:
   
 +Migrate a resource (-instance for clones/masters) off the specified 
   node.
   
   The '-instance' looks somewhat funny. Why not say Move/migrate a
   clone or master/slave instance away from the specified node?
  
  Moving away works for all kinds of resources so the text now looks like:
  
  diff -r b4f456380f60 doc/crm_cli.txt
  --- a/doc/crm_cli.txt   Thu Mar 17 09:41:25 2011 +0100
  +++ b/doc/crm_cli.txt   Tue Apr 05 13:08:10 2011 +0200
  @@ -818,10 +818,25 @@
   running on the current node. Additionally, you may specify a
   lifetime for the constraint---once it expires, the location
   constraint will no longer be active.
  +For a master resource specify rsc:master to move the master role.
   
   Usage:
   ...
  -migrate rsc [node] [lifetime] [force]
  +migrate rsc[:master] [node] [lifetime] [force]
  +...
  +
  +[[cmdhelp_resource_migrateoff,migrate a resource off the specified
  node]]
  + `migrateoff` (`moveoff`)
  +
  +Migrate a resource away from the specified node. 
  +The resource is migrated by creating a constraint which prevents it
  from
  +running on the specified node. Additionally, you may specify a
  +lifetime for the constraint---once it expires, the location
  +constraint will no longer be active.
  +
  +Usage:
  +...
  +migrateoff rsc node [lifetime] [force]
   ...
   
   [[cmdhelp_resource_unmigrate,unmigrate a resource to another node]]
  
   
   I must say that I still find all this quite confusing, i.e. now
   we have move, unmove, and move-off, but it's probably just me :)
  
  Think of move == move-to then it is simpler 8-)
  
  ... keeping in mind that for backward compatibility
  
  crm_resource --move --resource myResource
  
  is equivalent
  
  crm_resource --move-off --resource myResource --node $(current node)
  
  But as there is no current node for clones / masters the old
  implementation did some random movements...
 
 OK. Thanks for the clarification. I'd like to revise my previous
 comment about restricting use of certain constructs. For
 instance, in this case, if the command would result in a random
 movement then the shell should at least issue a warning about it.
 Or perhaps refuse to do that completely. I didn't take a look yet
 at the code, perhaps you've already done that.
 
 Thanks,
 
 Dejan
 
 

I admit you have to specify more verbosely what you want to achieve but
then the patched versions (based on patches I submitted today around
10:01) execute consistent and without surprises - at least for my test
cases. 

Regards
Holger



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter

2011-03-21 Thread Holger Teutsch
Hi Dejan,

On Mon, 2011-03-21 at 16:11 +0100, Dejan Muhamedagic wrote:
 Hi Holger,
 
 On Sat, Mar 19, 2011 at 11:55:57AM +0100, Holger Teutsch wrote:
  Hi Dejan,
  
  On Fri, 2011-03-18 at 14:24 +0100, Dejan Muhamedagic wrote:
   Hi,
   
   On Fri, Mar 18, 2011 at 12:21:40PM +0100, Holger Teutsch wrote:
Hi,
I would like to submit 2 patches of an initial implementation for
discussion.
  ..
To recall:

crm_resource --move resource
creates a standby rule that moves the resource off the currently
active node

while

crm_resource --move resource --node newnode
creates a prefer rule that moves the resource to the new node.

When dealing with clones and masters the behavior was random as the code
only considers the node where the first instance of the clone was
started.

The new code behaves consistently for the master role of an m/s
resource. The options --master and rsc:master are somewhat redundant
as a slave move is not supported. Currently it's more an
acknowledgement of the user.

On the other hand it is desirable (and was requested several times on
the ML) to stop a single resource instance of a clone or master on a
specific node.

Should that be implemented by something like
 
crm_resource --move-off --resource myresource --node devel2 ?

or should

crm_resource refuse to work on clones

and/or should moving the master role be the default for m/s resources
and the --master option discarded ?
   
   I think that we also need to consider the case when clone-max is
   less than the number of nodes. If I understood correctly what you
   were saying. So, all of move slave and move master and move clone
   should be possible.
   
  
  I think the following use cases cover what can be done with such kind of
  interface:
  
  crm_resource --moveoff --resource myresource --node mynode
 - all resource variants: check whether active on mynode, then create 
  standby constraint
  
  crm_resource --move --resource myresource
 - primitive/group: convert to --moveoff --node `current_node`
 - clone/master: refused
  
  crm_resource --move --resource myresource --node mynode
- primitive/group: create prefer constraint
- clone/master: refused
  
  crm_resource --move --resource myresource --master --node mynode
- master: create prefer constraint for master role
- others: refused
  
  They should work (witch foreseeable outcome!) regardless of the setting of 
  clone-max.
 
 This seems quite complicated to me. Took me a while to figure
 out what's what and where :) Why bother doing the thinking for

I'm afraid the matter *is* complicated. The current implementation of 

crm_resource --move --resource myResource

(without node name) is moving off the resource from the node it is
currently active on by creating a standby constraint. For clones and
masters there is no such *single* active node the constraint can be
constructed for.

Consider this use case:
I have 2 nodes and a clone or master and would like to safely get rid of
one instance on a particular node (e.g. with agents 1.0.5 the slave of a
DB2 HADR pair 8-) ). No idea how that should be done without a move-off
functionality. 

 users? The only case which seems to me worth considering is
 refusing setting role for non-ms resources. Otherwise, let's let
 the user move things around and enjoy the consequences.

Definitely not true for production clusters. The tools should produce
least surprise consequences.
  
 
 Cheers,
 

Over the weekend I implemented the above mentioned functionality. Drop
me note if you want to play with an early snapshot 8-)

Regards
Holger 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] [PATCH]Bug 2567 - crm resource migrate should support an optional role parameter

2011-03-18 Thread Holger Teutsch
Hi,
I would like to submit 2 patches of an initial implementation for
discussion.

Patch 1 implements migration of the Master role of an m/s resource to
another node in crm_resource
Patch 2 adds support for the shell.

crm_resource does this with options

crm_resource --move --resource ms_test --master --node devel2

The shell does the same with

crm resource migrate ms_test:master devel2

crm_resource insist on the options --master --node xxx if dealing with
m/s resources.

It is not easy to assess the expectations that a move command should
fulfill for something more complex than a group.

To recall:

crm_resource --move resource
creates a standby rule that moves the resource off the currently
active node

while

crm_resource --move resource --node newnode
creates a prefer rule that moves the resource to the new node.

When dealing with clones and masters the behavior was random as the code
only considers the node where the first instance of the clone was
started.

The new code behaves consistently for the master role of an m/s
resource. The options --master and rsc:master are somewhat redundant
as a slave move is not supported. Currently it's more an
acknowledgement of the user.

On the other hand it is desirable (and was requested several times on
the ML) to stop a single resource instance of a clone or master on a
specific node.

Should that be implemented by something like
 
crm_resource --move-off --resource myresource --node devel2 ?

or should

crm_resource refuse to work on clones

and/or should moving the master role be the default for m/s resources
and the --master option discarded ?


Regards
Holger
# HG changeset patch
# User Holger Teutsch holger.teut...@web.de
# Date 1300439791 -3600
# Branch mig
# Node ID dac1a4eae844f0bd857951b1154a171c80c25772
# Parent  b4f456380f60bd308acdc462215620f5bf530854
crm_resource.c: Add support for move of Master role of a m/s resource

diff -r b4f456380f60 -r dac1a4eae844 tools/crm_resource.c
--- a/tools/crm_resource.c	Thu Mar 17 09:41:25 2011 +0100
+++ b/tools/crm_resource.c	Fri Mar 18 10:16:31 2011 +0100
@@ -52,6 +52,7 @@
 const char *prop_id = NULL;
 const char *prop_set = NULL;
 char *move_lifetime = NULL;
+int move_master = 0;
 char rsc_cmd = 'L';
 char *our_pid = NULL;
 IPC_Channel *crmd_channel = NULL;
@@ -192,6 +193,32 @@
 return 0;
 }
 
+/* is m/s resource in master role on a host? */
+static int
+is_master_on(resource_t *rsc, const char *check_uname)
+{
+GListPtr lpc = NULL;
+
+if(rsc-variant  pe_native) {
+/* recursively call down */
+	GListPtr gIter = rsc-children;
+	for(; gIter != NULL; gIter = gIter-next) {
+	   if(is_master_on(gIter-data, check_uname))
+   return 1;
+}
+	return 0;
+}
+
+for(lpc = rsc-running_on; lpc != NULL; lpc = lpc-next) {
+	node_t *node = (node_t*)lpc-data;
+	if(rsc-variant == pe_native  rsc-role == RSC_ROLE_MASTER
+safe_str_eq(node-details-uname, check_uname)) {
+return 1;
+}
+}
+return 0;
+}
+
 #define cons_string(x) x?x:NA
 static void
 print_cts_constraints(pe_working_set_t *data_set) 
@@ -797,6 +824,7 @@
 static int
 move_resource(
 const char *rsc_id,
+int move_master,
 const char *existing_node, const char *preferred_node,
 cib_t *	cib_conn) 
 {
@@ -935,6 +963,10 @@
 	crm_xml_add(rule, XML_ATTR_ID, id);
 	crm_free(id);
 
+if(move_master) {
+crm_xml_add(rule, XML_RULE_ATTR_ROLE, Master);
+}
+
 	crm_xml_add(rule, XML_RULE_ATTR_SCORE, INFINITY_S);
 	crm_xml_add(rule, XML_RULE_ATTR_BOOLEAN_OP, and);
 	
@@ -1093,6 +1125,8 @@
 crm_free(prefix);
 }	
 
+/* out of single letter options */
+#define OPT_MASTER (256 + 'm')
 static struct crm_option long_options[] = {
 /* Top-level Options */
 {help,0, 0, '?', \t\tThis text},
@@ -1120,10 +1154,10 @@
 {get-property,1, 0, 'G', Display the 'class', 'type' or 'provider' of a resource, 1},
 {set-property,1, 0, 'S', (Advanced) Set the class, type or provider of a resource, 1},
 {move,0, 0, 'M',
- \t\tMove a resource from its current location, optionally specifying a destination (-N) and/or a period for which it should take effect (-u)
+ \t\tMove a resource from its current location, optionally specifying a role (--master), a destination (-N) and/or a period for which it should take effect (-u)
  \n\t\t\t\tIf -N is not specified, the cluster will force the resource to move by creating a rule for the current location and a score of -INFINITY
  \n\t\t\t\tNOTE: This will prevent the resource from running on this node until the constraint is removed with -U},
-{un-move, 0, 0, 'U', \tRemove all constraints created by a move command},
+{un-move, 0, 0, 'U', \t\tRemove all constraints created by a move command},
 
 {-spacer-,	1, 0, '-', \nAdvanced Commands:},
 {delete, 0, 0, 'D', \t\tDelete a resource from the CIB},
@@ -1137,6 +1171,7 @@
 {resource-type,	1, 0, 't', Resource type

Re: [Pacemaker] Failing back a multi-state resource eg. DRBD

2011-03-11 Thread Holger Teutsch
On Mon, 2011-03-07 at 14:21 +0100, Dejan Muhamedagic wrote:
 Hi,
 
 On Fri, Mar 04, 2011 at 09:12:46AM -0500, David McCurley wrote:
  Are you wanting to move all the resources back or just that one resource?
  
  I'm still learning, but one simple way I move all resources back from nodeb 
  to nodea is like this:
  
  # on nodeb
  sudo crm node standby
  # now services migrate to nodea
  # still on nodeb
  sudo crm node online
  
  This may be a naive way to do it but it works for now :)
 
 Yes, that would work. Though that would also make all other
 resources move from the standby node.
 
  There is also a crm resource migrate to migrate individual resources.  
  For that, see here:
 
 resource migrate has no option to move ms resources, i.e. to make
 another node the master.
 
 What would work right now is to create a temporary location
 constraint:
 
 location tmp1 ms-drbd0 \
 rule $id=tmp1-rule $role=Master inf: #uname eq nodea
 
 Then, once the drbd got promoted on nodea, just remove the
 constraint:
 
 crm configure delete tmp1
 
 Obviously, we'd need to make some improvements here. resource
 migrate uses crm_resource to insert the location constraint,
 perhaps we should update it to also accept the role parameter.
 
 Can you please make an enhancement bugzilla report so that this
 doesn't get lost.
 
 Thanks,
 
 Dejan

Hi Dejan,
it seems that the original author did not file the bug.
I entered it as

http://developerbugs.linux-foundation.org/show_bug.cgi?id=2567

Regards
Holger



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated

2011-03-10 Thread Holger Teutsch
Hi Dejan,
On Thu, 2011-03-10 at 10:14 +0100, Dejan Muhamedagic wrote:
 Hi Holger,
 
 On Wed, Mar 09, 2011 at 07:58:02PM +0100, Holger Teutsch wrote:
  Hi Dejan,
  
  On Wed, 2011-03-09 at 14:00 +0100, Dejan Muhamedagic wrote:
   Hi Holger,
  

  
  In order to show the intention of the arguments clearer:
  
  Instead of
  
  def _verify(self, set_obj_semantic, set_obj_all = None):
  if not set_obj_all:
  set_obj_all = set_obj_semantic
  rc1 = set_obj_all.verify()
  if user_prefs.check_frequency != never:
  rc2 = set_obj_semantic.semantic_check(set_obj_all)
  else:
  rc2 = 0
  return rc1 and rc2 = 1
  def verify(self,cmd):
  usage: verify
  if not cib_factory.is_cib_sane():
  return False
  return self._verify(mkset_obj(xml))
  
  This way (always passing both args):
  
  def _verify(self, set_obj_semantic, set_obj_all):
  rc1 = set_obj_all.verify()
  if user_prefs.check_frequency != never:
  rc2 = set_obj_semantic.semantic_check(set_obj_all)
  else:
  rc2 = 0
  return rc1 and rc2 = 1
  def verify(self,cmd):
  usage: verify
  if not cib_factory.is_cib_sane():
  return False
  set_obj_all = mkset_obj(xml)
  return self._verify(set_obj_all, set_obj_all)

See patch set_obj_all.diff

 
   My only remaining concern is performance. Though the meta-data is
   cached, perhaps it will pay off to save the RAInfo instance with
   the element. But we can worry about that later.
   
  
  I can work on this as next step.
 
 I'll do some testing on really big configurations and try to
 gauge the impact.

OK

 
 The patch makes some regression tests blow:
 
 +  File /usr/lib64/python2.6/site-packages/crm/ui.py, line 1441, in verify
 +return self._verify(mkset_obj(xml))
 +  File /usr/lib64/python2.6/site-packages/crm/ui.py, line 1433, in _verify
 +rc2 = set_obj_semantic.semantic_check(set_obj_all)
 +  File /usr/lib64/python2.6/site-packages/crm/cibconfig.py, line 294, in 
 semantic_check
 +rc = self.__check_unique_clash(set_obj_all)
 +  File /usr/lib64/python2.6/site-packages/crm/cibconfig.py, line 274, in 
 __check_unique_clash
 +process_primitive(node, clash_dict)
 +  File /usr/lib64/python2.6/site-packages/crm/cibconfig.py, line 259, in 
 process_primitive
 +if ra_params[ name ]['unique'] == '1':
 +KeyError: 'OCF_CHECK_LEVEL'
 
 Can't recall why OCF_CHECK_LEVEL appears here. There must be some
 good explanation :)

The good explanation is: Not only params are in instance_atributes ...
but OCF_CHECK_LEVEL as well within operations ...

The latest version no longer blows the test - semantic_check.diff

Regards
Holger
# HG changeset patch
# User Holger Teutsch holger.teut...@web.de
# Date 1299775617 -3600
# Branch hot
# Node ID 30730ccc0aa09c3a476a18c6d95c680b3595
# Parent  9fa61ee6e35ef190f4126e163e9bfe6911e35541
Low: Shell: Rename variable set_obj_verify to set_obj_all as it always contains all objects
Simplify usage of this var in [_]verify, pass to CibObjectSet.semantic_check

diff -r 9fa61ee6e35e -r 30730ccc0aa0 shell/modules/cibconfig.py
--- a/shell/modules/cibconfig.py	Wed Mar 09 13:41:27 2011 +0100
+++ b/shell/modules/cibconfig.py	Thu Mar 10 17:46:57 2011 +0100
@@ -230,7 +230,7 @@
 See below for specific implementations.
 '''
 pass
-def semantic_check(self):
+def semantic_check(self, set_obj_all):
 '''
 Test objects for sanity. This is about semantics.
 '''
diff -r 9fa61ee6e35e -r 30730ccc0aa0 shell/modules/ui.py.in
--- a/shell/modules/ui.py.in	Wed Mar 09 13:41:27 2011 +0100
+++ b/shell/modules/ui.py.in	Thu Mar 10 17:46:57 2011 +0100
@@ -1425,12 +1425,10 @@
 set_obj = mkset_obj(*args)
 err_buf.release() # show them, but get an ack from the user
 return set_obj.edit()
-def _verify(self, set_obj_semantic, set_obj_verify = None):
-if not set_obj_verify:
-set_obj_verify = set_obj_semantic
-rc1 = set_obj_verify.verify()
+def _verify(self, set_obj_semantic, set_obj_all):
+rc1 = set_obj_all.verify()
 if user_prefs.check_frequency != never:
-rc2 = set_obj_semantic.semantic_check()
+rc2 = set_obj_semantic.semantic_check(set_obj_all)
 else:
 rc2 = 0
 return rc1 and rc2 = 1
@@ -1438,7 +1436,8 @@
 usage: verify
 if not cib_factory.is_cib_sane():
 return False
-return self._verify(mkset_obj(xml))
+set_obj_all = mkset_obj(xml)
+return self._verify(set_obj_all, set_obj_all)
 def save(self,cmd,*args):
 usage: save [xml] filename
 if not cib_factory.is_cib_sane():
# HG changeset patch
# User Holger Teutsch holger.teut...@web.de
# Date 1299779740 -3600
# Branch hot
# Node ID 73021c988d92c5dad4c503af9f8826f5d1c34373
# Parent

Re: [Pacemaker] Patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated

2011-03-09 Thread Holger Teutsch
Hi Dejan,

On Tue, 2011-03-08 at 19:26 +0100, Holger Teutsch wrote:
 Hi Dejan,
 
 On Tue, 2011-03-08 at 12:07 +0100, Dejan Muhamedagic wrote:
  Hi Holger,
  
  On Tue, Mar 08, 2011 at 09:15:01AM +0100, Holger Teutsch wrote:
   On Fri, 2011-03-04 at 13:06 +0100, Holger Teutsch wrote:
On Thu, 2011-03-03 at 10:55 +0100, Florian Haas wrote:
 On 2011-03-03 10:43, Holger Teutsch wrote:
  Hi,
  I submit a patch for
  bugzilla 2541: Shell should warn if parameter uniqueness is 
  violated
  for discussion.
 
 ...
  It looks good, just a few notes. The check function should move
  to the CibObjectSetRaw class and be invoked from
 
 Will move it there.
 
  semantic_check(). There's
  
  rc1 = set_obj_verify.verify()
  if user_prefs.check_frequency != never:
  rc2 = set_obj_semantic.semantic_check()
  
  The last should be changed to:
  
  rc2 = set_obj_semantic.semantic_check(set_obj_verify)
  
  set_obj_verify always contains all CIB elements (well, that means
  that its name should probably be changed too :). Now, the code
  should check _only_ new and changed primitives which are
  contained in set_obj_semantic. That's because we don't want to
  repeatedly print warnings for all objects on commit, but only for
  those which were added/changed in the meantime. On the other
  hand, verify is an explicit check and in that case the whole CIB
  is always verified.
  
   
   +ra_class = prim.getAttribute(class)
   +ra_provider = prim.getAttribute(provider)
   +ra_type = prim.getAttribute(type)
   +ra_id = prim.getAttribute(id)
   +
   +ra = RAInfo(ra_class, ra_type, ra_provider)
  
  There's a convenience function get_ra(node) for this.
  
 
 I did not use this as I need all ra_XXX value anyhow later in the code
 for building k.
 
   +if ra == None:
   +return
   +ra_params = ra.params()
   +
   +attributes = prim.getElementsByTagName(instance_attributes)
   +if len(attributes) == 0:
   +return
   +
   +for p in attributes[0].getElementsByTagName(nvpair):
   +name = p.getAttribute(name)
   +if ra_params[ name ]['unique'] == '1':
   +value = p.getAttribute(value)
   +k = (ra_class, ra_provider, ra_type, name, value)
   +try:
   +clash_dict[k].append(ra_id)
   +except:
   +clash_dict[k] = [ra_id]
   +return
   +
   +clash_dict = {}
   +for p in cib_factory.mkobj_list(xml,type:primitive):
  
  This would become:
  
 for p in all_obj_list: # passed from _verify()
 if is_primitive(p.node):
  
   +process_primitive(p.node, clash_dict)
  
  Or perhaps to loop through self.obj_list and build clash_dict
  against all elements? Otherwise, you'll need to skip elements
  which don't pass the check but are not new/changed (in
  self.obj_list).
  
 

I did not pass set_obj_verify in semantic check as this variable only
by chance contains the right values.

- holger

Output:
crm(live)configure# primitive ip1a ocf:heartbeat:IPaddr2 params ip=1.2.3.4 
meta target-role=stopped
crm(live)configure# primitive ip1b ocf:heartbeat:IPaddr2 params ip=1.2.3.4 
meta target-role=stopped
crm(live)configure# commit
WARNING: Resources ip1a,ip1b violate uniqueness for parameter ip: 1.2.3.4
Do you still want to commit? y
crm(live)configure# primitive ip2a ocf:heartbeat:IPaddr2 params ip=1.2.3.5 
meta target-role=stopped
crm(live)configure# commit
crm(live)configure# primitive ip2b ocf:heartbeat:IPaddr2 params ip=1.2.3.5 
meta target-role=stopped
crm(live)configure# primitive ip3 ocf:heartbeat:IPaddr2 params ip=1.2.3.6 
meta target-role=stopped
crm(live)configure# commit
WARNING: Resources ip2a,ip2b violate uniqueness for parameter ip: 1.2.3.5
Do you still want to commit? y
crm(live)configure# primitive dummy_1 ocf:heartbeat:Dummy params fake=abc 
meta target-role=stopped
crm(live)configure# primitive dummy_2 ocf:heartbeat:Dummy params fake=abc 
meta target-role=stopped
crm(live)configure# primitive dummy_3 ocf:heartbeat:Dummy meta 
target-role=stopped
crm(live)configure# commit
crm(live)configure# verify
WARNING: Resources ip1a,ip1b violate uniqueness for parameter ip: 1.2.3.4
WARNING: Resources ip2a,ip2b violate uniqueness for parameter ip: 1.2.3.5
crm(live)configure# 

diff -r a35d8d6d0ab1 shell/modules/cibconfig.py
--- a/shell/modules/cibconfig.py	Wed Mar 09 11:21:03 2011 +0100
+++ b/shell/modules/cibconfig.py	Wed Mar 09 13:20:14 2011 +0100
@@ -230,11 +230,66 @@
 See below for specific implementations.
 '''
 pass
+
+def __check_unique_clash(self):
+'Check whether resource parameters with attribute unique clash'
+
+def process_primitive

Re: [Pacemaker] Patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated

2011-03-09 Thread Holger Teutsch
Hi Dejan,

On Wed, 2011-03-09 at 14:00 +0100, Dejan Muhamedagic wrote:
 Hi Holger,


This would become:

   for p in all_obj_list: # passed from _verify()
   if is_primitive(p.node):

 +process_primitive(p.node, clash_dict)

Or perhaps to loop through self.obj_list and build clash_dict
against all elements? Otherwise, you'll need to skip elements
which don't pass the check but are not new/changed (in
self.obj_list).

   
  
  I did not pass set_obj_verify in semantic check as this variable only
  by chance contains the right values.
 
 But it's not by chance. As I wrote earlier, it always contains
 the whole CIB. It has to, otherwise crm_verify wouldn't work. It
 should actually be renamed to set_obj_all or similar. Since we
 already have that list created, it's better to reuse it than to
 create another one from scratch. Further, we may need to add more
 semantic checks which would require looking at the whole CIB.
 

OK, I implemented it this way.

In order to show the intention of the arguments clearer:

Instead of

def _verify(self, set_obj_semantic, set_obj_all = None):
if not set_obj_all:
set_obj_all = set_obj_semantic
rc1 = set_obj_all.verify()
if user_prefs.check_frequency != never:
rc2 = set_obj_semantic.semantic_check(set_obj_all)
else:
rc2 = 0
return rc1 and rc2 = 1
def verify(self,cmd):
usage: verify
if not cib_factory.is_cib_sane():
return False
return self._verify(mkset_obj(xml))

This way (always passing both args):

def _verify(self, set_obj_semantic, set_obj_all):
rc1 = set_obj_all.verify()
if user_prefs.check_frequency != never:
rc2 = set_obj_semantic.semantic_check(set_obj_all)
else:
rc2 = 0
return rc1 and rc2 = 1
def verify(self,cmd):
usage: verify
if not cib_factory.is_cib_sane():
return False
set_obj_all = mkset_obj(xml)
return self._verify(set_obj_all, set_obj_all)


 My only remaining concern is performance. Though the meta-data is
 cached, perhaps it will pay off to save the RAInfo instance with
 the element. But we can worry about that later.
 

I can work on this as next step.

 Cheers,
 
 Dejan
 

- holger

diff -r a35d8d6d0ab1 shell/modules/cibconfig.py
--- a/shell/modules/cibconfig.py	Wed Mar 09 11:21:03 2011 +0100
+++ b/shell/modules/cibconfig.py	Wed Mar 09 19:53:50 2011 +0100
@@ -230,11 +230,68 @@
 See below for specific implementations.
 '''
 pass
-def semantic_check(self):
+
+def __check_unique_clash(self, set_obj_all):
+'Check whether resource parameters with attribute unique clash'
+
+def process_primitive(prim, clash_dict):
+'''
+Update dict clash_dict with
+(ra_class, ra_provider, ra_type, name, value) - [ resourcename ]
+if parameter name should be unique
+'''
+ra_class = prim.getAttribute(class)
+ra_provider = prim.getAttribute(provider)
+ra_type = prim.getAttribute(type)
+ra_id = prim.getAttribute(id)
+
+ra = RAInfo(ra_class, ra_type, ra_provider)
+if ra == None:
+return
+ra_params = ra.params()
+
+attributes = prim.getElementsByTagName(instance_attributes)
+if len(attributes) == 0:
+return
+
+for p in attributes[0].getElementsByTagName(nvpair):
+name = p.getAttribute(name)
+if ra_params[ name ]['unique'] == '1':
+value = p.getAttribute(value)
+k = (ra_class, ra_provider, ra_type, name, value)
+try:
+clash_dict[k].append(ra_id)
+except:
+clash_dict[k] = [ra_id]
+return
+
+# we check the whole CIB for clashes as a clash may originate between
+# an object already committed and a new one
+clash_dict = {}
+for obj in set_obj_all.obj_list:
+node = obj.node
+if is_primitive(node):
+process_primitive(node, clash_dict)
+
+# but we only warn if a 'new' object is involved 
+check_set = set([o.node.getAttribute(id) for o in self.obj_list if is_primitive(o.node)])
+
+rc = 0
+for param, resources in clash_dict.items():
+# at least one new object must be involved
+if len(resources)  1 and len(set(resources)  check_set)  0:
+rc = 2
+msg = 'Resources %s violate uniqueness for parameter %s: %s' %\
+(,.join(sorted(resources)), param[3], param[4])
+common_warning(msg)
+
+return rc
+
+def semantic_check(self, 

Re: [Pacemaker] Patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated

2011-03-08 Thread Holger Teutsch
On Fri, 2011-03-04 at 13:06 +0100, Holger Teutsch wrote:
 On Thu, 2011-03-03 at 10:55 +0100, Florian Haas wrote:
  On 2011-03-03 10:43, Holger Teutsch wrote:
   Hi,
   I submit a patch for
   bugzilla 2541: Shell should warn if parameter uniqueness is violated
   for discussion.
  
  I'll leave it do Dejan to review the code, but I love the functionality.
  Thanks a lot for tackling this. My only suggestion for an improvement is
  to make the warning message a bit more terse, as in:
  
  WARNING: Resources ip1a, ip1b violate uniqueness for parameter ip:
  1.2.3.4
  
 
 Florian,
 I see your point. Although my formatting allows for an unlimited number
 of collisions ( 8-) ) in real life we will only have 2 or 3. Will change
 this together with Dejan's hints.
 
  Cheers,
  Florian
  
Florian + Dejan,
here the version with terse output. The code got terser as well.
- holger

crm(live)configure# primitive ip1a ocf:heartbeat:IPaddr2 params ip=1.2.3.4 
meta target-role=stopped
crm(live)configure# primitive ip1b ocf:heartbeat:IPaddr2 params ip=1.2.3.4 
meta target-role=stopped
crm(live)configure# primitive ip2a ocf:heartbeat:IPaddr2 params ip=1.2.3.5 
meta target-role=stopped
crm(live)configure# primitive ip2b ocf:heartbeat:IPaddr2 params ip=1.2.3.5 
meta target-role=stopped
crm(live)configure# primitive ip3 ocf:heartbeat:IPaddr2 params ip=1.2.3.6 
meta target-role=stopped
crm(live)configure# primitive dummy_1 ocf:heartbeat:Dummy params fake=abc 
meta target-role=stopped
crm(live)configure# primitive dummy_2 ocf:heartbeat:Dummy params fake=abc 
meta target-role=stopped
crm(live)configure# primitive dummy_3 ocf:heartbeat:Dummy meta 
target-role=stopped
crm(live)configure# commit
WARNING: Resources ip1a,ip1b violate uniqueness for parameter ip: 1.2.3.4
WARNING: Resources ip2a,ip2b violate uniqueness for parameter ip: 1.2.3.5
Do you still want to commit? 


diff -r cf4e9febed8e shell/modules/ui.py.in
--- a/shell/modules/ui.py.in	Wed Feb 23 14:52:34 2011 +0100
+++ b/shell/modules/ui.py.in	Tue Mar 08 09:11:38 2011 +0100
@@ -1509,6 +1509,55 @@
 return False
 set_obj = mkset_obj(xml)
 return ptestlike(set_obj.ptest,'vv',cmd,*args)
+
+def __check_unique_clash(self):
+'Check whether resource parameters with attribute unique clash'
+
+def process_primitive(prim, clash_dict):
+'''
+Update dict clash_dict with
+(ra_class, ra_provider, ra_type, name, value) - [ resourcename ]
+if parameter name should be unique
+'''
+ra_class = prim.getAttribute(class)
+ra_provider = prim.getAttribute(provider)
+ra_type = prim.getAttribute(type)
+ra_id = prim.getAttribute(id)
+
+ra = RAInfo(ra_class, ra_type, ra_provider)
+if ra == None:
+return
+ra_params = ra.params()
+
+attributes = prim.getElementsByTagName(instance_attributes)
+if len(attributes) == 0:
+return
+
+for p in attributes[0].getElementsByTagName(nvpair):
+name = p.getAttribute(name)
+if ra_params[ name ]['unique'] == '1':
+value = p.getAttribute(value)
+k = (ra_class, ra_provider, ra_type, name, value)
+try:
+clash_dict[k].append(ra_id)
+except:
+clash_dict[k] = [ra_id]
+return
+
+clash_dict = {}
+for p in cib_factory.mkobj_list(xml,type:primitive):
+process_primitive(p.node, clash_dict)
+
+no_clash = 1
+for param, resources in clash_dict.items():
+if len(resources)  1:
+no_clash = 0
+msg = 'Resources %s violate uniqueness for parameter %s: %s' %\
+(,.join(sorted(resources)), param[3], param[4])
+common_warning(msg)
+
+return no_clash
+
 def commit(self,cmd,force = None):
 usage: commit [force]
 if force and force != force:
@@ -1523,7 +1572,8 @@
 rc1 = cib_factory.is_current_cib_equal()
 rc2 = cib_factory.is_cib_empty() or \
 self._verify(mkset_obj(xml,changed),mkset_obj(xml))
-if rc1 and rc2:
+rc3 = self.__check_unique_clash()
+if rc1 and rc2 and rc3:
 return cib_factory.commit()
 if force or user_prefs.get_force():
 common_info(commit forced)
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated

2011-03-08 Thread Holger Teutsch
Hi Dejan,

On Tue, 2011-03-08 at 12:07 +0100, Dejan Muhamedagic wrote:
 Hi Holger,
 
 On Tue, Mar 08, 2011 at 09:15:01AM +0100, Holger Teutsch wrote:
  On Fri, 2011-03-04 at 13:06 +0100, Holger Teutsch wrote:
   On Thu, 2011-03-03 at 10:55 +0100, Florian Haas wrote:
On 2011-03-03 10:43, Holger Teutsch wrote:
 Hi,
 I submit a patch for
 bugzilla 2541: Shell should warn if parameter uniqueness is violated
 for discussion.

...
 It looks good, just a few notes. The check function should move
 to the CibObjectSetRaw class and be invoked from

Will move it there.

 semantic_check(). There's
 
 rc1 = set_obj_verify.verify()
 if user_prefs.check_frequency != never:
   rc2 = set_obj_semantic.semantic_check()
 
 The last should be changed to:
 
   rc2 = set_obj_semantic.semantic_check(set_obj_verify)
 
 set_obj_verify always contains all CIB elements (well, that means
 that its name should probably be changed too :). Now, the code
 should check _only_ new and changed primitives which are
 contained in set_obj_semantic. That's because we don't want to
 repeatedly print warnings for all objects on commit, but only for
 those which were added/changed in the meantime. On the other
 hand, verify is an explicit check and in that case the whole CIB
 is always verified.
 
  
  +ra_class = prim.getAttribute(class)
  +ra_provider = prim.getAttribute(provider)
  +ra_type = prim.getAttribute(type)
  +ra_id = prim.getAttribute(id)
  +
  +ra = RAInfo(ra_class, ra_type, ra_provider)
 
 There's a convenience function get_ra(node) for this.
 

I did not use this as I need all ra_XXX value anyhow later in the code
for building k.

  +if ra == None:
  +return
  +ra_params = ra.params()
  +
  +attributes = prim.getElementsByTagName(instance_attributes)
  +if len(attributes) == 0:
  +return
  +
  +for p in attributes[0].getElementsByTagName(nvpair):
  +name = p.getAttribute(name)
  +if ra_params[ name ]['unique'] == '1':
  +value = p.getAttribute(value)
  +k = (ra_class, ra_provider, ra_type, name, value)
  +try:
  +clash_dict[k].append(ra_id)
  +except:
  +clash_dict[k] = [ra_id]
  +return
  +
  +clash_dict = {}
  +for p in cib_factory.mkobj_list(xml,type:primitive):
 
 This would become:
 
for p in all_obj_list: # passed from _verify()
  if is_primitive(p.node):
 
  +process_primitive(p.node, clash_dict)
 
 Or perhaps to loop through self.obj_list and build clash_dict
 against all elements? Otherwise, you'll need to skip elements
 which don't pass the check but are not new/changed (in
 self.obj_list).
 

The typical occurrences of clashes will originate from old objects and
new/changed objects.

I think I have to loop over all objects to build clash dict  and
then ... 

  +
  +no_clash = 1
  +for param, resources in clash_dict.items():
  +if len(resources)  1:

... only emit a warning if the intersection of a clash set with
changed objects is not empty. 

  +no_clash = 0
  +msg = 'Resources %s violate uniqueness for parameter %s: 
  %s' %\
  +(,.join(sorted(resources)), param[3], param[4])
  +common_warning(msg)
  +
  +return no_clash
  +

I will submit an updated version later this week.

-holger


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Antwort: Re: WG: time pressure - software raid cluster, raid1 ressource agent, help needed

2011-03-07 Thread Holger Teutsch
Hi,
SAN drivers often cave large timeouts configured, so are you patient
enough ?
At least this demonstrates that the problem is currently not in the
cluster...
- holger
On Mon, 2011-03-07 at 11:04 +0100, patrik.rappo...@knapp.com wrote:
 Hy, 
 
 thx for answer. I tested this now, the problem is, mdadm hangs totally
 when we simulate the fail of one storage. (we already tried two ways:
 1. removing the mapping., 2. removing one path, and then disabling the
 remaining path through the port on the san switch - which is nearly
 the same like a total fail of the storage). 
 
 So I can't get the output of mdadm, because it hangs. 
 
 I think it must be a problem with mdadm. This is my mdadm.conf: 
 
 DEVICE /dev/mapper/3600a0b800050c94e07874d2e0028_part1 
 /dev/mapper/3600a0b8000511f5414b14d2df1b1_part1 
 /dev/mapper/3600a0b800050c94e07874d2e0028_part2 
 /dev/mapper/3600a0b8000511f5414b14d2df1b1_part2 
 /dev/mapper/3600a0b800050c94e07874d2e0028_part3 
 /dev/mapper/3600a0b8000511f5414b14d2df1b1_part3 
 ARRAY /dev/md0 metadata=0.90 UUID=c411c076:bb28916f:d50a93ef:800dd1f0 
 ARRAY /dev/md1 metadata=0.90 UUID=522279fa:f3cdbe3a:d50a93ef:800dd1f0 
 ARRAY /dev/md2 metadata=0.90
 UUID=01e07d7d:5305e46c:d50a93ef:800dd1f0 
 
 kr Patrik 
 
 
 Mit freundlichen Grüßen / Best Regards
 
 Patrik Rapposch, BSc
 System Administration
 
 KNAPP Systemintegration GmbH
 Waltenbachstraße 9
 8700 Leoben, Austria 
 Phone: +43 3842 805-915
 Fax: +43 3842 805-500
 patrik.rappo...@knapp.com 
 www.KNAPP.com
 
 Commercial register number: FN 138870x
 Commercial register court: Leoben
 
 The information in this e-mail (including any attachment) is
 confidential and intended to be for the use of the addressee(s) only.
 If you have received the e-mail by mistake, any disclosure, copy,
 distribution or use of the contents of the e-mail is prohibited, and
 you must delete the e-mail from your system. As e-mail can be changed
 electronically KNAPP assumes no responsibility for any alteration to
 this e-mail or its attachments. KNAPP has taken every reasonable
 precaution to ensure that any attachment to this e-mail has been swept
 for virus. However, KNAPP does not accept any liability for damage
 sustained as a result of such attachment being virus infected and
 strongly recommend that you carry out your own virus check before
 opening any attachment. 
 
 
 Holger Teutsch
 holger.teut...@web.de 
 
 06.03.2011 19:56 
 Bitte antworten an
   The Pacemaker cluster resource
   manager
   pacemaker@oss.clusterlabs.org
 
 
 
 
An
 The Pacemaker
 cluster resource
 manager
 pacemaker@oss.clusterlabs.org 
 Kopie
 
 Thema
 Re: [Pacemaker]
 WG: time pressure
 - software raid
 cluster, raid1
 ressource agent,
 help needed
 
 
 
 
 
 
 
 
 On Sun, 2011-03-06 at 12:40 +0100, patrik.rappo...@knapp.com wrote:
 Hi,
 assume the basic problem is in your raid configuration.
 
 If you unmap one box the devices should not be in status FAIL but in
 degraded.
 
 So what is the exit status of
 
 mdadm --detail --test /dev/md0
 
 after unmapping ?
 
 Furthermore I would start start with one isolated group containing the
 raid, LVM, and FS to keep it simple.
 
 Regards
 Holger
 
   Hy, 
  
  
  does anyone have an idea to that? I only have the servers till next
  week friday, so to my regret I am under time pressure :(
  
  
  
  Like I already wrote, I would appreciate and test any idea of you.
  Also if someone already made clusters with lvm-mirror, I would be
  happy to get a cib or some configuration examples.
  
   
  
  
  
  
  
  Thank you very much in advance.
  
   
  
  
  
  
  
  kr Patrik
  
  
  
  
  
  patrik.rappo...@knapp.com
  03.03.2011 15:11Bitte antworten anThe Pacemaker cluster resource
  manager
  
  An   pacemaker@oss.clusterlabs.org
  Kopie   
  Blindkopie   
  Thema   [Pacemaker] software raid cluster, raid1 ressource
 agent,help
  needed
  
  
  Good Day, 
  
  I have a 2 node active/passive cluster which is connected to two
  ibm
  4700 storages. I configured 3 raids and I use the Raid1 ressource
  agent for managing the Raid1s in the cluster. 
  When I now disable the mapping of one storage, to simulate the fail
 of
  one storage, the Raid1 Ressources change to the State FAILED and
 the
  second node then takes over the ressources and is able to start the
  raid devices. 
  
  So I am confused, why the active node can't keep the raid1
 ressources
  and the former passive node takes them over and can start them
  correct. 
  
  I would really appreciate your advice, or maybe someone already has
 a
  example configuration for Raid1 with two storages.
  
  Thank you very much in advance. Attached you can find my cib.xml. 
  
  kr Patrik 
  
  
  
  Mit freundlichen Grüßen / Best Regards
  
  Patrik Rapposch, BSc
  System Administration
  
  KNAPP Systemintegration GmbH
  Waltenbachstraße 9
  8700 Leoben, Austria 
  Phone: +43 3842 805-915
  Fax: +43 3842 805-500

Re: [Pacemaker] WG: time pressure - software raid cluster, raid1 ressource agent, help needed

2011-03-06 Thread Holger Teutsch
On Sun, 2011-03-06 at 12:40 +0100, patrik.rappo...@knapp.com wrote:
Hi,
assume the basic problem is in your raid configuration.

If you unmap one box the devices should not be in status FAIL but in
degraded.

So what is the exit status of

mdadm --detail --test /dev/md0

after unmapping ?

Furthermore I would start start with one isolated group containing the
raid, LVM, and FS to keep it simple.

Regards
Holger

  Hy, 
 
 
 does anyone have an idea to that? I only have the servers till next
 week friday, so to my regret I am under time pressure :(
 
 
 
 Like I already wrote, I would appreciate and test any idea of you.
 Also if someone already made clusters with lvm-mirror, I would be
 happy to get a cib or some configuration examples.
 
  
 
 
 
 
 
 Thank you very much in advance.
 
  
 
 
 
 
 
 kr Patrik
 
 
 
 
 
 patrik.rappo...@knapp.com
 03.03.2011 15:11Bitte antworten anThe Pacemaker cluster resource
 manager
 
 An   pacemaker@oss.clusterlabs.org
 Kopie   
 Blindkopie   
 Thema   [Pacemaker] software raid cluster, raid1 ressource agent,help
 needed
 
 
 Good Day, 
 
 I have a 2 node active/passive cluster which is connected to two  ibm
 4700 storages. I configured 3 raids and I use the Raid1 ressource
 agent for managing the Raid1s in the cluster. 
 When I now disable the mapping of one storage, to simulate the fail of
 one storage, the Raid1 Ressources change to the State FAILED and the
 second node then takes over the ressources and is able to start the
 raid devices. 
 
 So I am confused, why the active node can't keep the raid1 ressources
 and the former passive node takes them over and can start them
 correct. 
 
 I would really appreciate your advice, or maybe someone already has a
 example configuration for Raid1 with two storages.
 
 Thank you very much in advance. Attached you can find my cib.xml. 
 
 kr Patrik 
 
 
 
 Mit freundlichen Grüßen / Best Regards
 
 Patrik Rapposch, BSc
 System Administration
 
 KNAPP Systemintegration GmbH
 Waltenbachstraße 9
 8700 Leoben, Austria 
 Phone: +43 3842 805-915
 Fax: +43 3842 805-500
 patrik.rappo...@knapp.com 
 www.KNAPP.com 
 
 Commercial register number: FN 138870x
 Commercial register court: Leoben
 
 The information in this e-mail (including any attachment) is
 confidential and intended to be for the use of the addressee(s) only.
 If you have received the e-mail by mistake, any disclosure, copy,
 distribution or use of the contents of the e-mail is prohibited, and
 you must delete the e-mail from your system. As e-mail can be changed
 electronically KNAPP assumes no responsibility for any alteration to
 this e-mail or its attachments. KNAPP has taken every reasonable
 precaution to ensure that any attachment to this e-mail has been swept
 for virus. However, KNAPP does not accept any liability for damage
 sustained as a result of such attachment being virus infected and
 strongly recommend that you carry out your own virus check before
 opening any attachment.
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started:
 http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated

2011-03-04 Thread Holger Teutsch
On Thu, 2011-03-03 at 10:55 +0100, Florian Haas wrote:
 On 2011-03-03 10:43, Holger Teutsch wrote:
  Hi,
  I submit a patch for
  bugzilla 2541: Shell should warn if parameter uniqueness is violated
  for discussion.
 
 I'll leave it do Dejan to review the code, but I love the functionality.
 Thanks a lot for tackling this. My only suggestion for an improvement is
 to make the warning message a bit more terse, as in:
 
 WARNING: Resources ip1a, ip1b violate uniqueness for parameter ip:
 1.2.3.4
 

Florian,
I see your point. Although my formatting allows for an unlimited number
of collisions ( 8-) ) in real life we will only have 2 or 3. Will change
this together with Dejan's hints.

 Cheers,
 Florian
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated

2011-03-03 Thread Holger Teutsch
Hi,
I submit a patch for
bugzilla 2541: Shell should warn if parameter uniqueness is violated
for discussion.

devel1:~ # crm configure
crm(live)configure# primitive ip1a ocf:heartbeat:IPaddr2 params ip=1.2.3.4 
meta target-role=stopped
crm(live)configure# primitive ip1b ocf:heartbeat:IPaddr2 params ip=1.2.3.4 
meta target-role=stopped
crm(live)configure# primitive ip2a ocf:heartbeat:IPaddr2 params ip=1.2.3.5 
meta target-role=stopped
crm(live)configure# primitive ip2b ocf:heartbeat:IPaddr2 params ip=1.2.3.5 
meta target-role=stopped
crm(live)configure# primitive ip3 ocf:heartbeat:IPaddr2 params ip=1.2.3.6 
meta target-role=stopped
crm(live)configure# primitive dummy_1 ocf:heartbeat:Dummy params fake=abc 
meta target-role=stopped
crm(live)configure# primitive dummy_2 ocf:heartbeat:Dummy params fake=abc 
meta target-role=stopped
crm(live)configure# primitive dummy_3 ocf:heartbeat:Dummy meta 
target-role=stopped
crm(live)configure# commit
WARNING: Violations of instance parameters with attribute unique detected:
  Agent ocf:heartbeat:IPaddr2 parameter ip value 1.2.3.4 in resources
  ip1a
  ip1b

  Agent ocf:heartbeat:IPaddr2 parameter ip value 1.2.3.5 in resources
  ip2a
  ip2b

Do you still want to commit? n
crm(live)configure#

The code now lives in ui.py. I'm not sure whether it should be considered as 
more cib related an be moved to some other module.

Regards
Holger

diff -r cf4e9febed8e -r 810c5ea83873 shell/modules/ui.py.in
--- a/shell/modules/ui.py.in	Wed Feb 23 14:52:34 2011 +0100
+++ b/shell/modules/ui.py.in	Thu Mar 03 10:24:51 2011 +0100
@@ -1509,6 +1509,60 @@
 return False
 set_obj = mkset_obj(xml)
 return ptestlike(set_obj.ptest,'vv',cmd,*args)
+
+def __check_unique_clash(self):
+'Check whether resource parameters with attribute unique clash'
+
+def process_primitive(prim, clash_dict):
+'''
+Update dict clash_dict with
+(ra_class, ra_provider, ra_type, name, value) - [ resourcename ]
+if parameter name should be unique
+'''
+ra_class = prim.getAttribute(class)
+ra_provider = prim.getAttribute(provider)
+ra_type = prim.getAttribute(type)
+ra_id = prim.getAttribute(id)
+
+ra = RAInfo(ra_class, ra_type, ra_provider)
+if ra == None:
+return
+ra_params = ra.params()
+
+attributes = prim.getElementsByTagName(instance_attributes)
+if len(attributes) == 0:
+return
+
+for p in attributes[0].getElementsByTagName(nvpair):
+name = p.getAttribute(name)
+if ra_params[ name ]['unique'] == '1':
+value = p.getAttribute(value)
+k = (ra_class, ra_provider, ra_type, name, value)
+try:
+clash_dict[k].append(ra_id)
+except:
+clash_dict[k] = [ra_id]
+return
+
+clash_dict = {}
+for p in cib_factory.mkobj_list(xml,type:primitive):
+process_primitive(p.node, clash_dict)
+
+clash_msg = []
+for param, resources in clash_dict.items():
+if len(resources)  1:
+tag = ':'.join(param[:3])
+clash_msg.append('  Agent %s parameter %s value %s in resources'\
+%(tag, param[3], param[4]))
+for r in sorted(resources):
+clash_msg.append(  %s%r)
+clash_msg.append()
+
+if len(clash_msg)  0:
+common_warning(Violations of instance parameters with attribute unique detected:)
+print \n.join(clash_msg)
+return 0
+return 1
 def commit(self,cmd,force = None):
 usage: commit [force]
 if force and force != force:
@@ -1523,7 +1577,8 @@
 rc1 = cib_factory.is_current_cib_equal()
 rc2 = cib_factory.is_cib_empty() or \
 self._verify(mkset_obj(xml,changed),mkset_obj(xml))
-if rc1 and rc2:
+rc3 = self.__check_unique_clash()
+if rc1 and rc2 and rc3:
 return cib_factory.commit()
 if force or user_prefs.get_force():
 common_info(commit forced)
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] rpm packages in Pacemaker's rpm-next do not install

2011-02-17 Thread Holger Teutsch
At least for Opensuse 11.3 x86_64. May be the title of my previous mail
was misleading.
-h
On Tue, 2011-02-15 at 17:57 +0100, Holger Teutsch wrote:
 Hi,
 the packages from rpm-next(64bit) for opensuse 11.3 do not install there (at 
 least true for 1.1.4 and 1.1.5).
 
 The plugin is in
 ./usr/lib/lcrso/pacemaker.lcrso
 
 but should be in
 ./usr/lib64/lcrso/pacemaker.lcrso
 
 I think the patch below (borrowed from the 'official' packages) cures.
 Regards
 Holger
 
 
 diff -r 43a11c0daae4 pacemaker.spec
 --- a/pacemaker.spec  Mon Feb 14 15:25:13 2011 +0100
 +++ b/pacemaker.spec  Tue Feb 15 17:50:27 2011 +0100
 @@ -1,3 +1,7 @@
 +%if 0%{?suse_version}
 +%define _libexecdir %{_libdir}
 +%endif
 +
  %global gname haclient
  %global uname hacluster
  %global pcmk_docdir %{_docdir}/%{name}
 
 
 
 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 
 Project Home: http://www.clusterlabs.org
 Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs: 
 http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Packages for Opensuse 11.3 don't build / install

2011-02-15 Thread Holger Teutsch
Hi,
the packages from rpm-next(64bit) for opensuse 11.3 do not install there (at 
least true for 1.1.4 and 1.1.5).

The plugin is in
./usr/lib/lcrso/pacemaker.lcrso

but should be in
./usr/lib64/lcrso/pacemaker.lcrso

I think the patch below (borrowed from the 'official' packages) cures.
Regards
Holger


diff -r 43a11c0daae4 pacemaker.spec
--- a/pacemaker.specMon Feb 14 15:25:13 2011 +0100
+++ b/pacemaker.specTue Feb 15 17:50:27 2011 +0100
@@ -1,3 +1,7 @@
+%if 0%{?suse_version}
+%define _libexecdir %{_libdir}
+%endif
+
 %global gname haclient
 %global uname hacluster
 %global pcmk_docdir %{_docdir}/%{name}



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Howto write a STONITH agent

2011-01-15 Thread Holger Teutsch
On Fri, 2011-01-14 at 17:10 +0100, Christoph Herrmann wrote:
 -Ursprüngliche Nachricht-
 Von: Dejan Muhamedagic deja...@fastmail.fm
 Gesendet: Fr 14.01.2011 12:31
 An: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org; 
 Betreff: Re: [Pacemaker] Howto write a STONITH agent
 
  Hi,
  
  On Thu, Jan 13, 2011 at 09:09:38PM +0100, Christoph Herrmann wrote:
   Hi,
   
   I have some brand new HP Blades with ILO Boards (iLO 2 Standard Blade 
   Edition 
  1.81 ...)
   But I'm not able to connect with them via the external/riloe agent.
   When i try:
   
   stonith -t external/riloe -p hostlist=node1 ilo_hostname=ilo1  
  ilo_user=ilouser ilo_password=ilopass ilo_can_reset=1 ilo_protocol=2.0 
  ilo_powerdown_method=power -S
  
  Try this:
  
  stonith -t external/riloe hostlist=node1 ilo_hostname=ilo1  
  ilo_user=ilouser 
  ilo_password=ilopass ilo_can_reset=1 ilo_protocol=2.0 
  ilo_powerdown_method=power -S
 
 thats much better (looks like PEBKAC ;-), thanks! But it is not reliable. 
 I've tested it about 10 times
 and 5 times it hangs.  That's not what I want.
I had the same experience. Ilo is _extremely_ slow and unreliable.

Go for external/ipmi.

That works very fast and reliable. It is available with ILO 2.x
firmware.

- holger


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Split-site cluster in two locations

2011-01-11 Thread Holger Teutsch
On Tue, 2011-01-11 at 10:21 +0100, Christoph Herrmann wrote:
 -Ursprüngliche Nachricht-
 Von: Andrew Beekhof and...@beekhof.net
 Gesendet: Di 11.01.2011 09:01
 An: The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org; 
 CC: Michael Schwartzkopff mi...@clusterbau.com; 
 Betreff: Re: [Pacemaker] Split-site cluster in two locations
 
  On Tue, Dec 28, 2010 at 10:21 PM, Anton Altaparmakov ai...@cam.ac.uk 
  wrote:
   Hi,
  
   On 28 Dec 2010, at 20:32, Michael Schwartzkopff wrote:
   Hi,
  
   I have four nodes in a split site scenario located in two computing 
   centers.
   STONITH is enabled.
  
   Is there and best practise how to deal with this setup? Does it make 
   sense to
   set expected-quorum-votes to 3 to make the whole setup still running 
   with
   one data center online? Is this possible at all?
  
   Is quorum needed with STONITH enabled?
  
   Is there a quorum server available already?
  
   I couldn't see a quorum server in Pacemaker so I have installed a third 
   dummy 
  node which is not allowed to run any resources (using location constraints 
  and 
  setting the cluster to not be symmetric) which just acts as a third vote.  
  I am 
  hoping this effectively acts as a quorum server as a node that looses 
  connectivity will lose quorum and shut down its services whilst the other 
  real 
  node will retain connectivity and thus quorum due to the dummy node still 
  being 
  present.
  
   Obviously this is quite wasteful of servers as you can only run a single 
  Pacemaker instance on a server (as far as I know) so that is a lot of dummy 
  servers when you run multiple pacemaker clusters...  Solution for us is to 
  use 
  virtualization - one physical server with VMs and each VM is a dummy node 
  for a 
  cluster...
  
  With recent 1.1.x builds it should be possible to run just the
  corosync piece (no pacemaker).
  
 
 As long as you have only two computing centers it doesn't matter if you run a 
 corosync
 only piece or whatever  on a physikal or a virtual machine. The question is: 
 How to
 configure a four node (or six node, an even number bigger then two) 
 corosync/pacemaker
 cluster to continue services if you have a blackout in one computing center 
 (you will
 always loose (at least) one half of your nodes), but to shutdown everything 
 if you have
 less then half of the node available. Are there any best practices on how to 
 deal with
 clusters in two computing centers? Anything like an external quorum node or a 
 quorum
 partition? I'd like to set the expected-quorum-votes to 3 but this is not 
 possible
 (with corosync-1.2.6 and pacemaker-1.1.2 on SLES11 SP1) Does anybody know why?
 Currently, the only way I can figure out is to run the cluster with 
 no-quorum-policy=ignore. But I don't like that. Any suggestions?
 
 
 Best regards
 
   Christoph

Hi,
I assume the only solution is to work with manual intervention, i.e. the
stonith meatware module.
Whenever a site goes down a human being has to confirm that it is lost,
pull the power cords or the inter-site links so it will not come back
unintentionally.

Then confirm with meatclient on the healthy site that the no longer
reachable site can be considered gone.

From theory this can be configured with an additional meatware stonith
resource with lower priority. The intention is to let your regular
stonith resources do the work with meatware as last resort.
Although I was not able to get this running with versions packaged with
SLES11 SP1. The priority was not honored and a lot of zombie meatware
processes were left over.
I found some patches in the upstream repositories that seem to address
these problems but I didn't follow up.

Regards
Holger


 


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Fail-count and failure timeout

2010-10-05 Thread Holger . Teutsch
The resource failed when the sleep expired, i.e. each 600 secs.
Now I changed the resource to

sleep 7200, failure-timeout 3600

i.e. to values far beyond the recheck-interval opf 15m.

Now everything behaves as expected.
 
Mit freundlichen Grüßen / Kind regards 

Holger Teutsch 





From:   Andrew Beekhof and...@beekhof.net
To: The Pacemaker cluster resource manager 
pacemaker@oss.clusterlabs.org
Date:   05.10.2010 11:09
Subject:Re: [Pacemaker] Fail-count and failure timeout



On Tue, Oct 5, 2010 at 11:07 AM, Andrew Beekhof and...@beekhof.net 
wrote:
 On Fri, Oct 1, 2010 at 3:40 PM,  holger.teut...@fresenius-netcare.com 
wrote:
 Hi,
 I observed the following in pacemaker Versions 1.1.3 and tip up to 
patch
 10258.

 In a small test environment to study fail-count behavior I have one 
resource

 anything
 doing sleep 600 with monitoring interval 10 secs.

 The failure-timeout is 300.

 I would expect to never see a failcount higher than 1.

 Why?

 The fail-count is only reset when the PE runs... which is on a failure
 and/or after the cluster-recheck-interval
 So I'd expect a maximum of two.

Actually this is wrong.
There is no maximum, because there needs to have been 300s since the
last failure when the PE runs.
And since it only runs when the resource fails, it is never reset.


   cluster-recheck-interval = time [15min]
  Polling interval for time based changes to options,
 resource parameters and constraints.

  The Cluster is primarily event driven, however the
 configuration can have elements that change based on time. To ensure
 these changes take effect, we can optionally poll  the  cluster’s
  status for changes. Allowed values: Zero disables
 polling. Positive values are an interval in seconds (unless other SI
 units are specified. eg. 5min)




 I observed some sporadic clears but mostly the count is increasing by 1 
each
 10 minutes.

 Am I mistaken or is this a bug ?

 Hard to say without logs.  What value did it reach?


 Regards
 Holger

 -- complete cib for reference ---

 cib epoch=32 num_updates=0 admin_epoch=0
 validate-with=pacemaker-1.2 crm_feature_set=3.0.4 have-quorum=0
 cib-last-written=Fri Oct  1 14:17:31 2010 dc-uuid=hotlx
   configuration
 crm_config
   cluster_property_set id=cib-bootstrap-options
 nvpair id=cib-bootstrap-options-dc-version name=dc-version
 value=1.1.3-09640bd6069e677d5eed65203a6056d9bf562e67/
 nvpair id=cib-bootstrap-options-cluster-infrastructure
 name=cluster-infrastructure value=openais/
 nvpair id=cib-bootstrap-options-expected-quorum-votes
 name=expected-quorum-votes value=2/
 nvpair id=cib-bootstrap-options-no-quorum-policy
 name=no-quorum-policy value=ignore/
 nvpair id=cib-bootstrap-options-stonith-enabled
 name=stonith-enabled value=false/
 nvpair id=cib-bootstrap-options-start-failure-is-fatal
 name=start-failure-is-fatal value=false/
 nvpair id=cib-bootstrap-options-last-lrm-refresh
 name=last-lrm-refresh value=1285926879/
   /cluster_property_set
 /crm_config
 nodes
   node id=hotlx uname=hotlx type=normal/
 /nodes
 resources
   primitive class=ocf id=test provider=heartbeat 
type=anything
 meta_attributes id=test-meta_attributes
   nvpair id=test-meta_attributes-target-role 
name=target-role
 value=started/
   nvpair id=test-meta_attributes-failure-timeout
 name=failure-timeout value=300/
 /meta_attributes
 operations id=test-operations
   op id=test-op-monitor-10 interval=10 name=monitor
 on-fail=restart timeout=20s/
   op id=test-op-start-0 interval=0 name=start
 on-fail=restart timeout=20s/
 /operations
 instance_attributes id=test-instance_attributes
   nvpair id=test-instance_attributes-binfile name=binfile
 value=sleep 600/
 /instance_attributes
   /primitive
 /resources
 constraints/
   /configuration
 /cib

 ___
 Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker

 Project Home: http://www.clusterlabs.org
 Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
 Bugs:
 
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker




___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: 
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http

Re: [Pacemaker] Dependency on either of two resources

2010-10-04 Thread Holger . Teutsch
Hi,
a similar or related use case that we tried to solve without success:
- a stretch cluster with two disk boxes
- a LUN on each disk box guarded by an individual SFEX
- a mirror (raid1 or clvm) that survives an outage of one disk box
- the mirror should be started if at least one SFEX can be obtained and 
the other one could not be obtained on a different node

IMHO sdb is not an alternative as this introduces a SPOF.
 
Mit freundlichen Grüßen / Kind regards 

Holger Teutsch 





From:   Vladislav Bogdanov bub...@hoster-ok.com
To: The Pacemaker cluster resource manager 
pacemaker@oss.clusterlabs.org
Date:   04.10.2010 06:33
Subject:[Pacemaker] Dependency on either of two resources



Hi all,

just wondering, is there a way to make resource depend on (be colocated
with) either of two other resources?

Use case is iSCSI initiator connection to iSCSI target with two portals.
Idea is to have f.e. device manager multipath resource depend on both
iSCSI connection resources, but in a soft way, so fail of any single
iSCSI connection will not cause multipath resource to stop, but fail of
both connections will cause it.

I should be missing something but I cannot find answer is it possible
with current pacemaker. Can someone bring some light?

Best,
Vladislav

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: 
http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Fail-count and failure timeout

2010-10-01 Thread Holger . Teutsch
Hi,
I observed the following in pacemaker Versions 1.1.3 and tip up to patch 
10258.

In a small test environment to study fail-count behavior I have one 
resource

anything
doing sleep 600 with monitoring interval 10 secs.

The failure-timeout is 300.

I would expect to never see a failcount higher than 1.

I observed some sporadic clears but mostly the count is increasing by 1 
each 10 minutes. 

Am I mistaken or is this a bug ?

Regards
Holger

-- complete cib for reference ---

cib epoch=32 num_updates=0 admin_epoch=0 
validate-with=pacemaker-1.2 crm_feature_set=3.0.4 have-quorum=0 
cib-last-written=Fri Oct  1 14:17:31 2010 dc-uuid=hotlx
  configuration
crm_config
  cluster_property_set id=cib-bootstrap-options
nvpair id=cib-bootstrap-options-dc-version name=dc-version 
value=1.1.3-09640bd6069e677d5eed65203a6056d9bf562e67/
nvpair id=cib-bootstrap-options-cluster-infrastructure 
name=cluster-infrastructure value=openais/
nvpair id=cib-bootstrap-options-expected-quorum-votes 
name=expected-quorum-votes value=2/
nvpair id=cib-bootstrap-options-no-quorum-policy 
name=no-quorum-policy value=ignore/
nvpair id=cib-bootstrap-options-stonith-enabled 
name=stonith-enabled value=false/
nvpair id=cib-bootstrap-options-start-failure-is-fatal 
name=start-failure-is-fatal value=false/
nvpair id=cib-bootstrap-options-last-lrm-refresh 
name=last-lrm-refresh value=1285926879/
  /cluster_property_set
/crm_config
nodes
  node id=hotlx uname=hotlx type=normal/
/nodes
resources
  primitive class=ocf id=test provider=heartbeat 
type=anything
meta_attributes id=test-meta_attributes
  nvpair id=test-meta_attributes-target-role name=target-role 
value=started/
  nvpair id=test-meta_attributes-failure-timeout 
name=failure-timeout value=300/
/meta_attributes
operations id=test-operations
  op id=test-op-monitor-10 interval=10 name=monitor 
on-fail=restart timeout=20s/
  op id=test-op-start-0 interval=0 name=start 
on-fail=restart timeout=20s/
/operations
instance_attributes id=test-instance_attributes
  nvpair id=test-instance_attributes-binfile name=binfile 
value=sleep 600/
/instance_attributes
  /primitive
/resources
constraints/
  /configuration
/cib
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker