Andrew Beekhof wrote: > You might want to check out Martin's packages. > If I understood correctly, he's built the version of clvm used by SUSE > (which we know works) against 0.80.5 > > Look for his email with the subject "lvm2-clvm RPMs in opensuse.org > package repo?"
Thanks! I installed Martin's packages. Here's what I have: pacemaker-openais 1.0.3+svn20090522-2~bpo50+1 clvm-openais 2.02.44-4~bpo50+1 libopenais-legacy-2 0.80.5+svn20090522-2~bpo50+1 openais-legacy 0.80.5+svn20090522-2~bpo50+1 heartbeat-common 2.99.2+sles11r9-3~bpo50+1 libheartbeat2 2.99.2+sles11r9-3~bpo50+1 Now, soon after I start clvmd, aisexec dies with a segv (in openais_conn_private_data_get). On my 3 nodes test cluster, I start openais on all nodes, then I start clvmd on one of the nodes. Not long after, aisexec dies on the other nodes. Here are the last messages logged by aisexec: May 28 16:19:04.924914 [TOTEM] entering GATHER state from 11. May 28 16:19:05.079052 [TOTEM] Saving state aru 20 high seq received 20 May 28 16:19:05.079094 [TOTEM] Storing new sequence id for ring 298 May 28 16:19:05.079155 [TOTEM] entering COMMIT state. May 28 16:19:05.079500 [TOTEM] entering RECOVERY state. May 28 16:19:05.079558 [TOTEM] position [0] member 142.135.16.107: May 28 16:19:05.079571 [TOTEM] previous ring seq 660 rep 142.135.16.107 May 28 16:19:05.079578 [TOTEM] aru a high delivered a received flag 1 May 28 16:19:05.079587 [TOTEM] position [1] member 142.135.16.109: May 28 16:19:05.079594 [TOTEM] previous ring seq 660 rep 142.135.16.109 May 28 16:19:05.079612 [TOTEM] aru 20 high delivered 20 received flag 1 May 28 16:19:05.079627 [TOTEM] Did not need to originate any messages in recovery. May 28 16:19:05.080669 [CLM ] CLM CONFIGURATION CHANGE May 28 16:19:05.080711 [CLM ] New Configuration: May 28 16:19:05.080724 [CLM ] r(0) ip(142.135.16.109) May 28 16:19:05.080733 [CLM ] Members Left: May 28 16:19:05.080774 [CLM ] Members Joined: May 28 16:19:05.080790 [crm ] notice: pcmk_peer_update: Transitional membership event on ring 664: memb=1, new=0, lost=0 May 28 16:19:05.080805 [crm ] info: pcmk_peer_update: memb: lab09 1829799822 May 28 16:19:05.080843 [CLM ] CLM CONFIGURATION CHANGE May 28 16:19:05.080855 [CLM ] New Configuration: May 28 16:19:05.080865 [CLM ] r(0) ip(142.135.16.107) May 28 16:19:05.080901 [CLM ] r(0) ip(142.135.16.109) May 28 16:19:05.080914 [CLM ] Members Left: May 28 16:19:05.080923 [CLM ] Members Joined: May 28 16:19:05.080938 [CLM ] r(0) ip(142.135.16.107) May 28 16:19:05.080972 [crm ] notice: pcmk_peer_update: Stable membership event on ring 664: memb=2, new=1, lost=0 May 28 16:19:05.080985 [MAIN ] info: update_member: Node 1796245390/lab07 is now: member May 28 16:19:05.081001 [crm ] info: pcmk_peer_update: NEW: lab07 1796245390 May 28 16:19:05.081036 [crm ] info: pcmk_peer_update: MEMB: lab07 1796245390 May 28 16:19:05.081044 [crm ] info: pcmk_peer_update: MEMB: lab09 1829799822 May 28 16:19:05.081063 [crm ] info: send_member_notification: Sending membership update 664 to 2 children May 28 16:19:05.081118 [SYNC ] This node is within the primary component and will provide service. May 28 16:19:05.081144 [TOTEM] entering OPERATIONAL state. May 28 16:19:05.082382 [MAIN ] info: update_member: 0x7f1188002510 Node 1796245390 (lab07) born on: 664 May 28 16:19:05.082416 [crm ] info: send_member_notification: Sending membership update 664 to 2 children May 28 16:19:05.082757 [CLM ] got nodejoin message 142.135.16.107 May 28 16:19:05.082832 [CLM ] got nodejoin message 142.135.16.109 May 28 16:19:05.087292 [CPG ] got joinlist message from node 1829799822 Then it crashes. Martin (or anybody), have you seen this? I attached my openais.conf file. Maybe I'm doing something stupid in there? Alain -- Alain St-Denis Supercomputing, Systems and Storage / Superinformatique, systèmes et stockage, High Performance Computing Support / Soutien aux calculs en haute performance Chief Information Officer Branch / Direction Générale du dirigeant principal de l'information Environment Canada / Environnement Canada Tel: +1 514 421 4697
# Please read the openais.conf.5 manual page aisexec { # Run as root - this is necessary to be able to manage resources with Pacemaker user: root group: root } service { # Load the Pacemaker Cluster Resource Manager name: pacemaker ver: 0 } totem { version: 2 # How long before declaring a token lost (ms) token: 10000 # How many token retransmits before forming a new configuration token_retransmits_before_loss_const: 20 # How long to wait for join messages in the membership protocol (ms) join: 60 # How long to wait for consensus to be achieved before starting a new round of membership configuration (ms) consensus: 4800 # Turn off the virtual synchrony filter vsftype: none # Number of messages that may be sent by one processor on receipt of the token max_messages: 20 # Limit generated nodeids to 31-bits (positive signed integers) clear_node_high_bit: yes # Disable encryption secauth: off # How many threads to use for encryption/decryption threads: 0 # Optionally assign a fixed node id (integer) # nodeid: 1234 interface { ringnumber: 0 # The following values need to be set based on your environment bindnetaddr: 142.135.16.0 mcastaddr: 226.94.1.1 mcastport: 5405 } } logging { debug: on fileline: off to_syslog: yes to_stderr: yes syslog_facility: daemon timestamp: on } amf { mode: disabled }
_______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker