Re: [Openais] Can't get udpu to work with basic 2-node Corosync cluster.

Jan Friesse Wed, 02 Jan 2013 01:26:35 -0800

Rosser,
actually, problem is much simpler to solve then you would expect.

If you look closely to your config there is:
>       member {
>         mamberaddr: 10.198.156.47
           ^
>       }
>       member {
>         memberaddr: 10.198.156.48
>       }


you have mAmberaddr instead of mEmberaddr ;)

We are checking this in 2.x, but not in flatiron. I will try to look if
there is some non intrusive solution to check this type.

Honza

ROSSER Brad (SPARQ) napsal(a):
> Hi.  I'm having very basic problems in getting Corosync to work for a 2-node 
> cluster using unicast UDP transport.
> 
> I'm running Red Hat Enterprise Linux 6.2 on the two nodes, with Red Hat's 
> Pacemaker 1.1.6-3 and Corosync 1.4.1-4 RPM packages installed.
> 
> I had followed the steps in the (excellent) 'Pacemaker 1.1 - Clusters from 
> Scratch' manual, got to the end of Chapter 5, which deals with testing 
> failover of a simple virtual IP resource from one node to the other upon 
> shutting down Corosync & Pacemaker on the first node, and then thought I'd 
> stop at that point and repeat the exercise with the transport changed from 
> the default UDP multicast transport to 'udpu', as I will be forced to use 
> unicast UDP in my final configuration.  I modified my corosync.conf files on 
> the two nodes and I am having significant problems.
> 
> Both test nodes are KVM virtual machines sitting 'side by side' on the same 
> hypervisor.  Node H has the address 10.198.156.47; node I has address 
> 10.198.156.48.  I've appended the corosync.conf files to the end of this 
> message.  The Pacemaker 'pcmk' plugin is set to version '1'.
> 
> My first problem is that the cluster won't start until *both* nodes are 
> started.  When I start the corosync service on node H it goes through the 
> regular startup sequence in the log file with no problems but then - instead 
> of forming a one-node cluster and establishing the virtual IP resource 
> (no-quorum-policy is set to "ignore") - it goes into a loop, producing these 
> messages every two seconds:
> 
>  corosync [TOTEM ] Totem is unable to form a cluster because of an operating 
> system or network fault. The most common cause of this message is that the 
> local firewall is configured improperly.
> 
> When I start the pacemaker service only the 'pacemakerd' daemon starts; it 
> doesn't fire off the various (including crmd) heartbeat processes.
> 
> It's only when I start corosync and packemaker on the second node, node I, 
> that both machines form a cluster, the heartbeat processes start on both, and 
> the virtual IP resource is configured on node H.
> 
> So that's my first problem; on starting a single node cold the cluster 
> doesn't form, like it does for multicast.  The first node goes into its error 
> message loop until the second node is also started.
> 
> The problem is compounded further by what happens when I test a shutdown of 
> the second node, machine I.  When pacemaker and corosync are shut down on 
> node I node H again starts printing the same 'network fault?' message every 
> two seconds.  However the pacemaker/heartbeat processes remain up on H and 
> crm_mon correctly reports that node I is offline, etc.
> 
> BUT ... when I restart corosync and pacemaker once more on node I immediately 
> afterwards *both* nodes go into the 'network fault' loop, with node I just 
> like H at the start (no heartbeat processes spawned by pacemakerd) and 
> crm_mon still reporting node I as offline.  The pacemakerd process ultimately 
> exits on node I about 9 minutes later saying:
> 
>   pacemakerd: [15668]: ERROR: main: Couldn't connect to Corosync's CFG service
> 
> There seems to be something quite wrong here -
> 
> 1.  The first node won't form a cluster of its own; it only starts the 
> heartbeat process once the second node starts up;
> 
> 2.  Both nodes fail to form a cluster when one of the nodes is stopped and 
> restarted.
> 
> The 'no-quorum-policy' is set to 'ignore' (as per chapter 5).
> 
> There are no firewall (iptables) rules at all on either of the VM nodes.
> 
> UDP was tested as working fine between the two (with netcat (nc)).
> 
> And if I put back the original multicast corosync.conf files on both nodes 
> everything goes back to working as expected.  So no other files or settings 
> seem to be involved.
> 
> (As an aside I'll mention that I couldn't get node I to work at all - it 
> produced a group of pcmk error messages every few seconds - until I modified 
> its corosync.conf to have itself, node I, as the first 'member' in the 
> totem.interface block.  It woudn't work at all when listed second.)
> 
> Can anyone help?  I won't be able to use multicast in my final production 
> configuration, so I desperately need corosync to be able to work properly 
> with udpu.  I'd be most grateful for any assistance.
> 
> Thanks,
> 
> 
> Brad
> 
> 
> This is the output of 'crm configure show':
> 
>   node node_h
>   node node_i
>   primitive ClusterIP ocf:heartbeat:IPaddr2 \
>         params ip="10.198.156.50" cidr_netmask="32" \
>         op monitor interval="30s"
>   property $id="cib-bootstrap-options" \
>         dc-version="1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558" \
>         cluster-infrastructure="openais" \
>         expected-quorum-votes="2" \
>         stonith-enabled="false" \
>         no-quorum-policy="ignore"
> 
> 
> This is the corosync.conf of node 'H':
> 
>   compatibility: whitetank
> 
>   totem {
>     version: 2
>     secauth: off
>     threads: 0
> 
>     interface {
>       member {
>         mamberaddr: 10.198.156.47
>       }
>       member {
>         memberaddr: 10.198.156.48
>       }
> 
>       ringnumber: 0
>       bindnetaddr: 10.198.156.0
>       mcastport: 5405
>       ttl: 1
>     }
> 
>     transport: udpu
>   }
> 
>   logging {
>     fileline: off
>     to_stderr: no
>     to_logfile: yes
>     to_syslog: yes
>     logfile: /var/log/cluster/corosync.log
>     debug: off
>     timestamp: on
>     logger_subsys {
>       subsys: AMF
>       debug: off
>     }
>   }
> 
>   amf {
>     mode: disabled
>   }
> 
> 
> This is the file for node I:
> 
>   compatibility: whitetank
> 
>   totem {
>     version: 2
>     secauth: off
>     threads: 0
> 
>     interface {
>       member {
>         mamberaddr: 10.198.156.48
>       }
>       member {
>         memberaddr: 10.198.156.47
>       }
> 
>       ringnumber: 0
>       bindnetaddr: 10.198.156.0
>       mcastport: 5405
>       ttl: 1
>     }
> 
>     transport: udpu
>   }
> 
>   logging {
>     fileline: off
>     to_stderr: no
>     to_logfile: yes
>     to_syslog: yes
>     logfile: /var/log/cluster/corosync.log
>     debug: off
>     timestamp: on
>     logger_subsys {
>       subsys: AMF
>       debug: off
>     }
>   }
> 
>   amf {
>     mode: disabled
>   }
> 
> 
> Thanks again.
> 
>  
> *************************************************************************************
> This email message (including any file attachments transmitted with it) is 
> for the sole use of the intended recipient(s) and may contain confidential 
> and legally privileged information. Any unauthorised review, use, alteration, 
> disclosure or distribution of this email (including any attachments) by an 
> unintended recipient is prohibited. If you have received this email in error, 
> please notify the sender by return email and destroy all copies of the 
> original message. Any confidential or legal professional privilege is not 
> waived or lost by any mistaken delivery of the email. SPARQ Solutions accepts 
> no responsibility for the content of any email which is sent by an employee 
> which is of a personal nature.
> 
> Sender Details:
>   SPARQ Solutions
>   PO Box 15760 City East, Brisbane QLD Australia 4002
>   +61 7 4931 2222
> 
> SPARQ Solutions policy is to not send unsolicited electronic messages. 
> Suspected breaches of this policy can be reported by replying to this message 
> including the original message and the word "UNSUBSCRIBE" in the subject. 
> 
> *************************************************************************************
> 
> 
> 
> 
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/openais

_______________________________________________
Openais mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/openais

Re: [Openais] Can't get udpu to work with basic 2-node Corosync cluster.

Reply via email to