Rosser,
actually, problem is much simpler to solve then you would expect.
If you look closely to your config there is:
> member {
> mamberaddr: 10.198.156.47
^
> }
> member {
> memberaddr: 10.198.156.48
> }
you have mAmberaddr instead of mEmberaddr ;)
We are checking this in 2.x, but not in flatiron. I will try to look if
there is some non intrusive solution to check this type.
Honza
ROSSER Brad (SPARQ) napsal(a):
> Hi. I'm having very basic problems in getting Corosync to work for a 2-node
> cluster using unicast UDP transport.
>
> I'm running Red Hat Enterprise Linux 6.2 on the two nodes, with Red Hat's
> Pacemaker 1.1.6-3 and Corosync 1.4.1-4 RPM packages installed.
>
> I had followed the steps in the (excellent) 'Pacemaker 1.1 - Clusters from
> Scratch' manual, got to the end of Chapter 5, which deals with testing
> failover of a simple virtual IP resource from one node to the other upon
> shutting down Corosync & Pacemaker on the first node, and then thought I'd
> stop at that point and repeat the exercise with the transport changed from
> the default UDP multicast transport to 'udpu', as I will be forced to use
> unicast UDP in my final configuration. I modified my corosync.conf files on
> the two nodes and I am having significant problems.
>
> Both test nodes are KVM virtual machines sitting 'side by side' on the same
> hypervisor. Node H has the address 10.198.156.47; node I has address
> 10.198.156.48. I've appended the corosync.conf files to the end of this
> message. The Pacemaker 'pcmk' plugin is set to version '1'.
>
> My first problem is that the cluster won't start until *both* nodes are
> started. When I start the corosync service on node H it goes through the
> regular startup sequence in the log file with no problems but then - instead
> of forming a one-node cluster and establishing the virtual IP resource
> (no-quorum-policy is set to "ignore") - it goes into a loop, producing these
> messages every two seconds:
>
> corosync [TOTEM ] Totem is unable to form a cluster because of an operating
> system or network fault. The most common cause of this message is that the
> local firewall is configured improperly.
>
> When I start the pacemaker service only the 'pacemakerd' daemon starts; it
> doesn't fire off the various (including crmd) heartbeat processes.
>
> It's only when I start corosync and packemaker on the second node, node I,
> that both machines form a cluster, the heartbeat processes start on both, and
> the virtual IP resource is configured on node H.
>
> So that's my first problem; on starting a single node cold the cluster
> doesn't form, like it does for multicast. The first node goes into its error
> message loop until the second node is also started.
>
> The problem is compounded further by what happens when I test a shutdown of
> the second node, machine I. When pacemaker and corosync are shut down on
> node I node H again starts printing the same 'network fault?' message every
> two seconds. However the pacemaker/heartbeat processes remain up on H and
> crm_mon correctly reports that node I is offline, etc.
>
> BUT ... when I restart corosync and pacemaker once more on node I immediately
> afterwards *both* nodes go into the 'network fault' loop, with node I just
> like H at the start (no heartbeat processes spawned by pacemakerd) and
> crm_mon still reporting node I as offline. The pacemakerd process ultimately
> exits on node I about 9 minutes later saying:
>
> pacemakerd: [15668]: ERROR: main: Couldn't connect to Corosync's CFG service
>
> There seems to be something quite wrong here -
>
> 1. The first node won't form a cluster of its own; it only starts the
> heartbeat process once the second node starts up;
>
> 2. Both nodes fail to form a cluster when one of the nodes is stopped and
> restarted.
>
> The 'no-quorum-policy' is set to 'ignore' (as per chapter 5).
>
> There are no firewall (iptables) rules at all on either of the VM nodes.
>
> UDP was tested as working fine between the two (with netcat (nc)).
>
> And if I put back the original multicast corosync.conf files on both nodes
> everything goes back to working as expected. So no other files or settings
> seem to be involved.
>
> (As an aside I'll mention that I couldn't get node I to work at all - it
> produced a group of pcmk error messages every few seconds - until I modified
> its corosync.conf to have itself, node I, as the first 'member' in the
> totem.interface block. It woudn't work at all when listed second.)
>
> Can anyone help? I won't be able to use multicast in my final production
> configuration, so I desperately need corosync to be able to work properly
> with udpu. I'd be most grateful for any assistance.
>
> Thanks,
>
>
> Brad
>
>
> This is the output of 'crm configure show':
>
> node node_h
> node node_i
> primitive ClusterIP ocf:heartbeat:IPaddr2 \
> params ip="10.198.156.50" cidr_netmask="32" \
> op monitor interval="30s"
> property $id="cib-bootstrap-options" \
> dc-version="1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> stonith-enabled="false" \
> no-quorum-policy="ignore"
>
>
> This is the corosync.conf of node 'H':
>
> compatibility: whitetank
>
> totem {
> version: 2
> secauth: off
> threads: 0
>
> interface {
> member {
> mamberaddr: 10.198.156.47
> }
> member {
> memberaddr: 10.198.156.48
> }
>
> ringnumber: 0
> bindnetaddr: 10.198.156.0
> mcastport: 5405
> ttl: 1
> }
>
> transport: udpu
> }
>
> logging {
> fileline: off
> to_stderr: no
> to_logfile: yes
> to_syslog: yes
> logfile: /var/log/cluster/corosync.log
> debug: off
> timestamp: on
> logger_subsys {
> subsys: AMF
> debug: off
> }
> }
>
> amf {
> mode: disabled
> }
>
>
> This is the file for node I:
>
> compatibility: whitetank
>
> totem {
> version: 2
> secauth: off
> threads: 0
>
> interface {
> member {
> mamberaddr: 10.198.156.48
> }
> member {
> memberaddr: 10.198.156.47
> }
>
> ringnumber: 0
> bindnetaddr: 10.198.156.0
> mcastport: 5405
> ttl: 1
> }
>
> transport: udpu
> }
>
> logging {
> fileline: off
> to_stderr: no
> to_logfile: yes
> to_syslog: yes
> logfile: /var/log/cluster/corosync.log
> debug: off
> timestamp: on
> logger_subsys {
> subsys: AMF
> debug: off
> }
> }
>
> amf {
> mode: disabled
> }
>
>
> Thanks again.
>
>
> *************************************************************************************
> This email message (including any file attachments transmitted with it) is
> for the sole use of the intended recipient(s) and may contain confidential
> and legally privileged information. Any unauthorised review, use, alteration,
> disclosure or distribution of this email (including any attachments) by an
> unintended recipient is prohibited. If you have received this email in error,
> please notify the sender by return email and destroy all copies of the
> original message. Any confidential or legal professional privilege is not
> waived or lost by any mistaken delivery of the email. SPARQ Solutions accepts
> no responsibility for the content of any email which is sent by an employee
> which is of a personal nature.
>
> Sender Details:
> SPARQ Solutions
> PO Box 15760 City East, Brisbane QLD Australia 4002
> +61 7 4931 2222
>
> SPARQ Solutions policy is to not send unsolicited electronic messages.
> Suspected breaches of this policy can be reported by replying to this message
> including the original message and the word "UNSUBSCRIBE" in the subject.
>
> *************************************************************************************
>
>
>
>
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/openais
_______________________________________________
Openais mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/openais