Re: [ClusterLabs] Users Digest, Vol 15, Issue 18

2016-04-13 Thread dinor geler
thank you Digimer ,i will try it on centos 7.
thanks again for the help.


On Wed, Apr 13, 2016 at 1:00 PM,  wrote:

> Send Users mailing list submissions to
> users@clusterlabs.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://clusterlabs.org/mailman/listinfo/users
> or, via email, send a message with subject or body 'help' to
> users-requ...@clusterlabs.org
>
> You can reach the person managing the list at
> users-ow...@clusterlabs.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Users digest..."
>
>
> Today's Topics:
>
>1. Re: Totem is unable to form a cluster because of an operating
>   system or network fault (Digimer)
>2. HA meetup at OpenStack Summit (Ken Gaillot)
>3. Re: HA meetup at OpenStack Summit (Digimer)
>4. Re: Totem is unable to form a cluster because of an operating
>   system or network fault (Jan Friesse)
>
>
> --
>
> Message: 1
> Date: Tue, 12 Apr 2016 11:46:21 -0400
> From: Digimer 
> To: Cluster Labs - All topics related to open-source clustering
> welcomed
> Subject: Re: [ClusterLabs] Totem is unable to form a cluster because
> of an operating system or network fault
> Message-ID: <570d184d.6000...@alteeve.ca>
> Content-Type: text/plain; charset=windows-1252
>
> On 12/04/16 07:44 AM, dinor geler wrote:
> > Hi ,
> > Am trying to configure my sql on ubuntu according to this article :
> >
> https://azure.microsoft.com/en-in/documentation/articles/virtual-machines-linux-classic-mysql-cluster/
> >
> > two node cluster
> >
> >
> > looking on corosync log :
> >
> >
> > Apr 12 11:01:09 corosync [TOTEM ] Totem is unable to form a cluster
> > because of an operating system or network fault. The most common cause
> > of this message is that the local firewall is configured improperly.
> > Apr 12 11:01:11 corosync [TOTEM ] Totem is unable to form a cluster
> > because of an operating system or network fault. The most common cause
> > of this message is that the local firewall is configured improperly.
> > Apr 12 11:01:13 corosync [TOTEM ] Totem is unable to form a cluster
> > because of an operating system or network fault. The most common cause
> > of this message is that the local firewall is configured improperly.
> > Apr 12 11:01:16 corosync [TOTEM ] Totem is unable to form a cluster
> > because of an operating system or network fault. The most common cause
> > of this message is that the local firewall is configured improperly.
> > Apr 12 11:01:18 corosync [TOTEM ] Totem is unable to form a cluster
> > because of an operating system or network fault. The most common cause
> > of this message is that the local firewall is configured improperly.
> > Apr 12 11:01:20 corosync [TOTEM ] Totem is unable to form a cluster
> > because of an operating system or network fault. The most common cause
> > of this message is that the local firewall is configured improperly.
> > Apr 12 11:01:22 corosync [TOTEM ] Totem is unable to form a cluster
> > because of an operating system or network fault. The most common cause
> > of this message is that the local firewall is configured improperly.
> > Apr 12 11:01:24 corosync [TOTEM ] Totem is unable to form a cluster
> > because of an operating system or network fault. The most common cause
> > of this message is that the local firewall is configured improperly.
> > Apr 12 11:01:27 corosync [TOTEM ] Totem is unable to form a cluster
> > because of an operating system or network fault. The most common cause
> > of this message is that the local firewall is configured improperly.
> > Apr 12 11:01:29 corosync [TOTEM ] Totem is unable to form a cluster
> > because of an operating system or network fault. The most common cause
> > of this message is that the local firewall is configured improperly.
> > Apr 12 11:01:31 corosync [TOTEM ] Totem is unable to form a cluster
> > because of an operating system or network fault. The most common cause
> > of this message is that the local firewall is configured improperly.
> >
> >
> >
> > totem {
> >   version: 2
> >   crypto_cipher: none
> >   crypto_hash: none
> >   interface {
> > ringnumber: 0
> > bindnetaddr: 10.1.0.0
> > mcastport: 5405
> > ttl: 1
> >   }
> >   transport: udpu
> > }
> > logging {
> >   fileline: off
> >   to_logfile: yes
> >   to_syslog: yes
> >   logfile: /var/log/corosync/corosync.log
> >   debug: off
> >   timestamp: on
> >   logger_subsys {
> > subsys: QUORUM
> > debug: off
> > }
> >   }
> > nodelist {
> >   node {
> > ring0_addr: 10.1.0.6
> > nodeid: 1
> >   }
> >   node {
> > ring0_addr: 10.1.0.7
> > nodeid: 2
> >   }
> > }
> > quorum {
> >   provider: corosync_votequorum
> > }
> >
> >
> > If I initiate a tcpdump on node 2 and start either a netcat or nmap I
> > see packet arrives to destination host for port 5405 UDP traffic
> >
> >

Re: [ClusterLabs] service flap as nodes join and leave

2016-04-13 Thread Christopher Harvey


On Wed, Apr 13, 2016, at 12:36 PM, Ken Gaillot wrote:
> On 04/13/2016 11:23 AM, Christopher Harvey wrote:
> > I have a 3 node cluster (see the bottom of this email for 'pcs config'
> > output) with 3 nodes. The MsgBB-Active and AD-Active service both flap
> > whenever a node joins or leaves the cluster. I trigger the leave and
> > join with a pacemaker service start and stop on any node.
> 
> That's the default behavior of clones used in ordering constraints. If
> you set interleave=true on your clones, each dependent clone instance
> will only care about the depended-on instances on its own node, rather
> than all nodes.
> 
> See
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_clone_options
> 
> While the interleave=true behavior is much more commonly used,
> interleave=false is the default because it's safer -- the cluster
> doesn't know anything about the cloned service, so it can't assume the
> service is OK with it. Since you know what your service does, you can
> set interleave=true for services that can handle it.

Hi Ken,

Thanks for pointing out that attribute to me. I applied it as follows:
 Clone: Router-clone
  Meta Attrs: clone-max=2 clone-node-max=1 interleave=true
  Resource: Router (class=ocf provider=solace type=Router)
   Meta Attrs: migration-threshold=1 failure-timeout=1s
   Operations: start interval=0s timeout=2 (Router-start-interval-0s)
   stop interval=0s timeout=2 (Router-stop-interval-0s)
   monitor interval=1s (Router-monitor-interval-1s)

It doesn't seems to change the behavior. Moreover, I found that I can
start/stop the pacemaker instance on the vmr-123-5 node and produce the
same flap on the MsgBB-Active resource on vmr-132-3 node. The Router
clones are never shutdown or started. I would have thought if everything
else in the cluster is constant, vmr-132-5 could never affect resources
on the other two.

> > Here is the happy steady state setup:
> > 
> > 3 nodes and 4 resources configured
> > 
> > Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
> > 
> >  Clone Set: Router-clone [Router]
> >  Started: [ vmr-132-3 vmr-132-4 ]
> > MsgBB-Active(ocf::solace:MsgBB-Active): Started vmr-132-3
> > AD-Active   (ocf::solace:AD-Active):Started vmr-132-3
> > 
> > [root@vmr-132-4 ~]# supervisorctl stop pacemaker
> > no change, except vmr-132-4 goes offline
> > [root@vmr-132-4 ~]# supervisorctl start pacemaker
> > vmr-132-4 comes back online
> > MsgBB-Active and AD-Active flap very quickly (<1s)
> > Steady state is resumed.
> > 
> > Why should the fact that vmr-132-4 coming and going affect the service
> > on any other node?
> > 
> > Thanks,
> > Chris
> > 
> > Cluster Name:
> > Corosync Nodes:
> >  192.168.132.5 192.168.132.4 192.168.132.3
> > Pacemaker Nodes:
> >  vmr-132-3 vmr-132-4 vmr-132-5
> > 
> > Resources:
> >  Clone: Router-clone
> >   Meta Attrs: clone-max=2 clone-node-max=1
> >   Resource: Router (class=ocf provider=solace type=Router)
> >Meta Attrs: migration-threshold=1 failure-timeout=1s
> >Operations: start interval=0s timeout=2 (Router-start-timeout-2)
> >stop interval=0s timeout=2 (Router-stop-timeout-2)
> >monitor interval=1s (Router-monitor-interval-1s)
> >  Resource: MsgBB-Active (class=ocf provider=solace type=MsgBB-Active)
> >   Meta Attrs: migration-threshold=2 failure-timeout=1s
> >   Operations: start interval=0s timeout=2 (MsgBB-Active-start-timeout-2)
> >   stop interval=0s timeout=2 (MsgBB-Active-stop-timeout-2)
> >   monitor interval=1s (MsgBB-Active-monitor-interval-1s)
> >  Resource: AD-Active (class=ocf provider=solace type=AD-Active)
> >   Meta Attrs: migration-threshold=2 failure-timeout=1s
> >   Operations: start interval=0s timeout=2 (AD-Active-start-timeout-2)
> >   stop interval=0s timeout=2 (AD-Active-stop-timeout-2)
> >   monitor interval=1s (AD-Active-monitor-interval-1s)
> > 
> > Stonith Devices:
> > Fencing Levels:
> > 
> > Location Constraints:
> >   Resource: AD-Active
> > Disabled on: vmr-132-5 (score:-INFINITY) (id:ADNotOnMonitor)
> >   Resource: MsgBB-Active
> > Enabled on: vmr-132-4 (score:100) (id:vmr-132-4Priority)
> > Enabled on: vmr-132-3 (score:250) (id:vmr-132-3Priority)
> > Disabled on: vmr-132-5 (score:-INFINITY) (id:MsgBBNotOnMonitor)
> >   Resource: Router-clone
> > Disabled on: vmr-132-5 (score:-INFINITY) (id:RouterNotOnMonitor)
> > Ordering Constraints:
> >   Resource Sets:
> > set Router-clone MsgBB-Active sequential=true
> > (id:pcs_rsc_set_Router-clone_MsgBB-Active) setoptions kind=Mandatory
> > (id:pcs_rsc_order_Router-clone_MsgBB-Active)
> > set MsgBB-Active AD-Active sequential=true
> > (id:pcs_rsc_set_MsgBB-Active_AD-Active) setoptions kind=Mandatory
> > (id:pcs_rsc_order_MsgBB-Active_AD-Active)
> > Colocation Constraints:
> >   MsgBB-Active with Router-clone (score:INFINITY)
> >   (id:colocation-MsgBB-

Re: [ClusterLabs] service flap as nodes join and leave

2016-04-13 Thread Ken Gaillot
On 04/13/2016 11:23 AM, Christopher Harvey wrote:
> I have a 3 node cluster (see the bottom of this email for 'pcs config'
> output) with 3 nodes. The MsgBB-Active and AD-Active service both flap
> whenever a node joins or leaves the cluster. I trigger the leave and
> join with a pacemaker service start and stop on any node.

That's the default behavior of clones used in ordering constraints. If
you set interleave=true on your clones, each dependent clone instance
will only care about the depended-on instances on its own node, rather
than all nodes.

See
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_clone_options

While the interleave=true behavior is much more commonly used,
interleave=false is the default because it's safer -- the cluster
doesn't know anything about the cloned service, so it can't assume the
service is OK with it. Since you know what your service does, you can
set interleave=true for services that can handle it.

> Here is the happy steady state setup:
> 
> 3 nodes and 4 resources configured
> 
> Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
> 
>  Clone Set: Router-clone [Router]
>  Started: [ vmr-132-3 vmr-132-4 ]
> MsgBB-Active(ocf::solace:MsgBB-Active): Started vmr-132-3
> AD-Active   (ocf::solace:AD-Active):Started vmr-132-3
> 
> [root@vmr-132-4 ~]# supervisorctl stop pacemaker
> no change, except vmr-132-4 goes offline
> [root@vmr-132-4 ~]# supervisorctl start pacemaker
> vmr-132-4 comes back online
> MsgBB-Active and AD-Active flap very quickly (<1s)
> Steady state is resumed.
> 
> Why should the fact that vmr-132-4 coming and going affect the service
> on any other node?
> 
> Thanks,
> Chris
> 
> Cluster Name:
> Corosync Nodes:
>  192.168.132.5 192.168.132.4 192.168.132.3
> Pacemaker Nodes:
>  vmr-132-3 vmr-132-4 vmr-132-5
> 
> Resources:
>  Clone: Router-clone
>   Meta Attrs: clone-max=2 clone-node-max=1
>   Resource: Router (class=ocf provider=solace type=Router)
>Meta Attrs: migration-threshold=1 failure-timeout=1s
>Operations: start interval=0s timeout=2 (Router-start-timeout-2)
>stop interval=0s timeout=2 (Router-stop-timeout-2)
>monitor interval=1s (Router-monitor-interval-1s)
>  Resource: MsgBB-Active (class=ocf provider=solace type=MsgBB-Active)
>   Meta Attrs: migration-threshold=2 failure-timeout=1s
>   Operations: start interval=0s timeout=2 (MsgBB-Active-start-timeout-2)
>   stop interval=0s timeout=2 (MsgBB-Active-stop-timeout-2)
>   monitor interval=1s (MsgBB-Active-monitor-interval-1s)
>  Resource: AD-Active (class=ocf provider=solace type=AD-Active)
>   Meta Attrs: migration-threshold=2 failure-timeout=1s
>   Operations: start interval=0s timeout=2 (AD-Active-start-timeout-2)
>   stop interval=0s timeout=2 (AD-Active-stop-timeout-2)
>   monitor interval=1s (AD-Active-monitor-interval-1s)
> 
> Stonith Devices:
> Fencing Levels:
> 
> Location Constraints:
>   Resource: AD-Active
> Disabled on: vmr-132-5 (score:-INFINITY) (id:ADNotOnMonitor)
>   Resource: MsgBB-Active
> Enabled on: vmr-132-4 (score:100) (id:vmr-132-4Priority)
> Enabled on: vmr-132-3 (score:250) (id:vmr-132-3Priority)
> Disabled on: vmr-132-5 (score:-INFINITY) (id:MsgBBNotOnMonitor)
>   Resource: Router-clone
> Disabled on: vmr-132-5 (score:-INFINITY) (id:RouterNotOnMonitor)
> Ordering Constraints:
>   Resource Sets:
> set Router-clone MsgBB-Active sequential=true
> (id:pcs_rsc_set_Router-clone_MsgBB-Active) setoptions kind=Mandatory
> (id:pcs_rsc_order_Router-clone_MsgBB-Active)
> set MsgBB-Active AD-Active sequential=true
> (id:pcs_rsc_set_MsgBB-Active_AD-Active) setoptions kind=Mandatory
> (id:pcs_rsc_order_MsgBB-Active_AD-Active)
> Colocation Constraints:
>   MsgBB-Active with Router-clone (score:INFINITY)
>   (id:colocation-MsgBB-Active-Router-clone-INFINITY)
>   AD-Active with MsgBB-Active (score:1000)
>   (id:colocation-AD-Active-MsgBB-Active-1000)
> 
> Resources Defaults:
>  No defaults set
> Operations Defaults:
>  No defaults set
> 
> Cluster Properties:
>  cluster-infrastructure: corosync
>  cluster-recheck-interval: 1s
>  dc-version: 1.1.13-10.el7_2.2-44eb2dd
>  have-watchdog: false
>  maintenance-mode: false
>  start-failure-is-fatal: false
>  stonith-enabled: false


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] service flap as nodes join and leave

2016-04-13 Thread Christopher Harvey
I have a 3 node cluster (see the bottom of this email for 'pcs config'
output) with 3 nodes. The MsgBB-Active and AD-Active service both flap
whenever a node joins or leaves the cluster. I trigger the leave and
join with a pacemaker service start and stop on any node.

Here is the happy steady state setup:

3 nodes and 4 resources configured

Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]

 Clone Set: Router-clone [Router]
 Started: [ vmr-132-3 vmr-132-4 ]
MsgBB-Active(ocf::solace:MsgBB-Active): Started vmr-132-3
AD-Active   (ocf::solace:AD-Active):Started vmr-132-3

[root@vmr-132-4 ~]# supervisorctl stop pacemaker
no change, except vmr-132-4 goes offline
[root@vmr-132-4 ~]# supervisorctl start pacemaker
vmr-132-4 comes back online
MsgBB-Active and AD-Active flap very quickly (<1s)
Steady state is resumed.

Why should the fact that vmr-132-4 coming and going affect the service
on any other node?

Thanks,
Chris

Cluster Name:
Corosync Nodes:
 192.168.132.5 192.168.132.4 192.168.132.3
Pacemaker Nodes:
 vmr-132-3 vmr-132-4 vmr-132-5

Resources:
 Clone: Router-clone
  Meta Attrs: clone-max=2 clone-node-max=1
  Resource: Router (class=ocf provider=solace type=Router)
   Meta Attrs: migration-threshold=1 failure-timeout=1s
   Operations: start interval=0s timeout=2 (Router-start-timeout-2)
   stop interval=0s timeout=2 (Router-stop-timeout-2)
   monitor interval=1s (Router-monitor-interval-1s)
 Resource: MsgBB-Active (class=ocf provider=solace type=MsgBB-Active)
  Meta Attrs: migration-threshold=2 failure-timeout=1s
  Operations: start interval=0s timeout=2 (MsgBB-Active-start-timeout-2)
  stop interval=0s timeout=2 (MsgBB-Active-stop-timeout-2)
  monitor interval=1s (MsgBB-Active-monitor-interval-1s)
 Resource: AD-Active (class=ocf provider=solace type=AD-Active)
  Meta Attrs: migration-threshold=2 failure-timeout=1s
  Operations: start interval=0s timeout=2 (AD-Active-start-timeout-2)
  stop interval=0s timeout=2 (AD-Active-stop-timeout-2)
  monitor interval=1s (AD-Active-monitor-interval-1s)

Stonith Devices:
Fencing Levels:

Location Constraints:
  Resource: AD-Active
Disabled on: vmr-132-5 (score:-INFINITY) (id:ADNotOnMonitor)
  Resource: MsgBB-Active
Enabled on: vmr-132-4 (score:100) (id:vmr-132-4Priority)
Enabled on: vmr-132-3 (score:250) (id:vmr-132-3Priority)
Disabled on: vmr-132-5 (score:-INFINITY) (id:MsgBBNotOnMonitor)
  Resource: Router-clone
Disabled on: vmr-132-5 (score:-INFINITY) (id:RouterNotOnMonitor)
Ordering Constraints:
  Resource Sets:
set Router-clone MsgBB-Active sequential=true
(id:pcs_rsc_set_Router-clone_MsgBB-Active) setoptions kind=Mandatory
(id:pcs_rsc_order_Router-clone_MsgBB-Active)
set MsgBB-Active AD-Active sequential=true
(id:pcs_rsc_set_MsgBB-Active_AD-Active) setoptions kind=Mandatory
(id:pcs_rsc_order_MsgBB-Active_AD-Active)
Colocation Constraints:
  MsgBB-Active with Router-clone (score:INFINITY)
  (id:colocation-MsgBB-Active-Router-clone-INFINITY)
  AD-Active with MsgBB-Active (score:1000)
  (id:colocation-AD-Active-MsgBB-Active-1000)

Resources Defaults:
 No defaults set
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-recheck-interval: 1s
 dc-version: 1.1.13-10.el7_2.2-44eb2dd
 have-watchdog: false
 maintenance-mode: false
 start-failure-is-fatal: false
 stonith-enabled: false

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] HA meetup at OpenStack Summit

2016-04-13 Thread Digimer
On 13/04/16 10:16 AM, Ken Gaillot wrote:
> On 04/12/2016 06:39 PM, Digimer wrote:
>> On 12/04/16 07:09 PM, Ken Gaillot wrote:
>>> Hi everybody,
>>>
>>> The upcoming OpenStack Summit is April 25-29 in Austin, Texas (US). Some
>>> regular ClusterLabs contributors are going, so I was wondering if anyone
>>> would like to do an informal meetup sometime during the summit.
>>>
>>> It looks like the best time would be that Wednesday, either lunch (at
>>> the venue) or dinner (offsite). It might also be possible to reserve a
>>> small (10-person) meeting room, or just meet informally in the expo hall.
>>>
>>> Anyone interested? Preferences/conflicts?
>>
>> Informal meet-up, or to try and get work done?
> 
> Informal, though of course HA will be the likely topic of conversation :)

OK, would be expensive to come down for $drinks, but I think we're still
working on something semi-official for late summer/early spring.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] HA meetup at OpenStack Summit

2016-04-13 Thread Ken Gaillot
On 04/12/2016 06:39 PM, Digimer wrote:
> On 12/04/16 07:09 PM, Ken Gaillot wrote:
>> Hi everybody,
>>
>> The upcoming OpenStack Summit is April 25-29 in Austin, Texas (US). Some
>> regular ClusterLabs contributors are going, so I was wondering if anyone
>> would like to do an informal meetup sometime during the summit.
>>
>> It looks like the best time would be that Wednesday, either lunch (at
>> the venue) or dinner (offsite). It might also be possible to reserve a
>> small (10-person) meeting room, or just meet informally in the expo hall.
>>
>> Anyone interested? Preferences/conflicts?
> 
> Informal meet-up, or to try and get work done?

Informal, though of course HA will be the likely topic of conversation :)

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] HA meetup at OpenStack Summit & Vault?

2016-04-13 Thread Lars Marowsky-Bree
On 2016-04-12T19:39:25, Digimer  wrote:

Alas, I won't make it to the Summit, but if anyone else is at Vault (the
week before in Raleigh), I'd be happy to meet!


Regards,
Lars

-- 
Architect SDS
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Totem is unable to form a cluster because of an operating system or network fault

2016-04-13 Thread Jan Friesse

Hi ,
Am trying to configure my sql on ubuntu according to this article :
https://azure.microsoft.com/en-in/documentation/articles/virtual-machines-linux-classic-mysql-cluster/

two node cluster


looking on corosync log :


Apr 12 11:01:09 corosync [TOTEM ] Totem is unable to form a cluster because
of an operating system or network fault. The most common cause of this
message is that the local firewall is configured improperly.
Apr 12 11:01:11 corosync [TOTEM ] Totem is unable to form a cluster because
of an operating system or network fault. The most common cause of this
message is that the local firewall is configured improperly.
Apr 12 11:01:13 corosync [TOTEM ] Totem is unable to form a cluster because
of an operating system or network fault. The most common cause of this
message is that the local firewall is configured improperly.
Apr 12 11:01:16 corosync [TOTEM ] Totem is unable to form a cluster because
of an operating system or network fault. The most common cause of this
message is that the local firewall is configured improperly.
Apr 12 11:01:18 corosync [TOTEM ] Totem is unable to form a cluster because
of an operating system or network fault. The most common cause of this
message is that the local firewall is configured improperly.
Apr 12 11:01:20 corosync [TOTEM ] Totem is unable to form a cluster because
of an operating system or network fault. The most common cause of this
message is that the local firewall is configured improperly.
Apr 12 11:01:22 corosync [TOTEM ] Totem is unable to form a cluster because
of an operating system or network fault. The most common cause of this
message is that the local firewall is configured improperly.
Apr 12 11:01:24 corosync [TOTEM ] Totem is unable to form a cluster because
of an operating system or network fault. The most common cause of this
message is that the local firewall is configured improperly.
Apr 12 11:01:27 corosync [TOTEM ] Totem is unable to form a cluster because
of an operating system or network fault. The most common cause of this
message is that the local firewall is configured improperly.
Apr 12 11:01:29 corosync [TOTEM ] Totem is unable to form a cluster because
of an operating system or network fault. The most common cause of this
message is that the local firewall is configured improperly.
Apr 12 11:01:31 corosync [TOTEM ] Totem is unable to form a cluster because
of an operating system or network fault. The most common cause of this
message is that the local firewall is configured improperly.



totem {
   version: 2
   crypto_cipher: none
   crypto_hash: none
   interface {
 ringnumber: 0
 bindnetaddr: 10.1.0.0
 mcastport: 5405
 ttl: 1
   }
   transport: udpu
}
logging {
   fileline: off
   to_logfile: yes
   to_syslog: yes
   logfile: /var/log/corosync/corosync.log
   debug: off
   timestamp: on
   logger_subsys {
 subsys: QUORUM
 debug: off
 }
   }
nodelist {
   node {
 ring0_addr: 10.1.0.6
 nodeid: 1
   }
   node {
 ring0_addr: 10.1.0.7
 nodeid: 2
   }
}
quorum {
   provider: corosync_votequorum
}


If I initiate a tcpdump on node 2 and start either a netcat or nmap I see
packet arrives to destination host for port 5405 UDP traffic



I do see Corosync listening on the IP/PORT





root@node-2:/home/dinor# netstat -an | grep -i 5405

udp0  0 10.1.0.7:5405   0.0.0.0:*






root@node-1:/home/dinor# netstat -an | grep -i 5405

udp0  0 10.1.0.6:5405   0.0.0.0:*





On node 1 I start a netcat to port 5405 via udp



netcat -D -4 -u 10.1.0.7 5405



In here you type some text and hit enter



On node 1 tcpdump we see data sent to IP 10.1.0.7



root@node-1:/var/log/corosync# tcpdump -n udp port 5405

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes

10:08:24.484533 IP 10.1.0.6.44299 > 10.1.0.7.5405: UDP, length 26







On node 2 tcpdump I see the data arrive



root@node-2:/var/log/corosync# tcpdump -n udp port 5405

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes

10:08:24.484892 IP 10.1.0.6.44299 > 10.1.0.7.5405: UDP, length 26





Tested also sending UDP packets from node 2 – all ok.



So connectivity seems to be ok.



Port scanner also shows the port as Open





root@node-1:/home/dinor# nmap -sUV 10.1.0.7 -p 5402-5405



Starting Nmap 5.21 ( http://nmap.org ) at 2016-04-12 10:31 UTC

Nmap scan report for node-2 (10.1.0.7)

Host is up (0.00060s latency).

PORT STATE SERVICE VERSION

5402/udp closedunknown

5403/udp closedunknown

5404/udp closedunknown

*5405/udp open|filtered unknown*

MAC Address: 12:34:56:78:9A:BC (Unknown)



Service detection performed. Please report any incorrect results at
http://nmap.org/submit/ .

Nmap done: 1 IP address (1 host up) scanned in 79.07 seconds





There is no FW and