Re: [ClusterLabs] corosync.conf - reload failed

2018-01-24 Thread bliu

Hi, Sriram


On 01/24/2018 07:02 PM, Sriram wrote:


Hi,


I m trying to update corosync.conf mcast ip in existing cluster.

I use this command "corosync-cfgtool -R" command to reload the 
corosync configuration.


//

/corosync-cfgtool -R/

//

/Reloading corosync.conf.../

//

/Done/

//

But corosync multicast ip not been changed.

it is by design. Not everything can not changed at runtime. And here is 
the list that can not be changed at runtime(for corosync-2.X):


totem.secauth
totem.crypto_hash
totem.crypto_cipher
temp_map, "totem.version
totem.threads
totem.ip_version
totem.rrp_mode
totem.netmtu
totem.interface.ringnumber
totem.interface.bindnetaddr
totem.interface.mcastaddr
totem.interface.broadcast
totem.interface.mcastport
totem.interface.ttl
totem.vsftype
totem.transport
totem.cluster_name
quorum.provider
qb.ipc_type

Bin

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] SLES 12 SP2 HAE

2017-04-06 Thread bliu

Hi, Cristiano
  pacemaker only get the cluster name from corosync on the first 
initialize. Then it will store the name in cib.
even you change the culster name in corosync.conf, pacemaker will not 
update it.


On 04/06/2017 04:16 PM, Cristiano Coltro wrote:


HI Zhen & Ulrich

I have tried to duplicate the issue but I wasn't able to reproduce.

Corosync is updated all the time

So I will not open any bug for the moment.

_I'm sorry for the wrong heads up!_

What is sure is that crm resource show do not reflect dinamically the 
cluster name.


It is written ONLY during ha-cluster-init.

If you change the cluster name while cluster is initialized you have 
to change it manually with crm configure edit


Is this working as designed?

Thanks,

Cristiano

Cristiano Coltro

Premium Support Engineer

mail: cristiano.col...@microfocus.com

mobile: +39 335 1435589

phone +39 02 36634936

__

-Original Message-

From: Zhen Ren

Sent: Thursday, April 6, 2017 4:01 AM

To: Cluster Labs - All topics related to open-source clustering 
welcomed ; Cristiano Coltro 



Subject: Re: [ClusterLabs] SLES 12 SP2 HAE

Hi Cristiano,

Thanks for your report. I suggest you file a bug for this at 
https://bugzilla.suse.com so that it can be routed to the right person 
quickly.


On 04/05/2017 03:26 PM, Cristiano Coltro wrote:

> Hi all,

> I was noticing some behaviour on SLES 12 Sp2 HAE

>

>

>1.  Cluster Name

> If you go to yast2 > cluster and you change the cluster name, the 
change is not reflected in /etc/corosync/corosync.conf  that shows the 
default name "hacluster"


>

>1.  Expected votes

> With a 2 node cluster you have the corosync.conf configured like 
this for the quorum section:


>

> provider: corosync_votequorum

>  expected_votes: 2

>  two_node: 1

>

> and it's correct

>

> If for some reason you redo an "ha-cluster-init"on the same first node

> WITHOUT overwriting the corosync.conf and then you perform an

> "ha-cluster-join" on the second node the corosync.conf changes like

> this

>

>provider: corosync_votequorum

>  expected_votes: 3

>  two_node: 0

>

>  so it seems that cluster do not check the number of 
nodes but simply add a +1 every join you perform IF you DON'T 
overwrite the original corosync.conf.


>

> Are 1 & 2 expected behaviour? Any experience on that?

It's probably a bug to fix, I think, so don't mix yast and HA 
bootstrap scripts to setup your cluster ATM:)


Regards,

Eric

>

> Thanks,

> Cristiano

>

>

> Cristiano Coltro

> Premium Support Engineer

>

> mail:

> cristiano.col...@microfocus.com >

> mobile: +39 335 1435589

> phone +39 02 36634936

> __

> [microfocus-logo]

>

>

>

>

> ___

> Users mailing list: Users@clusterlabs.org

> http://lists.clusterlabs.org/mailman/listinfo/users

>

> Project Home: http://www.clusterlabs.org Getting started:

> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

> Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync cannot acquire quorum

2017-03-12 Thread bliu

I usually use `corosync-cfgtool -R` instead


On 03/13/2017 11:08 AM, cys wrote:

At first there were 2 nodes, nodeA and nodeC. nodeB  was added later.
It seems corosync reload failed.
It's strange that `crm corosync reload` did not print any error message.



在2017年03月13 09时35分, "bliu"<b...@suse.com>写道:


 Hi,

   There is waitforall on nodeC, while not on nodeB, could you
disablewaitforall no nodeC
   And then have a re-try.

On 03/11/2017 10:50 AM, cys wrote:

We have a cluster containing 3 nodes(nodeA, nodeB, nodeC).
After nodeA is taken offline(by ifdown, this may be not
 right?), nodeC cannot acquire quorum while nodeB can.

NodeC: corosync-quorumtool -s
Quorum information
--
Date: Sat Mar 11 10:42:22 2017
Quorum provider:  corosync_votequorum
Nodes2
Node ID:  2
Ring ID:  2/24
Quorate:  No

Votequorum information
--
Expected votes:   3
Highest expected: 3
Total votes:  2
Quorum:  2 Activity blocked
Flags:WaitForAll

Membership information
--
Nodeid  Votes Name

 2  1 68.68.68.15 (local)
 3  1 68.68.68.16

NodeB: corosync-quorumtools -s
Quorum information
--
Date: Sat Mar 11 10:45:44 2017
Quorum provider:  corosync_votequorum
Nodes:2
Node ID:  3
Ring ID:  2/24
Quorate:  Yes

Votequorum information
--
Expected votes:   3
Highest expected: 3
Total votes:  2
Quorum:  2
Flags:Quorate

Membership information
--
Nodeid  Votes Name
 2  1 68.68.68.15
 3  1 68.68.68.16 (local)

So what's the problem?
Thanks.


___
Users mailing 
list:Users@clusterlabs.orghttp://lists.clusterlabs.org/mailman/listinfo/usersProject
 Home:http://www.clusterlabs.orgGetting 
started:http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdfBugs:http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync cannot acquire quorum

2017-03-12 Thread bliu

Hi,

There is waitforall on nodeC, while not on nodeB, could you disable 
waitforall no nodeC

And then have a re-try.

On 03/11/2017 10:50 AM, cys wrote:

We have a cluster containing 3 nodes(nodeA, nodeB, nodeC).
After nodeA is taken offline(by ifdown, this may be not right?), nodeC 
cannot acquire quorum while nodeB can.

NodeC: corosync-quorumtool -s
Quorum information
--
Date: Sat Mar 11 10:42:22 2017
Quorum provider:  corosync_votequorum
Nodes2
Node ID:  2
Ring ID:  2/24
Quorate:  No

Votequorum information
--
Expected votes:   3
Highest expected: 3
Total votes:  2
Quorum:  2 Activity blocked
Flags:WaitForAll

Membership information
--
Nodeid  Votes Name

 2  1 68.68.68.15 (local)
 3  1 68.68.68.16

NodeB: corosync-quorumtools -s
Quorum information
--
Date: Sat Mar 11 10:45:44 2017
Quorum provider:  corosync_votequorum
Nodes:2
Node ID:  3
Ring ID:  2/24
Quorate:  Yes

Votequorum information
--
Expected votes:   3
Highest expected: 3
Total votes:  2
Quorum:  2
Flags:Quorate

Membership information
--
Nodeid  Votes Name
 2  1 68.68.68.15
 3  1 68.68.68.16 (local)

So what's the problem?
Thanks.


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Corosync ring marked as FAULTY

2017-02-22 Thread bliu

Hi


On 02/22/2017 04:24 PM, Denis Gribkov wrote:


Hi,

Just tried - no any packet captured even if I re-enabled ring 0 on all 
nodes.



Re-enable ring 0 will just reset the faulty to 0, no more operation.

Did you specify interface with "-i " when you 
are using tcpdump. If you did, corosync is not talking with the 
multicast address, you need to check if your private network support 
multicast.



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Corosync ring marked as FAULTY

2017-02-22 Thread bliu

Hi, Denis

could you try tcpdump "udp port 5505" on the private network to see if 
there is packet?



On 02/22/2017 03:47 PM, Denis Gribkov wrote:


In our case it does not create problems since all nodes are located in 
few networks whichserved by single router.


There are also no any errors detected on public ring 1 unlike private 
ring 0.


I have a suspicion that this error could be related to private VLAN 
settings but unfortunately have no good idea how to found the issue.


On 22/02/17 09:37, Ulrich Windl wrote:

Is "ttl 1" a good idea for a public network?


Denis Gribkov  schrieb am 21.02.2017 um 18:26 in Nachricht

<4f5543c4-b80c-659d-ed5e-7a99e1482...@itsts.net>:

Hi Everyone.

I have 16-nodes asynchronous cluster configured with Corosync redundant
ring feature.

Each node has 2 similarly connected/configured NIC's. One NIC connected
to the public network,

another one to our private VLAN. When I checked Corosync rings
operability I found:

# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
  id  = 192.168.1.54
  status  = Marking ringid 0 interface 192.168.1.54 FAULTY
RING ID 1
  id  = 111.11.11.1
  status  = ring 1 active with no faults

After some time of digging into I identified that if I enable back the
failed ring with command:

# corosync-cfgtool -r

RING ID 0 will be marked as "active" for few minutes, but after it
marked permanently as faulty.

Log has no any useful info, just single message:

corosync[21740]:   [TOTEM ] Marking ringid 0 interface 192.168.1.54 FAULTY

And no any message like:

[TOTEM ] Automatically recovered ring 1


My corosync.conf looks like:

compatibility: whitetank

totem {
  version: 2
  secauth: on
  threads: 4
  rrp_mode: passive

  interface {

  member {
  memberaddr: PRIVATE_IP_1
  }

...

  member {
  memberaddr: PRIVATE_IP_16
  }

  ringnumber: 0
  bindnetaddr: PRIVATE_NET_ADDR
  mcastaddr: 226.0.0.1
  mcastport: 5505
  ttl: 1
  }

 interface {

  member {
  memberaddr: PUBLIC_IP_1
  }
...

  member {
  memberaddr: PUBLIC_IP_16
  }

  ringnumber: 1
  bindnetaddr: PUBLIC_NET_ADDR
  mcastaddr: 224.0.0.1
  mcastport: 5405
  ttl: 1
  }

  transport: udpu

logging {
  to_stderr: no
  to_logfile: yes
  logfile: /var/log/cluster/corosync.log
  logfile_priority: info
  to_syslog: yes
  syslog_priority: warning
  debug: on
  timestamp: on
}

I had tried to change rrp_mode, mcastaddr/mcastport for ringnumber: 0,
but result was the similar.

I checked multicast/unicast operability using omping utility and didn't
found any issues.

Also no errors on our private VLAN was found for network equipment.

Why Corosync decided to disable permanently second ring? How I can debug
the issue?

Other properties:

Corosync Cluster Engine, version '1.4.7'

Pacemaker properties:
   cluster-infrastructure: cman
   cluster-recheck-interval: 5min
   dc-version: 1.1.14-8.el6-70404b0
   expected-quorum-votes: 3
   have-watchdog: false
   last-lrm-refresh: 1484068350
   maintenance-mode: false
   no-quorum-policy: ignore
   pe-error-series-max: 1000
   pe-input-series-max: 1000
   pe-warn-series-max: 1000
   stonith-action: reboot
   stonith-enabled: false
   symmetric-cluster: false

Thank you.

--
Regards Denis Gribkov




___
Users mailing list:Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home:http://www.clusterlabs.org
Getting started:http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs:http://bugs.clusterlabs.org


--
Regards Denis Gribkov


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] HALVM problem with 2 nodes cluster

2017-01-18 Thread bliu



On 01/18/2017 04:45 PM, Marco Marino wrote:
Hi, I'm trying to realize a cluster with 2 nodes that manages a volume 
group.
Basically I have a san connected to both nodes that exposes 1 lun. So 
both nodes have a disk /dev/sdb. From one node I did:

fdisk /dev/sdb  <- Create a partition with type = 8e (LVM)
pvcreate /dev/sdb1
vgcreate myvg

then

pcs resource create halvm LVM volgrpname=myvg exclusive=true

Last command fails with an error: "LVM: myvg did not activate correctly"

Reading /usr/lib/ocf/resource.d/heartbeat/LVM, this happens because it 
seems that I need at least one logical volume inside the volume group 
before create the resource. Is this correct?

Yes
Furthermore, how can I set volume_list in lvm.conf? Actually in 
lvm.conf I have:

locking_type =3
use_lvmetad = 0

locking_type = 1
use_lvmetad = 1
volume_list = [ "vg-with-root-lv" ]


Thank you




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Stonith : meta-data contains no resource-agent element

2016-11-28 Thread bliu

Hi,
   SSH stonith is just devel demo if I did not remember wrong. If 
you are using openSUSE, you need to install libglue-devel.

I think there similar packages on other distribution.


On 11/21/2016 04:05 AM, jitendra.jaga...@dell.com wrote:


Hello Pacemaker admins,

We have recently updated to Pacemaker version 1.1.15

Before that we were using Pacemaker version 1.1.10

With Pacemaker version 1.1.10, our Stonith ssh agent use to work.

Now once we upgraded to Pacemaker version 1.1.15 we see below errors 
for basic Stonith configuration as explained in 
[http://clusterlabs.org/doc/crm_fencing.html]


=

crm configure primitive st-ssh stonith:external/ssh params 
hostlist="node1 node2"


*ERROR: stonith:external/ssh: meta-data contains no resource-agent 
element*


*ERROR: None*

*ERROR: stonith:external/ssh: meta-data contains no resource-agent 
element*


*ERROR: stonith:external/ssh: no such resource agent*



Additional info

Below is stonith –L output

=

/home/root# stonith -L

apcmaster

apcsmart

baytech

cyclades

drac3

external/drac5

external/dracmc-telnet

external/hetzner

external/hmchttp

external/ibmrsa

external/ibmrsa-telnet

external/ipmi

external/ippower9258

external/kdumpcheck

external/libvirt

external/nut

external/rackpdu

external/riloe

external/sbd

*external/ssh*

external/vcenter

external/vmware

external/xen0

external/xen0-ha

ibmhmc

meatware

null

nw_rpc100s

rcd_serial

rps10

ssh

suicide

wti_nps

=

Above agents are in /usr/lib/stonith/plugins directory

Please can anyone provide solution to resolve above issue.

Thanks

Jitendra



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] corosync-2.3.5 with ipv6 and multicast

2016-03-30 Thread bliu

Hi,
Thanks for your confirmation, and I 've two questions:

On 03/30/2016 05:11 PM, Jan Friesse wrote:



Hi, Honza
   I want to use corosync-2.3.5 with ipv6 and multicast, and I see
that I must specify nodeid in nodelist,
but I also see  nodeid in totem from totemconfig.c. Are they the same?


For given node it's same. Generally it's recommended to use nodelist 
nodeid. It's better simply because you can then copy config file to 
other nodes without changes.


Also if you specify both, warning is logged and nodelist nodeid is used.


Can I specify the nodeid in totem
instead of the on in nodelist.


You can but as long as you don't have really good reason (really 
dynamic cluster without knowledge of nodes beforehand) it's better to 
use node list.


If I use node list with multicast, should I add all nodes to the node 
list? And if so, what's the difference between unicast and multicast in 
this case except the sock addr?



Regards,
  Honza



Thanks
Bin Liu

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] split brain cluster

2015-11-16 Thread Bliu
Richard,
could you use tcpdump to detect whether there is communication between the two 
nodes,and please check the firewall as well

bin

> 在 2015年11月16日,21:43,Richard Korsten  写道:
> 
> Hello Cluster guru's.
> 
> I'm having a bit of trouble with a cluster of ours. After an outage of 1 node 
> it went into a split brain situation where both nodes aren't talking to each 
> other. Both say the other node is offline. I've tried to get them both up and 
> running again by stopping and starting the cluster services on both nodes, 
> one at a time. with out luck.
> 
> I've been trying to reproduce the problem with a set of test servers but i 
> can't seem to get it in the same state. 
> 
> Because of this i'm looking for some help because i'm not that known with 
> pacemaker/corosync.
> 
> this is the output of the command pcs status:
> Cluster name: MXloadbalancer 
> Last updated: Mon Nov 16 10:18:44 2015 
> Last change: Fri Nov 6 15:35:22 2015 
> Stack: corosync 
> Current DC: bckilb01 (1) - partition WITHOUT quorum 
> Version: 1.1.12-a14efad 
> 2 Nodes configured 
> 3 Resources configured 
> 
> Online: [ bckilb01 ] 
> OFFLINE: [ bckilb02 ] 
> 
> Full list of resources:
>  haproxy (systemd:haproxy): Stopped 
> 
> Resource Group: MXVIP 
> ip-192.168.250.200 (ocf::heartbeat:IPaddr2): Stopped 
> ip-192.168.250.201 (ocf::heartbeat:IPaddr2): Stopped 
> 
> PCSD Status: 
> bckilb01: Online 
> bckilb02: Online 
> 
> Daemon Status: 
> corosync: active/enabled 
> pacemaker: active/enabled 
> pcsd: active/enabled 
> 
> 
> And the config:
> totem { 
> version: 2 
> secauth: off 
> cluster_name: MXloadbalancer 
> transport: udpu }
>  
> nodelist { 
> node { ring0_addr: bckilb01 nodeid: 1 } 
> node { ring0_addr: bckilb02 nodeid: 2 } } 
> quorum { provider: corosync_votequorum two_node: 1 } 
> logging { to_syslog: yes }
> 
> If any has an idea about how to get them working together again please let me 
> know.
> 
> Greetings Richard
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org