[ClusterLabs] One cluster with two groups of nodes

2017-11-08 Thread Alberto Mijares
Hi guys, nice to say hello here.

I've been assigned with a very particular task: There's a
pacemaker-based cluster with 6 nodes. A system runs on three nodes
(group A), while the other three are hot-standby spares (group B).

Resources from group A are never supposed to me relocated individually
into nodes from group B. However, if any of the resources from group A
fails, all resources must be relocated into group B. It's an "all or
nothing" failover.

Ideally, you would split the cluster into two clusters and implement
Cluster-Sites and Tickets Management; however, it's not possible.

Taking all this into account, can you kindly suggest an strategy for
achieving the goal? I have some ideas but I'd like to hear from those
who have a lot more experience than me.

Thanks in advance,


Alberto Mijares

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Corosync OCFS2

2017-11-08 Thread Gang He
Hello David,

If you want to use OCFS2 with Pacemaker stack, you do not need ocfs2_controld 
in the new version.
you do not need configure o2cb resource too.

I can give you a crm demo in SLE12SP3 environment (actually there is not any 
change since SLE12SP1)

crm(live/tb-node1)configure# show
node 1084784015: tb-node2
node 1084784039: tb-node1
node 1084784110: tb-node3
primitive dlm ocf:pacemaker:controld \
op monitor interval=60 timeout=60
primitive fs1 Filesystem \
params directory="/mnt/shared" fstype=ocfs2 device="/dev/sdb1" \
op start timeout=60s interval=0 \
op stop timeout=60s interval=0 \
op monitor interval=20s timeout=40s
primitive stonith-libvirt stonith:external/libvirt \
params hostlist="tb-node1,tb-node2,tb-node3" 
hypervisor_uri="qemu+tcp://192.168.125.1/system" \
op monitor interval=60 timeout=120 \
meta target-role=Started
group base-group dlm fs1
clone base-clone base-group \
meta interleave=true
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.17-3.1-36d2962a8 \
cluster-infrastructure=corosync \
cluster-name=hacluster \
stonith-enabled=true \
placement-strategy=balanced
rsc_defaults rsc-options: \
resource-stickiness=1 \
migration-threshold=3
op_defaults op-options: \
timeout=600 \
record-pending=true


Thanks
Gang



>>> 
> I'm trying to set up a 2 node cluster using OCFS2 with a Pacemaker and
> Corosync stack on Debian. I attempted to ocf:heartbeat:o2cb to satisfy
> the o2cb requirement of OCFS2, but found that the required daemon
> o2cb_controld.pcmk is not available for Debian because it was dependent
> on OpenAIS which is no longer part of Corosync. I've reviewed the
> relevant code for this daemon, but I am not familiar with the Corosync
> or OpenAIS APIs in order to make the necessary conversion. The relevant
> code is less than 200 lines long and can be found here:
> https://oss.oracle.com/git/gitweb.cgi?p=ocfs2-tools.git;a=blob;f=ocfs2_contro 
> ld/pacemaker.c;h=18f776a748ca4d39f06c9bad84c7faf5fe0c6910;hb=HEAD
> Can someone take a look at this code and tell me if it can be converted
> to Corosync, and if so point me in the direction of how to begin? Is
> Corosync CPG the replacement for OpenAIS? 
> 
> I'm able to get OCFS2 working with lsb:o2cb, but OCFS2 fails the
> ping_pong test provided with ctdb which is my ultimate goal here. From
> my understanding, o2cb must use o2cb_controld.pcmk in order for OCFS2 to
> function correctly in regards to ctdb. I obviously haven't been able to
> test this configuration due to the current OpenAIS requirement of
> o2cb_controld.pcmk. 
> 
> Thanks, 
> 
> David Ellingsworth


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Corosync OCFS2

2017-11-08 Thread David Ellingsworth
I'm trying to set up a 2 node cluster using OCFS2 with a Pacemaker and
Corosync stack on Debian. I attempted to ocf:heartbeat:o2cb to satisfy
the o2cb requirement of OCFS2, but found that the required daemon
o2cb_controld.pcmk is not available for Debian because it was dependent
on OpenAIS which is no longer part of Corosync. I've reviewed the
relevant code for this daemon, but I am not familiar with the Corosync
or OpenAIS APIs in order to make the necessary conversion. The relevant
code is less than 200 lines long and can be found here:
https://oss.oracle.com/git/gitweb.cgi?p=ocfs2-tools.git;a=blob;f=ocfs2_controld/pacemaker.c;h=18f776a748ca4d39f06c9bad84c7faf5fe0c6910;hb=HEAD
Can someone take a look at this code and tell me if it can be converted
to Corosync, and if so point me in the direction of how to begin? Is
Corosync CPG the replacement for OpenAIS? 

I'm able to get OCFS2 working with lsb:o2cb, but OCFS2 fails the
ping_pong test provided with ctdb which is my ultimate goal here. From
my understanding, o2cb must use o2cb_controld.pcmk in order for OCFS2 to
function correctly in regards to ctdb. I obviously haven't been able to
test this configuration due to the current OpenAIS requirement of
o2cb_controld.pcmk. 

Thanks, 

David Ellingsworth___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Antw: monitor operation for ASTERISK on node_name: 7 (not running)

2017-11-08 Thread Ulrich Windl
Hi!

It may be worth the time to find out which command is used to monitor the 
status of asterisk.
Then maybe run that command repeatedly in a shell loop to find out what it 
returns.

I guess some thing is slow to respond; maybe jsut increase the timeout...

Regards,
Ulrich

>>> Donat Zenichev  schrieb am 07.11.2017 um 17:07 in
Nachricht
:
> Hi guys.
> I've just started to work with pacemaker and have a problem with monitored
> service.
> 
> I've already configured three Active/Stand-by clusters with pacemaker.
> Running resources:
> IPaddr2
> asterisk daemon
> bacula fd
> snmp daemon
> 
> First and second cluster are working fine - I didn't notice any failures.
> But the third cluster fails too frequently.
> 
> Schema is similar for all clusters:
> Master (active) <-corosync-> Slave(Stand-by)
> 
> There is no difference between three cluster, apart server loading.
> Asterisk that is running on third cluster has 500+ customers and processes
> much more calls than others.
> 
> So that, cluster periodically thinks that asterisk is not running:
> * ASTERISK_monitor_2000 on node1-Master 'not running' (7): call=85,
> status=complete, exitreason='none',
> last-rc-change='Tue Nov  7 15:06:16 2017', queued=0ms, exec=0ms
> 
> And restarts it (because on-fail=restart parameter for asterisk primitive).
> But indeed asterisk is working fine and nothing happens with him.
> I parsed asterisk full log and found nothing, that can explain the behavior
> of pacemaker.
> 
> All machines are virtual (not containers, but proxmox VMs). They have
> enough resources, each has - 8 cores 3GHz, 8GB ram.
> I tried to increase resources on machines - I doubled them up, but it
> changed nothing.
> And it seemed to be that machine resources are not the root of the problem,
> resources monitoring showed that cores are not loaded more than 10%.
> 
> Configurations.
> 
> Corosync config:
> totem {
> version: 2
> cluster_name: asterisk
> 
> token: 1000
> token_retransmit: 31
> hold: 31
> 
> token_retransmits_before_loss_const: 0
> 
> clear_node_high_bit: yes
> 
> crypto_cipher: none
> 
> crypto_hash: none
> rrp_mode: active
> transport: udpu
> 
> interface {
> member {
> memberaddr: 10.100.1.1
> }
> member {
> memberaddr: 10.100.1.2
> }
> 
> ringnumber: 0
> bindnetaddr: 10.100.1.1
> 
> mcastport: 5405
> ttl: 1
> }
> }
> 
> quorum {
> provider: corosync_votequorum
> expected_votes: 2
> }
> 
> logging block is skipped.
> 
> 
> 
> Pacemaker config:
> 
> node 178676749: node1-Master
> node 178676750: node2-Slave
> primitive ASTERISK systemd:asterisk \
> op monitor interval=2s timeout=30s on-fail=restart \
> op start interval=0 timeout=30s \
> op stop interval=0 timeout=30s \
> meta migration-threshold=2 failure-timeout=1800s target-role=Started
> primitive BACULA systemd:bacula-fd \
> op monitor interval=30s timeout=60s on-fail=restart \
> op start interval=0 timeout=30s \
> op stop interval=0 timeout=30s \
> meta migration-threshold=2 failure-timeout=1800s
> primitive IPSHARED IPaddr2 \
> params ip=here.my.real.ip.address nic=ens18 cidr_netmask=29 \
> meta migration-threshold=2 target-role=Started \
> op monitor interval=20 timeout=60 on-fail=restart
> primitive SNMP systemd:snmpd \
> op monitor interval=30s timeout=60s on-fail=restart \
> op start interval=0 timeout=30s \
> op stop interval=0 timeout=30s \
> meta migration-threshold=2 failure-timeout=1800s target-role=Started
> order ASTERISK_AFTER_IPSHARED Mandatory: IPSHARED ASTERISK SNMP
> colocation ASTERISK_WITH_IPSHARED inf: ASTERISK IPSHARED
> location PREFER_BACULA BACULA 100: node1-Master
> location PREFER_MASTER ASTERISK 100: node1-Master
> location PREFER_SNMP SNMP 100: node1-Master
> property cib-bootstrap-options: \
> cluster-recheck-interval=5s \
> start-failure-is-fatal=false \
> stonith-enabled=false \
> no-quorum-policy=ignore \
> have-watchdog=false \
> dc-version=1.1.16-94ff4df \
> cluster-infrastructure=corosync \
> cluster-name=virtual2
> 
> 
> Asterisk systemd config:
> 
> [Unit]
> Description=Asterisk
> 
> [Service]
> ExecStart=/etc/init.d/asterisk start
> ExecStop=/etc/init.d/asterisk stop
> PIDFile=/var/run/asterisk/asterisk.pid
> 
> 
> 
> Corosync log:
> 
> Nov 07 15:06:16 [3958] node1-Master   crmd: info:
> process_lrm_event: Result
> of monitor operation for ASTERISK on node1-Master: 7 (not running) |
> call=85 key=ASTERISK_monitor_2000 confirmed=false cib-update=106
> Nov 07 15:06:16 [3953] node1-Mastercib: info:
> cib_process_request: Forwarding cib_modify operation for section status to
> all (origin=local/crmd/106)
> Nov 07 15:06:16 [3953] node1-Master 

Re: [ClusterLabs] Hawk vs pcs Web UI

2017-11-08 Thread Ken Gaillot
On Tue, 2017-11-07 at 10:30 -0600, David Kent wrote:
> Hello,
> 
> What is the difference between Hawk and the pcs Web UI? (Why are
> there two projects, not one? Are they targeting different use cases?)
> 
> I was surprised to see two Web UI implementations under the same
> ClusterLabs umbrella. I assume there's a reason, but I can't find it.
> I'm running Pacemaker on RHEL, so the pcs Web UI works out of the
> box. Hawk looks like it could run on RHEL with a little work building
> from source, but it's been tricky. I'm trying to figure out if it's
> worth the time, and what I might be missing out on.
> 
> Thanks,
> David

The Pacemaker project provides low-level command-line tools
(crm_resource, etc.) and C APIs, that anyone can use to build a front-
end. There are quite a few front-ends available, with the most common
being crm-shell and pcs CLIs and hawk and pcs GUIs.

The reason for that particular split is mainly historical. SuSE, which
adopted Pacemaker first, developed hawk. Red Hat, which adopted
Pacemaker later, developed pcs as an interface that would be more
comfortable to users transitioning from its earlier rgmanager-based
clusters.

There are advantages and disadvantages to having multiple front-ends.
It allows more experimentation and evolution, and tailoring interfaces
to specific use cases. On the other hand, it divides developers'
limited time, and makes how-to's less universal.

At the recent ClusterLabs Summit, we discussed consolidating the back-
ends of hawk/crm-shell/pcs. Since these tools are gaining the ability
to manage other cluster components such as corosync, booth, and sbd,
the Pacemaker back-end alone is not sufficient.

Pcs has its own back-end daemon, pcsd, that runs on all the nodes
(regardless of whether the cluster itself is running), and provides a
python API to manage the various cluster components. It also
coordinates configurations between all nodes in the cluster, and
authenticates pcs users. Most likely, pcsd would become the basis for
the common back-end. This would be mostly invisible to users, but it
would lessen the split in developer time, while still allowing
diversity of front-ends.

Choosing an interface boils down to distro and personal preference. The
easiest option is go with what's available in your OS's repositories
(some, such as Debian, provide both). If you have a strong preference,
you can always build your favorite yourself (which is less of an option
if you are using an enterprise distro and want everything supported).
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] monitor operation for ASTERISK on node_name: 7 (not running)

2017-11-08 Thread Donat Zenichev
Hi guys.
I've just started to work with pacemaker and have a problem with monitored
service.

I've already configured three Active/Stand-by clusters with pacemaker.
Running resources:
IPaddr2
asterisk daemon
bacula fd
snmp daemon

First and second cluster are working fine - I didn't notice any failures.
But the third cluster fails too frequently.

Schema is similar for all clusters:
Master (active) <-corosync-> Slave(Stand-by)

There is no difference between three cluster, apart server loading.
Asterisk that is running on third cluster has 500+ customers and processes
much more calls than others.

So that, cluster periodically thinks that asterisk is not running:
* ASTERISK_monitor_2000 on node1-Master 'not running' (7): call=85,
status=complete, exitreason='none',
last-rc-change='Tue Nov  7 15:06:16 2017', queued=0ms, exec=0ms

And restarts it (because on-fail=restart parameter for asterisk primitive).
But indeed asterisk is working fine and nothing happens with him.
I parsed asterisk full log and found nothing, that can explain the behavior
of pacemaker.

All machines are virtual (not containers, but proxmox VMs). They have
enough resources, each has - 8 cores 3GHz, 8GB ram.
I tried to increase resources on machines - I doubled them up, but it
changed nothing.
And it seemed to be that machine resources are not the root of the problem,
resources monitoring showed that cores are not loaded more than 10%.

Configurations.

Corosync config:
totem {
version: 2
cluster_name: asterisk

token: 1000
token_retransmit: 31
hold: 31

token_retransmits_before_loss_const: 0

clear_node_high_bit: yes

crypto_cipher: none

crypto_hash: none
rrp_mode: active
transport: udpu

interface {
member {
memberaddr: 10.100.1.1
}
member {
memberaddr: 10.100.1.2
}

ringnumber: 0
bindnetaddr: 10.100.1.1

mcastport: 5405
ttl: 1
}
}

quorum {
provider: corosync_votequorum
expected_votes: 2
}

logging block is skipped.



Pacemaker config:

node 178676749: node1-Master
node 178676750: node2-Slave
primitive ASTERISK systemd:asterisk \
op monitor interval=2s timeout=30s on-fail=restart \
op start interval=0 timeout=30s \
op stop interval=0 timeout=30s \
meta migration-threshold=2 failure-timeout=1800s target-role=Started
primitive BACULA systemd:bacula-fd \
op monitor interval=30s timeout=60s on-fail=restart \
op start interval=0 timeout=30s \
op stop interval=0 timeout=30s \
meta migration-threshold=2 failure-timeout=1800s
primitive IPSHARED IPaddr2 \
params ip=here.my.real.ip.address nic=ens18 cidr_netmask=29 \
meta migration-threshold=2 target-role=Started \
op monitor interval=20 timeout=60 on-fail=restart
primitive SNMP systemd:snmpd \
op monitor interval=30s timeout=60s on-fail=restart \
op start interval=0 timeout=30s \
op stop interval=0 timeout=30s \
meta migration-threshold=2 failure-timeout=1800s target-role=Started
order ASTERISK_AFTER_IPSHARED Mandatory: IPSHARED ASTERISK SNMP
colocation ASTERISK_WITH_IPSHARED inf: ASTERISK IPSHARED
location PREFER_BACULA BACULA 100: node1-Master
location PREFER_MASTER ASTERISK 100: node1-Master
location PREFER_SNMP SNMP 100: node1-Master
property cib-bootstrap-options: \
cluster-recheck-interval=5s \
start-failure-is-fatal=false \
stonith-enabled=false \
no-quorum-policy=ignore \
have-watchdog=false \
dc-version=1.1.16-94ff4df \
cluster-infrastructure=corosync \
cluster-name=virtual2


Asterisk systemd config:

[Unit]
Description=Asterisk

[Service]
ExecStart=/etc/init.d/asterisk start
ExecStop=/etc/init.d/asterisk stop
PIDFile=/var/run/asterisk/asterisk.pid



Corosync log:

Nov 07 15:06:16 [3958] node1-Master   crmd: info:
process_lrm_event: Result
of monitor operation for ASTERISK on node1-Master: 7 (not running) |
call=85 key=ASTERISK_monitor_2000 confirmed=false cib-update=106
Nov 07 15:06:16 [3953] node1-Mastercib: info:
cib_process_request: Forwarding cib_modify operation for section status to
all (origin=local/crmd/106)
Nov 07 15:06:16 [3953] node1-Mastercib: info: cib_perform_op: Diff:
--- 0.38.37 2
Nov 07 15:06:16 [3953] node1-Mastercib: info: cib_perform_op: Diff:
+++ 0.38.38 (null)
Nov 07 15:06:16 [3953] node1-Mastercib: info: cib_perform_op: +
/cib:  @num_updates=38
Nov 07 15:06:16 [3953] node1-Mastercib: info: cib_perform_op: +
/cib/status/node_state[@id='178676749']/lrm[@id='178676749']/lrm_resources/lrm_resource[@id='ASTERISK']/lrm_rsc_op[@id='ASTERISK_last_failure_0']:
@transition-key=2:29672:0:d96266b4-0e4d-4718-8af5-7b6e2edf4934,
@transition-magic=0:7;2:29672:0:d96266b4-0e4d-4718-8af5-7b6e2edf4934,
@call-id=85, @last-rc-change=1510059976
Nov 07 15:06:16 [3953] node1-Mastercib: info:

Re: [ClusterLabs] boothd-site/boothd-arbitrator: WARN: packet timestamp older than previous one

2017-11-08 Thread Nicolas Huillard
Le mardi 07 novembre 2017 à 13:41 +0100, Dejan Muhamedagic a écrit :
> Hi,
> 
> On Mon, Nov 06, 2017 at 10:52:12AM +0100, Nicolas Huillard wrote:
> > Hello,
> > 
> > I have many of those above syslog messages from boothd (counting
> > all servers, that's nearly 1 hundred per day).
> > All sites are synchronized using NTP, but according to source
> > (https://github.com/ClusterLabs/booth/blob/master/src/transport.c),
> > that specific messages isn't event tied to maxtimeskew (which I
> > forced to 120s, because I wondered it it wrongly defaulted to 0s).
> > This message is output unless the new timestamp is larger (>) that
> > the previous one from the same site).
> 
> Right.
> 
> > Adding debug on the arbitrator, it appears that every exchange
> > between the servers is made of at least 2 messages in each
> > direction. Could it be that two consecutive messages have the exact
> > same timestamp, and thus trigger the warning?
> 
> The time resolution used should be sufficiently fine (i.e.
> microseconds) so that the timestamps of two consecutive packets
> are not the same. At least I'd expect that to be so. Now, what is
> the actual time resolution on your platform? Maybe you should
> check the clocksource? Still, if these are not some special kind
> of computing platform, I suppose that there shouldn't be any
> surprises.

My servers are regular Intel platforms ("Intel(R) Atom(TM) CPU C2750 @
2.40GHz" and "Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz"), using Debian
stretch with kernel 4.9.51.

According to boot-time logs, TSC is selected as the clocksource:
kernel: [0.00] clocksource: refined-jiffies: mask: 0x 
max_cycles: 0x, max_idle_ns: 7645519600211568 ns
kernel: [0.00] clocksource: hpet: mask: 0x max_cycles: 
0x, max_idle_ns: 133484882848 ns
kernel: [1.512907] clocksource: jiffies: mask: 0x max_cycles: 
0x, max_idle_ns: 764504178510 ns
kernel: [1.905780] clocksource: Switched to clocksource hpet
kernel: [1.93] clocksource: acpi_pm: mask: 0xff max_cycles: 
0xff, max_idle_ns: 2085701024 ns
kernel: [3.861842] tsc: Refined TSC clocksource calibration: 2099.998 MHz
kernel: [3.861859] clocksource: tsc: mask: 0x max_cycles: 
0x1e452ea631d, max_idle_ns: 440795244572 ns
kernel: [5.345808] clocksource: Switched to clocksource tsc

https://superuser.com/questions/393969/what-does-clocksource-tsc-unstab
le-mean#393978
"In short, on modern systems, the TSC sucks for measuring time
accurately" and "There is no promise that the timestamp counters of
multiple CPUs on a single motherboard will be synchronized"
Oops: this means that the timestamps sent may not be increasing,
depending on which core boothd runs...
The CPUs are currently underutilised, which can lead to increased
discrepancy between cores TSCs.
I can either:
* switch to another clocksource (I don't yet know how to do that)
* lock boothd on a specific core (I don't know if I can do that)
* ignore these messages altogether (the next one re. "timestamp older
than skew" will still happen)

> How often are these messages logged, compared to the expire (or
> renewal_freq) time?

3 nodes, 5 min expire, 2 tickets = 1728 messages per day, vs. 108
messages on all host in the last 24h = 6% fail.
Apparently, the Xeon D fail more often than the Atom C.
I wonder why this problem is not more widely experienced (not that it's
a big problem).

> Thanks,
> 
> Dejan

Thanks for your work!

-- 
Nicolas Huillard

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org