[ClusterLabs] One cluster with two groups of nodes
Hi guys, nice to say hello here. I've been assigned with a very particular task: There's a pacemaker-based cluster with 6 nodes. A system runs on three nodes (group A), while the other three are hot-standby spares (group B). Resources from group A are never supposed to me relocated individually into nodes from group B. However, if any of the resources from group A fails, all resources must be relocated into group B. It's an "all or nothing" failover. Ideally, you would split the cluster into two clusters and implement Cluster-Sites and Tickets Management; however, it's not possible. Taking all this into account, can you kindly suggest an strategy for achieving the goal? I have some ideas but I'd like to hear from those who have a lot more experience than me. Thanks in advance, Alberto Mijares ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Corosync OCFS2
Hello David, If you want to use OCFS2 with Pacemaker stack, you do not need ocfs2_controld in the new version. you do not need configure o2cb resource too. I can give you a crm demo in SLE12SP3 environment (actually there is not any change since SLE12SP1) crm(live/tb-node1)configure# show node 1084784015: tb-node2 node 1084784039: tb-node1 node 1084784110: tb-node3 primitive dlm ocf:pacemaker:controld \ op monitor interval=60 timeout=60 primitive fs1 Filesystem \ params directory="/mnt/shared" fstype=ocfs2 device="/dev/sdb1" \ op start timeout=60s interval=0 \ op stop timeout=60s interval=0 \ op monitor interval=20s timeout=40s primitive stonith-libvirt stonith:external/libvirt \ params hostlist="tb-node1,tb-node2,tb-node3" hypervisor_uri="qemu+tcp://192.168.125.1/system" \ op monitor interval=60 timeout=120 \ meta target-role=Started group base-group dlm fs1 clone base-clone base-group \ meta interleave=true property cib-bootstrap-options: \ have-watchdog=false \ dc-version=1.1.17-3.1-36d2962a8 \ cluster-infrastructure=corosync \ cluster-name=hacluster \ stonith-enabled=true \ placement-strategy=balanced rsc_defaults rsc-options: \ resource-stickiness=1 \ migration-threshold=3 op_defaults op-options: \ timeout=600 \ record-pending=true Thanks Gang >>> > I'm trying to set up a 2 node cluster using OCFS2 with a Pacemaker and > Corosync stack on Debian. I attempted to ocf:heartbeat:o2cb to satisfy > the o2cb requirement of OCFS2, but found that the required daemon > o2cb_controld.pcmk is not available for Debian because it was dependent > on OpenAIS which is no longer part of Corosync. I've reviewed the > relevant code for this daemon, but I am not familiar with the Corosync > or OpenAIS APIs in order to make the necessary conversion. The relevant > code is less than 200 lines long and can be found here: > https://oss.oracle.com/git/gitweb.cgi?p=ocfs2-tools.git;a=blob;f=ocfs2_contro > ld/pacemaker.c;h=18f776a748ca4d39f06c9bad84c7faf5fe0c6910;hb=HEAD > Can someone take a look at this code and tell me if it can be converted > to Corosync, and if so point me in the direction of how to begin? Is > Corosync CPG the replacement for OpenAIS? > > I'm able to get OCFS2 working with lsb:o2cb, but OCFS2 fails the > ping_pong test provided with ctdb which is my ultimate goal here. From > my understanding, o2cb must use o2cb_controld.pcmk in order for OCFS2 to > function correctly in regards to ctdb. I obviously haven't been able to > test this configuration due to the current OpenAIS requirement of > o2cb_controld.pcmk. > > Thanks, > > David Ellingsworth ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Corosync OCFS2
I'm trying to set up a 2 node cluster using OCFS2 with a Pacemaker and Corosync stack on Debian. I attempted to ocf:heartbeat:o2cb to satisfy the o2cb requirement of OCFS2, but found that the required daemon o2cb_controld.pcmk is not available for Debian because it was dependent on OpenAIS which is no longer part of Corosync. I've reviewed the relevant code for this daemon, but I am not familiar with the Corosync or OpenAIS APIs in order to make the necessary conversion. The relevant code is less than 200 lines long and can be found here: https://oss.oracle.com/git/gitweb.cgi?p=ocfs2-tools.git;a=blob;f=ocfs2_controld/pacemaker.c;h=18f776a748ca4d39f06c9bad84c7faf5fe0c6910;hb=HEAD Can someone take a look at this code and tell me if it can be converted to Corosync, and if so point me in the direction of how to begin? Is Corosync CPG the replacement for OpenAIS? I'm able to get OCFS2 working with lsb:o2cb, but OCFS2 fails the ping_pong test provided with ctdb which is my ultimate goal here. From my understanding, o2cb must use o2cb_controld.pcmk in order for OCFS2 to function correctly in regards to ctdb. I obviously haven't been able to test this configuration due to the current OpenAIS requirement of o2cb_controld.pcmk. Thanks, David Ellingsworth___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: monitor operation for ASTERISK on node_name: 7 (not running)
Hi! It may be worth the time to find out which command is used to monitor the status of asterisk. Then maybe run that command repeatedly in a shell loop to find out what it returns. I guess some thing is slow to respond; maybe jsut increase the timeout... Regards, Ulrich >>> Donat Zenichevschrieb am 07.11.2017 um 17:07 in Nachricht : > Hi guys. > I've just started to work with pacemaker and have a problem with monitored > service. > > I've already configured three Active/Stand-by clusters with pacemaker. > Running resources: > IPaddr2 > asterisk daemon > bacula fd > snmp daemon > > First and second cluster are working fine - I didn't notice any failures. > But the third cluster fails too frequently. > > Schema is similar for all clusters: > Master (active) <-corosync-> Slave(Stand-by) > > There is no difference between three cluster, apart server loading. > Asterisk that is running on third cluster has 500+ customers and processes > much more calls than others. > > So that, cluster periodically thinks that asterisk is not running: > * ASTERISK_monitor_2000 on node1-Master 'not running' (7): call=85, > status=complete, exitreason='none', > last-rc-change='Tue Nov 7 15:06:16 2017', queued=0ms, exec=0ms > > And restarts it (because on-fail=restart parameter for asterisk primitive). > But indeed asterisk is working fine and nothing happens with him. > I parsed asterisk full log and found nothing, that can explain the behavior > of pacemaker. > > All machines are virtual (not containers, but proxmox VMs). They have > enough resources, each has - 8 cores 3GHz, 8GB ram. > I tried to increase resources on machines - I doubled them up, but it > changed nothing. > And it seemed to be that machine resources are not the root of the problem, > resources monitoring showed that cores are not loaded more than 10%. > > Configurations. > > Corosync config: > totem { > version: 2 > cluster_name: asterisk > > token: 1000 > token_retransmit: 31 > hold: 31 > > token_retransmits_before_loss_const: 0 > > clear_node_high_bit: yes > > crypto_cipher: none > > crypto_hash: none > rrp_mode: active > transport: udpu > > interface { > member { > memberaddr: 10.100.1.1 > } > member { > memberaddr: 10.100.1.2 > } > > ringnumber: 0 > bindnetaddr: 10.100.1.1 > > mcastport: 5405 > ttl: 1 > } > } > > quorum { > provider: corosync_votequorum > expected_votes: 2 > } > > logging block is skipped. > > > > Pacemaker config: > > node 178676749: node1-Master > node 178676750: node2-Slave > primitive ASTERISK systemd:asterisk \ > op monitor interval=2s timeout=30s on-fail=restart \ > op start interval=0 timeout=30s \ > op stop interval=0 timeout=30s \ > meta migration-threshold=2 failure-timeout=1800s target-role=Started > primitive BACULA systemd:bacula-fd \ > op monitor interval=30s timeout=60s on-fail=restart \ > op start interval=0 timeout=30s \ > op stop interval=0 timeout=30s \ > meta migration-threshold=2 failure-timeout=1800s > primitive IPSHARED IPaddr2 \ > params ip=here.my.real.ip.address nic=ens18 cidr_netmask=29 \ > meta migration-threshold=2 target-role=Started \ > op monitor interval=20 timeout=60 on-fail=restart > primitive SNMP systemd:snmpd \ > op monitor interval=30s timeout=60s on-fail=restart \ > op start interval=0 timeout=30s \ > op stop interval=0 timeout=30s \ > meta migration-threshold=2 failure-timeout=1800s target-role=Started > order ASTERISK_AFTER_IPSHARED Mandatory: IPSHARED ASTERISK SNMP > colocation ASTERISK_WITH_IPSHARED inf: ASTERISK IPSHARED > location PREFER_BACULA BACULA 100: node1-Master > location PREFER_MASTER ASTERISK 100: node1-Master > location PREFER_SNMP SNMP 100: node1-Master > property cib-bootstrap-options: \ > cluster-recheck-interval=5s \ > start-failure-is-fatal=false \ > stonith-enabled=false \ > no-quorum-policy=ignore \ > have-watchdog=false \ > dc-version=1.1.16-94ff4df \ > cluster-infrastructure=corosync \ > cluster-name=virtual2 > > > Asterisk systemd config: > > [Unit] > Description=Asterisk > > [Service] > ExecStart=/etc/init.d/asterisk start > ExecStop=/etc/init.d/asterisk stop > PIDFile=/var/run/asterisk/asterisk.pid > > > > Corosync log: > > Nov 07 15:06:16 [3958] node1-Master crmd: info: > process_lrm_event: Result > of monitor operation for ASTERISK on node1-Master: 7 (not running) | > call=85 key=ASTERISK_monitor_2000 confirmed=false cib-update=106 > Nov 07 15:06:16 [3953] node1-Mastercib: info: > cib_process_request: Forwarding cib_modify operation for section status to > all (origin=local/crmd/106) > Nov 07 15:06:16 [3953] node1-Master
Re: [ClusterLabs] Hawk vs pcs Web UI
On Tue, 2017-11-07 at 10:30 -0600, David Kent wrote: > Hello, > > What is the difference between Hawk and the pcs Web UI? (Why are > there two projects, not one? Are they targeting different use cases?) > > I was surprised to see two Web UI implementations under the same > ClusterLabs umbrella. I assume there's a reason, but I can't find it. > I'm running Pacemaker on RHEL, so the pcs Web UI works out of the > box. Hawk looks like it could run on RHEL with a little work building > from source, but it's been tricky. I'm trying to figure out if it's > worth the time, and what I might be missing out on. > > Thanks, > David The Pacemaker project provides low-level command-line tools (crm_resource, etc.) and C APIs, that anyone can use to build a front- end. There are quite a few front-ends available, with the most common being crm-shell and pcs CLIs and hawk and pcs GUIs. The reason for that particular split is mainly historical. SuSE, which adopted Pacemaker first, developed hawk. Red Hat, which adopted Pacemaker later, developed pcs as an interface that would be more comfortable to users transitioning from its earlier rgmanager-based clusters. There are advantages and disadvantages to having multiple front-ends. It allows more experimentation and evolution, and tailoring interfaces to specific use cases. On the other hand, it divides developers' limited time, and makes how-to's less universal. At the recent ClusterLabs Summit, we discussed consolidating the back- ends of hawk/crm-shell/pcs. Since these tools are gaining the ability to manage other cluster components such as corosync, booth, and sbd, the Pacemaker back-end alone is not sufficient. Pcs has its own back-end daemon, pcsd, that runs on all the nodes (regardless of whether the cluster itself is running), and provides a python API to manage the various cluster components. It also coordinates configurations between all nodes in the cluster, and authenticates pcs users. Most likely, pcsd would become the basis for the common back-end. This would be mostly invisible to users, but it would lessen the split in developer time, while still allowing diversity of front-ends. Choosing an interface boils down to distro and personal preference. The easiest option is go with what's available in your OS's repositories (some, such as Debian, provide both). If you have a strong preference, you can always build your favorite yourself (which is less of an option if you are using an enterprise distro and want everything supported). -- Ken Gaillot___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] monitor operation for ASTERISK on node_name: 7 (not running)
Hi guys. I've just started to work with pacemaker and have a problem with monitored service. I've already configured three Active/Stand-by clusters with pacemaker. Running resources: IPaddr2 asterisk daemon bacula fd snmp daemon First and second cluster are working fine - I didn't notice any failures. But the third cluster fails too frequently. Schema is similar for all clusters: Master (active) <-corosync-> Slave(Stand-by) There is no difference between three cluster, apart server loading. Asterisk that is running on third cluster has 500+ customers and processes much more calls than others. So that, cluster periodically thinks that asterisk is not running: * ASTERISK_monitor_2000 on node1-Master 'not running' (7): call=85, status=complete, exitreason='none', last-rc-change='Tue Nov 7 15:06:16 2017', queued=0ms, exec=0ms And restarts it (because on-fail=restart parameter for asterisk primitive). But indeed asterisk is working fine and nothing happens with him. I parsed asterisk full log and found nothing, that can explain the behavior of pacemaker. All machines are virtual (not containers, but proxmox VMs). They have enough resources, each has - 8 cores 3GHz, 8GB ram. I tried to increase resources on machines - I doubled them up, but it changed nothing. And it seemed to be that machine resources are not the root of the problem, resources monitoring showed that cores are not loaded more than 10%. Configurations. Corosync config: totem { version: 2 cluster_name: asterisk token: 1000 token_retransmit: 31 hold: 31 token_retransmits_before_loss_const: 0 clear_node_high_bit: yes crypto_cipher: none crypto_hash: none rrp_mode: active transport: udpu interface { member { memberaddr: 10.100.1.1 } member { memberaddr: 10.100.1.2 } ringnumber: 0 bindnetaddr: 10.100.1.1 mcastport: 5405 ttl: 1 } } quorum { provider: corosync_votequorum expected_votes: 2 } logging block is skipped. Pacemaker config: node 178676749: node1-Master node 178676750: node2-Slave primitive ASTERISK systemd:asterisk \ op monitor interval=2s timeout=30s on-fail=restart \ op start interval=0 timeout=30s \ op stop interval=0 timeout=30s \ meta migration-threshold=2 failure-timeout=1800s target-role=Started primitive BACULA systemd:bacula-fd \ op monitor interval=30s timeout=60s on-fail=restart \ op start interval=0 timeout=30s \ op stop interval=0 timeout=30s \ meta migration-threshold=2 failure-timeout=1800s primitive IPSHARED IPaddr2 \ params ip=here.my.real.ip.address nic=ens18 cidr_netmask=29 \ meta migration-threshold=2 target-role=Started \ op monitor interval=20 timeout=60 on-fail=restart primitive SNMP systemd:snmpd \ op monitor interval=30s timeout=60s on-fail=restart \ op start interval=0 timeout=30s \ op stop interval=0 timeout=30s \ meta migration-threshold=2 failure-timeout=1800s target-role=Started order ASTERISK_AFTER_IPSHARED Mandatory: IPSHARED ASTERISK SNMP colocation ASTERISK_WITH_IPSHARED inf: ASTERISK IPSHARED location PREFER_BACULA BACULA 100: node1-Master location PREFER_MASTER ASTERISK 100: node1-Master location PREFER_SNMP SNMP 100: node1-Master property cib-bootstrap-options: \ cluster-recheck-interval=5s \ start-failure-is-fatal=false \ stonith-enabled=false \ no-quorum-policy=ignore \ have-watchdog=false \ dc-version=1.1.16-94ff4df \ cluster-infrastructure=corosync \ cluster-name=virtual2 Asterisk systemd config: [Unit] Description=Asterisk [Service] ExecStart=/etc/init.d/asterisk start ExecStop=/etc/init.d/asterisk stop PIDFile=/var/run/asterisk/asterisk.pid Corosync log: Nov 07 15:06:16 [3958] node1-Master crmd: info: process_lrm_event: Result of monitor operation for ASTERISK on node1-Master: 7 (not running) | call=85 key=ASTERISK_monitor_2000 confirmed=false cib-update=106 Nov 07 15:06:16 [3953] node1-Mastercib: info: cib_process_request: Forwarding cib_modify operation for section status to all (origin=local/crmd/106) Nov 07 15:06:16 [3953] node1-Mastercib: info: cib_perform_op: Diff: --- 0.38.37 2 Nov 07 15:06:16 [3953] node1-Mastercib: info: cib_perform_op: Diff: +++ 0.38.38 (null) Nov 07 15:06:16 [3953] node1-Mastercib: info: cib_perform_op: + /cib: @num_updates=38 Nov 07 15:06:16 [3953] node1-Mastercib: info: cib_perform_op: + /cib/status/node_state[@id='178676749']/lrm[@id='178676749']/lrm_resources/lrm_resource[@id='ASTERISK']/lrm_rsc_op[@id='ASTERISK_last_failure_0']: @transition-key=2:29672:0:d96266b4-0e4d-4718-8af5-7b6e2edf4934, @transition-magic=0:7;2:29672:0:d96266b4-0e4d-4718-8af5-7b6e2edf4934, @call-id=85, @last-rc-change=1510059976 Nov 07 15:06:16 [3953] node1-Mastercib: info:
Re: [ClusterLabs] boothd-site/boothd-arbitrator: WARN: packet timestamp older than previous one
Le mardi 07 novembre 2017 à 13:41 +0100, Dejan Muhamedagic a écrit : > Hi, > > On Mon, Nov 06, 2017 at 10:52:12AM +0100, Nicolas Huillard wrote: > > Hello, > > > > I have many of those above syslog messages from boothd (counting > > all servers, that's nearly 1 hundred per day). > > All sites are synchronized using NTP, but according to source > > (https://github.com/ClusterLabs/booth/blob/master/src/transport.c), > > that specific messages isn't event tied to maxtimeskew (which I > > forced to 120s, because I wondered it it wrongly defaulted to 0s). > > This message is output unless the new timestamp is larger (>) that > > the previous one from the same site). > > Right. > > > Adding debug on the arbitrator, it appears that every exchange > > between the servers is made of at least 2 messages in each > > direction. Could it be that two consecutive messages have the exact > > same timestamp, and thus trigger the warning? > > The time resolution used should be sufficiently fine (i.e. > microseconds) so that the timestamps of two consecutive packets > are not the same. At least I'd expect that to be so. Now, what is > the actual time resolution on your platform? Maybe you should > check the clocksource? Still, if these are not some special kind > of computing platform, I suppose that there shouldn't be any > surprises. My servers are regular Intel platforms ("Intel(R) Atom(TM) CPU C2750 @ 2.40GHz" and "Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz"), using Debian stretch with kernel 4.9.51. According to boot-time logs, TSC is selected as the clocksource: kernel: [0.00] clocksource: refined-jiffies: mask: 0x max_cycles: 0x, max_idle_ns: 7645519600211568 ns kernel: [0.00] clocksource: hpet: mask: 0x max_cycles: 0x, max_idle_ns: 133484882848 ns kernel: [1.512907] clocksource: jiffies: mask: 0x max_cycles: 0x, max_idle_ns: 764504178510 ns kernel: [1.905780] clocksource: Switched to clocksource hpet kernel: [1.93] clocksource: acpi_pm: mask: 0xff max_cycles: 0xff, max_idle_ns: 2085701024 ns kernel: [3.861842] tsc: Refined TSC clocksource calibration: 2099.998 MHz kernel: [3.861859] clocksource: tsc: mask: 0x max_cycles: 0x1e452ea631d, max_idle_ns: 440795244572 ns kernel: [5.345808] clocksource: Switched to clocksource tsc https://superuser.com/questions/393969/what-does-clocksource-tsc-unstab le-mean#393978 "In short, on modern systems, the TSC sucks for measuring time accurately" and "There is no promise that the timestamp counters of multiple CPUs on a single motherboard will be synchronized" Oops: this means that the timestamps sent may not be increasing, depending on which core boothd runs... The CPUs are currently underutilised, which can lead to increased discrepancy between cores TSCs. I can either: * switch to another clocksource (I don't yet know how to do that) * lock boothd on a specific core (I don't know if I can do that) * ignore these messages altogether (the next one re. "timestamp older than skew" will still happen) > How often are these messages logged, compared to the expire (or > renewal_freq) time? 3 nodes, 5 min expire, 2 tickets = 1728 messages per day, vs. 108 messages on all host in the last 24h = 6% fail. Apparently, the Xeon D fail more often than the Atom C. I wonder why this problem is not more widely experienced (not that it's a big problem). > Thanks, > > Dejan Thanks for your work! -- Nicolas Huillard ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org