Re: [ClusterLabs] Using "mandatory" startup order but avoiding depending clones from restart after member of parent clone fails

2017-02-09 Thread Ken Gaillot
On 02/09/2017 08:50 AM, Alejandro Comisario wrote:
> Ken, thanks for yor reply.
> 
> Since in our setup, we use active/active mysql clone, so i think that
> order is the only way to ensure what i want.
> So, simple question, making order "Advisory", and tiking into
> consideration that "maybe" keystone starts before mysql, making it fail
> because of database connecrion.
> 
> If i set on the keystone clone (and all the dependant clones)
> on-fail="restart" for start and monitor actions (of course setting the
> cib option start-failure-is-fatal=false ) to make sure that if it fails,
> it will restart till everything is ok.
> 
> would that make sense to "workaround" that ?
> 
> best.

Yes, that would work. The default is to fail up to 1,000,000 times, then
it will stop retrying on that node. Of course, you can clean up the
failure to start over (or set a failure-timeout to do that automatically).

> On Thu, Feb 9, 2017 at 12:18 AM, Ken Gaillot  <mailto:kgail...@redhat.com>> wrote:
> 
> On 02/06/2017 05:25 PM, Alejandro Comisario wrote:
> > guys, really happy to post my first doubt.
> >
> > i'm kinda having an "conceptual" issue that's bringing me, lots of
> issues
> > i need to ensure that order of starting resources are mandatory but
> > that is causing me a huge issue, that is if just one of the members of
> > a clone goes down and up (but not all members) all resources depending
> > on it are restarted (wich is bad), my workaround is to set order as
> > advisory, but that doesnt asure strict order startup.
> >
> > eg. clone_b runs on servers_B, and depends on clone_a that runs on
> servers_A.
> >
> > I'll put an example on how i have everything defined between this
> two clones.
> >
> > ### clone_A running on servers A (location rule)
> > primitive p_mysql mysql-wss \
> > op monitor timeout=55 interval=60 enabled=true on-fail=restart \
> > op start timeout=475 interval=0 on-fail=restart \
> > op stop timeout=175 interval=0 \
> > params socket="/var/run/mysqld/mysqld.sock"
> > pid="/var/run/mysqld/mysqld.pid" test_passwd="XXX" test_user=root \
> > meta is-managed=true
> >
> > clone p_mysql-clone p_mysql \
> > meta target-role=Started interleave=false globally-unique=false
> >
> > location mysql_location p_mysql-clone resource-discovery=never \
> > rule -inf: galera ne 1
> >
> > ### clone_B running on servers B (location rule)
> > primitive p_keystone apache \
> > params configfile="/etc/apache2/apache2.conf" \
> > op monitor on-fail=restart interval=60s timeout=60s \
> > op start on-fail=restart interval=0 \
> > meta target-role=Started migration-threshold=2 failure-timeout=60s
> > resource-stickiness=300
> >
> > clone p_keystone-clone p_keystone \
> > meta target-role=Started interleave=false globally-unique=false
> >
> > location keystone_location p_keystone-clone resource-discovery=never \
> > rule -inf: keystone ne 1
> >
> > order p_clone-mysql-before-p_keystone INF: p_mysql-clone
> p_keystone-clone:start
> >
> > Again just to make my point, if p_mysql-clone looses even one member
> > of the clone, ONLY when that member gets back, all members of
> > p_keystone-clone gets restarted, and thats NOT what i need, so if i
> > change the order from mandatory to advisory, i get what i want
> > regarding behaviour of what happens when instances of the clone comes
> > and goes, but i loos the strictness of the startup order, which is
> > critial for me.
> >
> > How can i fix this problem ?
> > .. can i ?
> 
> I don't think pacemaker can model your desired situation currently.
> 
> In OpenStack configs that I'm familiar with, the mysql server (usually
> galera) is a master-slave clone, and the constraint used is "promote
> mysql then start keystone". That way, if a slave goes away and comes
> back, it has no effect.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Failed reload

2017-02-09 Thread Ken Gaillot
On 02/08/2017 02:15 AM, Ferenc Wágner wrote:
> Hi,
> 
> There was an interesting discussion on this list about "Doing reload
> right" last July (which I still haven't digested entirely).  Now I've
> got a related question about the current and intented behavior: what
> happens if a reload operation fails?  I found some suggestions in
> http://ocf.community.tummy.narkive.com/RngPlNfz/adding-reload-to-the-ocf-specification,
> from 11 years back, and the question wasn't clear cut at all.  Now I'm
> contemplating adding best-effort reloads to an RA, but not sure what
> behavior I can expect and depend on in the long run.  I'd be grateful
> for your insights.

Seeing this, it occurs to me that commit messages should have links to
relevant mailing list threads, it makes the reason for various choices
so much clearer :-)

As with any operation, a reload failure should be handled according to
its exit code and its on-fail attribute. (Well, any operation except
notify, as became apparent in another recent thread.)

By default, that means a restart (stop then start).

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Q (SLES11 SP4): Delay after node came up (info: throttle_send_command: New throttle mode: 0000 (was ffffffff))

2017-02-09 Thread Ken Gaillot
On 01/16/2017 04:25 AM, Ulrich Windl wrote:
> Hi!
> 
> I have a question: The following happened in out 3-node cluster (n1, n2, n3):
> n3 was DC, n2 was offlined, n2 came online again, n1 rebooted (went 
> offline/online), then n2 reboted (offline /online)
> 
> I observed a significant delay after all three nodes were online before 
> resources were started. Actualy the start seemed to be triggered by some crm 
> restart action on n3.
> 
> Logs on n3 (DC) look like this:
> cib: info: cib_process_request:  Completed cib_modify operation for 
> section status: OK (rc=0, origin=local/crmd/359, version=1.99.1)
> crmd:   notice: handle_request:   Current ping state: S_TRANSITION_ENGINE
> (...many more...)
> stonith-ng: info: plugin_handle_membership: Membership 3328: 
> quorum retained
> crmd: info: plugin_handle_membership: Membership 3328: quorum 
> retained
> [...]
> stonith-ng: info: plugin_handle_membership: Membership 3328: 
> quorum retained
> [...]
> cib: info: cib_process_request:  Completed cib_modify operation for 
> section status: OK (rc=0, origin=local/crmd/365, version=1.99.3)
> crmd: info: crmd_cs_dispatch: Setting expected votes to 3
> crmd: info: plugin_handle_membership: Membership 3328: quorum 
> retained
> [...]
> crmd: info: crmd_cs_dispatch: Setting expected votes to 3
> crmd: info: do_state_transition:  State transition 
> S_TRANSITION_ENGINE -> S_INTEGRATION [ input=I_NODE_JOIN cause=C_FSA_INTERNAL 
> origin=peer_update_callback ]
> crmd: info: do_dc_join_offer_one: An unknown node joined - (re-)offer 
> to any unconfirmed nodes
> (???what's that?)

This is normal when a node joins the cluster. The DC's cluster layer
detects any joins, and the DC's crmd responds to that by offering
membership to the new node(s).

> crmd: info: join_make_offer:  Making join offers based on membership 3328
> crmd: info: join_make_offer:  join-2: Sending offer to n2
> crmd: info: crm_update_peer_join: join_make_offer: Node n2[739512325] 
> - join-2 phase 0 -> 1
> crmd: info: join_make_offer:  Skipping n1: already known 4
> crmd: info: join_make_offer:  Skipping n3: already known 4

Above we can see that n1 and n3 already have confirmed membership, and
the newly joined n2 gets offered membership.

> crmd:   notice: abort_transition_graph:   Transition aborted: Peer Halt 
> (source=do_te_invoke:168, 0)

This is one of the common log messages I think can be improved. "Peer
Halt" in this case does not mean the peer halted, but rather that the
transition was halted due to a peer event (in this case the join).

> cib: info: cib_process_request:  Completed cib_modify operation for 
> section crm_config: OK (rc=0, origin=local/crmd/375, version=1.99.5)
> crmd: info: crm_update_peer_join: do_dc_join_filter_offer: Node 
> n2[739512325] - join-2 phase 1 -> 2
> crmd: info: crm_update_peer_expected: do_dc_join_filter_offer: 
> Node n2[739512325] - expected state is now member (was down)
> crmd: info: abort_transition_graph:   Transition aborted: Peer Halt 
> (source=do_te_invoke:168, 0)
> crmd: info: do_state_transition:  State transition S_INTEGRATION -> 
> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL 
> origin=check_join_state ]
> crmd: info: crmd_join_phase_log:  join-2: n2=integrated
> crmd: info: crmd_join_phase_log:  join-2: n1=confirmed
> crmd: info: crmd_join_phase_log:  join-2: n3=confirmed
> crmd:   notice: do_dc_join_finalize:  join-2: Syncing the CIB from n2 to 
> the rest of the cluster
> [...]
> cib: info: cib_process_replace:  Replaced 1.99.5 with 1.99.5 from n2
> cib: info: cib_process_request:  Completed cib_replace operation for 
> section 'all': OK (rc=0, origin=n2/crmd/376, version=1.99.5)
> crmd: info: crm_update_peer_join: finalize_join_for: Node 
> n2[739512325] - join-2 phase 2 -> 3
> crmd: info: do_log:   FSA: Input I_WAIT_FOR_EVENT from do_te_invoke() 
> received in state S_FINALIZE_JOIN
> crmd: info: abort_transition_graph:   Transition aborted: Peer Halt 
> (source=do_te_invoke:168, 0)
> [...]
> cib: info: cib_file_write_with_digest:   Wrote version 1.99.0 of the 
> CIB to disk (digest: 6e71ae6f4a1d2619cc64c91d40f55a32)
> (??? We already had 1.99.5)

Only .0's are written to disk -- the .x's contain updates to dynamic
information (like the status section) and are in-memory only.

> cib: info: cib_process_request:  Completed cib_modify operation for 
> section status: OK (rc=0, origin=n2/attrd/3, version=1.99.6)
> crmd: info: crm_update_peer_join: do_dc_join_ack: Node n2[739512325] 
> - join-2 phase 3 -> 4
> crmd: info: do_dc_join_ack:   join-2: Updating node state to member for n2
> [...]
> crmd:   notice: handle_request:   Current ping state: S_FINALIZE_JOIN
> crmd: info: do_log:   FSA: Input I_WAIT_FOR_EVENT from do_te_in

Re: [ClusterLabs] Pacemaker cluster not working after switching from 1.0 to 1.1

2017-02-09 Thread Ken Gaillot
On 01/16/2017 01:16 PM, Rick Kint wrote:
> 
>> Date: Mon, 16 Jan 2017 09:15:44 -0600
>> From: Ken Gaillot 
>> To: users@clusterlabs.org
>> Subject: Re: [ClusterLabs] Pacemaker cluster not working after
>> switching from 1.0 to 1.1 (resend as plain text)
>> Message-ID: 
>> Content-Type: text/plain; charset=utf-8
>>
>> A preliminary question -- what cluster layer are you running?
>>
>> Pacemaker 1.0 worked with heartbeat or corosync 1, while Ubuntu 14.04
>> ships with corosync 2 by default, IIRC. There were major incompatible
>> changes between corosync 1 and 2, so it's important to get that right
>> before looking at pacemaker.
>>
>> A general note, when making such a big jump in the pacemaker version,
>> I'd recommend running "cibadmin --upgrade" both before exporting 
>> the
>> configuration from 1.0, and again after deploying it on 1.1. This will
>> apply any transformations needed in the CIB syntax. Pacemaker will do
>> this on the fly, but doing it manually lets you see any issues early, as
>> well as being more efficient.
> 
> TL;DR
> - Thanks.
> - Cluster mostly works so I don't think it's a corosync issue.
> - Configuration XML is actually created with crm shell.
> - Is there a summary of changes from 1.0 to 1.1?
> 
> 
> Thanks for the quick reply.
> 
> 
> Corosync is v2.3.3. We've already been through the issues getting corosync 
> working. 
> 
> The cluster works in many ways:
> - Pacemaker sees both nodes.
> - Pacemaker starts all the resources.- Pacemaker promotes an instance of the 
> stateful Encryptor resource to Master/active.
> - If the node running the active Encryptor goes down, the standby Encryptor 
> is promoted and the DC changes.
> - Manual failover works (fiddling with the master-score attribute).
> 
> The problem is that a failure in one of the dependencies doesn't cause 
> promotion anymore.
> 
> 
> 
> 
> Thanks for the cibadmin command, I missed that when reading the docs.
> 
> I omitted some detail. I didn't export the XML from the old cluster to the 
> new cluster. We create the configuration with the crm shell, not with XML. 
> The sequence of events is
> 
> 
> - install corosync, pacemaker, etc.- apply local config file changes.
> - start corosync and pacemaker on both nodes in cluster.
> - verify that cluster is formed (crm_mon shows both nodes online, but no 
> resources).
> - create cluster by running script which passes a here document to the crm 
> shell.
> - verify that cluster is formed
> 
> 
> The crm shell version is "1.2.5+hg1034-1ubuntu4". I've checked the XML 
> against the "Pacemaker Configuration Explained" doc and it looks OK to my 
> admittedly non-knowledgeable eye.
> 
> I tried the cibadmin command in hopes that this might tell me something, but 
> it made no changes. "cib_verify --live-check" doesn't complain either.
> I copied the XML from a Pacemaker 1.0.X system to a Pacemaker 1.1.X system 
> and ran "cibadmin --upgrade" on it. Nothing changed there either. 
> 
> 
> 
> Is there a quick summary of changes from 1.0 to 1.1 somewhere? The "Pacemaker 
> 1.1 Configuration Explained" doc has a section entitled "What is new in 1.0" 
> but nothing for 1.1. I wouldn't be surprised if there is something obvious 
> that I'm missing and it would help if I could limit my search space.

No, there's just the change log, which is quite detailed.

There was no defining change from 1.0 to 1.1. Originally, it was planned
that 1.1 would be a "development" branch with new features, and 1.0
would be a "production" branch with bugfixes only. It proved too much
work to maintain two separate branches, so the 1.0 line was ended, and
1.1 became the sole production branch.

> I've done quite a bit of experimentation: changed the syntax of the 
> colocation constraints, added ordering constraints, and fiddled with 
> timeouts. When I was doing the port to Ubuntu, I tested resource agent exit 
> status but I'll go back and check that again. Any other suggestions?
> 
> 
> BTW, I've fixed some issues with the Pacemaker init script running on Ubuntu. 
> Should these go to Clusterlabs or the Debian/Ubuntu maintainer?

It depends on whether they're using the init script provided upstream,
or their own (which I suspect is more likely).

> CONFIGURATION
> 
> 
> Here's the configuration again, hopefully with indentation preserved this 
> time:
> 
> 
> 
>   
> id="cib-bootstrap-options-stonith-enabled"/>
> id="cib-bootstrap-o

Re: [ClusterLabs] Problems with corosync and pacemaker with error scenarios

2017-02-09 Thread Ken Gaillot
On 01/16/2017 11:18 AM, Gerhard Wiesinger wrote:
> Hello Ken,
> 
> thank you for the answers.
> 
> On 16.01.2017 16:43, Ken Gaillot wrote:
>> On 01/16/2017 08:56 AM, Gerhard Wiesinger wrote:
>>> Hello,
>>>
>>> I'm new to corosync and pacemaker and I want to setup a nginx cluster
>>> with quorum.
>>>
>>> Requirements:
>>> - 3 Linux maschines
>>> - On 2 maschines floating IP should be handled and nginx as a load
>>> balancing proxy
>>> - 3rd maschine is for quorum only, no services must run there
>>>
>>> Installed on all 3 nodes corosync/pacemaker, firewall ports openend are:
>>> 5404, 5405, 5406 for udp in both directions
>> If you're using firewalld, the easiest configuration is:
>>
>>firewall-cmd --permanent --add-service=high-availability
>>
>> If not, depending on what you're running, you may also want to open  TCP
>> ports 2224 (pcsd), 3121 (Pacemaker Remote), and 21064 (DLM).
> 
> I'm using shorewall on the lb01/lb02 nodes and firewalld on kvm01.
> 
> pcs status
> Cluster name: lbcluster
> Stack: corosync
> Current DC: lb01 (version 1.1.16-1.fc25-94ff4df) - partition with quorum
> Last updated: Mon Jan 16 16:46:52 2017
> Last change: Mon Jan 16 15:07:59 2017 by root via cibadmin on lb01
> 
> 3 nodes configured
> 40 resources configured
> 
> Online: [ kvm01 lb01 lb02 ]
> 
> Full list of resources:
> ...
> 
> Daemon Status:
>   corosync: active/enabled
>   pacemaker: active/enabled
>   pcsd: inactive/disabled
> 
> BTW: I'm not running pcsd, as far as I know it is for UI configuration
> only So ports ports 2224 (pcsd), 3121 (Pacemaker Remote), and 21064
> (DLM) are closed. Shouldn't be a problem, right?

pcs uses pcsd for most of its commands, so if you want to use pcs, it
should be enabled and allowed between nodes.

You don't have Pacemaker Remote nodes, so you can leave that port
closed. DLM is only necessary for certain resource types (such as clvmd).

>>> OS: Fedora 25
>>>
>>> Configuration of corosync (only the bindnetaddr is different on every
>>> maschine) and pacemaker below.
>> FYI you don't need a different bindnetaddr. You can (and generally
>> should) use the *network* address, which is the same on all hosts.
> 
> Only lb01 and lb02 are on the same network, kvm01 is on a different
> location and network therefore.

I'm not familiar with corosync nodes on the same ring using different
networks, but I suppose it's OK since you're using udpu, with ring0_addr
specified for each node.

>>> Configuration works so far but error test scenarios don't work like
>>> expected:
>>> 1.) I had cases in testing without qourum and quorum again where the
>>> cluster kept in Stopped state
>>>I had to restart the whole stack to get it online again (killall -9
>>> corosync;systemctl restart corosync;systemctl restart pacemaker)
>>>Any ideas?
>> It will be next to impossible to say without logs. It's definitely not
>> expected behavior. Stopping is the correct response to losing quorum;
>> perhaps quorum is not being properly restored for some reason. What is
>> your test methodology?
> 
> I had it when I rebooted just one node.
> 
> Testing scenarios are:
> *) Rebooting
> *) Starting/stopping corosync
> *) network down simulation on lb01/lb02
> *) putting an interface down with ifconfig eth1:1 down (simulation of
> loosing an IP address)
> *) see also below
> 
> Tested now again with all nodes up (I've configured 13 ip adresses for
> the sake of getting a faster overview I posted only the config for 2 ip
> adresses):
> No automatic recovery happens.
> e.g. ifconfig eth1:1 down
>  Resource Group: ClusterNetworking
>  ClusterIP_01   (ocf::heartbeat:IPaddr2):   FAILED lb02
>  ClusterIPRoute_01  (ocf::heartbeat:Route): FAILED lb02
>  ClusterIPRule_01   (ocf::heartbeat:Iprule):Started lb02
>  ClusterIP_02   (ocf::heartbeat:IPaddr2):   FAILED lb02
>  ClusterIPRoute_02  (ocf::heartbeat:Route): FAILED lb02 (blocked)
>  ClusterIPRule_02   (ocf::heartbeat:Iprule):Stopped
>  ClusterIP_03   (ocf::heartbeat:IPaddr2):   Stopped
>  ClusterIPRoute_03  (ocf::heartbeat:Route): Stopped
>  ClusterIPRule_03   (ocf::heartbeat:Iprule):Stopped
> ...
>  ClusterIP_13   (ocf::heartbeat:IPaddr2):   Stopped
>  ClusterIPRoute_13  (ocf::heartbeat:Route): Stopped
>  ClusterIPRule_13   (ocf::heartbeat:Iprule):Stopped
>  webserver  (ocf::heartbeat:nginx): Stopped
>

Re: [ClusterLabs] Trouble setting up selfcompiled Apache in a pacemaker cluster on Oracle Linux 6.8

2017-02-09 Thread Ken Gaillot
On 01/16/2017 10:16 AM, Souvignier, Daniel wrote:
> Hi List,
> 
>  
> 
> I’ve got trouble getting Apache to work in a Pacemaker cluster I set up
> between two Oracle Linux 6.8 hosts. The cluster itself works just fine,
> but Apache won’t come up. Thing is here, this Apache is different from a
> basic setup because it is selfcompiled and therefore living in
> /usr/local/apache2. Also it is the latest version available (2.4.25),
> which could also be causing problems. To be able to debug, I went into
> the file /usr/lib/ocf/resources.d/heartbeat/apache and „verbosed“ it by
> simply adding set –x. This way, I can extract the scripts output from
> the logfile /var/log/cluster/corosync.log, which I appended to this
> email (hopefully it won’t get filtered).
> 
>  
> 
> The command I used to invoke the apache script mentioned above was:
> 
> pcs resource create httpd ocf:heartbeat:apache
> configfile=/usr/local/apache2/conf/httpd.conf
> httpd=/usr/local/apache2/bin/httpd
> statusurl=http://localhost/server-status
> envfiles=/usr/local/apache2/bin/envvars op monitor interval=60s
> 
>  
> 
> Before you ask: the paths are correct and mod_status is also configured
> correctly (works fine when starting Apache manually). I should also add
> that the two nodes which form this cluster are virtual (Vmware vSphere)
> hosts and living in the same network (so no firewalling between them,
> there is a dedicated firewall just before the network). I assume that it
> has something to do with the handling of the pid file, but I couldn’t
> seem to fix it until now. I pointed Apache to create the pid file in
> /var/run/httpd.pid, but that didn’t work either. Suggestions on how to
> solv this? Thanks in advance!

Do you have SELinux enabled? If so, check /var/log/audit/audit.log for
denials.

It looks like the output you attached is from the cluster's initial
probe (one-time monitor operation), and not the start operation.

>  
> 
> Kind regards,
> 
> Daniel Souvignier
> 
>  
> 
> P.S.: If you need the parameters I compiled Apache with, I can tell you,
> but I don’t think that it is relevant here.
> 
>  
> 
> --
> 
> Daniel Souvignier
> 
>  
> 
> IT Center
> 
> Gruppe: Linux-basierte Anwendungen
> 
> Abteilung: Systeme und Betrieb
> 
> RWTH Aachen University
> 
> Seffenter Weg 23
> 
> 52074 Aachen
> 
> Tel.: +49 241 80-29267
> 
> souvign...@itc.rwth-aachen.de 
> 
> www.itc.rwth-aachen.de 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] two node cluster: vm starting - shutting down 15min later - starting again 15min later ... and so on

2017-02-09 Thread Ken Gaillot
On 02/09/2017 10:48 AM, Lentes, Bernd wrote:
> Hi,
> 
> i have a two node cluster with a vm as a resource. Currently i'm just testing 
> and playing. My vm boots and shuts down again in 15min gaps.
> Surely this is related to "PEngine Recheck Timer (I_PE_CALC) just popped 
> (90ms)" found in the logs. I googled, and it is said that this
> is due to time-based rule 
> (http://oss.clusterlabs.org/pipermail/pacemaker/2009-May/001647.html). OK.
> But i don't have any time-based rules.
> This is the config for my vm:
> 
> primitive prim_vm_mausdb VirtualDomain \
> params config="/var/lib/libvirt/images/xml/mausdb_vm.xml" \
> params hypervisor="qemu:///system" \
> params migration_transport=ssh \
> op start interval=0 timeout=90 \
> op stop interval=0 timeout=95 \
> op monitor interval=30 timeout=30 \
> op migrate_from interval=0 timeout=100 \
> op migrate_to interval=0 timeout=120 \
> meta allow-migrate=true \
> meta target-role=Started \
> utilization cpu=2 hv_memory=4099
> 
> The only constraint concerning the vm i had was a location (which i didn't 
> create).

What is the constraint? If its ID starts with "cli-", it was created by
a command-line tool (such as crm_resource, crm shell or pcs, generally
for a "move" or "ban" command).

> Ok, this timer is available, i can set it to zero to disable it.

The timer is used for multiple purposes; I wouldn't recommend disabling
it. Also, this doesn't fix the problem; the problem will still occur
whenever the cluster recalculates, just not on a regular time schedule.

> But why does it influence my vm in such a manner ?
> 
> Excerp from the log:
> 
> ...
> Feb  9 16:19:38 ha-idg-1 VirtualDomain(prim_vm_mausdb)[13148]: INFO: Domain 
> mausdb_vm already stopped.
> Feb  9 16:19:38 ha-idg-1 crmd[8407]:   notice: process_lrm_event: Operation 
> prim_vm_mausdb_stop_0: ok (node=ha-idg-1, call=401, rc=0, cib-update=340, 
> confirmed=true)
> Feb  9 16:19:38 ha-idg-1 kernel: [852506.947196] device vnet0 entered 
> promiscuous mode
> Feb  9 16:19:38 ha-idg-1 kernel: [852507.008770] br0: port 2(vnet0) entering 
> forwarding state
> Feb  9 16:19:38 ha-idg-1 kernel: [852507.008775] br0: port 2(vnet0) entering 
> forwarding state
> Feb  9 16:19:38 ha-idg-1 kernel: [852507.172120] qemu-kvm: sending ioctl 5326 
> to a partition!
> Feb  9 16:19:38 ha-idg-1 kernel: [852507.172133] qemu-kvm: sending ioctl 
> 80200204 to a partition!
> Feb  9 16:19:41 ha-idg-1 crmd[8407]:   notice: process_lrm_event: Operation 
> prim_vm_mausdb_start_0: ok (node=ha-idg-1, call=402, rc=0, cib-update=341, 
> confirmed=true)
> Feb  9 16:19:41 ha-idg-1 crmd[8407]:   notice: process_lrm_event: Operation 
> prim_vm_mausdb_monitor_3: ok (node=ha-idg-1, call=403, rc=0, 
> cib-update=342, confirmed=false)
> Feb  9 16:19:48 ha-idg-1 kernel: [852517.049015] vnet0: no IPv6 routers 
> present
> ...
> Feb  9 16:34:41 ha-idg-1 VirtualDomain(prim_vm_mausdb)[18272]: INFO: Issuing 
> graceful shutdown request for domain mausdb_vm.
> Feb  9 16:35:06 ha-idg-1 kernel: [853434.550089] br0: port 2(vnet0) entering 
> forwarding state
> Feb  9 16:35:06 ha-idg-1 kernel: [853434.550160] device vnet0 left 
> promiscuous mode
> Feb  9 16:35:06 ha-idg-1 kernel: [853434.550165] br0: port 2(vnet0) entering 
> disabled state
> Feb  9 16:35:06 ha-idg-1 ifdown: vnet0
> Feb  9 16:35:06 ha-idg-1 ifdown: Interface not available and no configuration 
> found.
> Feb  9 16:35:07 ha-idg-1 crmd[8407]:   notice: process_lrm_event: Operation 
> prim_vm_mausdb_stop_0: ok (node=ha-idg-1, call=405, rc=0, cib-update=343, 
> confirmed=true)
> ...
> 
> I deleted the location and until that vm is running fine for already 35min.

The logs don't go far back enough to have an idea why the VM was
stopped. Also, logs from the other node might be relevant, if it was the
DC (controller) at the time.

> System is SLES 11 SP4 64bit, vm is SLES 10 SP4 64bit.
> 
> Thanks.
> 
> Bernd

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] two node cluster: vm starting - shutting down 15min later - starting again 15min later ... and so on

2017-02-10 Thread Ken Gaillot
On 02/10/2017 06:49 AM, Lentes, Bernd wrote:
> 
> 
> - On Feb 10, 2017, at 1:10 AM, Ken Gaillot kgail...@redhat.com wrote:
> 
>> On 02/09/2017 10:48 AM, Lentes, Bernd wrote:
>>> Hi,
>>>
>>> i have a two node cluster with a vm as a resource. Currently i'm just 
>>> testing
>>> and playing. My vm boots and shuts down again in 15min gaps.
>>> Surely this is related to "PEngine Recheck Timer (I_PE_CALC) just popped
>>> (90ms)" found in the logs. I googled, and it is said that this
>>> is due to time-based rule
>>> (http://oss.clusterlabs.org/pipermail/pacemaker/2009-May/001647.html). OK.
>>> But i don't have any time-based rules.
>>> This is the config for my vm:
>>>
>>> primitive prim_vm_mausdb VirtualDomain \
>>> params config="/var/lib/libvirt/images/xml/mausdb_vm.xml" \
>>> params hypervisor="qemu:///system" \
>>> params migration_transport=ssh \
>>> op start interval=0 timeout=90 \
>>> op stop interval=0 timeout=95 \
>>> op monitor interval=30 timeout=30 \
>>> op migrate_from interval=0 timeout=100 \
>>> op migrate_to interval=0 timeout=120 \
>>> meta allow-migrate=true \
>>> meta target-role=Started \
>>> utilization cpu=2 hv_memory=4099
>>>
>>> The only constraint concerning the vm i had was a location (which i didn't
>>> create).
>>
>> What is the constraint? If its ID starts with "cli-", it was created by
>> a command-line tool (such as crm_resource, crm shell or pcs, generally
>> for a "move" or "ban" command).
>>
> I deleted the one i mentioned, but now i have two again. I didn't create them.
> Does the crm create constraints itself ?
> 
> location cli-ban-prim_vm_mausdb-on-ha-idg-2 prim_vm_mausdb role=Started -inf: 
> ha-idg-2
> location cli-prefer-prim_vm_mausdb prim_vm_mausdb role=Started inf: ha-idg-2

The command-line tool you use creates them.

If you're using crm_resource, they're created by crm_resource
--move/--ban. If you're using pcs, they're created by pcs resource
move/ban. Etc.

> One location constraint inf, one -inf for the same resource on the same node.
> Isn't that senseless ?

Yes, but that's what you told it to do :-)

The command-line tools move or ban resources by setting constraints to
achieve that effect. Those constraints are permanent until you remove them.

How to clear them again depends on which tool you use ... crm_resource
--clear, pcs resource clear, etc.

> 
> "crm resorce scores" show -inf for that resource on that node:
> native_color: prim_vm_mausdb allocation score on ha-idg-1: 100
> native_color: prim_vm_mausdb allocation score on ha-idg-2: -INFINITY
> 
> Is -inf stronger ?
> Is it true that only the values for "native_color" are notable ?
> 
> A principle question: When i have trouble to start/stop/migrate resources,
> is it senseful to do a "crm resource cleanup" before trying again ?
> (Beneath finding the reason for the trouble).

It's best to figure out what the problem is first, make sure that's
taken care of, then clean up. The cluster might or might not do anything
when you clean up, depending on what stickiness you have, your failure
handling settings, etc.

> Sorry for asking basic stuff. I read a lot before, but in practise it's total 
> different.
> Although i just have a vm as a resource, and i'm only testing, i'm sometimes 
> astonished about the 
> complexity of a simple two node cluster: scores, failcounts, constraints, 
> default values for a lot of variables ...
> you have to keep an eye on a lot of stuff.
> 
> Bernd
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons 
> Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] clone resource not get restarted on fail

2017-02-13 Thread Ken Gaillot
On 02/13/2017 07:57 AM, he.hailo...@zte.com.cn wrote:
> Pacemaker 1.1.10
> 
> Corosync 2.3.3
> 
> 
> this is a 3 nodes cluster configured with 3 clone resources, each
> attached wih a vip resource of IPAddr2:
> 
> 
> >crm status
> 
> 
> Online: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
> 
> 
>  router_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-1 
> 
>  sdclient_vip   (ocf::heartbeat:IPaddr2):   Started paas-controller-3 
> 
>  apigateway_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-2 
> 
>  Clone Set: sdclient_rep [sdclient]
> 
>  Started: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
> 
>  Clone Set: router_rep [router]
> 
>  Started: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
> 
>  Clone Set: apigateway_rep [apigateway]
> 
>  Started: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
> 
> 
> It is observed that sometimes the clone resource is stuck to monitor
> when the service fails:
> 
> 
>  router_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-1 
> 
>  sdclient_vip   (ocf::heartbeat:IPaddr2):   Started paas-controller-2 
> 
>  apigateway_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-3 
> 
>  Clone Set: sdclient_rep [sdclient]
> 
>  Started: [ paas-controller-1 paas-controller-2 ]
> 
>  Stopped: [ paas-controller-3 ]
> 
>  Clone Set: router_rep [router]
> 
>  router (ocf::heartbeat:router):Started
> paas-controller-3 FAILED 
> 
>  Started: [ paas-controller-1 paas-controller-2 ]
> 
>  Clone Set: apigateway_rep [apigateway]
> 
>  apigateway (ocf::heartbeat:apigateway):Started
> paas-controller-3 FAILED 
> 
>  Started: [ paas-controller-1 paas-controller-2 ]
> 
> 
> in the example above. the sdclient_rep get restarted on node 3, while
> the other two hang at monitoring on node 3, here are the ocf logs:
> 
> 
> abnormal (apigateway_rep):
> 
> 2017-02-13 18:27:53 [23586]===print_log test_monitor run_func main===
> Starting health check.
> 
> 2017-02-13 18:27:53 [23586]===print_log test_monitor run_func main===
> health check succeed.
> 
> 2017-02-13 18:27:55 [24010]===print_log test_monitor run_func main===
> Starting health check.
> 
> 2017-02-13 18:27:55 [24010]===print_log test_monitor run_func main===
> Failed: docker daemon is not running.
> 
> 2017-02-13 18:27:57 [24095]===print_log test_monitor run_func main===
> Starting health check.
> 
> 2017-02-13 18:27:57 [24095]===print_log test_monitor run_func main===
> Failed: docker daemon is not running.
> 
> 2017-02-13 18:27:59 [24159]===print_log test_monitor run_func main===
> Starting health check.
> 
> 2017-02-13 18:27:59 [24159]===print_log test_monitor run_func main===
> Failed: docker daemon is not running.
> 
> 
> normal (sdclient_rep):
> 
> 2017-02-13 18:27:52 [23507]===print_log sdclient_monitor run_func
> main=== health check succeed.
> 
> 2017-02-13 18:27:54 [23630]===print_log sdclient_monitor run_func
> main=== Starting health check.
> 
> 2017-02-13 18:27:54 [23630]===print_log sdclient_monitor run_func
> main=== Failed: docker daemon is not running.
> 
> 2017-02-13 18:27:55 [23710]===print_log sdclient_stop run_func main===
> Starting stop the container.
> 
> 2017-02-13 18:27:55 [23710]===print_log sdclient_stop run_func main===
> docker daemon lost, pretend stop succeed.
> 
> 2017-02-13 18:27:55 [23763]===print_log sdclient_start run_func main===
> Starting run the container.
> 
> 2017-02-13 18:27:55 [23763]===print_log sdclient_start run_func main===
> docker daemon lost, try again in 5 secs.
> 
> 2017-02-13 18:28:00 [23763]===print_log sdclient_start run_func main===
> docker daemon lost, try again in 5 secs.
> 
> 2017-02-13 18:28:05 [23763]===print_log sdclient_start run_func main===
> docker daemon lost, try again in 5 secs.
> 
> 
> If I disable 2 clone resource, the switch over test for one clone
> resource works as expected: fail the service -> monitor fails -> stop
> -> start
> 
> 
> Online: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
> 
> 
>  sdclient_vip   (ocf::heartbeat:IPaddr2):   Started paas-controller-2 
> 
>  Clone Set: sdclient_rep [sdclient]
> 
>  Started: [ paas-controller-1 paas-controller-2 ]
> 
>  Stopped: [ paas-controller-3 ]
> 
> 
> what's the reason behind 

Can you show the configuration of the three clones, their operations,
and any constraints?

Normally, the response is controlled by the monitor operation's on-fail
attribute (which defaults to restart).


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker kill does not cause node fault ???

2017-02-13 Thread Ken Gaillot
On 02/08/2017 02:45 AM, Ferenc Wágner wrote:
> Ken Gaillot  writes:
> 
>> On 02/03/2017 07:00 AM, RaSca wrote:
>>>
>>> On 03/02/2017 11:06, Ferenc Wágner wrote:
>>>> Ken Gaillot  writes:
>>>>
>>>>> On 01/10/2017 04:24 AM, Stefan Schloesser wrote:
>>>>>
>>>>>> I am currently testing a 2 node cluster under Ubuntu 16.04. The setup
>>>>>> seems to be working ok including the STONITH.
>>>>>> For test purposes I issued a "pkill -f pace" killing all pacemaker
>>>>>> processes on one node.
>>>>>>
>>>>>> Result:
>>>>>> The node is marked as "pending", all resources stay on it. If I
>>>>>> manually kill a resource it is not noticed. On the other node a drbd
>>>>>> "promote" command fails (drbd is still running as master on the first
>>>>>> node).
>>>>>
>>>>> I suspect that, when you kill pacemakerd, systemd respawns it quickly
>>>>> enough that fencing is unnecessary. Try "pkill -f pace; systemd stop
>>>>> pacemaker".
>>>>
>>>> What exactly is "quickly enough"?
>>>
>>> What Ken is saying is that Pacemaker, as a service managed by systemd,
>>> have in its service definition file
>>> (/usr/lib/systemd/system/pacemaker.service) this option:
>>>
>>> Restart=on-failure
>>>
>>> Looking at [1] it is explained: systemd restarts immediately the process
>>> if it ends for some unexpected reason (like a forced kill).
>>>
>>> [1] https://www.freedesktop.org/software/systemd/man/systemd.service.html
>>
>> And the cluster itself is resilient to some daemon restarts. If only
>> pacemakerd is killed, corosync and pacemaker's crmd can still function
>> without any issues. When pacemakerd respawns, it reestablishes contact
>> with any other cluster daemons still running (and its pacemakerd peers
>> on other cluster nodes).
> 
> KillMode=process looks like is a very important compenent of the service
> file then.  Probably worth commenting, especially its relation to
> Restart=on-failure (it also affects plain stop operations, of course).
> 
> But I still wonder how "quickly enough" could be quantified.  Have we
> got a timeout for this, or are we good while the cluster is quiescent,
> or maybe something else?

pacemakerd's main purpose is to monitor the other daemons and respawn
them if necessary. If systemd asks it to shut down, or if one of the
daemons exits with the "don't respawn" exit code, it will stop all
daemons. So if it's not running, nothing immediately happens that would
lead to fencing. But if another daemon dies, or if systemd is shutting
down the host, it can't do its job, and fencing might result.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Antw: Re: Pacemaker kill does not cause node fault ???

2017-02-13 Thread Ken Gaillot
On 02/08/2017 02:49 AM, Ferenc Wágner wrote:
> Ken Gaillot  writes:
> 
>> On 02/07/2017 01:11 AM, Ulrich Windl wrote:
>>
>>> Ken Gaillot  writes:
>>>
>>>> On 02/06/2017 03:28 AM, Ulrich Windl wrote:
>>>>
>>>>> Isn't the question: Is crmd a process that is expected to die (and
>>>>> thus need restarting)? Or wouldn't one prefer to debug this
>>>>> situation. I fear that restarting it might just cover some fatal
>>>>> failure...
>>>>
>>>> If crmd or corosync dies, the node will be fenced (if fencing is enabled
>>>> and working). If one of the crmd's persistent connections (such as to
>>>> the cib) fails, it will exit, so it ends up the same.
>>>
>>> But isn't it due to crmd not responding to network packets? So if the
>>> timeout is long enough, and crmd is started fast enough, will the
>>> node really be fenced?
>>
>> If crmd dies, it leaves its corosync process group, and I'm pretty sure
>> the other nodes will fence it for that reason, regardless of the duration.
> 
> See http://lists.clusterlabs.org/pipermail/users/2016-March/002415.html
> for a case when a Pacemaker cluster survived a crmd failure and restart.
> Re-reading the thread, I'm still unsure what saved our ass from
> resources being started in parallel and losing massive data.  I'd fully
> expect fencing in such cases...

Looking at that again, crmd leaving the process group isn't enough to be
fenced -- that should abort the transition and update the node state in
the CIB, but it's up to the (new) DC to determine that fencing is needed.

If crmd respawns quickly enough to join the election for the new DC
(which seemed to be the case here), it should just need to be re-probed.



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] 答复: Re: clone resource not get restarted on fail

2017-02-14 Thread Ken Gaillot
On 02/13/2017 07:08 PM, he.hailo...@zte.com.cn wrote:
> Hi,
> 
> 
> > crm configure show
> 
> + crm configure show
> 
> node $id="336855579" paas-controller-1
> 
> node $id="336855580" paas-controller-2
> 
> node $id="336855581" paas-controller-3
> 
> primitive apigateway ocf:heartbeat:apigateway \
> 
> op monitor interval="2s" timeout="20s" on-fail="restart" \
> 
> op stop interval="0" timeout="200s" on-fail="restart" \
> 
> op start interval="0" timeout="h" on-fail="restart"
> 
> primitive apigateway_vip ocf:heartbeat:IPaddr2 \
> 
> params ip="20.20.2.7" cidr_netmask="24" \
> 
> op start interval="0" timeout="20" \
> 
> op stop interval="0" timeout="20" \
> 
> op monitor timeout="20s" interval="2s" depth="0"
> 
> primitive router ocf:heartbeat:router \
> 
> op monitor interval="2s" timeout="20s" on-fail="restart" \
> 
> op stop interval="0" timeout="200s" on-fail="restart" \
> 
> op start interval="0" timeout="h" on-fail="restart"
> 
> primitive router_vip ocf:heartbeat:IPaddr2 \
> 
> params ip="10.10.1.7" cidr_netmask="24" \
> 
> op start interval="0" timeout="20" \
> 
> op stop interval="0" timeout="20" \
> 
> op monitor timeout="20s" interval="2s" depth="0"
> 
> primitive sdclient ocf:heartbeat:sdclient \
> 
> op monitor interval="2s" timeout="20s" on-fail="restart" \
> 
> op stop interval="0" timeout="200s" on-fail="restart" \
> 
> op start interval="0" timeout="h" on-fail="restart"
> 
> primitive sdclient_vip ocf:heartbeat:IPaddr2 \
> 
> params ip="10.10.1.8" cidr_netmask="24" \
> 
> op start interval="0" timeout="20" \
> 
> op stop interval="0" timeout="20" \
> 
> op monitor timeout="20s" interval="2s" depth="0"
> 
> clone apigateway_rep apigateway
> 
> clone router_rep router
> 
> clone sdclient_rep sdclient
> 
> location apigateway_loc apigateway_vip \
> 
> rule $id="apigateway_loc-rule" +inf: apigateway_workable eq 1
> 
> location router_loc router_vip \
> 
> rule $id="router_loc-rule" +inf: router_workable eq 1
> 
> location sdclient_loc sdclient_vip \
> 
> rule $id="sdclient_loc-rule" +inf: sdclient_workable eq 1
> 
> property $id="cib-bootstrap-options" \
> 
> dc-version="1.1.10-42f2063" \
> 
> cluster-infrastructure="corosync" \
> 
> stonith-enabled="false" \
> 
> no-quorum-policy="ignore" \
> 
> start-failure-is-fatal="false" \
> 
> last-lrm-refresh="1486981647"
> 
> op_defaults $id="op_defaults-options" \
> 
> on-fail="restart"
> 
> -
> 
> 
> and B.T.W, I am using "crm_attribute -N $HOSTNAME -q -l reboot --name
> <prefix>_workable -v <1 or 0>" in the monitor to update the
> transient attributes, which control the vip location.

Is there a reason not to use a colocation constraint instead? If X_vip
is colocated with X, it will be moved if X fails.

I don't see any reason in your configuration why the services wouldn't
be restarted. It's possible the cluster tried to restart the service,
but the stop action failed. Since you have stonith disabled, the cluster
can't recover from a failed stop action.

Is there a reason you disabled quorum? With 3 nodes, if they get split
into groups of 1 node and 2 nodes, quorum is what keeps the groups from
both starting all resources.

> and also found, the vip resource won't get moved if the related clone
> resource failed to restart.
> 
> 
> 原始邮件
> *发件人:*<kgail...@redhat.com>;
> *收件人:*<users@clusterlabs.org>;
> *日 期 :*2017年02月13日 23:04
> *主 题 :**Re: [ClusterLabs] clone resource not get restarted on fail*
> 
> 
> On 02/13/2017 07:57 AM, he.hailo...@zte.com.cn wrote:
> > Pacemaker 1.1.10
> > 
> > Corosync 2.3.3
> > 
> > 
> > this is a 3 nodes cluster configured with 3 clone resources, each
> > attached wih a vip resource of IPAddr2:
> > 
> > 
> > >crm status
> > 
> > 
> > Online: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
> > 
> > 
> >  router_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-1 
> > 
> >  sdclient_vip   (ocf::heartbeat:IPaddr2):   Started paas-controller-3 
> > 
> >  apigateway_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-2 
> > 
> >  Clone Set: sdclient_rep [sdclient]
> > 
> >  Started: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
> > 
> >  Clone Set: router_rep [router]
> > 
> >  Started: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
> > 
> >  Clone Set: apigateway_rep [apigateway]
> > 
> >  Started: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
> > 
> > 
> > It is observed that sometimes the clone resource is stuck to monitor
> > when the service fails:
> > 
> > 
> >  router_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-1 
> > 
> >  sdclient_vip   (ocf::heartbeat:IPaddr2):   

Re: [ClusterLabs] 答复: Re: 答复: Re: clone resource not get restarted on fail

2017-02-15 Thread Ken Gaillot
On 02/15/2017 03:57 AM, he.hailo...@zte.com.cn wrote:
> I just tried using colocation, it dosen't work.
> 
> 
> I failed the node paas-controller-3, but sdclient_vip didn't get moved:

The colocation would work, but the problem you're having with router and
apigateway is preventing it from getting that far. In other words,
router and apigateway are still running on the node (they have not been
successfully stopped), so the colocation is still valid.

I suspect that the return codes from your custom resource agents may be
the issue. Make sure that your agents conform to these guidelines:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#ap-ocf

In particular, "start" should not return until a monitor operation would
return success, "stop" should not return until a monitor would return
"not running", and "monitor" should return "not running" if called on a
host where the service hasn't started yet. Be sure you are returning the
proper OCF_* codes according to the table in the link above.

If the documentation is unclear, please ask here about anything you are
unsure of.

> 
> Online: [ paas-controller-1 paas-controller-2 paas-controller-3 ]
> 
> 
>  router_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-1 
> 
>  sdclient_vip   (ocf::heartbeat:IPaddr2):   Started paas-controller-3 
> 
>  apigateway_vip (ocf::heartbeat:IPaddr2):   Started paas-controller-2 
> 
>  Clone Set: sdclient_rep [sdclient]
> 
>  Started: [ paas-controller-1 paas-controller-2 ]
> 
>  Stopped: [ paas-controller-3 ]
> 
>  Clone Set: router_rep [router]
> 
>  router (ocf::heartbeat:router):Started
> paas-controller-3 FAILED 
> 
>  Started: [ paas-controller-1 paas-controller-2 ]
> 
>  Clone Set: apigateway_rep [apigateway]
> 
>  apigateway (ocf::heartbeat:apigateway):Started
> paas-controller-3 FAILED 
> 
>  Started: [ paas-controller-1 paas-controller-2 ]
> 
> 
> here is the configuration:
> 
> >crm configure show
> 
> node $id="336855579" paas-controller-1
> 
> node $id="336855580" paas-controller-2
> 
> node $id="336855581" paas-controller-3
> 
> primitive apigateway ocf:heartbeat:apigateway \
> 
> op monitor interval="2s" timeout="20s" on-fail="restart" \
> 
> op stop interval="0" timeout="200s" on-fail="restart" \
> 
> op start interval="0" timeout="h" on-fail="restart"
> 
> primitive apigateway_vip ocf:heartbeat:IPaddr2 \
> 
> params ip="20.20.2.7" cidr_netmask="24" \
> 
> op start interval="0" timeout="20" \
> 
> op stop interval="0" timeout="20" \
> 
> op monitor timeout="20s" interval="2s" depth="0"
> 
> primitive router ocf:heartbeat:router \
> 
> op monitor interval="2s" timeout="20s" on-fail="restart" \
> 
> op stop interval="0" timeout="200s" on-fail="restart" \
> 
> op start interval="0" timeout="h" on-fail="restart"
> 
> primitive router_vip ocf:heartbeat:IPaddr2 \
> 
> params ip="10.10.1.7" cidr_netmask="24" \
> 
> op start interval="0" timeout="20" \
> 
> op stop interval="0" timeout="20" \
> 
> op monitor timeout="20s" interval="2s" depth="0"
> 
> primitive sdclient ocf:heartbeat:sdclient \
> 
> op monitor interval="2s" timeout="20s" on-fail="restart" \
> 
> op stop interval="0" timeout="200s" on-fail="restart" \
> 
> op start interval="0" timeout="h" on-fail="restart"
> 
> primitive sdclient_vip ocf:heartbeat:IPaddr2 \
> 
> params ip="10.10.1.8" cidr_netmask="24" \
> 
> op start interval="0" timeout="20" \
> 
> op stop interval="0" timeout="20" \
> 
> op monitor timeout="20s" interval="2s" depth="0"
> 
> clone apigateway_rep apigateway
> 
> clone router_rep router
> 
> clone sdclient_rep sdclient
> 
> colocation apigateway_colo +inf: apigateway_vip apigateway_rep:Started
> 
> colocation router_colo +inf: router_vip router_rep:Started
> 
> colocation sdclient_colo +inf: sdclient_vip sdclient_rep:Started
> 
> property $id="cib-bootstrap-options" \
> 
> dc-version="1.1.10-42f2063" \
> 
> cluster-infrastructure="corosync" \
> 
> stonith-enabled="false" \
> 
> no-quorum-policy="stop" \
> 
> start-failure-is-fatal="false" \
> 
> last-lrm-refresh="1486981647"
> 
> op_defaults $id="op_defaults-options" \
> 
> on-fail="restart"
> 
> 
> 
> 
> 
> 原始邮件
> *发件人:*何海龙10164561
> *收件人:*<kgail...@redhat.com>;
> *抄送人:*<users@clusterlabs.org>;
> *日 期 :*2017年02月15日 10:54
> *主 题 :**答复: Re: 答复: Re: [ClusterLabs] clone resource not get
> restarted on fail*
> 
> 
> Is there a reason not to use a colocation constraint instead? If X_vip
> is colocated with X, it will be moved if X fails.
> 
> [hhl]: the movement should take place as well if X stopped (the start is
> on-going). I don't know if the colocation would satisfy this requirement.
> 
> I don't see any reason in your configuration why the services would

Re: [ClusterLabs] I question whether STONITH is working.

2017-02-15 Thread Ken Gaillot
On 02/15/2017 12:17 PM, dur...@mgtsciences.com wrote:
> I have 2 Fedora VMs (node1, and node2) running on a Windows 10 machine
> using Virtualbox.
> 
> I began with this.
> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Clusters_from_Scratch/
> 
> 
> When it came to fencing, I refered to this.
> http://www.linux-ha.org/wiki/SBD_Fencing
> 
> To the file /etc/sysconfig/sbd I added these lines.
> SBD_OPTS="-W"
> SBD_DEVICE="/dev/sdb1"
> I added 'modprobe softdog' to rc.local
> 
> After getting sbd working, I resumed with Clusters from Scratch, chapter
> 8.3.
> I executed these commands *only* one node1.  Am I suppose to run any of
> these commands on other nodes? 'Clusters from Scratch' does not specify.

Configuration commands only need to be run once. The cluster
synchronizes all changes across the cluster.

> pcs cluster cib stonith_cfg
> pcs -f stonith_cfg stonith create sbd-fence fence_sbd
> devices="/dev/sdb1" port="node2"

The above command creates a fence device configured to kill node2 -- but
it doesn't tell the cluster which nodes the device can be used to kill.
Thus, even if you try to fence node1, it will use this device, and node2
will be shot.

The pcmk_host_list parameter specifies which nodes the device can kill.
If not specified, the device will be used to kill any node. So, just add
pcmk_host_list=node2 here.

You'll need to configure a separate device to fence node1.

I haven't used fence_sbd, so I don't know if there's a way to configure
it as one device that can kill both nodes.

> pcs -f stonith_cfg property set stonith-enabled=true
> pcs cluster cib-push stonith_cfg
> 
> I then tried this command from node1.
> stonith_admin --reboot node2
> 
> Node2 did not reboot or even shutdown. the command 'sbd -d /dev/sdb1
> list' showed node2 as off, but I was still logged into it (cluster
> status on node2 showed not running).
> 
> I rebooted and ran this command on node 2 and started cluster.
> sbd -d /dev/sdb1 message node2 clear
> 
> If I ran this command on node2, node2 rebooted.
> stonith_admin --reboot node1
> 
> What have I missed or done wrong?
> 
> 
> Thank you,
> 
> Durwin F. De La Rue
> Management Sciences, Inc.
> 6022 Constitution Ave. NE
> Albuquerque, NM  87110
> Phone (505) 255-8611


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] MySQL Cluster: Strange behaviour when forcing movement of resources

2017-02-16 Thread Ken Gaillot
On 02/16/2017 02:26 AM, Félix Díaz de Rada wrote:
> 
> Hi all,
> 
> We are currently setting up a MySQL cluster (Master-Slave) over this
> platform:
> - Two nodes, on RHEL 7.0
> - pacemaker-1.1.10-29.el7.x86_64
> - corosync-2.3.3-2.el7.x86_64
> - pcs-0.9.115-32.el7.x86_64
> There is a IP address resource to be used as a "virtual IP".
> 
> This is configuration of cluster:
> 
> Cluster Name: webmobbdprep
> Corosync Nodes:
>  webmob1bdprep-ges webmob2bdprep-ges
> Pacemaker Nodes:
>  webmob1bdprep-ges webmob2bdprep-ges
> 
> Resources:
>  Group: G_MySQL_M
>   Meta Attrs: priority=100
>   Resource: MySQL_M (class=ocf provider=heartbeat type=mysql_m)
>Attributes:
> binary=/opt/mysql/mysql-5.7.17-linux-glibc2.5-x86_64/bin/mysqld_safe
> config=/data/webmob_prep/webmob_prep.cnf datadir=/data/webmob_prep
> log=/data/webmob_prep/webmob_prep.err
> pid=/data/webmob_prep/webmob_rep.pid
> socket=/data/webmob_prep/webmob_prep.sock user=mysql group=mysql
> test_table=replica.pacemaker_test test_user=root
>Meta Attrs: resource-stickiness=1000
>Operations: promote interval=0s timeout=120 (MySQL_M-promote-timeout-120)
>demote interval=0s timeout=120 (MySQL_M-demote-timeout-120)
>start interval=0s timeout=120s on-fail=restart
> (MySQL_M-start-timeout-120s-on-fail-restart)
>stop interval=0s timeout=120s (MySQL_M-stop-timeout-120s)
>monitor interval=60s timeout=30s OCF_CHECK_LEVEL=1
> (MySQL_M-monitor-interval-60s-timeout-30s)
>   Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
>Attributes: ip=172.18.64.44 nic=ens160:1 cidr_netmask=32
>Meta Attrs: target-role=Started migration-threshold=3
> failure-timeout=60s
>Operations: start interval=0s timeout=20s (ClusterIP-start-timeout-20s)
>stop interval=0s timeout=20s (ClusterIP-stop-timeout-20s)
>monitor interval=60s (ClusterIP-monitor-interval-60s)
>  Resource: MySQL_S (class=ocf provider=heartbeat type=mysql_s)
>   Attributes:
> binary=/opt/mysql/mysql-5.7.17-linux-glibc2.5-x86_64/bin/mysqld_safe
> config=/data/webmob_prep/webmob_prep.cnf datadir=/data/webmob_prep
> log=/data/webmob_prep/webmob_prep.err
> pid=/data/webmob_prep/webmob_rep.pid
> socket=/data/webmob_prep/webmob_prep.sock user=mysql group=mysql
> test_table=replica.pacemaker_test test_user=root
>   Meta Attrs: resource-stickiness=0
>   Operations: promote interval=0s timeout=120 (MySQL_S-promote-timeout-120)
>   demote interval=0s timeout=120 (MySQL_S-demote-timeout-120)
>   start interval=0s timeout=120s on-fail=restart
> (MySQL_S-start-timeout-120s-on-fail-restart)
>   stop interval=0s timeout=120s (MySQL_S-stop-timeout-120s)
>   monitor interval=60s timeout=30s OCF_CHECK_LEVEL=1
> (MySQL_S-monitor-interval-60s-timeout-30s)
> 
> Stonith Devices:
> Fencing Levels:
> 
> Location Constraints:
> Ordering Constraints:
>   start MySQL_M then start ClusterIP (Mandatory)
> (id:order-MySQL_M-ClusterIP-mandatory)
>   start G_MySQL_M then start MySQL_S (Mandatory)
> (id:order-G_MySQL_M-MySQL_S-mandatory)
> Colocation Constraints:
>   G_MySQL_M with MySQL_S (-100) (id:colocation-G_MySQL_M-MySQL_S-INFINITY)
> 
> Cluster Properties:
>  cluster-infrastructure: corosync
>  dc-version: 1.1.10-29.el7-368c726
>  last-lrm-refresh: 1487148812
>  no-quorum-policy: ignore
>  stonith-enabled: false
> 
> Pacemaker works as expected under most of situations, but there is one
> scenario that is really not understable to us. I will try to describe it:
> 
> a - Master resource (and Cluster IP address) are active on node 1 and
> Slave resource is active on node 2.
> b - We force movement of Master resource to node 2.
> c - Pacemaker stops all resources: Master, Slave and Cluster IP.
> d - Master resource and Cluster IP are started on node 2 (this is OK),
> but Slave also tries to start (??). It fails (logically, because Master
> resource has been started on the same node), it logs an "unknown error"
> and its state is marked as "failed". This is a capture of 'pcs status'
> at that point:
> 
> OFFLINE: [ webmob1bdprep-ges ]
> Online: [ webmob2bdprep-ges ]
> 
> Full list of resources:
> 
> Resource Group: G_MySQL_M
> MySQL_M (ocf::heartbeat:mysql_m): Started webmob2bdprep-ges
> ClusterIP (ocf::heartbeat:IPaddr2): Started webmob2bdprep-ges
> MySQL_S (ocf::heartbeat:mysql_s): FAILED webmob2bdprep-ges
> 
> Failed actions:
> MySQL_M_monitor_6 on webmob2bdprep-ges 'master' (8): call=62,
> status=complete, last-rc-change='Wed Feb 15 11:54:08 2017', queued=0ms,
> exec=0ms
> MySQL_S_start_0 on webmob2bdprep-ges 'unknown error' (1): call=78,
> status=complete, last-rc-change='Wed Feb 15 11:54:17 2017', queued=40ms,
> exec=0ms
> 
> PCSD Status:
> webmob1bdprep-ges: Offline
> webmob2bdprep-ges: Online
> 
> e - Pacemaker moves Slave resource to node 1 and starts it. Now we have
> both resources started again, Master on node 2 and Slave on node 1.
> f - One minu

Re: [ClusterLabs] question about equal resource distribution

2017-02-17 Thread Ken Gaillot
On 02/17/2017 08:43 AM, Ilia Sokolinski wrote:
> Thank you!
> 
> What quantity does pacemaker tries to equalize - number of running resources 
> per node or total stickiness per node?
> 
> Suppose I have a bunch of web server groups each with IPaddr and apache 
> resources, and a fewer number of database groups each with IPaddr, postgres 
> and LVM resources.
> 
> In that case, does it mean that 3 web server groups are weighted the same as 
> 2 database groups in terms of distribution?
> 
> Ilia

By default, pacemaker simply chooses the node with the fewest resources
when placing a resource (subject to your constraints, of course).
However you can have much more control if you want:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm139683960632560

> 
> 
> 
>> On Feb 17, 2017, at 2:58 AM, Kristoffer Grönlund  
>> wrote:
>>
>> Ilia Sokolinski  writes:
>>
>>> Suppose I have a N node cluster where N > 2 running m*N resources. 
>>> Resources don’t have preferred nodes, but since resources take RAM and CPU 
>>> it is important to distribute them equally among the nodes.
>>> Will pacemaker do the equal distribution, e.g. m resources per node?
>>> If a node fails, will pacemaker redistribute the resources equally too, 
>>> e.g. m * N/(N-1) per node?
>>>
>>> I don’t see any settings controlling this behavior in the documentation, 
>>> but perhaps, pacemaker tries to be “fair” by default.
>>>
>>
>> Yes, pacemaker tries to allocate resources evenly by default, and will
>> move resources when nodes fail in order to maintain that.
>>
>> There are several different mechanisms that influence this behaviour:
>>
>> * Any placement constraints in general influence where resources are
>>  allocated.
>>
>> * You can set resource-stickiness to a non-zero value which determines
>>  to which degree Pacemaker prefers to leave resources running where
>>  they are. The score is in relation to other placement scores, like
>>  constraint scores etc. This can be set for individual resources or
>>  globally. [1]
>>
>> * If you have an asymmetrical cluster, resources have to be manually
>>  allocated to nodes via constraints, see [2]
>>
>> [1]: 
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#s-resource-options
>> [2]: 
>> http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_asymmetrical_opt_in_clusters
>>
>> Cheers,
>> Kristoffer
>>
>>> Thanks 
>>>
>>> Ilia Sokolinski

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Need help in setting up HA cluster for applications/services other than Apache tomcat.

2017-02-20 Thread Ken Gaillot
On 02/18/2017 10:55 AM, Chad Cravens wrote:
> Hello Vijay:
> 
> it seems you may want to consider developing custom Resource Agents.
> Take a look at the following guide:
> http://www.linux-ha.org/doc/dev-guides/ra-dev-guide.html
> 
> I have created several, it is pretty straightforward and has always
> worked as expected. I would say one of the most important parts of
> creating a custom RA script is to make sure you have a good method for
> determining the state of a resource with monitor()
> 
> Good luck!

Agreed, a custom resource agent is the most flexible approach. More
details in addition to above link:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#ap-ocf

If your services already have LSB init scripts or systemd unit files,
Pacemaker can use them directly instead -- just configure the resource
as lsb:myscriptname or systemd:myunitname. Pacemaker will look for those
files in the usual system locations for such things. That doesn't have
as much flexibility as a custom agent, but if you already have them,
it's the easiest approach.

> On Fri, Feb 17, 2017 at 8:22 AM, vijay singh rathore
> mailto:vijayrathore.rji...@gmail.com>>
> wrote:
> 
> Hi Team,
> 
> Good Morning everyone, hope you all are doing great.
> 
> First of all I would like to apologise, if I have created
> inconvenience for team members by sending this mail.  
> 
> I have a question and i have tried almost all possible forums and
> googled a lot before reaching to this group for help.
> 
> I want to create HA cluster for applications/services which are not
> in tomcat or related to Apache or MySQL. Let's say they are written
> in different languages such as java, node js, c++, and deployed in
> certain path i.e. /home/xyz
> 
> How can i add these applications for high availability in HA cluster
> using pcs/pacemaker/corosync.
> 
> If I have to create resource for these applications how to create
> and if i have to use some other way, how can i implement it.
> 
> Requesting you to please provide me some suggestions or reference
> documents or links or anything which can help me in completing this
> task and to test fail over for these applications.
> 
> Thanks a lot in advance, have a great day and time ahead.
> 
> Best Regards
> Vijay Singh
> 
> ___
> Users mailing list: Users@clusterlabs.org 
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> 
> Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> 
> -- 
> Kindest Regards,
> Chad Cravens
> (843) 291-8340
> 
> http://www.ossys.com 
> http://www.linkedin.com/company/open-source-systems-llc
>   
> https://www.facebook.com/OpenSrcSys
>    https://twitter.com/OpenSrcSys
>   http://www.youtube.com/OpenSrcSys
>    http://www.ossys.com/feed
>    cont...@ossys.com 
> Chad Cravens
> (843) 291-8340
> chad.crav...@ossys.com 
> http://www.ossys.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Adding a node to the cluster deployed with "Unicast configuration"

2017-02-22 Thread Ken Gaillot
On 02/22/2017 08:44 AM, Alejandro Comisario wrote:
> Hi everyone, i have a problem when scaling a corosync/pacemaker
> cluster deployed using unicast.
> 
> eg on corosync.conf.
> 
> nodelist {
> node {
> ring0_addr: 10.10.0.10
> nodeid: 1
> }
> node {
> ring0_addr: 10.10.0.11
> nodeid: 2
> }
> }
> 
> Tried to add the new node with the new config (meaning, adding the new
> node) nad leaving the other two with the same config and started
> services on the third node, but doesnt seem to work until i update the
> config on the server #1 and #2 and restart corosync/pacemaker which
> does the obvious of bringing every resource down.
> 
> There should be a way to "hot add" a new node to the cluster, but i
> dont seem to find one.
> what is the best way to add a node without bothering the rest ? or
> better said, what is the right way to do it ?
> 
> PS: i can't implement multicast on my network.

AFAIK, in this situation, there's no way to avoid restarting corosync
(and thus pacemaker, too). But, you can put your cluster into
maintenance mode beforehand, so pacemaker will leave all your services
running, and re-detect them once it starts again. Once the cluster
status looks good again, you can leave maintenance mode.


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Adding a node to the cluster deployed with "Unicast configuration"

2017-02-22 Thread Ken Gaillot
On 02/22/2017 09:55 AM, Jan Friesse wrote:
> Alejandro Comisario napsal(a):
>> Hi everyone, i have a problem when scaling a corosync/pacemaker
>> cluster deployed using unicast.
>>
>> eg on corosync.conf.
>>
>> nodelist {
>> node {
>> ring0_addr: 10.10.0.10
>> nodeid: 1
>> }
>> node {
>> ring0_addr: 10.10.0.11
>> nodeid: 2
>> }
>> }
>>
>> Tried to add the new node with the new config (meaning, adding the new
>> node) nad leaving the other two with the same config and started
>> services on the third node, but doesnt seem to work until i update the
>> config on the server #1 and #2 and restart corosync/pacemaker which
>> does the obvious of bringing every resource down.
>>
>> There should be a way to "hot add" a new node to the cluster, but i
>> dont seem to find one.
>> what is the best way to add a node without bothering the rest ? or
>> better said, what is the right way to do it ?
> 
> Adding node is possible with following process:
> - Add node to config file on both existing nodes
> - Exec corosync-cfgtool -R (it's enough to exec only on one of nodes)
> - Make sure 3rd (new) node has same config as two existing nodes
> - Start corosync/pcmk on new node
> 
> This is (more or less) how pcs works. No stopping of corosync is needed.
> 
> Regards,
>   Honza

Ah, noted ... I need to update some upstream docs :-)

>>
>> PS: i can't implement multicast on my network.
>>

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Insert delay between the statup of VirtualDomain

2017-02-23 Thread Ken Gaillot
On 02/23/2017 01:51 PM, Oscar Segarra wrote:
> Hi, 
> 
> In my environment I have 5 guestes that have to be started up in a
> specified order starting for the MySQL database server.
> 
> I have set the order constraints and VirtualDomains start in the right
> order but, the problem I have, is that the second host starts up faster
> than the database server and therefore applications running on the
> second host raise errors due to database connectivity problems.
> 
> I'd like to introduce a delay between the startup of the VirtualDomain
> of the database server and the startup of the second guest.
> 
> ¿Is it any way to get this?
> 
> Thanks a lot.

One option would be to make the first VM a guest node, and make the
database server a cluster resource running on it. Then you could use
regular ordering constraints.


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Never join a list without a problem...

2017-02-24 Thread Ken Gaillot
On 02/24/2017 08:36 AM, Jeffrey Westgate wrote:
> Greetings all.
> 
> I have inherited a pair of Scientific Linux 6 boxes used as front-end load 
> balancers for our DNS cluster. (Yes, I inherited that, too.)
> 
> It was time to update them so we pulled snapshots (they are VMWare VMs, very 
> small, 1 cpu, 2G RAM, 10G disk), did a "yum update -y" watched everything 
> update, then rebooted.  Pacemaker kept the system from booting.
> Reverted to the snapshot, ran a "yum update -y --exclude=pacemaker\* " and 
> everything is hunky-dory.
> 
> # yum list pacemaker\*
> Installed Packages
> pacemaker.x86_64 1.1.10-14.el6
> @sl
> pacemaker-cli.x86_64 1.1.10-14.el6
> @sl
> pacemaker-cluster-libs.x86_641.1.10-14.el6
> @sl
> pacemaker-libs.x86_641.1.10-14.el6
> @sl
> Available Packages
> pacemaker.x86_64 1.1.14-8.el6_8.2 
> sl-security
> pacemaker-cli.x86_64 1.1.14-8.el6_8.2 
> sl-security
> pacemaker-cluster-libs.x86_641.1.14-8.el6_8.2 
> sl-security
> pacemaker-libs.x86_641.1.14-8.el6_8.2 
> sl-security
> 
> I searched clusterlabs.org looking for issues with updates, and came up empty.
> 
> # cat /etc/redhat-release
> Scientific Linux release 6.5 (Carbon)
> 
> ... is there something post-install/pre reboot that I need to do?
> 
> 
> --
> Jeff Westgate
> UNIX/Linux System Administrator
> Arkansas Dept. of Information Systems

Welcome! I joined the list with a problem, too, and now I'm technical
lead for the project, so be prepared ... ;-)

I don't know of any issues that would cause problems in that upgrade,
much less prevent a boot. Try disabling pacemaker at boot, doing the
upgrade, and then starting pacemaker, and pastebin any relevant messages
from /var/log/cluster/corosync.log.

If you're on SL 6, you should be using CMAN as the underlying cluster
layer. If you're using the corosync 1 pacemaker plugin, that's not well
tested on that platform.

Some general tips:

* You can run crm_verify (with either --live-check on a running cluster,
or -x /var/lib/pacemaker/cib/cib.xml on a stopped one) before and after
the upgrade to make sure you don't have any unaddressed configuration
issues.

* You can also run cibadmin --upgrade before and after the upgrade, to
make sure your configuration is using the latest schema.

It shouldn't prevent a boot if they're not done, but that may help
uncover any issues.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Ordering Sets of Resources

2017-02-26 Thread Ken Gaillot
On 02/25/2017 03:35 PM, iva...@libero.it wrote:
> Hi all,
> i have configured a two node cluster on redhat 7.
> 
> Because I need to manage resources stopping and starting singularly when
> they are running I have configured cluster using order set constraints.
> 
> Here the example
> 
> Ordering Constraints:
>   Resource Sets:
> set MYIP_1 MYIP_2 MYFTP MYIP_5 action=start sequential=false
> require-all=true set MYIP_3 MYIP_4 MYSMTP action=start sequential=true
> require-all=true setoptions symmetrical=false
> set MYSMTP MYIP_4 MYIP_3 action=stop sequential=true
> require-all=true set MYIP_5 MYFTP MYIP_2 MYIP_1 action=stop
> sequential=true require-all=true setoptions symmetrical=false kind=Mandatory
> 
> The constrait work as expected on start but when stopping the resource
> don't respect the order.
> Any help is appreciated
> 
> Thank and regards
> Ivan

symmetrical=false means the order only applies for starting


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] R: Re: Ordering Sets of Resources

2017-02-27 Thread Ken Gaillot
On 02/26/2017 02:45 PM, iva...@libero.it wrote:
> Hi,
> yes, I want, with active resources, that I can turn them off individually.
> With symmetrical=true when i stop a resource, for example MYIP_4, also MYSMTP 
> will stop.
> 
> Ragards
> Ivan

I thought that was the goal, to ensure that things are stopped in order.

If your goal is to start and stop them in order *if* they're both
starting or stopping, but not *require* it, then you want kind=Optional
instead of Mandatory.

> 
> 
>> ----Messaggio originale
>> Da: "Ken Gaillot" 
>> Data: 26/02/2017 20.04
>> A: 
>> Ogg: Re: [ClusterLabs] Ordering Sets of Resources
>>
>> On 02/25/2017 03:35 PM, iva...@libero.it wrote:
>>> Hi all,
>>> i have configured a two node cluster on redhat 7.
>>>
>>> Because I need to manage resources stopping and starting singularly when
>>> they are running I have configured cluster using order set constraints.
>>>
>>> Here the example
>>>
>>> Ordering Constraints:
>>>   Resource Sets:
>>> set MYIP_1 MYIP_2 MYFTP MYIP_5 action=start sequential=false
>>> require-all=true set MYIP_3 MYIP_4 MYSMTP action=start sequential=true
>>> require-all=true setoptions symmetrical=false
>>> set MYSMTP MYIP_4 MYIP_3 action=stop sequential=true
>>> require-all=true set MYIP_5 MYFTP MYIP_2 MYIP_1 action=stop
>>> sequential=true require-all=true setoptions symmetrical=false 
> kind=Mandatory
>>>
>>> The constrait work as expected on start but when stopping the resource
>>> don't respect the order.
>>> Any help is appreciated
>>>
>>> Thank and regards
>>> Ivan
>>
>> symmetrical=false means the order only applies for starting

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Never join a list without a problem...

2017-02-27 Thread Ken Gaillot
On 02/27/2017 01:48 PM, Jeffrey Westgate wrote:
> I think I may be on to something.  It seems that every time my boxes start 
> showing increased host load, the preceding change that takes place is:
> 
>  crmd: info: throttle_send_command:   New throttle mode: 0100 (was 
> )
> 
> I'm attaching the last 50-odd lines from the corosync.log.  It just happens 
> that  - at the moment - our host load on this box is coming back down.  No 
> host load issue (0.00 load) immediately preceding this part of the log.
> 
> I know the log shows them in reverse order, but it shows them as the same log 
> item, and printed at the same time.  I'm assuming the throttle change takes 
> place and that increases the load, not the other way around
> 
> So - what is the throttle mode?
> 
> --
> Jeff Westgate
> DIS UNIX/Linux System Administrator

Actually it is the other way around. When Pacemaker detects high load on
a node, it "throttles" by reducing the number of operations it will
execute concurrently (to avoid making a bad situation worse).

So, what caused the load to go up is still a mystery.

There have been some cases where corosync started using 100% CPU, but
since you mentioned that processes aren't taking any more CPU, it
doesn't sound like the same issue.

> --
> Message: 3
> Date: Mon, 27 Feb 2017 13:26:30 +
> From: Jeffrey Westgate 
> To: "users@clusterlabs.org" 
> Subject: Re: [ClusterLabs] Never join a list without a problem...
> Message-ID:
> 
> 
> 
> Content-Type: text/plain; charset="us-ascii"
> 
> Thanks, Ken.
> 
> Our late guru was the admin who set all this up, and it's been rock solid 
> until recent oddities started cropping up.  They still function fine - 
> they've just developed some... quirks.
> 
> I found the solution before I got your reply, which was essentially what we 
> did; update all but pacemaker, reboot, stop pacemaker, update pacemaker, 
> reboot.  That process was necessary because they've been running sooo long, 
> pacemaker would not stop.  it would try, then seemingly stall after several 
> minutes.
> 
> We're good now, up-to-date-wise, and stuck only with the initial issue we 
> were hoping to eliminate by updating/patching EVERYthing.  And we honestly 
> don't know what may be causing it.
> 
> We use Nagios to monitor, and once every 20 to 40 hours - sometimes longer, 
> and we cannot set a clock by it - while the machine is 95% idle (or more 
> according to 'top'), the host load shoots up to 50 or 60%.  It takes about 20 
> minutes to peak, and another 30 to 45 minutes to come back down to baseline, 
> which is mostly 0.00.  (attached hostload.pdf)  This happens to both 
> machines, randomly, and is concerning, as we'd like to find what's causing it 
> and resolve it.
> 
> We were hoping "uptime kernel bug", but patching has not helped.  There seems 
> to be no increase in the number of processes running, and the processes 
> running do not take any more cpu time.  They are DNS forwarding resolvers, 
> but there is no correlation between dns requests and load increase - 
> sometimes (like this morning) it rises around 1 AM when the dns load is 
> minimal.
> 
> The oddity is - these are the only two boxes with this issue, and we have a 
> couple dozen at the same OS and level.  Only these two, with this role and 
> this particular package set have the issue.
> 
> --
> Jeff

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Ordering Sets of Resources

2017-03-01 Thread Ken Gaillot
On 03/01/2017 01:36 AM, Ulrich Windl wrote:
>>>> Ken Gaillot  schrieb am 26.02.2017 um 20:04 in 
>>>> Nachricht
> :
>> On 02/25/2017 03:35 PM, iva...@libero.it wrote:
>>> Hi all,
>>> i have configured a two node cluster on redhat 7.
>>>
>>> Because I need to manage resources stopping and starting singularly when
>>> they are running I have configured cluster using order set constraints.
>>>
>>> Here the example
>>>
>>> Ordering Constraints:
>>>   Resource Sets:
>>> set MYIP_1 MYIP_2 MYFTP MYIP_5 action=start sequential=false
>>> require-all=true set MYIP_3 MYIP_4 MYSMTP action=start sequential=true
>>> require-all=true setoptions symmetrical=false
>>> set MYSMTP MYIP_4 MYIP_3 action=stop sequential=true
>>> require-all=true set MYIP_5 MYFTP MYIP_2 MYIP_1 action=stop
>>> sequential=true require-all=true setoptions symmetrical=false kind=Mandatory
>>>
>>> The constrait work as expected on start but when stopping the resource
>>> don't respect the order.
>>> Any help is appreciated
>>>
>>> Thank and regards
>>> Ivan
>>
>> symmetrical=false means the order only applies for starting
> 
> From the name (symmetrical) alone it could also mean that it only applies for 
> stopping ;-)
> (Another example where better names would be nice)

Well, more specifically, it only applies to the action specified in the
constraint. I hadn't noticed before that the second constraint here has
action=stop, so yes, that one would only apply for stopping.

In the above example, the two constraints are identical to a single
constraint with symmetrical=true, since the second constraint is just
the reverse of the first.


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Cannot clone clvmd resource

2017-03-01 Thread Ken Gaillot
On 03/01/2017 03:49 PM, Anne Nicolas wrote:
> Hi there
> 
> 
> I'm testing quite an easy configuration to work on clvm. I'm just
> getting crazy as it seems clmd cannot be cloned on other nodes.
> 
> clvmd start well on node1 but fails on both node2 and node3.

Your config looks fine, so I'm going to guess there's some local
difference on the nodes.

> In pacemaker journalctl I get the following message
> Mar 01 16:34:36 node3 pidofproc[27391]: pidofproc: cannot stat /clvmd:
> No such file or directory
> Mar 01 16:34:36 node3 pidofproc[27392]: pidofproc: cannot stat
> /cmirrord: No such file or directory

I have no idea where the above is coming from. pidofproc is an LSB
function, but (given journalctl) I'm assuming you're using systemd. I
don't think anything in pacemaker or resource-agents uses pidofproc (at
least not currently, not sure about the older version you're using).

> Mar 01 16:34:36 node3 lrmd[2174]: notice: finished - rsc:p-clvmd
> action:stop call_id:233 pid:27384 exit-code:0 exec-time:45ms queue-time:0ms
> Mar 01 16:34:36 node3 crmd[2177]: notice: Operation p-clvmd_stop_0: ok
> (node=node3, call=233, rc=0, cib-update=541, confirmed=true)
> Mar 01 16:34:36 node3 crmd[2177]: notice: Initiating action 72: stop
> p-dlm_stop_0 on node3 (local)
> Mar 01 16:34:36 node3 lrmd[2174]: notice: executing - rsc:p-dlm
> action:stop call_id:235
> Mar 01 16:34:36 node3 crmd[2177]: notice: Initiating action 67: stop
> p-dlm_stop_0 on node2
> 
> Here is my configuration
> 
> node 739312139: node1
> node 739312140: node2
> node 739312141: node3
> primitive admin_addr IPaddr2 \
> params ip=172.17.2.10 \
> op monitor interval=10 timeout=20 \
> meta target-role=Started
> primitive p-clvmd ocf:lvm2:clvmd \
> op start timeout=90 interval=0 \
> op stop timeout=100 interval=0 \
> op monitor interval=30 timeout=90
> primitive p-dlm ocf:pacemaker:controld \
> op start timeout=90 interval=0 \
> op stop timeout=100 interval=0 \
> op monitor interval=60 timeout=90
> primitive stonith-sbd stonith:external/sbd
> group g-clvm p-dlm p-clvmd
> clone c-clvm g-clvm meta interleave=true
> property cib-bootstrap-options: \
> have-watchdog=true \
> dc-version=1.1.13-14.7-6f22ad7 \
> cluster-infrastructure=corosync \
> cluster-name=hacluster \
> stonith-enabled=true \
> placement-strategy=balanced \
> no-quorum-policy=freeze \
> last-lrm-refresh=1488404073
> rsc_defaults rsc-options: \
> resource-stickiness=1 \
> migration-threshold=10
> op_defaults op-options: \
> timeout=600 \
> record-pending=true
> 
> Thanks in advance for your input
> 
> Cheers
> 


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] R: Re: Antw: Re: Ordering Sets of Resources

2017-03-01 Thread Ken Gaillot
On 03/01/2017 03:22 PM, iva...@libero.it wrote:
> You are right, but i had to use option symmetrical=false because i need to 
> stop, when all resources are running, even the single primitive with no 
> impact 
> to others resources.
> 
> I have also used symmetrical=false with kind=Optional.
> The stop of the individual resource does not stop the others resources, but 
> if 
> during the startup or shutdown of the resources is used a list of primitives 
> without any order, the resources will start or stop without respecting the 
> constraint strictly.
> 
> Regards
> Ivan

If I understand, you want to be able to specify resources A B C such
that they always start in that order, but stopping can be in any
combination:
* just A
* just B
* just C
* just A and B (in which case B stops then A)
* just A and C (in which case C stops then A)
* just B and C (in which case C stops then B)
* or all (in which case C stops, then B, then A)

There may be a fancy way to do it with sets, but my first thought is:

* Keep the start constraint you have

* Use individual ordering constraints between each resource pair with
kind=Optional and action=stop

>> Messaggio originale
>> Da: "Ken Gaillot" 
>> Data: 01/03/2017 15.57
>> A: "Ulrich Windl", 
>> Ogg: Re: [ClusterLabs] Antw: Re:  Ordering Sets of Resources
>>
>> On 03/01/2017 01:36 AM, Ulrich Windl wrote:
>>>>>> Ken Gaillot  schrieb am 26.02.2017 um 20:04 in 
> Nachricht
>>> :
>>>> On 02/25/2017 03:35 PM, iva...@libero.it wrote:
>>>>> Hi all,
>>>>> i have configured a two node cluster on redhat 7.
>>>>>
>>>>> Because I need to manage resources stopping and starting singularly when
>>>>> they are running I have configured cluster using order set constraints.
>>>>>
>>>>> Here the example
>>>>>
>>>>> Ordering Constraints:
>>>>>   Resource Sets:
>>>>> set MYIP_1 MYIP_2 MYFTP MYIP_5 action=start sequential=false
>>>>> require-all=true set MYIP_3 MYIP_4 MYSMTP action=start sequential=true
>>>>> require-all=true setoptions symmetrical=false
>>>>> set MYSMTP MYIP_4 MYIP_3 action=stop sequential=true
>>>>> require-all=true set MYIP_5 MYFTP MYIP_2 MYIP_1 action=stop
>>>>> sequential=true require-all=true setoptions symmetrical=false 
> kind=Mandatory
>>>>>
>>>>> The constrait work as expected on start but when stopping the resource
>>>>> don't respect the order.
>>>>> Any help is appreciated
>>>>>
>>>>> Thank and regards
>>>>> Ivan
>>>>
>>>> symmetrical=false means the order only applies for starting
>>>
>>> From the name (symmetrical) alone it could also mean that it only applies 
> for stopping ;-)
>>> (Another example where better names would be nice)
>>
>> Well, more specifically, it only applies to the action specified in the
>> constraint. I hadn't noticed before that the second constraint here has
>> action=stop, so yes, that one would only apply for stopping.
>>
>> In the above example, the two constraints are identical to a single
>> constraint with symmetrical=true, since the second constraint is just
>> the reverse of the first.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] PCMK_OCF_DEGRADED (_MASTER): exit codes are mapped to PCMK_OCF_UNKNOWN_ERROR

2017-03-02 Thread Ken Gaillot
On 03/01/2017 05:28 PM, Andrew Beekhof wrote:
> On Tue, Feb 28, 2017 at 12:06 AM, Lars Ellenberg
>  wrote:
>> When I recently tried to make use of the DEGRADED monitoring results,
>> I found out that it does still not work.
>>
>> Because LRMD choses to filter them in ocf2uniform_rc(),
>> and maps them to PCMK_OCF_UNKNOWN_ERROR.
>>
>> See patch suggestion below.
>>
>> It also filters away the other "special" rc values.
>> Do we really not want to see them in crmd/pengine?
> 
> I would think we do.
> 
>> Why does LRMD think it needs to outsmart the pengine?
> 
> Because the person that implemented the feature incorrectly assumed
> the rc would be passed back unmolested.
> 
>>
>> Note: I did build it, but did not use this yet,
>> so I have no idea if the rest of the implementation of the DEGRADED
>> stuff works as intended or if there are other things missing as well.
> 
> failcount might be the other place that needs some massaging.
> specifically, not incrementing it when a degraded rc comes through

I think that's already taken care of.

>> Thougts?\
> 
> looks good to me
> 
>>
>> diff --git a/lrmd/lrmd.c b/lrmd/lrmd.c
>> index 724edb7..39a7dd1 100644
>> --- a/lrmd/lrmd.c
>> +++ b/lrmd/lrmd.c
>> @@ -800,11 +800,40 @@ hb2uniform_rc(const char *action, int rc, const char 
>> *stdout_data)
>>  static int
>>  ocf2uniform_rc(int rc)
>>  {
>> -if (rc < 0 || rc > PCMK_OCF_FAILED_MASTER) {
>> -return PCMK_OCF_UNKNOWN_ERROR;

Let's simply use > PCMK_OCF_OTHER_ERROR here, since that's guaranteed to
be the high end.

Lars, do you want to test that?

>> +switch (rc) {
>> +default:
>> +   return PCMK_OCF_UNKNOWN_ERROR;
>> +
>> +case PCMK_OCF_OK:
>> +case PCMK_OCF_UNKNOWN_ERROR:
>> +case PCMK_OCF_INVALID_PARAM:
>> +case PCMK_OCF_UNIMPLEMENT_FEATURE:
>> +case PCMK_OCF_INSUFFICIENT_PRIV:
>> +case PCMK_OCF_NOT_INSTALLED:
>> +case PCMK_OCF_NOT_CONFIGURED:
>> +case PCMK_OCF_NOT_RUNNING:
>> +case PCMK_OCF_RUNNING_MASTER:
>> +case PCMK_OCF_FAILED_MASTER:
>> +
>> +case PCMK_OCF_DEGRADED:
>> +case PCMK_OCF_DEGRADED_MASTER:
>> +   return rc;
>> +
>> +#if 0
>> +   /* What about these?? */
> 
> yes, these should get passed back as-is too
> 
>> +/* 150-199 reserved for application use */
>> +PCMK_OCF_CONNECTION_DIED = 189, /* Operation failure implied by 
>> disconnection of the LRM API to a local or remote node */
>> +
>> +PCMK_OCF_EXEC_ERROR= 192, /* Generic problem invoking the agent */
>> +PCMK_OCF_UNKNOWN   = 193, /* State of the service is unknown - used 
>> for recording in-flight operations */
>> +PCMK_OCF_SIGNAL= 194,
>> +PCMK_OCF_NOT_SUPPORTED = 195,
>> +PCMK_OCF_PENDING   = 196,
>> +PCMK_OCF_CANCELLED = 197,
>> +PCMK_OCF_TIMEOUT   = 198,
>> +PCMK_OCF_OTHER_ERROR   = 199, /* Keep the same codes as PCMK_LSB */
>> +#endif
>>  }
>> -
>> -return rc;
>>  }
>>
>>  static int

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] PCMK_OCF_DEGRADED (_MASTER): exit codes are mapped to PCMK_OCF_UNKNOWN_ERROR

2017-03-06 Thread Ken Gaillot
On 03/06/2017 10:55 AM, Lars Ellenberg wrote:
> On Thu, Mar 02, 2017 at 05:31:33PM -0600, Ken Gaillot wrote:
>> On 03/01/2017 05:28 PM, Andrew Beekhof wrote:
>>> On Tue, Feb 28, 2017 at 12:06 AM, Lars Ellenberg
>>>  wrote:
>>>> When I recently tried to make use of the DEGRADED monitoring results,
>>>> I found out that it does still not work.
>>>>
>>>> Because LRMD choses to filter them in ocf2uniform_rc(),
>>>> and maps them to PCMK_OCF_UNKNOWN_ERROR.
>>>>
>>>> See patch suggestion below.
>>>>
>>>> It also filters away the other "special" rc values.
>>>> Do we really not want to see them in crmd/pengine?
>>>
>>> I would think we do.
> 
>>>> Note: I did build it, but did not use this yet,
>>>> so I have no idea if the rest of the implementation of the DEGRADED
>>>> stuff works as intended or if there are other things missing as well.
>>>
>>> failcount might be the other place that needs some massaging.
>>> specifically, not incrementing it when a degraded rc comes through
>>
>> I think that's already taken care of.
>>
>>>> Thougts?\
>>>
>>> looks good to me
>>>
>>>>
>>>> diff --git a/lrmd/lrmd.c b/lrmd/lrmd.c
>>>> index 724edb7..39a7dd1 100644
>>>> --- a/lrmd/lrmd.c
>>>> +++ b/lrmd/lrmd.c
>>>> @@ -800,11 +800,40 @@ hb2uniform_rc(const char *action, int rc, const char 
>>>> *stdout_data)
>>>>  static int
>>>>  ocf2uniform_rc(int rc)
>>>>  {
>>>> -if (rc < 0 || rc > PCMK_OCF_FAILED_MASTER) {
>>>> -return PCMK_OCF_UNKNOWN_ERROR;
>>
>> Let's simply use > PCMK_OCF_OTHER_ERROR here, since that's guaranteed to
>> be the high end.
>>
>> Lars, do you want to test that?
> 
> Why would we want to filter at all, then?
> 
> I get it that we may want to map non-ocf agent exit codes
> into the "ocf" range,
> but why mask exit codes from "ocf" agents at all (in lrmd)?
> 
> Lars

It's probably unnecessarily paranoid, but I guess the idea is to check
that the agent at least returns something in the expected range for OCF
(perhaps it's not complying with the spec, or complying with a newer
version of the spec than we can handle).

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] resource was disabled automatically

2017-03-06 Thread Ken Gaillot
On 03/06/2017 03:49 AM, cys wrote:
> Hi,
> 
> Today I found one resource was disabled. I checked that nobody did it.
> The logs showed crmd(or pengine?) stopped it. I don't known why.
> So I want to know will pacemaker disable resource automatically?
> If so, when and why?
> 
> Thanks.


Pacemaker will never set the target-role automatically, so if you mean
that something set target-role=Stopped, that happened outside the cluster.

If you just mean stopping, the cluster can stop a resource in response
to the configuration or conditions.

The pengine decides what needs to be done, the crmd coordinates it, and
the lrmd does it (for actions on resources, anyway). So all are involved
to some extent.

To figure out why a resource was stopped, you want to check the logs on
the DC (which will be the node with the most "pengine:" messages around
that time). When the PE decides a resource needs to be stopped, you'll
see a message like

   notice: LogActions:  Stop()

Often, by looking at the messages before that, you can see what led it
to decide that. Shortly after that, you'll see something like

   Calculated transition , saving inputs in 

That file will contain the state of the cluster at that moment. So you
can grab that for some deep diving. One of the things you can do with
that file is run crm_simulate on it, to get detailed info about why each
action was taken. "crm_simulate -Ssx " will show a somewhat
painful description of everything the cluster would do and the scores
that fed into the decision.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] PCMK_OCF_DEGRADED (_MASTER): exit codes are mapped to PCMK_OCF_UNKNOWN_ERROR

2017-03-06 Thread Ken Gaillot
On 03/06/2017 04:15 PM, Lars Ellenberg wrote:
> On Mon, Mar 06, 2017 at 12:35:18PM -0600, Ken Gaillot wrote:
>>>>>> diff --git a/lrmd/lrmd.c b/lrmd/lrmd.c
>>>>>> index 724edb7..39a7dd1 100644
>>>>>> --- a/lrmd/lrmd.c
>>>>>> +++ b/lrmd/lrmd.c
>>>>>> @@ -800,11 +800,40 @@ hb2uniform_rc(const char *action, int rc, const 
>>>>>> char *stdout_data)
>>>>>>  static int
>>>>>>  ocf2uniform_rc(int rc)
>>>>>>  {
>>>>>> -if (rc < 0 || rc > PCMK_OCF_FAILED_MASTER) {
>>>>>> -return PCMK_OCF_UNKNOWN_ERROR;
>>>>
>>>> Let's simply use > PCMK_OCF_OTHER_ERROR here, since that's guaranteed to
>>>> be the high end.
>>>>
>>>> Lars, do you want to test that?
>>>
>>> Why would we want to filter at all, then?
>>>
>>> I get it that we may want to map non-ocf agent exit codes
>>> into the "ocf" range,
>>> but why mask exit codes from "ocf" agents at all (in lrmd)?
>>
>> It's probably unnecessarily paranoid, but I guess the idea is to check
>> that the agent at least returns something in the expected range for OCF
> 
> Well, yes. But, if we are going to allow the range 0 to 199,
> I don't see any reason to hide the range 200 to 255.

Ideally, all non-LSB OCF return codes should be in the LSB spec's
150-199 range reserved for application use, since OCF aims for some
degree of LSB compatibility. RUNNING_MASTER and FAILED_MASTER got into
the wild before that policy was set, so they're exceptions. But it's
reasonable to require that all future OCF codes fall into that range.

>> (perhaps it's not complying with the spec, or complying with a newer
>> version of the spec than we can handle).
> 
> Or, as in the case of PCMK_OCF_DEGRADED, complying with a newer version
> of the spec that probably would have been handled fine, if it wasn't for
> the unneccesary paranoia ;-)
> 
> Lars

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] resource was disabled automatically

2017-03-07 Thread Ken Gaillot
On 03/06/2017 08:29 PM, cys wrote:
> At 2017-03-07 05:47:19, "Ken Gaillot"  wrote:
>> To figure out why a resource was stopped, you want to check the logs on
>> the DC (which will be the node with the most "pengine:" messages around
>> that time). When the PE decides a resource needs to be stopped, you'll
>> see a message like
>>
>>   notice: LogActions:  Stop()
>>
>> Often, by looking at the messages before that, you can see what led it
>> to decide that. Shortly after that, you'll see something like
>>
> 
> Thanks Ken. It's really helpful.
> Finally I found the debug log of pengine(in a separate file). It has this 
> message:
> "All nodes for resource p_vs-scheduler are unavailable, unclean or shutting 
> down..."
> So it seems this caused vs-scheduler disabled.
> 
> If all nodes come back to be in good state, will pengine start the resource 
> automatically?
> I did it manually yesterday.

Yes, whenever a node changes state (such as becoming available), the
pengine will recheck what can be done.


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] VirtualDomain as non-root / encrypted

2017-03-08 Thread Ken Gaillot
On 03/08/2017 04:19 AM, philipp.achmuel...@arz.at wrote:
> hi,
> 
> Any ideas how to run VirtualDomain Resource as non-root user with
> encrypted transport to remote hypervisor(ssh)?
> 
> i'm able to start/stop/migrate vm via libvirt as non-root, but it
> doesn't work with pacemaker - pacemaker runs VirtualDomain as root, also
> there is no option to pass user via parameter
> 
> thank you!

There's no way to do this within Pacemaker currently. The closest
workaround would be to copy the VirtualDomain agent, and edit it to
switch users before doing anything.

Since we added the alerts feature, we've been keeping a future
enhancement in mind to allow selecting the user that alert agents run as
(currently, it's always hacluster). If we do that, the same mechanism
will likely work with resource agents as well. There is a lot of
high-priority work ahead of that, though.

Keep in mind that some agents maintain state data somewhere like
/var/run, and they may break even if they can otherwise run as a
different user. If they offer the state location as an option, that's an
easy workaround.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Never join a list without a problem...

2017-03-08 Thread Ken Gaillot
Digest, Vol 26, Issue 9
> 
> Send Users mailing list submissions to
> users@clusterlabs.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.clusterlabs.org/mailman/listinfo/users
> or, via email, send a message with subject or body 'help' to
> users-requ...@clusterlabs.org
> 
> You can reach the person managing the list at
> users-ow...@clusterlabs.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Users digest..."
> 
> 
> Today's Topics:
> 
>1. Re: Never join a list without a problem... (Jeffrey Westgate)
>2. Re: PCMK_OCF_DEGRADED (_MASTER): exit codes are mapped to
>   PCMK_OCF_UNKNOWN_ERROR (Ken Gaillot)
>3. Re: Cannot clone clvmd resource (Eric Ren)
>4. Re: Cannot clone clvmd resource (Eric Ren)
>5. Antw: Re:  Never join a list without a problem... (Ulrich Windl)
>6. Antw: Re:  Cannot clone clvmd resource (Ulrich Windl)
>7. Re: Insert delay between the statup of VirtualDomain
>   (Dejan Muhamedagic)
> 
> 
> --
> 
> Message: 1
> Date: Thu, 2 Mar 2017 16:32:02 +
> From: Jeffrey Westgate 
> To: Adam Spiers , "Cluster Labs - All topics related
> to  open-source clustering welcomed" 
> Subject: Re: [ClusterLabs] Never join a list without a problem...
> Message-ID:
> 
> 
> 
> Content-Type: text/plain; charset="iso-8859-1"
> 
> Since we have both pieces of the load-balanced cluster doing the same thing - 
> for still-as-yet unidentified reasons - we've put atop on one and sysdig on 
> the other.  Running atop at 10 second slices, hoping it will catch something. 
>  While configuring it yesterday, that server went into it's 'episode', but 
> there was nothing in the atop log to show anything.  Nothing else changed 
> except the cpu load average.  No increase in any other parameter.
> 
> frustrating.
> 
> 
> 
> From: Adam Spiers [aspi...@suse.com]
> Sent: Wednesday, March 01, 2017 5:33 AM
> To: Cluster Labs - All topics related to open-source clustering welcomed
> Cc: Jeffrey Westgate
> Subject: Re: [ClusterLabs] Never join a list without a problem...
> 
> Ferenc W?gner  wrote:
>> Jeffrey Westgate  writes:
>>
>>> We use Nagios to monitor, and once every 20 to 40 hours - sometimes
>>> longer, and we cannot set a clock by it - while the machine is 95%
>>> idle (or more according to 'top'), the host load shoots up to 50 or
>>> 60%.  It takes about 20 minutes to peak, and another 30 to 45 minutes
>>> to come back down to baseline, which is mostly 0.00.  (attached
>>> hostload.pdf) This happens to both machines, randomly, and is
>>> concerning, as we'd like to find what's causing it and resolve it.
>>
>> Try running atop (http://www.atoptool.nl/).  It collects and logs
>> process accounting info, allowing you to step back in time and check
>> resource usage in the past.
> 
> Nice, I didn't know atop could also log the collected data for future
> analysis.
> 
> If you want to capture even more detail, sysdig is superb:
> 
> http://www.sysdig.org/
> 
> 
> 
> --
> 
> Message: 2
> Date: Thu, 2 Mar 2017 17:31:33 -0600
> From: Ken Gaillot 
> To: users@clusterlabs.org
> Subject: Re: [ClusterLabs] PCMK_OCF_DEGRADED (_MASTER): exit codes are
> mapped to PCMK_OCF_UNKNOWN_ERROR
> Message-ID: <8b8dd955-8e35-6824-a80c-2556d833f...@redhat.com>
> Content-Type: text/plain; charset=windows-1252
> 
> On 03/01/2017 05:28 PM, Andrew Beekhof wrote:
>> On Tue, Feb 28, 2017 at 12:06 AM, Lars Ellenberg
>>  wrote:
>>> When I recently tried to make use of the DEGRADED monitoring results,
>>> I found out that it does still not work.
>>>
>>> Because LRMD choses to filter them in ocf2uniform_rc(),
>>> and maps them to PCMK_OCF_UNKNOWN_ERROR.
>>>
>>> See patch suggestion below.
>>>
>>> It also filters away the other "special" rc values.
>>> Do we really not want to see them in crmd/pengine?
>>
>> I would think we do.
>>
>>> Why does LRMD think it needs to outsmart the pengine?
>>
>> Because the person that implemented the feature incorrectly assumed
>> the rc would be passed back unmolested.
>>
>>>
>>> Note: I did build it, but did not use this yet,
>>> so I have no idea if the rest of the implementa

Re: [ClusterLabs] Failover question

2017-03-15 Thread Ken Gaillot
Sure, just add a colocation constraint for virtual_ip with proxy.

On 03/15/2017 05:06 AM, Frank Fiene wrote:
> Hi,
> 
> Another beginner question:
> 
> I have configured a virtual IP resource on two hosts and an apache resource 
> cloned on both machines like this
> 
> pcs resource create virtual_ip ocf:heartbeat:IPaddr2 params ip= 
> op monitor interval=10s
> pcs resource create proxy lsb:apache2 
> statusurl="http://127.0.0.1/server-status"; op monitor interval=15s clone
> 
> 
> Will the IP failover if the Apache server on the Master has a problem?
> The Apache is just acting as a proxy, so I thought it would be faster to have 
> it already running on both machines.
> 
> 
> Kind Regards! Frank
> — 
> Frank Fiene
> IT-Security Manager VEKA Group
> 
> Fon: +49 2526 29-6200
> Fax: +49 2526 29-16-6200
> mailto: ffi...@veka.com
> http://www.veka.com
> 
> PGP-ID: 62112A51
> PGP-Fingerprint: 7E12 D61B 40F0 212D 5A55 765D 2A3B B29B 6211 2A51
> Threema: VZK5NDWW
> 
> VEKA AG
> Dieselstr. 8
> 48324 Sendenhorst
> Deutschland/Germany
> 
> Vorstand/Executive Board: Andreas Hartleif (Vorsitzender/CEO),
> Dr. Andreas W. Hillebrand, Bonifatius Eichwald, Elke Hartleif, Dr. Werner 
> Schuler,
> Vorsitzender des Aufsichtsrates/Chairman of Supervisory Board: Ulrich Weimer
> HRB 8282 AG Münster/District Court of Münster

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Failover question

2017-03-16 Thread Ken Gaillot
On 03/16/2017 04:01 AM, Frank Fiene wrote:
> OK, but with the parameter INFINITY?

Correct, that makes it mandatory, so virtual_ip can only run where there
is a running instance of proxy.

> I am not sure that this prevents Apache for running on the host without the 
> virtual IP.

You configured apache as a clone, so it will run on all nodes,
regardless of where the IP is -- but the IP would only be placed where
apache is successfully running.

>> Am 15.03.2017 um 15:15 schrieb Ken Gaillot :
>>
>> Sure, just add a colocation constraint for virtual_ip with proxy.
>>
>> On 03/15/2017 05:06 AM, Frank Fiene wrote:
>>> Hi,
>>>
>>> Another beginner question:
>>>
>>> I have configured a virtual IP resource on two hosts and an apache resource 
>>> cloned on both machines like this
>>>
>>> pcs resource create virtual_ip ocf:heartbeat:IPaddr2 params ip= 
>>> op monitor interval=10s
>>> pcs resource create proxy lsb:apache2 
>>> statusurl="http://127.0.0.1/server-status"; op monitor interval=15s clone
>>>
>>>
>>> Will the IP failover if the Apache server on the Master has a problem?
>>> The Apache is just acting as a proxy, so I thought it would be faster to 
>>> have it already running on both machines.
>>>
>>>
>>> Kind Regards! Frank
>>> — 
>>> Frank Fiene
>>> IT-Security Manager VEKA Group
>>>
>>> Fon: +49 2526 29-6200
>>> Fax: +49 2526 29-16-6200
>>> mailto: ffi...@veka.com
>>> http://www.veka.com
>>>
>>> PGP-ID: 62112A51
>>> PGP-Fingerprint: 7E12 D61B 40F0 212D 5A55 765D 2A3B B29B 6211 2A51
>>> Threema: VZK5NDWW
>>>
>>> VEKA AG
>>> Dieselstr. 8
>>> 48324 Sendenhorst
>>> Deutschland/Germany
>>>
>>> Vorstand/Executive Board: Andreas Hartleif (Vorsitzender/CEO),
>>> Dr. Andreas W. Hillebrand, Bonifatius Eichwald, Elke Hartleif, Dr. Werner 
>>> Schuler,
>>> Vorsitzender des Aufsichtsrates/Chairman of Supervisory Board: Ulrich Weimer
>>> HRB 8282 AG Münster/District Court of Münster

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] CIB configuration: role with many expressions - error 203

2017-03-21 Thread Ken Gaillot
On 03/21/2017 11:20 AM, Radoslaw Garbacz wrote:
> Hi,
> 
> I have a problem when creating rules with many expressions:
> 
>  
>  boolean-op="and">
>id="on_nodes_dbx_first_head-expr" value="Active"/>
>id="on_nodes_dbx_first_head-expr" value="AH"/>
> 
>   
> 
> Result:
> Call cib_replace failed (-203): Update does not conform to the
> configured schema
> 
> Everything works when I remove "boolean-op" attribute and leave only one
> expression.
> What do I do wrong when creating rules?

boolean_op

Underbar not dash :-)

> 
> 
> Pacemaker 1.1.16-1.el6
> Written by Andrew Beekhof
> 
> 
> Thank in advance for any help,
> 
> -- 
> Best Regards,
> 
> Radoslaw Garbacz
> XtremeData Incorporated

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Running two independent clusters

2017-03-22 Thread Ken Gaillot
On 03/22/2017 05:23 AM, Nikhil Utane wrote:
> Hi Ulrich,
> 
> It's not an option unfortunately.
> Our product runs on a specialized hardware and provides both the
> services (A & B) that I am referring to. Hence I cannot have service A
> running on some nodes as cluster A and service B running on other nodes
> as cluster B.
> The two services HAVE to run on same node. The catch being service A and
> service B have to be independent of each other.
> 
> Hence looking at Container option since we are using that for some other
> product (but not for Pacemaker/Corosync).
> 
> -Regards
> Nikhil

Instead of containerizing pacemaker, why don't you containerize or
virtualize the services, and have pacemaker manage the containers/VMs?

Coincidentally, I am about to announce enhanced container support in
pacemaker. I should have a post with more details later today or tomorrow.

> 
> On Wed, Mar 22, 2017 at 12:41 PM, Ulrich Windl
>  > wrote:
> 
> >>> Nikhil Utane  > schrieb am 22.03.2017 um 07:48 in
> Nachricht
>  >:
> > Hi All,
> >
> > First of all, let me thank everyone here for providing excellent support
> > from the time I started evaluating this tool about a year ago. It has
> > helped me to make a timely and good quality release of our Redundancy
> > solution using Pacemaker & Corosync. (Three cheers :))
> >
> > Now for our next release we have a slightly different ask.
> > We want to provide Redundancy to two different types of services (we can
> > call them Service A and Service B) such that all cluster communication 
> for
> > Service A happens on one network/interface (say VLAN A) and for service 
> B
> > happens on a different network/interface (say VLAN B). Moreover we do 
> not
> > want the details of Service A (resource attributes etc) to be seen by
> > Service B and vice-versa.
> >
> > So essentially we want to be able to run two independent clusters. From
> > what I gathered, we cannot run multiple instances of Pacemaker and 
> Corosync
> > on same node. I was thinking if we can use Containers and run two 
> isolated
> 
> You conclude from two services that should not see each other that
> you need to instances of pacemaker on one node. Why?
> If you want true separation, drop the VLANs, make real networks and
> two independent clusters.
> Even if two pacemeaker on one node would work, you habe the problem
> of fencing, where at least one pacemaker instance will always be
> surprised badly if fencing takes place. I cannot imaging you want that!
> 
> > instances of Pacemaker + Corosync on same node.
> > As per https://github.com/davidvossel/pacemaker_docker
>  it looks do-able.
> > I wanted to get an opinion on this forum before I can commit that it 
> can be
> > done.
> 
> Why are you designing it more complicated as necessary?
> 
> >
> > Please share your views if you have already done this and if there are 
> any
> > known challenges that I should be familiar with.
> >
> > -Thanks
> > Nikhil

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: CIB configuration: role with many expressions - error 203

2017-03-22 Thread Ken Gaillot
On 03/22/2017 09:26 AM, Radoslaw Garbacz wrote:
> I have tried also as 'boolean_op', sorry did not mention this in the
> original post (just as a remark the documentation for pacemaker has both
> forms).

*smacks forehead*

Yep, the documentation needs to be fixed. You were right the first time,
it's "boolean-op" with a dash.

Looking at your example again, I think the problem is that you're using
the same ID for both expressions. The ID must be unique.

> 
> To make it work I have to remove additional "" and leave
> only one.
> 
> To summarize:
> - having no "boolean..." attribute and a single "expression" - works
> - having "boolean-op" and a single "expression" - works
> 
> - having "boolean_op" and a single "expression" - does not work
> - having either "boolean-op" or "boolean_op" or no such phrase at all
> with more than one "expression" - does not work
> 
> 
> 
> I have found the reason: expressions IDs within a rule is the same, once
> I made it unique it works.
> 
> 
> Thanks,
> 
> 
> On Wed, Mar 22, 2017 at 2:06 AM, Ulrich Windl
>  <mailto:ulrich.wi...@rz.uni-regensburg.de>> wrote:
> 
> >>> Ken Gaillot mailto:kgail...@redhat.com>>
> schrieb am 22.03.2017 um 00:18 in Nachricht
> <94b7e5fd-cb65-4775-71df-ca8983629...@redhat.com
> <mailto:94b7e5fd-cb65-4775-71df-ca8983629...@redhat.com>>:
> > On 03/21/2017 11:20 AM, Radoslaw Garbacz wrote:
> >> Hi,
> >>
> >> I have a problem when creating rules with many expressions:
> >>
> >>  
> >>  >> boolean-op="and">
> >>>> id="on_nodes_dbx_first_head-expr" value="Active"/>
> >>>> id="on_nodes_dbx_first_head-expr" value="AH"/>
> >> 
> >>   
> >>
> >> Result:
> >> Call cib_replace failed (-203): Update does not conform to the
> >> configured schema
> >>
> >> Everything works when I remove "boolean-op" attribute and leave only 
> one
> >> expression.
> >> What do I do wrong when creating rules?
> >
> > boolean_op
> >
> > Underbar not dash :-)
> 
> Good spotting, but I think a more useful error message would be
> desired ;-)
> 
> >
> >>
> >>
> >> Pacemaker 1.1.16-1.el6
> >> Written by Andrew Beekhof
> >>
> >>
> >> Thank in advance for any help,
> >>
> >> --
> >> Best Regards,
> >>
> >> Radoslaw Garbacz
> >> XtremeData Incorporated
> >
> > ___
> > Users mailing list: Users@clusterlabs.org
> <mailto:Users@clusterlabs.org>
> > http://lists.clusterlabs.org/mailman/listinfo/users
> <http://lists.clusterlabs.org/mailman/listinfo/users>
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> > Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org>
> http://lists.clusterlabs.org/mailman/listinfo/users
> <http://lists.clusterlabs.org/mailman/listinfo/users>
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> <http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf>
> Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> 
> -- 
> Best Regards,
> 
> Radoslaw Garbacz
> XtremeData Incorporated
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
> 


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Running two independent clusters

2017-03-23 Thread Ken Gaillot
On 03/22/2017 11:08 PM, Nikhil Utane wrote:
> I simplified when I called it as a service. Essentially it is a complete
> system.
> It is an LTE eNB solution. It provides LTE service (service A) and now
> we need to provide redundancy for another different but related service
> (service B). The catch being, the LTE redundancy solution will be tied
> to one operator whereas the other service can span across multiple
> operators. Therefore ideally we want two completely independent clusters
> since different set of nodes will form the two clusters.
> Now what I am thinking is, to run additional instance of Pacemaker +
> Corosync in a container which can then notify the service B on host
> machine to start or stop it's service. That way my CIB file will be
> independent and I can run corosync on different interfaces.
> 
> Workable right?
> 
> -Regards
> Nikhil

It's not well-tested, but in theory it should work, as long as the
container is privileged.

I still think virtualizing the services would be more resilient. It
makes sense to have a single determination of quorum and fencing for the
same real hosts. I'd think of it like a cloud provider -- the cloud
instances are segregated by customer, but the underlying hosts are the same.

You could configure your cluster as asymmetric, and enable each VM only
on the nodes it's allowed on, so you get the two separate "clusters"
that way. You could set up the VMs as guest nodes if you want to monitor
and manage multiple services within them. If your services require
hardware access that's not easily passed to a VM, containerizing the
services might be a better option.

> On Wed, Mar 22, 2017 at 8:06 PM, Ken Gaillot  <mailto:kgail...@redhat.com>> wrote:
> 
> On 03/22/2017 05:23 AM, Nikhil Utane wrote:
> > Hi Ulrich,
> >
> > It's not an option unfortunately.
> > Our product runs on a specialized hardware and provides both the
> > services (A & B) that I am referring to. Hence I cannot have service A
> > running on some nodes as cluster A and service B running on other nodes
> > as cluster B.
> > The two services HAVE to run on same node. The catch being service A and
> > service B have to be independent of each other.
> >
> > Hence looking at Container option since we are using that for some other
> > product (but not for Pacemaker/Corosync).
> >
> > -Regards
> > Nikhil
> 
> Instead of containerizing pacemaker, why don't you containerize or
> virtualize the services, and have pacemaker manage the containers/VMs?
> 
> Coincidentally, I am about to announce enhanced container support in
> pacemaker. I should have a post with more details later today or
> tomorrow.
> 
> >
> > On Wed, Mar 22, 2017 at 12:41 PM, Ulrich Windl
> >  <mailto:ulrich.wi...@rz.uni-regensburg.de>
> > <mailto:ulrich.wi...@rz.uni-regensburg.de
> <mailto:ulrich.wi...@rz.uni-regensburg.de>>> wrote:
> >
> > >>> Nikhil Utane  <mailto:nikhil.subscri...@gmail.com>
> > <mailto:nikhil.subscri...@gmail.com
> <mailto:nikhil.subscri...@gmail.com>>> schrieb am 22.03.2017 um 07:48 in
> > Nachricht
> >   
>   <mailto:0g...@mail.gmail.com>
> > <mailto:0g...@mail.gmail.com <mailto:0g...@mail.gmail.com>>>:
> > > Hi All,
> > >
> > > First of all, let me thank everyone here for providing
> excellent support
> > > from the time I started evaluating this tool about a year
> ago. It has
> > > helped me to make a timely and good quality release of our
> Redundancy
> > > solution using Pacemaker & Corosync. (Three cheers :))
> > >
> > > Now for our next release we have a slightly different ask.
> > > We want to provide Redundancy to two different types of
> services (we can
> > > call them Service A and Service B) such that all cluster
> communication for
> > > Service A happens on one network/interface (say VLAN A) and
> for service B
> > > happens on a different network/interface (say VLAN B).
> Moreover we do not
> > > want the details of Service A (resource attributes etc) to
> be seen by
> > > Service B and vice-versa.
> > >
> > > So essentially we want to be able to run two independent
> clusters. From
> > > what I gathered, we cannot run mul

Re: [ClusterLabs] error: The cib process (17858) exited: Key has expired (127)

2017-03-24 Thread Ken Gaillot
On 03/24/2017 08:06 AM, Rens Houben wrote:
> I recently upgraded a two-node cluster (named 'castor' and 'pollux'
> because I should not be allowed to think up computer names before I've
> had my morning caffeine) from Debian wheezy to Jessie after the
> backports for corosync and pacemaker finally made it in. However, one of
> the two servers failed to start correctly for no really obvious reason.
> 
> Given as how it'd been years since I last set them up  and had forgotten
> pretty much everything about it in the interim I decided to purge
> corosync and pacemaker on both systems and run with clean installs instead.
> 
> This worked on pollux, but not on castor. Even after going pack,
> re-purging, removing everything legacy in /var/lib/heartbeat and
> emptying both directories, castor still refuses to bring up pacemaker.
> 
> 
> I put the full log of a start attempt up at
> http://proteus.systemec.nl/~shadur/pacemaker/pacemaker.log.txt
> , but
> this is the excerpt that I /think/ is causing the failure:
> 
> Mar 24 13:59:05 [25495] castor pacemakerd:error: pcmk_child_exit:The
> cib process (25502) exited: Key has expired (127)
> Mar 24 13:59:05 [25495] castor pacemakerd:   notice:
> pcmk_process_exit:Respawning failed child process: cib
> 
> I don't see any entries from cib in the log that suggest anything's
> going wrong, though, and I'm running out of ideas on where to look next.

The "Key has expired" message is misleading. (Pacemaker really needs an
overhaul of the exit codes it can return, so these messages can be
reliable, but there are always more important things to take care of ...)

Pacemaker is getting 127 as the exit status of cib, and interpreting
that as a standard system error number, but it probably isn't one. I
don't actually see any way that the cib can return 127, so I'm not sure
what that might indicate.

In any case, the cib is mysteriously dying whenever it tries to start,
apparently without logging why or dumping core. (Do you have cores
disabled at the OS level?)

> Does anyone have any suggestions as to how to coax more information out
> of the processes and into the log files so I'll have a clue to work with?

Try it again with PCMK_debug=cib in /etc/default/pacemaker. That should
give more log messages.

> 
> Regards,
> 
> --
> Rens Houben
> Systemec Internet Services
> 
> SYSTEMEC BV
> 
> Marinus Dammeweg 25, 5928 PW Venlo
> Postbus 3290, 5902 RG Venlo
> Industrienummer: 6817
> Nederland
> 
> T: 077-3967572 (Support)
> K.V.K. nummer: 12027782 (Venlo)
> 
> Systemec Datacenter Venlo & Nettetal 
> 
> Systemec Helpdesk   Helpdesk
> 
> 
> Aanmelden nieuwsbrief 
>  Aanmelden nieuwsbrief 
> 
> Volg ons op: Systemec Twitter  Systemec
> Facebook  Systemec Linkedin
>  Systemec Youtube
> 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] stonith in dual HMC environment

2017-03-24 Thread Ken Gaillot
On 03/22/2017 09:42 AM, Alexander Markov wrote:
> 
>> Please share your config along with the logs from the nodes that were
>> effected.
> 
> I'm starting to think it's not about how to define stonith resources. If
> the whole box is down with all the logical partitions defined, then HMC
> cannot define if LPAR (partition) is really dead or just inaccessible.
> This leads to UNCLEAN OFFLINE node status and pacemaker refusal to do
> anything until it's resolved. Am I right? Anyway, the simples pacemaker
> config from my partitions is below.

Yes, it looks like you are correct. The fence agent is returning an
error when pacemaker tries to use it to reboot crmapp02. From the stderr
in the logs, the message is "ssh: connect to host 10.1.2.9 port 22: No
route to host".

The first thing I'd try is making sure you can fence each node from the
command line by manually running the fence agent. I'm not sure how to do
that for the "stonith:" type agents.

Once that's working, make sure the cluster can do the same, by manually
running "stonith_admin -B $NODE" for each $NODE.

> 
> primitive sap_ASCS SAPInstance \
> params InstanceName=CAP_ASCS01_crmapp \
> op monitor timeout=60 interval=120 depth=0
> primitive sap_D00 SAPInstance \
> params InstanceName=CAP_D00_crmapp \
> op monitor timeout=60 interval=120 depth=0
> primitive sap_ip IPaddr2 \
> params ip=10.1.12.2 nic=eth0 cidr_netmask=24

> primitive st_ch_hmc stonith:ibmhmc \
> params ipaddr=10.1.2.9 \
> op start interval=0 timeout=300
> primitive st_hq_hmc stonith:ibmhmc \
> params ipaddr=10.1.2.8 \
> op start interval=0 timeout=300

I see you have two stonith devices defined, but they don't specify which
nodes they can fence -- pacemaker will assume that either device can be
used to fence either node.

> group g_sap sap_ip sap_ASCS sap_D00 \
> meta target-role=Started

> location l_ch_hq_hmc st_ch_hmc -inf: crmapp01
> location l_st_hq_hmc st_hq_hmc -inf: crmapp02

These constraints restrict which node monitors which device, not which
node the device can fence.

Assuming st_ch_hmc is intended to fence crmapp01, this will make sure
that crmapp02 monitors that device -- but you also want something like
pcmk_host_list=crmapp01 in the device configuration.

> location prefer_node_1 g_sap 100: crmapp01
> property cib-bootstrap-options: \
> stonith-enabled=true \
> no-quorum-policy=ignore \
> placement-strategy=balanced \
> expected-quorum-votes=2 \
> dc-version=1.1.12-f47ea56 \
> cluster-infrastructure="classic openais (with plugin)" \
> last-lrm-refresh=1490009096 \
> maintenance-mode=false
> rsc_defaults rsc-options: \
> resource-stickiness=200 \
> migration-threshold=3
> op_defaults op-options: \
> timeout=600 \
> record-pending=true
> 
> Logs are pretty much going in circle: stonith cannot reset logical
> partition via HMC, node stays unclean offline, resources are shown to
> stay on node that is down.
> 
> 
> stonith-ng:error: log_operation:Operation 'reboot' [6942] (call
> 6 from crmd.4568) for host 'crmapp02' with device 'st_ch_hmc:0'
> Trying: st_ch_hmc:0
> stonith-ng:  warning: log_operation:st_ch_hmc:0:6942 [ Performing:
> stonith -t ibmhmc -T reset crmapp02 ]
> stonith-ng:  warning: log_operation:st_ch_hmc:0:6942 [ failed:
> crmapp02 3 ]
> stonith-ng: info: internal_stonith_action_execute:  Attempt 2 to
> execute fence_legacy (reboot). remaining timeout is 59
> stonith-ng: info: update_remaining_timeout: Attempted to
> execute agent fence_legacy (reboot) the maximum number of times (2)
> 
> stonith-ng:error: log_operation:Operation 'reboot' [6955] (call
> 6 from crmd.4568) for host 'crmapp02' with device 'st_hq_hmc' re
> Trying: st_hq_hmc
> stonith-ng:  warning: log_operation:st_hq_hmc:6955 [ Performing:
> stonith -t ibmhmc -T reset crmapp02 ]
> stonith-ng:  warning: log_operation:st_hq_hmc:6955 [ failed:
> crmapp02 8 ]
> stonith-ng: info: internal_stonith_action_execute:  Attempt 2 to
> execute fence_legacy (reboot). remaining timeout is 60
> stonith-ng: info: update_remaining_timeout: Attempted to
> execute agent fence_legacy (reboot) the maximum number of times (2)
> 
> stonith-ng:error: log_operation:Operation 'reboot' [6976] (call
> 6 from crmd.4568) for host 'crmapp02' with device 'st_hq_hmc:0'
> 
> stonith-ng:  warning: log_operation:st_hq_hmc:0:6976 [ Performing:
> stonith -t ibmhmc -T reset crmapp02 ]
> stonith-ng:  warning: log_operation:st_hq_hmc:0:6976 [ failed:
> crmapp02 8 ]
> stonith-ng:   notice: stonith_choose_peer:  Couldn't find anyone to
> fence crmapp02 with 
> stonith-ng: info: call_remote_stonith:  None of the 1 peers are
> capable of terminating crmapp02 for crmd.4568 (1)
> stonith-ng:error: remote_op_done:   Operation reboot of crmapp02 by
>  for crmd.4568@crmapp01.6bf66b9c: No route to host
> crmd:   notice: tengine_stonith_callback: Stonith ope

Re: [ClusterLabs] error: The cib process (17858) exited: Key has expired (127)

2017-03-24 Thread Ken Gaillot
On 03/24/2017 11:06 AM, Rens Houben wrote:
> I activated debug=cib, and retried.
> 
> New log file up at
> http://proteus.systemec.nl/~shadur/pacemaker/pacemaker_2.log.txt
> <http://proteus.systemec.nl/%7Eshadur/pacemaker/pacemaker_2.log.txt> ;
> unfortunately, while that *is* more information I'm not seeing anything
> that looks like it could be the cause, although it shouldn't be reading
> any config files yet because there shouldn't be any *to* read...

If there's no config file, pacemaker will create an empty one and use
that, so it still goes through the mechanics of validating it and
writing it out.

Debug doesn't give us much -- just one additional message before it dies:

Mar 24 16:59:27 [20266] castorcib:debug: activateCibXml:
Triggering CIB write for start op

You might want to look at the system log around that time to see if
something else is going wrong. If you have SELinux enabled, check the
audit log for denials.

> As to the misleading error message, it gets weirder: I grabbed a copy of
> the source code via apt-get source, and the phrase 'key has expired'
> does not occur anywhere in any file according to find ./ -type f -exec
> grep -il 'key has expired' {} \; so I have absolutely NO idea where it's
> coming from...

Right, it's not part of pacemaker, it's just the standard system error
message for errno 127. But the exit status isn't an errno, so that's not
the right interpretation. I can't find any code path in the cib that
would return 127, so I don't know what the right intepretation would be.

> 
> --
> Rens Houben
> Systemec Internet Services
> 
> SYSTEMEC BV
> 
> Marinus Dammeweg 25, 5928 PW Venlo
> Postbus 3290, 5902 RG Venlo
> Industrienummer: 6817
> Nederland
> 
> T: 077-3967572 (Support)
> K.V.K. nummer: 12027782 (Venlo)
> 
> Systemec Datacenter Venlo & Nettetal <https://www.systemec.nl>
> 
> Systemec Helpdesk <https://support.systemec.nl>  Helpdesk
> <https://support.systemec.nl>
> 
> Aanmelden nieuwsbrief <https://www.systemec.nl/nl/nieuwsbrief>
>  Aanmelden nieuwsbrief <https://www.systemec.nl/nl/nieuwsbrief>
> 
> Volg ons op: Systemec Twitter <https://twitter.com/systemec> Systemec
> Facebook <https://www.facebook.com/systemecbv> Systemec Linkedin
> <http://www.linkedin.com/company/systemec-b.v.> Systemec Youtube
> <http://www.youtube.com/user/systemec1>
> 
> 
> 
> Van: Ken Gaillot 
> Verzonden: vrijdag 24 maart 2017 16:49
> Aan: users@clusterlabs.org
> Onderwerp: Re: [ClusterLabs] error: The cib process (17858) exited: Key
> has expired (127)
> 
> On 03/24/2017 08:06 AM, Rens Houben wrote:
>> I recently upgraded a two-node cluster (named 'castor' and 'pollux'
>> because I should not be allowed to think up computer names before I've
>> had my morning caffeine) from Debian wheezy to Jessie after the
>> backports for corosync and pacemaker finally made it in. However, one of
>> the two servers failed to start correctly for no really obvious reason.
>>
>> Given as how it'd been years since I last set them up  and had forgotten
>> pretty much everything about it in the interim I decided to purge
>> corosync and pacemaker on both systems and run with clean installs instead.
>>
>> This worked on pollux, but not on castor. Even after going pack,
>> re-purging, removing everything legacy in /var/lib/heartbeat and
>> emptying both directories, castor still refuses to bring up pacemaker.
>>
>>
>> I put the full log of a start attempt up at
>> http://proteus.systemec.nl/~shadur/pacemaker/pacemaker.log.txt
> <http://proteus.systemec.nl/%7Eshadur/pacemaker/pacemaker.log.txt>
>> <http://proteus.systemec.nl/%7Eshadur/pacemaker/pacemaker.log.txt>, but
>> this is the excerpt that I /think/ is causing the failure:
>>
>> Mar 24 13:59:05 [25495] castor pacemakerd:error: pcmk_child_exit:The
>> cib process (25502) exited: Key has expired (127)
>> Mar 24 13:59:05 [25495] castor pacemakerd:   notice:
>> pcmk_process_exit:Respawning failed child process: cib
>>
>> I don't see any entries from cib in the log that suggest anything's
>> going wrong, though, and I'm running out of ideas on where to look next.
> 
> The "Key has expired" message is misleading. (Pacemaker really needs an
> overhaul of the exit codes it can return, so these messages can be
> reliable, but there are always more important things to take care of ...)
> 
> Pacemaker is getting 127 as the exit status of cib, and interpreting
> that a

Re: [ClusterLabs] Three node cluster becomes completely fenced if one node leaves

2017-03-24 Thread Ken Gaillot
On 03/24/2017 03:52 PM, Digimer wrote:
> On 24/03/17 04:44 PM, Seth Reid wrote:
>> I have a three node Pacemaker/GFS2 cluster on Ubuntu 16.04. Its not in
>> production yet because I'm having a problem during fencing. When I
>> disable the network interface of any one machine, the disabled machines
>> is properly fenced leaving me, briefly, with a two node cluster. A
>> second node is then fenced off immediately, and the remaining node
>> appears to try to fence itself off. This leave two nodes with
>> corosync/pacemaker stopped, and the remaining machine still in the
>> cluster but showing an offline node and an UNCLEAN node. What can be
>> causing this behavior?
> 
> It looks like the fence attempt failed, leaving the cluster hung. When
> you say all nodes were fenced, did all nodes actually reboot? Or did the
> two surviving nodes just lock up? If the later, then that is the proper
> response to a failed fence (DLM stays blocked).

See comments inline ...

> 
>> Each machine has a dedicated network interface for the cluster, and
>> there is a vlan on the switch devoted to just these interfaces.
>> In the following, I disabled the interface on node id 2 (b014). Node 1
>> (b013) is fenced as well. Node 2 (b015) is still up.
>>
>> Logs from b013:
>> Mar 24 16:35:01 b013 CRON[19133]: (root) CMD (command -v debian-sa1 >
>> /dev/null && debian-sa1 1 1)
>> Mar 24 16:35:13 b013 corosync[2134]: notice  [TOTEM ] A processor
>> failed, forming new configuration.
>> Mar 24 16:35:13 b013 corosync[2134]:  [TOTEM ] A processor failed,
>> forming new configuration.
>> Mar 24 16:35:17 b013 corosync[2134]: notice  [TOTEM ] A new membership
>> (192.168.100.13:576 ) was formed. Members left: 2
>> Mar 24 16:35:17 b013 corosync[2134]: notice  [TOTEM ] Failed to receive
>> the leave message. failed: 2
>> Mar 24 16:35:17 b013 corosync[2134]:  [TOTEM ] A new membership
>> (192.168.100.13:576 ) was formed. Members left: 2
>> Mar 24 16:35:17 b013 corosync[2134]:  [TOTEM ] Failed to receive the
>> leave message. failed: 2
>> Mar 24 16:35:17 b013 attrd[2223]:   notice: crm_update_peer_proc: Node
>> b014-cl[2] - state is now lost (was member)
>> Mar 24 16:35:17 b013 cib[2220]:   notice: crm_update_peer_proc: Node
>> b014-cl[2] - state is now lost (was member)
>> Mar 24 16:35:17 b013 cib[2220]:   notice: Removing b014-cl/2 from the
>> membership list
>> Mar 24 16:35:17 b013 cib[2220]:   notice: Purged 1 peers with id=2
>> and/or uname=b014-cl from the membership cache
>> Mar 24 16:35:17 b013 pacemakerd[2187]:   notice: crm_reap_unseen_nodes:
>> Node b014-cl[2] - state is now lost (was member)
>> Mar 24 16:35:17 b013 attrd[2223]:   notice: Removing b014-cl/2 from the
>> membership list
>> Mar 24 16:35:17 b013 attrd[2223]:   notice: Purged 1 peers with id=2
>> and/or uname=b014-cl from the membership cache
>> Mar 24 16:35:17 b013 stonith-ng[2221]:   notice: crm_update_peer_proc:
>> Node b014-cl[2] - state is now lost (was member)
>> Mar 24 16:35:17 b013 stonith-ng[2221]:   notice: Removing b014-cl/2 from
>> the membership list
>> Mar 24 16:35:17 b013 stonith-ng[2221]:   notice: Purged 1 peers with
>> id=2 and/or uname=b014-cl from the membership cache
>> Mar 24 16:35:17 b013 dlm_controld[2727]: 3091 fence request 2 pid 19223
>> nodedown time 1490387717 fence_all dlm_stonith
>> Mar 24 16:35:17 b013 kernel: [ 3091.800118] dlm: closing connection to
>> node 2
>> Mar 24 16:35:17 b013 crmd[2227]:   notice: crm_reap_unseen_nodes: Node
>> b014-cl[2] - state is now lost (was member)
>> Mar 24 16:35:17 b013 dlm_stonith: stonith_api_time: Found 0 entries for
>> 2/(null): 0 in progress, 0 completed
>> Mar 24 16:35:18 b013 stonith-ng[2221]:   notice: Operation reboot of
>> b014-cl by b015-cl for stonith-api.19223@b013-cl.7aeb2ffb: OK
>> Mar 24 16:35:18 b013 stonith-api[19223]: stonith_api_kick: Node 2/(null)
>> kicked: reboot

It looks like the fencing of b014-cl is reported as successful above ...

>> Mar 24 16:35:18 b013 kernel: [ 3092.421495] dlm: closing connection to
>> node 3
>> Mar 24 16:35:18 b013 kernel: [ 3092.422246] dlm: closing connection to
>> node 1
>> Mar 24 16:35:18 b013 dlm_controld[2727]: 3092 abandoned lockspace share_data
>> Mar 24 16:35:18 b013 dlm_controld[2727]: 3092 abandoned lockspace clvmd
>> Mar 24 16:35:18 b013 kernel: [ 3092.426545] dlm: dlm user daemon left 2
>> lockspaces
>> Mar 24 16:35:18 b013 systemd[1]: corosync.service: Main process exited,
>> code=exited, status=255/n/a

... but then DLM and corosync exit on this node. Pacemaker can only
exit, and the node gets fenced.

What does your fencing configuration look like?

>> Mar 24 16:35:18 b013 cib[2220]:error: Connection to the CPG API
>> failed: Library error (2)
>> Mar 24 16:35:18 b013 systemd[1]: corosync.service: Unit entered failed
>> state.
>> Mar 24 16:35:18 b013 attrd[2223]:error: Connection to cib_rw failed
>> Mar 24 16:35:18 b013 systemd[1]: corosync.service: Failed with result
>> 'exit-code'

Re: [ClusterLabs] pending actions

2017-03-24 Thread Ken Gaillot
On 03/07/2017 04:13 PM, Jehan-Guillaume de Rorthais wrote:
> Hi,
> 
> Occasionally, I find my cluster with one pending action not being executed for
> some minutes (I guess until the "PEngine Recheck Timer" elapse).
> 
> Running "crm_simulate -SL" shows the pending actions.
> 
> I'm still confused about how it can happens, why it happens and how to avoid
> this.

It's most likely a bug in the crmd, which schedules PE runs.

> Earlier today, I started my test cluster with 3 nodes and a master/slave
> resource[1], all with positive master score (1001, 1000 and 990), and the
> cluster kept the promote action as a pending action for 15 minutes. 
> 
> You will find in attachment the first 3 pengine inputs executed after the
> cluster startup.
> 
> What are the consequences if I set cluster-recheck-interval to 30s as 
> instance?

The cluster would consume more CPU and I/O continually recalculating the
cluster state.

It would be nice to have some guidelines for cluster-recheck-interval
based on real-world usage, but it's just going by gut feeling at this
point. The cluster automatically recalculates when something
"interesting" happens -- a node comes or goes, a monitor fails, a node
attribute changes, etc. The cluster-recheck-interval is (1) a failsafe
for buggy situations like this, and (2) the maximum granularity of many
time-based checks such as rules. I would personally use at least 5
minutes, though less is probably reasonable, especially with simple
configurations (number of nodes/resources/constraints).

> Thanks in advance for your lights :)
> 
> Regards,
> 
> [1] here is the setup:
> http://dalibo.github.io/PAF/Quick_Start-CentOS-7.html#cluster-resource-creation-and-management

Feel free to open a bug report and include some logs around the time of
the incident (most importantly from the DC).

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Create ressource to monitor each IPSEC VPN

2017-03-24 Thread Ken Gaillot
On 03/09/2017 01:44 AM, Damien Bras wrote:
> Hi,
> 
>  
> 
> We have a 2 nodes cluster with ipsec (libreswan).
> 
> Actually we have a resource to monitor the service ipsec (via system).
> 
>  
> 
> But now I would like to monitor each VPN. Is there a way to do that ?
> Which agent could I use for that ?
> 
>  
> 
> Thanks in advance for your help.
> 
> Damien

I'm not aware of any existing OCF agent for libreswan. You can always
manage any service via its OS launcher (systemd or lsb). If the OS's
status check isn't sufficient, you could additionally use
ocf:pacemaker:ping to monitor an IP address only available across the
VPN, to set a node attribute that you could maybe use somehow.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Three node cluster becomes completely fenced if one node leaves

2017-03-27 Thread Ken Gaillot
On 03/27/2017 03:54 PM, Seth Reid wrote:
> 
> 
> 
> On Fri, Mar 24, 2017 at 2:10 PM, Ken Gaillot  <mailto:kgail...@redhat.com>> wrote:
> 
> On 03/24/2017 03:52 PM, Digimer wrote:
> > On 24/03/17 04:44 PM, Seth Reid wrote:
> >> I have a three node Pacemaker/GFS2 cluster on Ubuntu 16.04. Its not in
> >> production yet because I'm having a problem during fencing. When I
> >> disable the network interface of any one machine, the disabled machines
> >> is properly fenced leaving me, briefly, with a two node cluster. A
> >> second node is then fenced off immediately, and the remaining node
> >> appears to try to fence itself off. This leave two nodes with
> >> corosync/pacemaker stopped, and the remaining machine still in the
> >> cluster but showing an offline node and an UNCLEAN node. What can be
> >> causing this behavior?
> >
> > It looks like the fence attempt failed, leaving the cluster hung. When
> > you say all nodes were fenced, did all nodes actually reboot? Or did the
> > two surviving nodes just lock up? If the later, then that is the proper
> > response to a failed fence (DLM stays blocked).
> 
> See comments inline ...
> 
> >
> >> Each machine has a dedicated network interface for the cluster, and
> >> there is a vlan on the switch devoted to just these interfaces.
> >> In the following, I disabled the interface on node id 2 (b014).
> Node 1
> >> (b013) is fenced as well. Node 2 (b015) is still up.
> >>
> >> Logs from b013:
> >> Mar 24 16:35:01 b013 CRON[19133]: (root) CMD (command -v debian-sa1 >
> >> /dev/null && debian-sa1 1 1)
> >> Mar 24 16:35:13 b013 corosync[2134]: notice  [TOTEM ] A processor
> >> failed, forming new configuration.
> >> Mar 24 16:35:13 b013 corosync[2134]:  [TOTEM ] A processor failed,
> >> forming new configuration.
> >> Mar 24 16:35:17 b013 corosync[2134]: notice  [TOTEM ] A new
> membership
> >> (192.168.100.13:576 <http://192.168.100.13:576>
> <http://192.168.100.13:576>) was formed. Members left: 2
> >> Mar 24 16:35:17 b013 corosync[2134]: notice  [TOTEM ] Failed to
> receive
> >> the leave message. failed: 2
> >> Mar 24 16:35:17 b013 corosync[2134]:  [TOTEM ] A new membership
> >> (192.168.100.13:576 <http://192.168.100.13:576>
> <http://192.168.100.13:576>) was formed. Members left: 2
> >> Mar 24 16:35:17 b013 corosync[2134]:  [TOTEM ] Failed to receive the
> >> leave message. failed: 2
> >> Mar 24 16:35:17 b013 attrd[2223]:   notice: crm_update_peer_proc:
> Node
> >> b014-cl[2] - state is now lost (was member)
> >> Mar 24 16:35:17 b013 cib[2220]:   notice: crm_update_peer_proc: Node
> >> b014-cl[2] - state is now lost (was member)
> >> Mar 24 16:35:17 b013 cib[2220]:   notice: Removing b014-cl/2 from the
> >> membership list
> >> Mar 24 16:35:17 b013 cib[2220]:   notice: Purged 1 peers with id=2
> >> and/or uname=b014-cl from the membership cache
> >> Mar 24 16:35:17 b013 pacemakerd[2187]:   notice:
> crm_reap_unseen_nodes:
> >> Node b014-cl[2] - state is now lost (was member)
> >> Mar 24 16:35:17 b013 attrd[2223]:   notice: Removing b014-cl/2
> from the
> >> membership list
> >> Mar 24 16:35:17 b013 attrd[2223]:   notice: Purged 1 peers with id=2
> >> and/or uname=b014-cl from the membership cache
> >> Mar 24 16:35:17 b013 stonith-ng[2221]:   notice:
> crm_update_peer_proc:
> >> Node b014-cl[2] - state is now lost (was member)
> >> Mar 24 16:35:17 b013 stonith-ng[2221]:   notice: Removing
> b014-cl/2 from
> >> the membership list
> >> Mar 24 16:35:17 b013 stonith-ng[2221]:   notice: Purged 1 peers with
> >> id=2 and/or uname=b014-cl from the membership cache
> >> Mar 24 16:35:17 b013 dlm_controld[2727]: 3091 fence request 2 pid
> 19223
> >> nodedown time 1490387717 fence_all dlm_stonith
> >> Mar 24 16:35:17 b013 kernel: [ 3091.800118] dlm: closing
> connection to
> >> node 2
> >> Mar 24 16:35:17 b013 crmd[2227]:   notice: crm_reap_unseen_nodes:
> Node
> >> b014-cl[2] - state is now lost (was member)
> >> Mar 24 16:35:17 b013 dlm_stonith: stonith_api_time: Found 0
> entries for
> >> 2/(nul

Re: [ClusterLabs] stonith in dual HMC environment

2017-03-28 Thread Ken Gaillot
On 03/28/2017 08:20 AM, Alexander Markov wrote:
> Hello, Dejan,
> 
>> Why? I don't have a test system right now, but for instance this
>> should work:
>>
>> $ stonith -t ibmhmc ipaddr=10.1.2.9 -lS
>> $ stonith -t ibmhmc ipaddr=10.1.2.9 -T reset {nodename}
> 
> Ah, I see. Everything (including stonith methods, fencing and failover)
> works just fine under normal circumstances. Sorry if I wasn't clear
> about that. The problem occurs only when I have one datacenter (i.e. one
> IBM machine and one HMC) lost due to power outage.

If the datacenters are completely separate, you might want to take a
look at booth. With booth, you set up a separate cluster at each
datacenter, and booth coordinates which one can host resources. Each
datacenter must have its own self-sufficient cluster with its own
fencing, but one site does not need to be able to fence the other.

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm139683855002656

> 
> For example:
> test01:~ # stonith -t ibmhmc ipaddr=10.1.2.8 -lS | wc -l
> info: ibmhmc device OK.
> 39
> test01:~ # stonith -t ibmhmc ipaddr=10.1.2.9 -lS | wc -l
> info: ibmhmc device OK.
> 39
> 
> As I had said stonith device can see and manage all the cluster nodes.
> 
>> If so, then your configuration does not appear to be correct. If
>> both are capable of managing all nodes then you should tell
>> pacemaker about it.
> 
> Thanks for the hint. But if stonith device return node list, isn't it
> obvious for cluster that it can manage those nodes? Could you please be
> more precise about what you refer to? I currently changed configuration
> to two fencing levels (one per HMC) but still don't think I get an idea
> here.

I believe Dejan is referring to fencing topology (levels). That would be
preferable to booth if the datacenters are physically close, and even if
one fence device fails, the other can still function.

In this case you'd probably want level 1 = the main fence device, and
level 2 = the fence device to use if the main device fails.

A common implementation (which Digimer uses to great effect) is to use
IPMI as level 1 and an intelligent power switch as level 2. If your
second device can function regardless of what hosts are up or down, you
can do something similar.

> 
>> Survived node, running stonith resource for dead node tries to
>> contact ipmi device (which is also dead). How does cluster understand
>> that
>> lost node is really dead and it's not just a network issue?
>>
>> It cannot.

And it will be unable to recover resources that were running on the
questionable partition.

> 
> How do people then actually solve the problem of two node metro cluster?
> I mean, I know one option: stonith-enabled=false, but it doesn't seem
> right for me.
> 
> Thank you.
> 
> Regards,
> Alexander Markov

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] cloned resource not deployed on all matching nodes

2017-03-28 Thread Ken Gaillot
On 03/28/2017 01:26 PM, Radoslaw Garbacz wrote:
> Hi,
> 
> I have a situation when a cloned resource is being deployed only on some
> of the nodes, even though this resource is similar to others, which are
> being deployed according to location rules properly.
> 
> Please take a look at the configuration below and let me know if there
> is anything to do to make the resource "dbx_nfs_mounts_datas" (which is
> a primitive of "dbx_nfs_mounts_datas-clone") being deployed on all 4
> nodes matching its location rules.

Look in your logs for "pengine:" messages. They will list the decisions
made about where to start resources, then have a message about
"Calculated transition ... saving inputs in ..." with a file name.

You can run crm_simulate on that file to see why the decisions were
made. The output is somewhat difficult to follow, but "crm_simulate -Ssx
$FILENAME" will show every score that went into the decision.

> 
> 
> Thanks in advance,
> 
> 
> 
> * Configuration:
> ** Nodes:
> 
>   
> 
>   
>   
>   
> 
>   
>   
> 
>   
>   
>   
> 
>   
>   
> 
>   
>   
>   
> 
>   
>   
> 
>   
>   
>   
> 
>   
>   
> 
>   
>   
>   
> 
>   
> 
> 
> 
> 
> ** Resource in question:
>   
> http://dbx_mounts.ocf.sh>" class="ocf" provider="dbxcl">
>id="dbx_nfs_mounts_datas-instance_attributes">
>  ...
>   
>   
>  ...
>   
> 
> 
>id="dbx_nfs_mounts_datas-meta_attributes-target-role"/>
>id="dbx_nfs_mounts_datas-meta_attributes-clone-max"/>
> 
>   
> 
> 
> 
> ** Resource location
>rsc="dbx_nfs_mounts_datas">
>  id="on_nodes_dbx_nfs_mounts_datas-INFINITY" boolean-op="and">
>id="on_nodes_dbx_nfs_mounts_datas-INFINITY-0-expr" value="Active"/>
>id="on_nodes_dbx_nfs_mounts_datas-INFINITY-1-expr" value="AD"/>
> 
>  id="on_nodes_dbx_nfs_mounts_datas--INFINITY" boolean-op="or">
>id="on_nodes_dbx_nfs_mounts_datas--INFINITY-0-expr" value="Active"/>
>id="on_nodes_dbx_nfs_mounts_datas--INFINITY-1-expr" value="AD"/>
> 
>   
> 
> 
> 
> ** Status on properly deployed node:
>type="dbx_mounts.ocf.sh " class="ocf"
> provider="dbxcl">
>  operation_key="dbx_nfs_mounts_datas_start_0" operation="start"
> crm-debug-origin="do_update_resource" crm_feature_set="3.0.12"
> transition-key="156:0:0:d817e2a2-50fb-4462-bd6b-118d1d7b8ecd"
> transition-magic="0:0;156:0:0:d817e2a2-50fb-4462-bd6b-118d1d7b8ecd"
> on_node="ip-10-180-227-53" call-id="85" rc-code="0" op-status="0"
> interval="0" last-run="1490720995" last-rc-change="1490720995"
> exec-time="733" queue-time="0"
> op-digest="e95785e3e2d043b0bda24c5bd4655317" op-force-restart=""
> op-restart-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
>  operation_key="dbx_nfs_mounts_datas_monitor_137000" operation="monitor"
> crm-debug-origin="do_update_resource" crm_feature_set="3.0.12"
> transition-key="157:0:0:d817e2a2-50fb-4462-bd6b-118d1d7b8ecd"
> transition-magic="0:0;157:0:0:d817e2a2-50fb-4462-bd6b-118d1d7b8ecd"
> on_node="ip-10-180-227-53" call-id="86" rc-code="0" op-status="0"
> interval="137000" last-rc-change="1490720995" exec-time="172"
> queue-time="0" op-digest="a992d78564e6b3942742da0859d8c734"/>
>   
> 
> 
> 
> ** Status on not properly deployed node:
>type="dbx_mounts.ocf.sh " class="ocf"
> provider="dbxcl">
>  operation_key="dbx_nfs_mounts_datas_monitor_0" operation="monitor"
> crm-debug-origin="do_update_resource" crm_feature_set="3.0.12"
> transition-key="73:0:7:d817e2a2-50fb-4462-bd6b-118d1d7b8ecd"
> transition-magic="0:7;73:0:7:d817e2a2-50fb-4462-bd6b-118d1d7b8ecd"
> on_node="ip-10-183-39-69" call-id="39" rc-code="7" op-status="0"
> interval="0" last-run="1490720950" last-rc-change="1490720950"
> exec-time="172" queue-time="0"
> op-digest="e95785e3e2d043b0bda24c5bd4655317" op-force-restart=""
> op-restart-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/>
>   
> 
> 
> 
> -- 
> Best Regards,
> 
> Radoslaw Garbacz
> XtremeData Incorporated

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Running two independent clusters

2017-03-30 Thread Ken Gaillot
On 03/30/2017 01:17 AM, Nikhil Utane wrote:
> "/Coincidentally, I am about to announce enhanced container support in/
> /pacemaker. I should have a post with more details later today or
> tomorrow./"
> 
> Ken: Where you able to get to it?
> 
> -Thanks
> Nikhil

Not yet, we've been tweaking the syntax a bit, so I wanted to have
something more final first. But it's very close.

> 
> On Thu, Mar 23, 2017 at 7:35 PM, Ken Gaillot  <mailto:kgail...@redhat.com>> wrote:
> 
> On 03/22/2017 11:08 PM, Nikhil Utane wrote:
> > I simplified when I called it as a service. Essentially it is a complete
> > system.
> > It is an LTE eNB solution. It provides LTE service (service A) and now
> > we need to provide redundancy for another different but related service
> > (service B). The catch being, the LTE redundancy solution will be tied
> > to one operator whereas the other service can span across multiple
> > operators. Therefore ideally we want two completely independent clusters
> > since different set of nodes will form the two clusters.
> > Now what I am thinking is, to run additional instance of Pacemaker +
> > Corosync in a container which can then notify the service B on host
> > machine to start or stop it's service. That way my CIB file will be
> > independent and I can run corosync on different interfaces.
> >
> > Workable right?
> >
> > -Regards
> > Nikhil
> 
> It's not well-tested, but in theory it should work, as long as the
> container is privileged.
> 
> I still think virtualizing the services would be more resilient. It
> makes sense to have a single determination of quorum and fencing for the
> same real hosts. I'd think of it like a cloud provider -- the cloud
> instances are segregated by customer, but the underlying hosts are
> the same.
> 
> You could configure your cluster as asymmetric, and enable each VM only
> on the nodes it's allowed on, so you get the two separate "clusters"
> that way. You could set up the VMs as guest nodes if you want to monitor
> and manage multiple services within them. If your services require
> hardware access that's not easily passed to a VM, containerizing the
> services might be a better option.
> 
> > On Wed, Mar 22, 2017 at 8:06 PM, Ken Gaillot  <mailto:kgail...@redhat.com>
> > <mailto:kgail...@redhat.com <mailto:kgail...@redhat.com>>> wrote:
> >
> > On 03/22/2017 05:23 AM, Nikhil Utane wrote:
> > > Hi Ulrich,
> > >
> > > It's not an option unfortunately.
> > > Our product runs on a specialized hardware and provides both the
> > > services (A & B) that I am referring to. Hence I cannot have 
> service A
> > > running on some nodes as cluster A and service B running on other 
> nodes
> > > as cluster B.
> > > The two services HAVE to run on same node. The catch being 
> service A and
> > > service B have to be independent of each other.
> > >
> > > Hence looking at Container option since we are using that for 
> some other
> > > product (but not for Pacemaker/Corosync).
> > >
> > > -Regards
> > > Nikhil
> >
> > Instead of containerizing pacemaker, why don't you containerize or
> > virtualize the services, and have pacemaker manage the 
> containers/VMs?
> >
> > Coincidentally, I am about to announce enhanced container support in
> > pacemaker. I should have a post with more details later today or
> > tomorrow.
> >
> > >
> > > On Wed, Mar 22, 2017 at 12:41 PM, Ulrich Windl
> > >  <mailto:ulrich.wi...@rz.uni-regensburg.de>
> > <mailto:ulrich.wi...@rz.uni-regensburg.de
> <mailto:ulrich.wi...@rz.uni-regensburg.de>>
> > > <mailto:ulrich.wi...@rz.uni-regensburg.de
> <mailto:ulrich.wi...@rz.uni-regensburg.de>
> > <mailto:ulrich.wi...@rz.uni-regensburg.de
> <mailto:ulrich.wi...@rz.uni-regensburg.de>>>> wrote:
> > >
> > > >>> Nikhil Utane  <mailto:nikhil.subscri...@gmail.com>
> <mailto:nikhil.subscri...@gmail.com
> <mailto:nikhil.subscri...@gmail.com>>
> > > <mailto:nikhil.subscri...@gmail.com
>

Re: [ClusterLabs] Syncing data and reducing CPU utilization of cib process

2017-03-31 Thread Ken Gaillot
On 03/31/2017 06:44 AM, Nikhil Utane wrote:
> We are seeing this log in pacemaker.log continuously.
> 
> Mar 31 17:13:01 [6372] 0005B932ED72cib: info:
> crm_compress_string:  Compressed 436756 bytes into 14635 (ratio 29:1) in
> 284ms
> 
> This looks to be the reason for high CPU. What does this log indicate?

If a cluster message is larger than 128KB, pacemaker will compress it
(using BZ2) before transmitting it across the network to the other
nodes. This can hit the CPU significantly. Having a large resource
definition makes such messages more common.

There are many ways to sync a configuration file between nodes. If the
configuration rarely changes, a simple rsync cron could do it.
Specialized tools like lsyncd are more responsive while still having a
minimal footprint. DRBD or shared storage would be more powerful and
real-time. If it's a custom app, you could even modify it to use
something like etcd or a NoSQL db.

> 
> -Regards
> Nikhil
> 
> 
> On Fri, Mar 31, 2017 at 12:08 PM, Nikhil Utane
> mailto:nikhil.subscri...@gmail.com>> wrote:
> 
> Hi,
> 
> In our current design (which we plan to improve upon) we are using
> the CIB file to synchronize information across active and standby nodes.
> Basically we want the standby node to take the configuration that
> was used by the active node so we are adding those as resource
> attributes. This ensures that when the standby node takes over, it
> can read all the configuration which will be passed to it as
> environment variables.
> Initially we thought the list of configuration parameters will be
> less and we did some prototyping and saw that there wasn't much of
> an issue. But now the list has grown it has become close to 300
> attributes. (I know this is like abusing the feature and we are
> looking towards doing it the right way).
> 
> So I have two questions:
> 1) What is the best way to synchronize such kind of information
> across nodes in the cluster? DRBD? Anything else that is simpler?
> For e.g. instead of syncing 300 attributes i could just sync up the
> path to a file.
> 
> 2) In the current design, is there anything that I can do to reduce
> the CPU utilization of cib process? Currently it regularly takes
> 30-50% of the CPU.
> Any quick fix that I can do which will bring it down? For e.g.
> configure how often it synchronizes etc?
> 
> -Thanks
> Nikhil

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Coming in Pacemaker 1.1.17: container bundles

2017-03-31 Thread Ken Gaillot
Hi all,

The release process for Pacemaker 1.1.17 will start soon! The most
significant new feature is container bundles, developed by Andrew Beekhof.

Pacemaker's container story has previously been muddled.

For the simplest case, the ocf:heartbeat:docker agent allows you to
launch a docker instance. This works great, but limited in what it can do.

It is possible to run Pacemaker Remote inside a container and use it as
a guest node, but that does not model containers well: a container is
not a generic platform for any cluster resource, but typically provides
a single service.

"Isolated resources" were added in Pacemaker 1.1.13 to better represent
containers as a single service, but that feature was never documented or
widely used, and it does not model some common container scenarios. It
should now be considered deprecated.

Pacemaker 1.1.17 introduces a new type of resource: the "bundle". A
bundle is a single resource specifying the Docker settings, networking
requirements, and storage requirements for any number of containers
generated from the same Docker image.

A preliminary implementation of the feature is now available in the
master branch, for anyone who wants to experiment. The documentation
source in the master branch has been updated, though the online
documentation on clusterlabs.org has not been regenerated yet.

Here's an example of the CIB XML syntax (higher-level tools will likely
provide a more convenient interface):

 

  

  

  

  





  

  

 

With that, Pacemaker would launch 3 instances of the container image,
assign an IP address to each where it could be reached on port 80 from
the host's network, map host directories into the container, and use
Pacemaker Remote to manage the apache resource inside the container.

The feature is currently experimental and will likely get significant
bugfixes throughout the coming release cycle, but the syntax is stable
and likely what will be released.

I intend to add a more detailed walk-through example to the ClusterLabs
wiki.
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] to change resource id - how?

2017-04-03 Thread Ken Gaillot
On 04/03/2017 06:35 AM, lejeczek wrote:
> hi
> I'm sroogling and reading but cannot find any info - how to
> (programmatically) change resources ids? In other words: how to rename
> these entities?
> many thanks
> L

As far as I know, higher-level tools don't support this directly -- you
have to edit the XML. The basic process is:

1. Save a copy of the live CIB to a file.
2. Edit that file, and do a search-and-replace on the desired name (so
you change it in constraints, etc., as well as the resource definition).
3. Push the configuration section of that file to the live CIB.

The higher-level tools have commands to do that, but at the low level,
it would be something like:

1. cibadmin -Q --scope configuration > tmp.cib
2. vim tmp.cib
3. cibadmin -x tmp.cib --replace --scope configuration

The cluster will treat it as a completely new resource, so it will stop
the old one and start the new one.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Coming in Pacemaker 1.1.17: container bundles

2017-04-03 Thread Ken Gaillot
On 04/03/2017 02:12 AM, Ulrich Windl wrote:
>>>> Ken Gaillot  schrieb am 01.04.2017 um 00:43 in 
>>>> Nachricht
> <981d420d-73b2-3f24-a67c-e9c66dafb...@redhat.com>:
> 
> [...]
>> Pacemaker 1.1.17 introduces a new type of resource: the "bundle". A
>> bundle is a single resource specifying the Docker settings, networking
>> requirements, and storage requirements for any number of containers
>> generated from the same Docker image.
> 
> I wonder: Is a "bundle" just a kind of special "group template"? It looks as 
> if I could do it with a group also, but would have to write a bite more to 
> get it configured. Did I miss something?

With a group, you could reproduce most of this functionality, though it
would be more verbose: you'd need to configure ocf:heartbeat:docker,
ocf:heartbeat:IPaddr2, and ocf:pacemaker:remote resources, plus a
primitive for your service, as well as constraints that restrict the
primitive to the guest node and prevent any other resource from running
on the guest node.

However, this can do something that a group can't: launch multiple
instances of a single container image, while associating floating IPs
and storage mappings specific to each replica. This puts it somewhere
between a group and a specialized form of clone.

Also, it will be shown differently in the cluster status, which is helpful.

> (With my background of HP-UX ServiceGuard, the "bundles remind me of  
> ServiceGuard's "packages" (easy to use, but less flexible than 
> primitives/groups)
> 
> Regards,
> Ulrich
> 
> 
>>
>> A preliminary implementation of the feature is now available in the
>> master branch, for anyone who wants to experiment. The documentation
>> source in the master branch has been updated, though the online
>> documentation on clusterlabs.org has not been regenerated yet.
>>
>> Here's an example of the CIB XML syntax (higher-level tools will likely
>> provide a more convenient interface):
>>
>>  
>>
>>   
>>
>>   
>> 
>>   
>>
>>   
>>
>> >   source-dir="/srv/www"
>>   target-dir="/var/www/html"
>>   options="rw"/>
>>
>> >   source-dir-root="/var/log/pacemaker/bundles"
>>   target-dir="/etc/httpd/logs"
>>   options="rw"/>
>>
>>   
>>
>>   > class="ocf" provider="heartbeat" type="apache"/>
>>
>>  
>>
>> With that, Pacemaker would launch 3 instances of the container image,
>> assign an IP address to each where it could be reached on port 80 from
>> the host's network, map host directories into the container, and use
>> Pacemaker Remote to manage the apache resource inside the container.
>>
>> The feature is currently experimental and will likely get significant
>> bugfixes throughout the coming release cycle, but the syntax is stable
>> and likely what will be released.
>>
>> I intend to add a more detailed walk-through example to the ClusterLabs
>> wiki.
>> -- 
>> Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Coming in Pacemaker 1.1.17: Per-operation fail counts

2017-04-03 Thread Ken Gaillot
Hi all,

Pacemaker 1.1.17 will have a significant change in how it tracks
resource failures, though the change will be mostly invisible to users.

Previously, Pacemaker tracked a single count of failures per resource --
for example, start failures and monitor failures for a given resource
were added together.

In a thread on this list last year[1], we discussed adding some new
failure handling options that would require tracking failures for each
operation type.

Pacemaker 1.1.17 will include this tracking, in preparation for adding
the new options in a future release.

Whereas previously, failure counts were stored in node attributes like
"fail-count-myrsc", they will now be stored in multiple node attributes
like "fail-count-myrsc#start_0" and "fail-count-myrsc#monitor_1"
(the number distinguishes monitors with different intervals).

Actual cluster behavior will be unchanged in this release (and
backward-compatible); the cluster will sum the per-operation fail counts
when checking against options such as migration-threshold.

The part that will be visible to the user in this release is that the
crm_failcount and crm_resource --cleanup tools will now be able to
handle individual per-operation fail counts if desired, though by
default they will still affect the total fail count for the resource.

As an example, if "myrsc" has one start failure and one monitor failure,
"crm_failcount -r myrsc --query" will still show 2, but now you can also
say "crm_failcount -r myrsc --query --operation start" which will show 1.

Additionally, crm_failcount --delete previously only reset the
resource's fail count, but it now behaves identically to crm_resource
--cleanup (resetting the fail count and clearing the failure history).

Special note for pgsql users: Older versions of common pgsql resource
agents relied on a behavior of crm_failcount that is now rejected. While
the impact is limited, users are recommended to make sure they have the
latest version of their pgsql resource agent before upgrading to
pacemaker 1.1.17.

[1] http://lists.clusterlabs.org/pipermail/users/2016-September/004096.html
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Coming in Pacemaker 1.1.17: Per-operation fail counts

2017-04-04 Thread Ken Gaillot
On 04/04/2017 01:18 AM, Ulrich Windl wrote:
>>>> Ken Gaillot  schrieb am 03.04.2017 um 17:00 in 
>>>> Nachricht
> :
>> Hi all,
>>
>> Pacemaker 1.1.17 will have a significant change in how it tracks
>> resource failures, though the change will be mostly invisible to users.
>>
>> Previously, Pacemaker tracked a single count of failures per resource --
>> for example, start failures and monitor failures for a given resource
>> were added together.
> 
> That is "per resource operation", not "per resource" ;-)

I mean that there was only a single number to count failures for a given
resource; before this change, failures were not remembered separately by
operation.

>> In a thread on this list last year[1], we discussed adding some new
>> failure handling options that would require tracking failures for each
>> operation type.
> 
> So the existing set of operations failures was restricted to 
> start/stop/monitor? How about master/slave featuring two monitor operations?

No, both previously and with the new changes, all operation failures are
counted (well, except metadata!). The only change is whether they are
remembered per resource or per operation.

>> Pacemaker 1.1.17 will include this tracking, in preparation for adding
>> the new options in a future release.
>>
>> Whereas previously, failure counts were stored in node attributes like
>> "fail-count-myrsc", they will now be stored in multiple node attributes
>> like "fail-count-myrsc#start_0" and "fail-count-myrsc#monitor_1"
>> (the number distinguishes monitors with different intervals).
> 
> Wouldn't it be thinkable to store is as (transient) resource attribute, 
> either local to a node (LRM) or including the node attribute (CRM)?

Failures are specific to the node the failure occurred on, so it makes
sense to store them as transient node attributes.

So, to be more precise, we previously recorded failures per
node+resource combination, and now we record them per
node+resource+operation+interval combination.

>> Actual cluster behavior will be unchanged in this release (and
>> backward-compatible); the cluster will sum the per-operation fail counts
>> when checking against options such as migration-threshold.
>>
>> The part that will be visible to the user in this release is that the
>> crm_failcount and crm_resource --cleanup tools will now be able to
>> handle individual per-operation fail counts if desired, though by
>> default they will still affect the total fail count for the resource.
> 
> Another thing to think about would be "fail count" vs. "fail rate": Currently 
> there is a fail count, and some reset interval, which allows to build some 
> failure rate from it. Maybe many users just have the requirement that some 
> resource shouldn't fail again and again, but with long uptimes (and then the 
> operatior forgets to reset fail counters), occasional failures (like once in 
> two weeks) shouldn't prevent a resource from running.

Yes, we discussed that a bit in the earlier thread. It would be too much
of an incompatible change and add considerable complexity to start
tracking the failure rate.

Failure clearing hasn't changed -- failures can only be cleared by
manual commands, the failure-timeout option, or a restart of cluster
services on a node.

For the example you mentioned, a high failure-timeout is the best answer
we have. You could set a failure-timeout of 24 hours, and if the
resource went 24 hours without any failures, any older failures would be
forgotten.

>> As an example, if "myrsc" has one start failure and one monitor failure,
>> "crm_failcount -r myrsc --query" will still show 2, but now you can also
>> say "crm_failcount -r myrsc --query --operation start" which will show 1.
> 
> Would accumulated monitor failures ever prevent a resource from starting, or 
> will it force a stop of the resource?

As of this release, failure recovery behavior has not changed. All
operation failures are added together to produce a single fail count per
resource, as was recorded before. The only thing that changed is how
they're recorded.

Failure recovery is controlled by the resource's migration-threshold and
the operation's on-fail. By default, on-fail=restart and
migration-threshold=INFINITY, so a monitor failure would result in
1,000,000 restarts before being banned from the failing node.

> Regards,
> Ulrich
> 
>>
>> Additionally, crm_failcount --delete previously only reset the
>> resource's fail count, but it now behaves identically to crm_resource
>> --cleanup (resetting the fail count and 

Re: [ClusterLabs] STONITH not communicated back to initiator until token expires

2017-04-04 Thread Ken Gaillot
On 03/13/2017 10:43 PM, Chris Walker wrote:
> Thanks for your reply Digimer.
> 
> On Mon, Mar 13, 2017 at 1:35 PM, Digimer  > wrote:
> 
> On 13/03/17 12:07 PM, Chris Walker wrote:
> > Hello,
> >
> > On our two-node EL7 cluster (pacemaker: 1.1.15-11.el7_3.4; corosync:
> > 2.4.0-4; libqb: 1.0-1),
> > it looks like successful STONITH operations are not communicated from
> > stonith-ng back to theinitiator (in this case, crmd) until the STONITHed
> > node is removed from the cluster when
> > Corosync notices that it's gone (i.e., after the token timeout).
> 
> Others might have more useful info, but my understanding of a lost node
> sequence is this;
> 
> 1. Node stops responding, corosync declares it lost after token timeout
> 2. Corosync reforms the cluster with remaining node(s), checks if it is
> quorate (always true in 2-node)
> 3. Corosync informs Pacemaker of the membership change.
> 4. Pacemaker invokes stonith, waits for the fence agent to return
> "success" (exit code of the agent as per the FenceAgentAPI
> [https://docs.pagure.org/ClusterLabs.fence-agents/FenceAgentAPI.md]
> ).
> If
> the method fails, it moves on to the next method. If all methods fail,
> it goes back to the first method and tries again, looping indefinitely.
> 
> 
> That's roughly my understanding as well for the case when a node
> suddenly leaves the cluster (e.g., poweroff), and this case is working
> as expected for me.  I'm seeing delays when a node is marked for STONITH
> while it's still up (e.g., after a stop operation fails).  In this case,
> what I expect to see is something like:
> 1.  crmd requests that stonith-ng fence the node
> 2.  stonith-ng (might be a different stonith-ng) fences the node and
> sends a message that it has succeeded
> 3.  stonith-ng (the original from step 1) receives this message and
> communicates back to crmd that the node has been fenced
> 
> but what I'm seeing is
> 1.  crmd requests that stonith-ng fence the node
> 2.  stonith-ng fences the node and sends a message saying that it has
> succeeded
> 3.  nobody hears this message
> 4.  Corosync eventually realizes that the fenced node is no longer part
> of the config and broadcasts a config change
> 5.  stonith-ng finishes the STONITH operation that was started earlier
> and communicates back to crmd that the node has been STONITHed

In your attached log, bug1 was DC at the time of the fencing, and bug0
takes over DC after the fencing. This is what I expect is happening
(logs from bug1 would help confirm):

1. crmd on the DC (bug1) runs pengine which sees the stop failure and
schedules fencing (of bug1)

2. stonithd on bug1 sends a query to all nodes asking who can fence bug1

3. Each node replies, and stonithd on bug1 chooses bug0 to execute the
fencing

4. stonithd on bug0 fences bug1. At this point, it would normally report
the result to the DC ... but that happens to be bug1.

5. Once crmd on bug0 takes over DC, it can decide that the fencing
succeeded, but it can't take over DC until it sees that the old DC is
gone, which takes a while because of your long token timeout. So, this
is where the delay is coming in.

I'll have to think about whether we can improve this, but I don't think
it would be easy. There are complications if for example a fencing
topology is used, such that the result being reported in step 4 might
not be the entire result.

> I'm less convinced that the sending of the STONITH notify in step 2 is
> at fault; it also seems possible that a callback is not being run when
> it should be.
> 
> 
> The Pacemaker configuration is not important (I've seen this happen on
> our production clusters and on a small sandbox), but the config is:
> 
> primitive bug0-stonith stonith:fence_ipmilan \
> params pcmk_host_list=bug0 ipaddr=bug0-ipmi action=off
> login=admin passwd=admin \
> meta target-role=Started
> primitive bug1-stonith stonith:fence_ipmilan \
> params pcmk_host_list=bug1 ipaddr=bug1-ipmi action=off
> login=admin passwd=admin \
> meta target-role=Started
> primitive prm-snmp-heartbeat snmptrap_heartbeat \
> params snmphost=bug0 community=public \
> op monitor interval=10 timeout=300 \
> op start timeout=300 interval=0 \
> op stop timeout=300 interval=0
> clone cln-snmp-heartbeat prm-snmp-heartbeat \
> meta interleave=true globally-unique=false ordered=false
> notify=false
> location bug0-stonith-loc bug0-stonith -inf: bug0
> location bug1-stonith-loc bug1-stonith -inf: bug1
> 
> The corosync config might be more interesting:
> 
> totem {
> version: 2
> crypto_cipher: none
> crypto_hash: none
> secauth: off
> rrp_mode: passive
> transport: udpu
> token: 24
> consensus: 1000
> 
> interface {
> ringnumber 0
> bindnetaddr:

Re: [ClusterLabs] how start resources on the last running node

2017-04-05 Thread Ken Gaillot
On 04/04/2017 10:01 AM, Ján Poctavek wrote:
> Hi,
> 
> I came here to ask for some inspiration about my cluster setup.
> 
> I have 3-node pcs+corosync+pacemaker cluster. When majority of nodes
> exist in the cluster, everything is working fine. But what recovery
> options do I have when I lose 2 of 3 nodes? If I know for sure that the
> missing nodes are turned off, is there any command to force start of the
> resources? The idea is to make the resources available (by
> administrative command) even without majority of nodes and when the
> other nodes become reachable again, they will normally join to the
> cluster without any manual intervention.
> 
> All nodes are set with wait_for_all, stonith-enabled=false and
> no-quorum-policy=stop.
> 
> Thank you.
> 
> Jan

In general, no. The cluster must have quorum to serve resources.

However, corosync is versatile in how it can define quorum. See the
votequorum(5) man page regarding last_man_standing, auto_tie_breaker,
and allow_downscale. Also, the newest version of corosync supports
qdevice, which is a special quorum arbitrator.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] cloned resources ordering and remote nodes problem

2017-04-06 Thread Ken Gaillot
On 04/06/2017 09:32 AM, Radoslaw Garbacz wrote:
> Hi,
> 
> 
> I have a question regarding resources order settings.
> 
> Having cloned resources: "res_1-clone", "res_2-clone",
>  and defined order:  first "res_1-clone" then "res_2-clone"
> 
> When I have a monitoring failure on a remote node with "res_1" (an
> instance of "res_1-clone") which causes all dependent resources to be
> restarted, only instances on this remote node are being restarted, not
> the ones on other nodes.
> 
> Is it an intentional behavior and if so, is there a way to make all
> instances of the cloned resource to be restarted in such a case?

That is controlled by a clone's "interleave" meta-attribute. The default
(false) actually gives your desired behavior. I'm guessing you have
interleave=true configured.

> I can provide more details regarding the CIB configuration when needed.
> 
> Pacemaker 1.1.16-1.el6
> OS: Linux CentOS 6
> 
> 
> Thanks in advance,
> 
> -- 
> Best Regards,
> 
> Radoslaw Garbacz
> XtremeData Incorporated


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] cluster does not detect kill on pacemaker process ?

2017-04-07 Thread Ken Gaillot
On 04/05/2017 05:16 PM, neeraj ch wrote:
> Hello All, 
> 
> I noticed something on our pacemaker test cluster. The cluster is
> configured to manage an underlying database using master slave primitive. 
> 
> I ran a kill on the pacemaker process, all the other nodes kept showing
> the node online. I went on to kill the underlying database on the same
> node which would have been detected had the pacemaker on the node been
> online. The cluster did not detect that the database on the node has
> failed, the failover never occurred. 
> 
> I went on to kill corosync on the same node and the cluster now marked
> the node as stopped and proceeded to elect a new master. 
> 
> 
> In a separate test. I killed the pacemaker process on the cluster DC,
> the cluster showed no change. I went on to change CIB on a different
> node. The CIB modify command timed out. Once that occurred, the node
> didn't failover even when I turned off corosync on cluster DC. The
> cluster didn't recover after this mishap. 
> 
> Is this expected behavior? Is there a solution for when OOM decides to
> kill the pacemaker process? 
> 
> I run pacemaker 1.1.14, with corosync 1.4. I have stonith disabled and
> quorum enabled. 
> 
> Thank you,
> 
> nwarriorch

What exactly are you doing to kill pacemaker? There are multiple
pacemaker processes, and they have different recovery methods.

Also, what OS/version are you running? If it has systemd, that can play
a role in recovery as well.

Having stonith disabled is a big part of what you're seeing. When a node
fails, stonith is the only way the rest of the cluster can be sure the
node is unable to cause trouble, so it can recover services elsewhere.


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Can't See Why This Cluster Failed Over

2017-04-07 Thread Ken Gaillot
On 04/07/2017 12:58 PM, Eric Robinson wrote:
> Somebody want to look at this log and tell me why the cluster failed over? 
> All we did was add a new resource. We've done it many times before without 
> any problems.
> 
> --
> 
> Apr 03 08:50:30 [22762] ha14acib: info: cib_process_request:  
>   Forwarding cib_apply_diff operation for section 'all' to master 
> (origin=local/cibadmin/2)
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
> --- 0.605.2 2
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
> +++ 0.607.0 65654c97e62cd549f22f777a5290fe3a
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: +  
> /cib:  @epoch=607, @num_updates=0
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
> /cib/configuration/resources:   type="mysql_745"/>
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
> /cib/configuration/resources:   type="mysql_746"/>
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
> /cib/configuration/constraints/rsc_colocation[@id='c_clust19']/resource_set[@id='c_clust19-0']:
>   
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
> /cib/configuration/constraints/rsc_colocation[@id='c_clust19']/resource_set[@id='c_clust19-0']:
>   
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
> /cib/configuration/constraints/rsc_order[@id='o_clust19']/resource_set[@id='o_clust19-3']:
>   
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
> /cib/configuration/constraints/rsc_order[@id='o_clust19']/resource_set[@id='o_clust19-3']:
>   
> Apr 03 08:50:30 [22762] ha14acib: info: cib_process_request:  
>   Completed cib_apply_diff operation for section 'all': OK (rc=0, 
> origin=ha14a/cibadmin/2, version=0.607.0)
> Apr 03 08:50:30 [22762] ha14acib: info: write_cib_contents:   
>   Archived previous version as /var/lib/pacemaker/cib/cib-36.raw
> Apr 03 08:50:30 [22762] ha14acib: info: write_cib_contents:   
>   Wrote version 0.607.0 of the CIB to disk (digest: 
> 1afdb9e480f870a095aa9e39719d29c4)
> Apr 03 08:50:30 [22762] ha14acib: info: retrieveCib:
> Reading cluster configuration from: /var/lib/pacemaker/cib/cib.DkIgSs 
> (digest: /var/lib/pacemaker/cib/cib.hPwa66)
> Apr 03 08:50:30 [22764] ha14a   lrmd: info: 
> process_lrmd_get_rsc_info:  Resource 'p_mysql_745' not found (17 active 
> resources)
> Apr 03 08:50:30 [22764] ha14a   lrmd: info: 
> process_lrmd_rsc_register:  Added 'p_mysql_745' to the rsc list (18 active 
> resources)
> Apr 03 08:50:30 [22767] ha14a   crmd: info: do_lrm_rsc_op:  
> Performing key=10:7484:7:91ef4b03-8769-47a1-a364-060569c46e52 
> op=p_mysql_745_monitor_0
> Apr 03 08:50:30 [22764] ha14a   lrmd: info: 
> process_lrmd_get_rsc_info:  Resource 'p_mysql_746' not found (18 active 
> resources)
> Apr 03 08:50:30 [22764] ha14a   lrmd: info: 
> process_lrmd_rsc_register:  Added 'p_mysql_746' to the rsc list (19 active 
> resources)
> Apr 03 08:50:30 [22767] ha14a   crmd: info: do_lrm_rsc_op:  
> Performing key=11:7484:7:91ef4b03-8769-47a1-a364-060569c46e52 
> op=p_mysql_746_monitor_0
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
> --- 0.607.0 2
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
> +++ 0.607.1 (null)
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: +  
> /cib:  @num_updates=1
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
> /cib/status/node_state[@id='ha14b']/lrm[@id='ha14b']/lrm_resources:  
> 
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++
> 
>  operation="monitor" crm-debug-origin="do_update_resource" 
> crm_feature_set="3.0.9" 
> transition-key="13:7484:7:91ef4b03-8769-47a1-a364-060569c46e52" 
> transition-magic="0:7;13:7484:7:91ef4b03-8769-47a1-a364-060569c46e52" 
> call-id="142" rc-code="7" op-status="0" interval="0" last-run="1491234630" las
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++
>   
> 
> Apr 03 08:50:30 [22762] ha14acib: info: cib_process_request:  
>   Completed cib_modify operation for section status: OK (rc=0, 
> origin=ha14b/crmd/7665, version=0.607.1)
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
> --- 0.607.1 2
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: Diff: 
> +++ 0.607.2 (null)
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: +  
> /cib:  @num_updates=2
> Apr 03 08:50:30 [22762] ha14acib: info: cib_perform_op: ++ 
> /cib/status/node_s

Re: [ClusterLabs] [Problem] The crmd causes an error of xml.

2017-04-07 Thread Ken Gaillot
On 04/06/2017 08:49 AM, renayama19661...@ybb.ne.jp wrote:
> Hi All,
> 
> I confirmed a development edition of Pacemaker.
>  - 
> https://github.com/ClusterLabs/pacemaker/tree/71dbd128c7b0a923c472c8e564d33a0ba1816cb5
> 
> 
> property no-quorum-policy="ignore" \
> stonith-enabled="true" \
> startup-fencing="false"
> 
> rsc_defaults resource-stickiness="INFINITY" \
> migration-threshold="INFINITY"
> 
> fencing_topology \
> rh73-01-snmp: prmStonith1-1 \
> rh73-02-snmp: prmStonith2-1
> 
> primitive prmDummy ocf:pacemaker:Dummy \
> op start interval="0s" timeout="60s" on-fail="restart" \
> op monitor interval="10s" timeout="60s" on-fail="restart" \
> op stop interval="0s" timeout="60s" on-fail="fence"
> 
> primitive prmStonith1-1 stonith:external/ssh \
> params \
> pcmk_reboot_retries="1" \
> pcmk_reboot_timeout="40s" \
> hostlist="rh73-01-snmp" \
> op start interval="0s" timeout="60s" on-fail="restart" \
> op stop interval="0s" timeout="60s" on-fail="ignore"
> 
> primitive prmStonith2-1 stonith:external/ssh \
> params \
> pcmk_reboot_retries="1" \
> pcmk_reboot_timeout="40s" \
> hostlist="rh73-02-snmp" \
> op start interval="0s" timeout="60s" on-fail="restart" \
> op stop interval="0s" timeout="60s" on-fail="ignore"
> 
> ### Resource Location ###
> location rsc_location-1 prmDummy \
> rule  300: #uname eq rh73-01-snmp \
> rule  200: #uname eq rh73-02-snmp
> 
> 
> 
> I pour the following brief crm files.
> I produce the trouble of the resource in a cluster.
> Then crmd causes an error.
> 
> 
> (snip)
> Apr  6 18:04:22 rh73-01-snmp pengine[5214]: warning: Calculated transition 4 
> (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-0.bz2
> Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error: Entity: line 1: 
> parser error : Specification mandate value for attribute 
> CRM_meta_fail_count_prmDummy
> Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error: rh73-01-snmp" 
> on_node_uuid="3232238265"> Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error:  
>   ^
> Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error: Entity: line 1: 
> parser error : attributes construct error
> Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error: rh73-01-snmp" 
> on_node_uuid="3232238265"> Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error:  
>   ^
> Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error: Entity: line 1: 
> parser error : Couldn't find end of Start Tag attributes line 1
> Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error: rh73-01-snmp" 
> on_node_uuid="3232238265"> Apr  6 18:04:22 rh73-01-snmp crmd[5215]:   error: XML Error:  
>   ^
> Apr  6 18:04:22 rh73-01-snmp crmd[5215]: warning: Parsing failed (domain=1, 
> level=3, code=73): Couldn't find end of Start Tag attributes line 1
> (snip)
> 
> 
> The XML that a new trouble count was related to somehow or other seems to 
> have a problem.
> 
> I attach pe-warn-0.bz2.
> 
> Best Regards,
> Hideo Yamauchi.

Hi Hideo,

Thanks for the report!

This appears to be a PE bug when fencing is needed due to stop failure.
It wasn't caught in regression testing because the PE will continue to
use the old-style fail-count attribute if the DC does not support the
new style, and existing tests obviously have older DCs. I definitely
need to add some new tests.

I'm not sure why fail-count and last-failure are being added as
meta-attributes in this case, or why incorrect XML syntax is being
generated, but I'll investigate.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] cluster does not detect kill on pacemaker process ?

2017-04-07 Thread Ken Gaillot
On 04/07/2017 05:20 PM, neeraj ch wrote:
> I am running it on centos 6.6. I am killing the "pacemakerd" process
> using kill -9.

pacemakerd is a supervisor process that watches the other processes, and
respawns them if they die. It is not really responsible for anything in
the cluster directly. So, killing it does not disrupt the cluster in any
way, it just prevents automatic recovery if one of the other daemons dies.

When systemd is in use, systemd will restart pacemakerd if it dies, but
CentOS 6 does not have systemd (CentOS 7 does).

> hmm, stonith is used for detection as well? I thought it was used to
> disable malfunctioning nodes. 

If you kill pacemakerd, that doesn't cause any harm to the cluster, so
that would not involve stonith.

If you kill crmd or corosync instead, that would cause the node to leave
the cluster -- it would be considered a malfunctioning node. The rest of
the cluster would then use stonith to disable that node, so it could
safely recover its services elsewhere.

> On Fri, Apr 7, 2017 at 7:58 AM, Ken Gaillot  <mailto:kgail...@redhat.com>> wrote:
> 
> On 04/05/2017 05:16 PM, neeraj ch wrote:
> > Hello All,
> >
> > I noticed something on our pacemaker test cluster. The cluster is
> > configured to manage an underlying database using master slave
> primitive.
> >
> > I ran a kill on the pacemaker process, all the other nodes kept
> showing
> > the node online. I went on to kill the underlying database on the same
> > node which would have been detected had the pacemaker on the node been
> > online. The cluster did not detect that the database on the node has
> > failed, the failover never occurred.
> >
> > I went on to kill corosync on the same node and the cluster now marked
> > the node as stopped and proceeded to elect a new master.
> >
> >
> > In a separate test. I killed the pacemaker process on the cluster DC,
> > the cluster showed no change. I went on to change CIB on a different
> > node. The CIB modify command timed out. Once that occurred, the node
> > didn't failover even when I turned off corosync on cluster DC. The
> > cluster didn't recover after this mishap.
> >
> > Is this expected behavior? Is there a solution for when OOM decides to
> > kill the pacemaker process?
> >
> > I run pacemaker 1.1.14, with corosync 1.4. I have stonith disabled and
> > quorum enabled.
> >
> > Thank you,
> >
> > nwarriorch
> 
> What exactly are you doing to kill pacemaker? There are multiple
> pacemaker processes, and they have different recovery methods.
> 
> Also, what OS/version are you running? If it has systemd, that can play
> a role in recovery as well.
> 
> Having stonith disabled is a big part of what you're seeing. When a node
> fails, stonith is the only way the rest of the cluster can be sure the
> node is unable to cause trouble, so it can recover services elsewhere.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: Rename option group resource id with pcs

2017-04-11 Thread Ken Gaillot
On 04/11/2017 05:48 AM, Ulrich Windl wrote:
 Dejan Muhamedagic  schrieb am 11.04.2017 um 11:43 in
> Nachricht <20170411094352.GD8414@tuttle.homenet>:
>> Hi,
>>
>> On Tue, Apr 11, 2017 at 10:50:56AM +0200, Tomas Jelinek wrote:
>>> Dne 11.4.2017 v 08:53 SAYED, MAJID ALI SYED AMJAD ALI napsal(a):
 Hello,

 Is there any option in pcs to rename group resource id?

>>>
>>> Hi,
>>>
>>> No, there is not.
>>>
>>> Pacemaker doesn't really cover the concept of renaming a resource.
>>
>> Perhaps you can check how crmsh does resource rename. It's not
>> impossible, but can be rather involved if there are other objects
>> (e.g. constraints) referencing the resource. Also, crmsh will
>> refuse to rename the resource if it's running.
> 
> The real problem in pacemaker (as resources are created now) is that the 
> "IDs" have too much semantic, i.e. most are derived from the resource name 
> (while lacking a name attribute or element), and some required elements are 
> IDs are accessed by ID, and not by name.
> 
> Examples:
> 
>value="1.1
> .12-f47ea56"/>
> 
> A s and s have no name, but only an ID (it seems).
> 
>   
> 
> This is redundant: As the  is part of a resource (by XML structure) it's 
> unneccessary to put the name of the resource into the ID of the operation.
> 
> It all looks like a kind of abuse of XML IMHO.I think the next CIB format 
> should be able to handle IDs that are free of semantics other than to denote 
> (relatively unique) identity. That is: It should be OK to assign IDs like 
> "i1", "i2", "i3", ... and besides from an IDREF the elements should be 
> accessed by structure and/or name.
> 
> (If the ID should be the primary identification feature, flatten all 
> structure and drop all (redundant) names.)
> 
> Regards,
> Ulrich

That's how it's always been :-)

Pacemaker doesn't care what IDs are, only that they are unique (though
of course they must meet the XML requirements for an ID type as far as
allowed characters). The various tools (CLI, crm shell, pcs)
auto-generate IDs so the user doesn't have to care about them, and they
create IDs like the ones you mention above, because they're easy to
generate.

>>
>> Thanks,
>>
>> Dejan
>>
>>> From
>>> pacemaker's point of view one resource gets removed and another one gets
>>> created.
>>>
>>> This has been discussed recently:
>>> http://lists.clusterlabs.org/pipermail/users/2017-April/005387.html 
>>>
>>> Regards,
>>> Tomas
>>>






 */MAJID SAYED/*

 /HPC System Administrator./

 /King Abdullah International Medical Research Centre/

 /Phone:+9661801(Ext:40631)/

 /Email:sayed...@ngha.med.sa/

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Surprising semantics of location constraints with INFINITY score

2017-04-11 Thread Ken Gaillot
On 04/11/2017 08:30 AM, Kristoffer Grönlund wrote:
> Hi all,
> 
> I discovered today that a location constraint with score=INFINITY
> doesn't actually restrict resources to running only on particular
> nodes. From what I can tell, the constraint assigns the score to that
> node, but doesn't change scores assigned to other nodes. So if the node
> in question happens to be offline, the resource will be started on any
> other node.
> 
> Example:
> 
> 
> 
> If node2 is offline, I see the following:
> 
>  dummy(ocf::heartbeat:Dummy): Started node1
> native_color: dummy allocation score on node1: 1
> native_color: dummy allocation score on node2: -INFINITY
> native_color: dummy allocation score on webui: 0
> 
> It makes some kind of sense, but seems surprising - and the
> documentation is a bit unclear on the topic. In particular, the
> statement that a score = INFINITY means "must" is clearly not correct in
> this case. Maybe the documentation should be clarified for location
> constraints?

Yes, that behavior is intended. I'll make a note to clarify in the
documentation.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker for Embedded Systems

2017-04-11 Thread Ken Gaillot
On 04/10/2017 03:58 PM, Chad Cravens wrote:
> Hello all:
> 
> we have implemented large cluster solutions for complex server
> environments that had databases, application servers, apache web servers
> and implemented fencing with the IPMI fencing agent.
> 
> However, we are considering if pacemaker would be a good solution for
> high availability for an embedded control system that integrates with
> CAN for vehicles? We will also have Ethernet for cluster communication
> between the hardware units.
> 
> My main questions are:
> 1) Is it common use case to use pacemaker to implement high availability
> for embedded control systems?

I know it has been done. I'd love to hear about some specific examples,
but I don't know of any public ones.

> 2) What, if any, special considerations should be taken when it comes to
> fencing in this type of environment?

>From pacemaker's point of view, it's not a special environment ...
communication between nodes and some way to request fencing are all
that's needed.

Of course, the physical environment poses many more challenges in this
case, not to mention the safety and regulatory requirements if the
system is in any way important to the operation of the vehicle.

I don't have any experience in the area, but just as a thought
experiment, I'd think the main question would be: what happens in a
split-brain situation? Fencing is important to the same degree as the
consequences of that. If the worst that happens is the music player
skips tracks, it might be acceptable to disable fencing; if the vehicle
could brake inappropriately, then the needs are much larger.

> Thank you for any guidance!
> 
> -- 
> Kindest Regards,
> Chad Cravens
> (800) 214-9146 x700
> 
> http://www.ossys.com 
> http://www.linkedin.com/company/open-source-systems-llc
>   
> https://www.facebook.com/OpenSrcSys
>    https://twitter.com/OpenSrcSys
>   http://www.youtube.com/OpenSrcSys
>    http://www.ossys.com/feed
>    cont...@ossys.com 
> Chad Cravens
> (800) 214-9146 x700
> chad.crav...@ossys.com 
> http://www.ossys.com


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Coming in Pacemaker 1.1.17: container bundles

2017-04-17 Thread Ken Gaillot
On 04/13/2017 07:04 AM, Jan Pokorný wrote:
> On 03/04/17 09:47 -0500, Ken Gaillot wrote:
>> On 04/03/2017 02:12 AM, Ulrich Windl wrote:
>>>>>> Ken Gaillot  schrieb am 01.04.2017 um 00:43 in 
>>>>>> Nachricht
>>> <981d420d-73b2-3f24-a67c-e9c66dafb...@redhat.com>:
>>>
>>> [...]
>>>> Pacemaker 1.1.17 introduces a new type of resource: the "bundle". A
>>>> bundle is a single resource specifying the Docker settings, networking
>>>> requirements, and storage requirements for any number of containers
>>>> generated from the same Docker image.
>>>
>>> I wonder: Is a "bundle" just a kind of special "group template"? It
>>> looks as if I could do it with a group also, but would have to
>>> write a bite more to get it configured. Did I miss something?
>>
>> With a group, you could reproduce most of this functionality, though it
>> would be more verbose: you'd need to configure ocf:heartbeat:docker,
>> ocf:heartbeat:IPaddr2, and ocf:pacemaker:remote resources, plus a
>> primitive for your service, as well as constraints that restrict the
>> primitive to the guest node and prevent any other resource from running
>> on the guest node.
>>
>> However, this can do something that a group can't: launch multiple
>> instances of a single container image, while associating floating IPs
>> and storage mappings specific to each replica. This puts it somewhere
>> between a group and a specialized form of clone.
> 
> In that case, wouldn't it be more systemic to factor any generic
> clone/master-like controls (replicas, replicas-per-host, masters) out
> of  so it can be reused seemlessly when switching to other
> possible containerization backends in the future?

We did consider that. It's not clear how much of the functionality (or
terminology) will be applicable to other technologies (currently known
and potential future ones). We decided that it would be cleaner to
duplicate those fields in other technology tags than to have them not
work for certain technologies.

> 
>> Also, it will be shown differently in the cluster status, which is
>> helpful.


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] KVM virtualdomain - stopped

2017-04-17 Thread Ken Gaillot
On 04/13/2017 03:01 AM, Jaco van Niekerk wrote:
> 
> Hi
> 
> I am having endless problems with ocf::heartbeat:VirtualDomain when
> failing over to second node. The virtualdomain goes into a stopped state
> 
> virtdom_compact (ocf::heartbeat:VirtualDomain): Stopped
> 
> * virtdom_compact_start_0 on node2.kvm.bitco.co.za 'unknown error' (1):
> call=93, status=complete, exitreason='Failed to start virtual domain
> compact.',
> last-rc-change='Thu Apr 13 09:11:16 2017', queued=0ms, exec=369ms
> 
> I then aren't able to get it started without deleting the resource and
> adding it again:
> 
> pcs resource create virtdom_compact ocf:heartbeat:VirtualDomain
> config=/etc/libvirt/qemu/compact.xml meta allow-migrate=true op monitor
> interval="30"
> 
> Looking at virsh list --all
> virsh list --all
> Id Name State
> 
> 
> It doesn't seam like ocf:heartbeat:VirtualDomain is able to define the
> domain and thus the command can't start the domain:
> 
> virsh start compact
> error: failed to get domain 'compact'
> error: Domain not found: no domain with matching name 'compact'
> 
> Am I missing something in my configuration:
> pcs resource create my_FS ocf:heartbeat:Filesystem params
> device=/dev/sdc1 directory=/images fstype=xfs
> pcs resource create my_VIP ocf:heartbeat:IPaddr2 ip=192.168.99.10
> cidr_netmask=22 op monitor interval=10s
> pcs resource create virtdom_compact ocf:heartbeat:VirtualDomain
> config=/etc/libvirt/qemu/compact.xml meta allow-migrate=true op monitor

^^^ Make sure that config file is available on all nodes. It's a good
idea to try starting the VM outside the cluster (e.g. with virsh) on
each node, before putting it under cluster control.

> interval="30"
> 
> Regards
> 
> 
> * Jaco van Niekerk*
> 
> * Solutions Architect*
> 
>   
>
> 
>  *T:* 087 135  | Ext: 2102
> 
>
> 
>  *E:* j...@bitco.co.za 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] nodes ID assignment issue

2017-04-17 Thread Ken Gaillot
On 04/13/2017 10:40 AM, Radoslaw Garbacz wrote:
> Hi,
> 
> I have a question regarding building CIB nodes scope and specifically
> assignment to node IDs.
> It seems like the preexisting scope is not honored and nodes can get
> replaced based on check-in order.
> 
> I pre-create the nodes scope because it is faster, then setting
> parameters for all the nodes later (when the number of nodes is large).
> 
> From the listings below, one can see that node with ID=1 was replaced
> with another node (uname), however not the options. This situation
> causes problems when resource assignment is based on rules involving
> node options.
> 
> Is there a way to prevent this rearrangement of 'uname', if not whether
> there is a way to make the options follow 'uname', or maybe the problem
> is somewhere else - corosync configuration perhaps?
> Is the corosync 'nodeid' enforced to be also CIB node 'id'?

Hi,

Yes, for cluster nodes, pacemaker gets the node id from the messaging
layer (corosync). For remote nodes, id and uname are always identical.

> 
> 
> Thanks in advance,
> 
> 
> Below is CIB committed before nodes check-in:
> 
> 
>   
> 
>   
>   
>   
> 
>   
>   
> 
>name="STATE"/>
>   
>name="Primary"/>
> 
>   
>   
> 
>name="STATE"/>
>   
>name="Primary"/>
> 
>   
>   
> 
>name="STATE"/>
>   
>name="Primary"/>
> 
>   
>   
> 
>   
>   
>   
> 
>   
> 
> 
> 
> 
> And automatic changes after nodes check-in:
> 
> 
>   
> 
>   
>   
>   
> 
>   
>   
> 
>name="STATE"/>
>   
>name="Primary"/>
> 
>   
>   
> 
>name="STATE"/>
>   
>name="Primary"/>
> 
>   
>   
> 
>name="STATE"/>
>   
>name="Primary"/>
> 
>   
>   
> 
>   
>   
>   
> 
>   
> 
> 
> 
> 
> -- 
> Best Regards,
> 
> Radoslaw Garbacz
> XtremeData Incorporated

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Why shouldn't one store resource configuration in the CIB?

2017-04-17 Thread Ken Gaillot
On 04/13/2017 11:11 AM, Ferenc Wágner wrote:
> Hi,
> 
> I encountered several (old) statements on various forums along the lines
> of: "the CIB is not a transactional database and shouldn't be used as
> one" or "resource parameters should only uniquely identify a resource,
> not configure it" and "the CIB was not designed to be a configuration
> database but people still use it that way".  Sorry if I misquote these,
> I go by my memories now, I failed to dig up the links by a quick try.
> 
> Well, I've been feeling guilty in the above offenses for years, but it
> worked out pretty well that way which helped to suppress these warnings
> in the back of my head.  Still, I'm curious: what's the reason for these
> warnings, what are the dangers of "abusing" the CIB this way?
> /var/lib/pacemaker/cib/cib.xml is 336 kB with 6 nodes and 155 resources
> configured.  Old Pacemaker versions required tuning PCMK_ipc_buffer to
> handle this, but even the default is big enough nowadays (128 kB after
> compression, I guess).
> 
> Am I walking on thin ice?  What should I look out for?

That's a good question. Certainly, there is some configuration
information in most resource definitions, so it's more a matter of degree.

The main concerns I can think of are:

1. Size: Increasing the CIB size increases the I/O, CPU and networking
overhead of the cluster (and if it crosses the compression threshold,
significantly). It also marginally increases the time it takes the
policy engine to calculate a new state, which slows recovery.

2. Consistency: Clusters can become partitioned. If changes are made on
one or more partitions during the separation, the changes won't be
reflected on all nodes until the partition heals, at which time the
cluster will reconcile them, potentially losing one side's changes. I
suppose this isn't qualitatively different from using a separate
configuration file, but those tend to be more static, and failure to
modify all copies would be more obvious when doing them individually
rather than issuing a single cluster command.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] How to force remove a cluster node?

2017-04-17 Thread Ken Gaillot
On 04/13/2017 01:11 PM, Scott Greenlese wrote:
> Hi,
> 
> I need to remove some nodes from my existing pacemaker cluster which are
> currently unbootable / unreachable.
> 
> Referenced
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-clusternodemanage-HAAR.html#s2-noderemove-HAAR
> 
> *4.4.4. Removing Cluster Nodes*
> The following command shuts down the specified node and removes it from
> the cluster configuration file, corosync.conf, on all of the other nodes
> in the cluster. For information on removing all information about the
> cluster from the cluster nodes entirely, thereby destroying the cluster
> permanently, refer to _Section 4.6, “Removing the Cluster
> Configuration”_
> .
> 
> pcs cluster node remove /node/
> 
> I ran the command with the cluster active on 3 of the 5 available
> cluster nodes (with quorum). The command fails with:
> 
> [root@zs90KP VD]# date;*pcs cluster node remove zs93kjpcs1*
> Thu Apr 13 13:40:59 EDT 2017
> *Error: pcsd is not running on zs93kjpcs1*
> 
> 
> The node was not removed:
> 
> [root@zs90KP VD]# pcs status |less
> Cluster name: test_cluster_2
> Last updated: Thu Apr 13 14:08:15 2017 Last change: Wed Apr 12 16:40:26
> 2017 by root via cibadmin on zs93KLpcs1
> Stack: corosync
> Current DC: zs90kppcs1 (version 1.1.13-10.el7_2.ibm.1-44eb2dd) -
> partition with quorum
> 45 nodes and 180 resources configured
> 
> Node zs95KLpcs1: UNCLEAN (offline)
> Online: [ zs90kppcs1 zs93KLpcs1 zs95kjpcs1 ]
> *OFFLINE: [ zs93kjpcs1 ]*
> 
> 
> Is there a way to force remove a node that's no longer bootable? If not,
> what's the procedure for removing a rogue cluster node?
> 
> Thank you...
> 
> Scott Greenlese ... KVM on System Z - Solutions Test, IBM Poughkeepsie, N.Y.
> INTERNET: swgre...@us.ibm.com

Yes, the pcs command is just a convenient shorthand for a series of
commands. You want to ensure pacemaker and corosync are stopped on the
node to be removed (in the general case, obviously already done in this
case), remove the node from corosync.conf and restart corosync on all
other nodes, then run "crm_node -R " on any one active node.


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Re: 2-Node Cluster Pointless?

2017-04-18 Thread Ken Gaillot
On 04/18/2017 02:47 AM, Ulrich Windl wrote:
 Digimer  schrieb am 16.04.2017 um 20:17 in Nachricht
> <12cde13f-8bad-a2f1-6834-960ff3afc...@alteeve.ca>:
>> On 16/04/17 01:53 PM, Eric Robinson wrote:
>>> I was reading in "Clusters from Scratch" where Beekhof states, "Some would
> 
>> argue that two-node clusters are always pointless, but that is an argument 
>> for another time." Is there a page or thread where this argument has been 
>> fleshed out? Most of my dozen clusters are 2 nodes. I hate to think they're
> 
>> pointless.  
>>>
>>> --
>>> Eric Robinson
>>
>> There is a belief that you can't build a reliable cluster without
>> quorum. I am of the mind that you *can* build a very reliable 2-node
>> cluster. In fact, every cluster our company has deployed, going back
>> over five years, has been 2-node and have had exception uptimes.
>>
>> The confusion comes from the belief that quorum is required and stonith
>> is option. The reality is the opposite. I'll come back to this in a minute.
>>
>> In a two-node cluster, you have two concerns;
>>
>> 1. If communication between the nodes fail, but both nodes are alive,
>> how do you avoid a split brain?
> 
> By killing one of the two parties.
> 
>>
>> 2. If you have a two node cluster and enable cluster startup on boot,
>> how do you avoid a fence loop?
> 
> I think the problem in the question is using "you" instead of "it" ;-)
> Pacemaker assumes all problems that cause STONITH will be solved by STONITH.
> That's not always true (e.g. configuration errors). Maybe a node's failcount
> should not be reset if the node was fenced.
> So you'll avoid a fencing loop, but might end in a state where no resources
> are running. IMHO I'd prefer that over a fencing loop.
> 
>>
>> Many answer #1 by saying "you need a quorum node to break the tie". In
>> some cases, this works, but only when all nodes are behaving in a
>> predictable manner.
> 
> All software relies on the fact that it behaves in a predictable manner, BTW.
> The problem is not "the predictable manner for all nodes", but the predictable
> manner for the cluster.
> 
>>
>> Many answer #2 by saying "well, with three nodes, if a node boots and
>> can't talk to either other node, it is inquorate and won't do anything".
> 
> "wan't do anything" is also wrong: I must go offline without killing others,
> preferrably.
> 
>> This is a valid mechanism, but it is not the only one.
>>
>> So let me answer these from a 2-node perspective;
>>
>> 1. You use stonith and the faster node lives, the slower node dies. From
> 
> Isn't there a possibility that both nodes shoot each other? Is there a
> guarantee that there will always be one faster node?
> 
>> the moment of comms failure, the cluster blocks (needed with quorum,
>> too) and doesn't restore operation until the (slower) peer is in a known
>> state; Off. You can bias this by setting a fence delay against your
>> preferred node. So say node 1 is the node that normally hosts your
>> services, then you add 'delay="15"' to node 1's fence method. This tells
>> node 2 to wait 15 seconds before fencing node 1. If both nodes are
>> alive, node 2 will be fenced before the timer expires.
> 
> Can only the DC issue fencing?

No, any cluster node can initiate fencing. Fencing can also be requested
from a remote node (e.g. via stonith_admin), but the remote node will
ask a cluster node to initiate the fencing.

Also, fence device resources do not need to be "running" in order to be
used. If they are intentionally disabled (target-role=Stopped), they
will not be used, but if they are simply not running, the cluster will
still use the device when needed. "Running" is used solely to determine
whether recurring monitor actions are done.

This design ensures that fencing requires a bare minimum to be
functional (stonith daemon running, and fence devices configured), so it
can be used even at startup before resources are running, and even if
the DC is the node that needs to be fenced or a DC has not yet been elected.

>> 2. In Corosync v2+, there is a 'wait_for_all' option that tells a node
>> to not do anything until it is able to talk to the peer node. So in the
>> case of a fence after a comms break, the node that reboots will come up,
>> fail to reach the survivor node and do nothing more. Perfect.
> 
> Does "do nothing more" mean continuously polling for other nodes?
> 
>>
>> Now let me come back to quorum vs. stonith;
>>
>> Said simply; Quorum is a tool for when everything is working. Fencing is
>> a tool for when things go wrong.
> 
> I'd say: Quorum is the tool to decide who'll be alive and who's going to die,
> and STONITH is the tool to make nodes die. If everything is working you need
> neither quorum nor STONITH.
> 
>>
>> Lets assume that your cluster is working find, then for whatever reason,
>> node 1 hangs hard. At the time of the freeze, it was hosting a virtual
>> IP and an NFS service. Node 2 declares node 1 lost after a period of
>> time and decides it needs to take

Re: [ClusterLabs] lvm on shared storage and a lot of...

2017-04-18 Thread Ken Gaillot
On 04/18/2017 09:14 AM, lejeczek wrote:
> 
> 
> On 18/04/17 14:45, Digimer wrote:
>> On 18/04/17 07:31 AM, lejeczek wrote:
>>> .. device_block & device_unblock in dmesg.
>>>
>>> and I see that the LVM resource would fail.
>>> This to me seems to happen randomly, or I fail to spot a pattern.
>>>
>>> Shared storage is a sas3 enclosure.
>>> I believe I follow docs on LVM to the letter. I don't know what could be
>>> the problem.
>>>
>>> would you suggest ways to troubleshoot it? Is it faulty/failing hardware?
>>>
>>> many thanks,
>>> L.
>> LVM or clustered LVM?
>>
> no clvmd
> And inasmuch as the resource would start, fs would mount, if I start
> using it more intensely I'd get more of block/unblock and after a while
> mountpoint resource failes and then LVM resource too.
> It gets only worse after, even after I deleted resourced, I begin to
> see, eg.:
> 
> [ 6242.606870] sd 7:0:32:0: device_unblock and setting to running,
> handle(0x002c)
> [ 6334.248617] sd 7:0:18:0: [sdy] tag#0 FAILED Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> [ 6334.248633] sd 7:0:18:0: [sdy] tag#0 Sense Key : Not Ready [current]
> [ 6334.248640] sd 7:0:18:0: [sdy] tag#0 Add. Sense: Logical unit is in
> process of becoming ready

This feels like a hardware issue to me. Have you checked the SMART data
on the drives?

> [ 6334.248647] sd 7:0:18:0: [sdy] tag#0 CDB: Read(10) 28 00 00 00 00 00
> 00 00 08 00
> [ 6334.248652] blk_update_request: I/O error, dev sdy, sector 0

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Why shouldn't one store resource configuration in the CIB?

2017-04-18 Thread Ken Gaillot
On 04/18/2017 11:46 AM, Ferenc Wágner wrote:
> Ken Gaillot  writes:
> 
>> On 04/13/2017 11:11 AM, Ferenc Wágner wrote:
>>
>>> I encountered several (old) statements on various forums along the lines
>>> of: "the CIB is not a transactional database and shouldn't be used as
>>> one" or "resource parameters should only uniquely identify a resource,
>>> not configure it" and "the CIB was not designed to be a configuration
>>> database but people still use it that way".  Sorry if I misquote these,
>>> I go by my memories now, I failed to dig up the links by a quick try.
>>>
>>> Well, I've been feeling guilty in the above offenses for years, but it
>>> worked out pretty well that way which helped to suppress these warnings
>>> in the back of my head.  Still, I'm curious: what's the reason for these
>>> warnings, what are the dangers of "abusing" the CIB this way?
>>> /var/lib/pacemaker/cib/cib.xml is 336 kB with 6 nodes and 155 resources
>>> configured.  Old Pacemaker versions required tuning PCMK_ipc_buffer to
>>> handle this, but even the default is big enough nowadays (128 kB after
>>> compression, I guess).
>>>
>>> Am I walking on thin ice?  What should I look out for?
>>
>> That's a good question. Certainly, there is some configuration
>> information in most resource definitions, so it's more a matter of degree.
>>
>> The main concerns I can think of are:
>>
>> 1. Size: Increasing the CIB size increases the I/O, CPU and networking
>> overhead of the cluster (and if it crosses the compression threshold,
>> significantly). It also marginally increases the time it takes the
>> policy engine to calculate a new state, which slows recovery.
> 
> Thanks for the input, Ken!  Is this what you mean?
> 
> cib: info: crm_compress_string: Compressed 1028972 bytes into 69095 (ratio 
> 14:1) in 138ms

yep

> At the same time /var/lib/pacemaker/cib/cib.xml is 336K, and
> 
> # cibadmin -Q --scope resources | wc -c
> 330951
> # cibadmin -Q --scope status | wc -c
> 732820
> 
> Even though I consume about 2 kB per resource, the status section
> weights 2.2 times the resources section.  Which means shrinking the
> resource size wouldn't change the full size significantly.

good point

> At the same time, we should probably monitor the trends of the cluster
> messaging health as we expand it (with nodes and resources).  What would
> be some useful indicators to graph?

I think the main concern would be CPU spikes when a new state needs to
be calculated (which is at least every cluster-recheck-interval).

Network traffic on the cluster communication link would be interesting,
especially at start-up when everything is happening at once, or after a
global clean-up of all resources.

I/O on whatever holds /var/lib/pacemaker will probably be small, but
wouldn't hurt to check.

>> 2. Consistency: Clusters can become partitioned. If changes are made on
>> one or more partitions during the separation, the changes won't be
>> reflected on all nodes until the partition heals, at which time the
>> cluster will reconcile them, potentially losing one side's changes.
> 
> Ah, that's a very good point, which I neglected totally: even inquorate
> partitions can have configuration changes.  Thanks for bringing this up!
> I wonder if there's any practical workaround for that.
> 
>> I suppose this isn't qualitatively different from using a separate
>> configuration file, but those tend to be more static, and failure to
>> modify all copies would be more obvious when doing them individually
>> rather than issuing a single cluster command.
> 
> From a different angle: if a node is off, you can't modify its
> configuration file.  So you need an independent mechanism to do what the
> CIB synchronization does anyway, or a shared file system with its added
> complexity.  On the other hand, one needn't guess how Pacemaker
> reconciles the conflicting resource configuration changes.  Indeed, how
> does it?

Good question, I haven't delved deeply into that code. It's not merging
diffs or anything like that -- some changes are blessed, and anything
incompatible is discarded.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Wtrlt: Antw: Re: Antw: Re: how important would you consider to have two independent fencing device for each node ?

2017-04-20 Thread Ken Gaillot
On 04/20/2017 01:43 AM, Ulrich Windl wrote:
> Should have gone to the list...
> 
> Digimer  schrieb am 19.04.2017 um 17:20 in Nachricht
>> <600637f1-fef8-0a3d-821c-7aecfa398...@alteeve.ca>:
>>> On 19/04/17 02:38 AM, Ulrich Windl wrote:
>>> Digimer  schrieb am 18.04.2017 um 19:08 in
> Nachricht
 <26e49390-b384-b46e-4965-eba5bfe59...@alteeve.ca>:
> On 18/04/17 11:07 AM, Lentes, Bernd wrote:
>> Hi,
>>
>> i'm currently establishing a two node cluster. Each node is a HP
> server
 with 
> an ILO card.
>> I can fence both of them, it's working fine.
>> But what is if the ILO does not work correctly ? Then fencing is not 
> possible.
>
> Correct. If you only have iLO fencing, then the cluster would hang
> (failed fencing is *not* an indication of node death).
>
>> I also have a switched PDU from APC. Each server has two power
> supplies. 
> Currently one is connected to the normal power equipment, the other to
> the 
> UPS.
>> As a sort of redundancy, if the UPS does not work properly.
>
> That's a fine setup.
>
>> When i'd like to use the switched PDU as a fencing device i will loose
> the

> redundancy of two independent power sources, because then i have to
> connect

> both power supplies together to the UPS.
>> I wouldn't like to do that.
>
> Not if you have two switched PDUs. This is what we do in our Anvil!
> systems... One PDU feeds the first PSU in each node and the second PDU
> feeds the second PSUs. Ideally both PDUs are fed by UPSes, but that's
> not as important. One PDU on a UPS and one PDU directly from mains will
> work.
>
>> How important would you consider to have two independent fencing device
> for

> each node ? I'd can't by another PDU, currently we are very poor.
>
> Depends entirely on your tolerance for interruption. *I* answer that
> with "extremely important". However, most clusters out there have only
> IPMI-based fencing, so they would obviously say "not so important".
>
>> Is there another way to create a second fencing device, independent
> from
 the 
> ILO card ?
>>
>> Thanks.
>
> Sure, SBD would work. I've never seen IPMI not have a watchdog timer
> (and iLO is IPMI++), as one example. It's slow, and needs shared
> storage, but a small box somewhere running a small tgtd or iscsid
> should
> do the trick (note that I have never used SBD myself...).

 Slow is relative: If it takes 3 seconds from issuing the reset command
> until
 the node is dead, it's fast enough for most cases. Even a switched PDU
> has 
>>> some
 delays: The command has to be processed, the relay may "stick" a short 
>>> moment,
 the power supply's capacitors have to discharge (if you have two power 
>>> supplys,
 both need to)...  And iLOs don't really like to be powered off.

 Ulrich
>>>
>>> The way I understand SBD, and correct me if I am wrong, recovery won't
>>> begin until sometime after the watchdog timer kicks. If the watchdog
>>> timer is 60 seconds, then your cluster will hang for >60 seconds (plus
>>> fence delays, etc).
>>
>> I think it works differently: One task periodically reads ist mailbox slot 
>> for commands, and once a comment was read, it's executed immediately. Only
> if 
>> the read task does hang for a long time, the watchdog itself triggers a
> reset 
>> (as SBD seems dead). So the delay is actually made from the sum of "write 
>> delay", "read delay", "command excution".

I think you're right when sbd uses shared-storage, but there is a
watchdog-only configuration that I believe digimer was referring to.

With watchdog-only, the cluster will wait for the value of the
stonith-watchdog-timeout property before considering the fencing successful.

>> The manual page (LSES 11 SP4) states: "Set watchdog timeout to N seconds. 
>> This depends mostly on your storage latency; the majority of devices must be
> 
>> successfully read within this time, or else the node will self-fence." and 
>> "If a watchdog is used together with the "sbd" as is strongly recommended, 
>> the watchdog is activated at initial start of the sbd daemon. The watchdog
> is 
>> refreshed every time the majority of SBD devices has been successfully read.
> 
>> Using a watchdog provides additional protection against "sbd" crashing."
>>
>> Final remark: I thing the developers of sbd were under drugs (or never saw a
> 
>> UNIX program before) when designing the options. For example: "-W  Enable or
> 
>> disable use of the system watchdog to protect against the sbd processes 
>> failing and the node being left in an undefined state. Specify this once to
> 
>> enable, twice to disable." (MHO)
>>
>> Regards,
>> Ulrich
>>
>>>
>>> IPMI and PDUs can confirm fence the peer if ~5 seconds (plus fence
> delays).
>>>
>>> -- 
>>> Digimer
>>> Papers and Projects: https://alteeve.com/w/ 
>>> "I am,

Re: [ClusterLabs] Colocation of a primitive resource with a clone with limited copies

2017-04-20 Thread Ken Gaillot
On 04/20/2017 10:52 AM, Jan Wrona wrote:
> Hello,
> 
> my problem is closely related to the thread [1], but I didn't find a
> solution there. I have a resource that is set up as a clone C restricted
> to two copies (using the clone-max=2 meta attribute||), because the
> resource takes long time to get ready (it starts immediately though),

A resource agent must not return from "start" until a "monitor"
operation would return success.

Beyond that, the cluster doesn't care what "ready" means, so it's OK if
it's not fully operational by some measure. However, that raises the
question of what you're accomplishing with your monitor.

> and by having it ready as a clone, I can failover in the time it takes
> to move an IP resource. I have a colocation constraint "resource IP with
> clone C", which will make sure IP runs with a working instance of C:
> 
> Configuration:
>  Clone: dummy-clone
>   Meta Attrs: clone-max=2 interleave=true
>   Resource: dummy (class=ocf provider=heartbeat type=Dummy)
>Operations: start interval=0s timeout=20 (dummy-start-interval-0s)
>stop interval=0s timeout=20 (dummy-stop-interval-0s)
>monitor interval=10 timeout=20 (dummy-monitor-interval-10)
>  Resource: ip (class=ocf provider=heartbeat type=Dummy)
>   Operations: start interval=0s timeout=20 (ip-start-interval-0s)
>   stop interval=0s timeout=20 (ip-stop-interval-0s)
>   monitor interval=10 timeout=20 (ip-monitor-interval-10)
> 
> Colocation Constraints:
>   ip with dummy-clone (score:INFINITY)
> 
> State:
>  Clone Set: dummy-clone [dummy]
>  Started: [ sub1.example.org sub3.example.org ]
>  ip (ocf::heartbeat:Dummy): Started sub1.example.org
> 
> 
> This is fine until the the active node (sub1.example.org) fails. Instead
> of moving the IP to the passive node (sub3.example.org) with ready clone
> instance, Pacemaker will move it to the node where it just started a
> fresh instance of the clone (sub2.example.org in my case):
> 
> New state:
>  Clone Set: dummy-clone [dummy]
>  Started: [ sub2.example.org sub3.example.org ]
>  ip (ocf::heartbeat:Dummy): Started sub2.example.org
> 
> 
> Documentation states that the cluster will choose a copy based on where
> the clone is running and the resource's own location preferences, so I
> don't understand why this is happening. Is there a way to tell Pacemaker
> to move the IP to the node where the resource is already running?
> 
> Thanks!
> Jan Wrona
> 
> [1] http://lists.clusterlabs.org/pipermail/users/2016-November/004540.html

The cluster places ip based on where the clone will be running at that
point in the recovery, rather than where it was running before recovery.

Unfortunately I can't think of a way to do exactly what you want,
hopefully someone else has an idea.

One possibility would be to use on-fail=standby on the clone monitor.
That way, instead of recovering the clone when it fails, all resources
on the node would move elsewhere. You'd then have to manually take the
node out of standby for it to be usable again.

It might be possible to do something more if you convert the clone to a
master/slave resource, and colocate ip with the master role. For
example, you could set the master score based on how long the service
has been running, so the longest-running instance is always master.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] starting primitive resources of a group without starting the complete group - unclear behaviour

2017-04-20 Thread Ken Gaillot
On 04/20/2017 02:53 PM, Lentes, Bernd wrote:
> Hi,
> 
> just for the sake of completeness i'd like to figure out what happens if i 
> start one resource, which is a member of a group, but only this resource.
> I'd like to see what the other resources of that group are doing. Also if it 
> maybe does not make much sense. Just for learning and understanding.
> 
> But i'm getting mad about my test results:
> 
> first test:
> 
> crm(live)# status
> Last updated: Thu Apr 20 20:56:08 2017
> Last change: Thu Apr 20 20:46:35 2017 by root via cibadmin on ha-idg-2
> Stack: classic openais (with plugin)
> Current DC: ha-idg-2 - partition with quorum
> Version: 1.1.12-f47ea56
> 2 Nodes configured, 2 expected votes
> 14 Resources configured
> 
> 
> Online: [ ha-idg-1 ha-idg-2 ]
> 
>  Clone Set: clone_group_prim_dlm_clvmd_vg_cluster_01_ocfs2_fs_lv_xml 
> [group_prim_dlm_clvmd_vg_cluster_01_ocfs2_fs_lv_xml]
>  Started: [ ha-idg-1 ha-idg-2 ]
>  prim_stonith_ipmi_ha-idg-1 (stonith:external/ipmi):Started 
> ha-idg-2
>  prim_stonith_ipmi_ha-idg-2 (stonith:external/ipmi):Started 
> ha-idg-1
> 
> crm(live)# resource start prim_vnc_ip_mausdb
> 
> crm(live)# status
> Last updated: Thu Apr 20 20:56:44 2017
> Last change: Thu Apr 20 20:56:44 2017 by root via crm_resource on ha-idg-1
> Stack: classic openais (with plugin)
> Current DC: ha-idg-2 - partition with quorum
> Version: 1.1.12-f47ea56
> 2 Nodes configured, 2 expected votes
> 14 Resources configured
> 
> 
> Online: [ ha-idg-1 ha-idg-2 ]
> 
>  Clone Set: clone_group_prim_dlm_clvmd_vg_cluster_01_ocfs2_fs_lv_xml 
> [group_prim_dlm_clvmd_vg_cluster_01_ocfs2_fs_lv_xml]
>  Started: [ ha-idg-1 ha-idg-2 ]
>  prim_stonith_ipmi_ha-idg-1 (stonith:external/ipmi):Started 
> ha-idg-2
>  prim_stonith_ipmi_ha-idg-2 (stonith:external/ipmi):Started 
> ha-idg-1
>  Resource Group: group_vnc_mausdb
>  prim_vnc_ip_mausdb (ocf::heartbeat:IPaddr):Started ha-idg-1   
> <===
>  prim_vm_mausdb (ocf::heartbeat:VirtualDomain): Started ha-idg-1   
> <===
> 
> 
> 
> second test:
> 
> crm(live)# status
> Last updated: Thu Apr 20 21:24:19 2017
> Last change: Thu Apr 20 21:20:04 2017 by root via cibadmin on ha-idg-2
> Stack: classic openais (with plugin)
> Current DC: ha-idg-2 - partition with quorum
> Version: 1.1.12-f47ea56
> 2 Nodes configured, 2 expected votes
> 14 Resources configured
> 
> 
> Online: [ ha-idg-1 ha-idg-2 ]
> 
>  Clone Set: clone_group_prim_dlm_clvmd_vg_cluster_01_ocfs2_fs_lv_xml 
> [group_prim_dlm_clvmd_vg_cluster_01_ocfs2_fs_lv_xml]
>  Started: [ ha-idg-1 ha-idg-2 ]
>  prim_stonith_ipmi_ha-idg-1 (stonith:external/ipmi):Started 
> ha-idg-2
>  prim_stonith_ipmi_ha-idg-2 (stonith:external/ipmi):Started 
> ha-idg-1
> 
> 
> crm(live)# resource start prim_vnc_ip_mausdb
> 
> 
> crm(live)# status
> Last updated: Thu Apr 20 21:26:05 2017
> Last change: Thu Apr 20 21:25:55 2017 by root via cibadmin on ha-idg-2
> Stack: classic openais (with plugin)
> Current DC: ha-idg-2 - partition with quorum
> Version: 1.1.12-f47ea56
> 2 Nodes configured, 2 expected votes
> 14 Resources configured
> 
> 
> Online: [ ha-idg-1 ha-idg-2 ]
> 
>  Clone Set: clone_group_prim_dlm_clvmd_vg_cluster_01_ocfs2_fs_lv_xml 
> [group_prim_dlm_clvmd_vg_cluster_01_ocfs2_fs_lv_xml]
>  Started: [ ha-idg-1 ha-idg-2 ]
>  prim_stonith_ipmi_ha-idg-1 (stonith:external/ipmi):Started 
> ha-idg-2
>  prim_stonith_ipmi_ha-idg-2 (stonith:external/ipmi):Started 
> ha-idg-1
>  Resource Group: group_vnc_mausdb
>  prim_vnc_ip_mausdb (ocf::heartbeat:IPaddr):Started ha-idg-1   
> <===
>  prim_vm_mausdb (ocf::heartbeat:VirtualDomain): (target-role:Stopped) 
> Stopped   <===

target-role=Stopped prevents a resource from being started.

In a group, each member of the group depends on the previously listed
members, same as if ordering and colocation constraints had been created
between each pair. So, starting a resource in the "middle" of a group
will also start everything before it.

> 
> Once the second resource of the group is started with the first resource, the 
> other time not !?!
> Why this unclear behaviour ?
> 
> This is my configuration:
> 
> primitive prim_vm_mausdb VirtualDomain \
> params config="/var/lib/libvirt/images/xml/mausdb_vm.xml" \
> params hypervisor="qemu:///system" \
> params migration_transport=ssh \
> op start interval=0 timeout=120 \
> op stop interval=0 timeout=130 \
> op monitor interval=30 timeout=30 \
> op migrate_from interval=0 timeout=180 \
> op migrate_to interval=0 timeout=190 \
> meta allow-migrate=true is-managed=true \
> utilization cpu=4 hv_memory=8006
> 
> 
> primitive prim_vnc_ip_mausdb IPaddr \
> params ip=146.107.235.161 nic=br0 cidr_netmask=24 \
> meta target-role=Started
> 
> 
> group group_vnc_mausdb prim_vnc_ip_mausdb prim_vm_mausdb \
>  

Re: [ClusterLabs] starting primitive resources of a group without starting the complete group - unclear behaviour

2017-04-21 Thread Ken Gaillot
On 04/21/2017 04:38 AM, Lentes, Bernd wrote:
> 
> 
> - On Apr 21, 2017, at 1:24 AM, Ken Gaillot kgail...@redhat.com wrote:
> 
>> On 04/20/2017 02:53 PM, Lentes, Bernd wrote:
> 
>>
>> target-role=Stopped prevents a resource from being started.
>>
>> In a group, each member of the group depends on the previously listed
>> members, same as if ordering and colocation constraints had been created
>> between each pair. So, starting a resource in the "middle" of a group
>> will also start everything before it.
> 
> What is the other way round ? Starting the first of the group ? Will the 
> subsequent follow ?

Groups are generally intended to start and stop as a whole, so I would
expect starting any member explicitly to lead the cluster to want to
start the entire group, but I could be wrong, because only prior members
are required to be started first.

>> Everything in the group inherits this target-role=Stopped. However,
>> prim_vnc_ip_mausdb has its own target-role=Started, which overrides that.
>>
>> I'm not sure what target-role was on each resource at each step in your
>> tests, but the behavior should match that.
>>
> 
> I have to admit that i'm struggling with the meaning of "target-role".
> What does it really mean ? The current status of the resource ? The status of 
> the resource the cluster should try
> to achieve ? Both ? Nothing of this ? Could you clarify that to me ?

"try to achieve"

The cluster doesn't have any direct concept of intentionally starting or
stopping a resource, only of a desired cluster state, and it figures out
the actions needed to get there. The higher-level tools provide the
start/stop concept by setting target-role.

It's the same in practice, but the cluster "thinks" in terms of being
told what the desired end result is, not what specific actions to perform.

> 
> Thanks.
> 
> 
> Bernd
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons 
> Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] starting primitive resources of a group without starting the complete group - unclear behaviour

2017-04-21 Thread Ken Gaillot
On 04/21/2017 07:52 AM, Lentes, Bernd wrote:
> 
> 
> - On Apr 21, 2017, at 11:38 AM, Bernd Lentes 
> bernd.len...@helmholtz-muenchen.de wrote:
> 
>> - On Apr 21, 2017, at 1:24 AM, Ken Gaillot kgail...@redhat.com wrote:
>>
>>> On 04/20/2017 02:53 PM, Lentes, Bernd wrote:
>>
>>>
>>> target-role=Stopped prevents a resource from being started.
>>>
>>> In a group, each member of the group depends on the previously listed
>>> members, same as if ordering and colocation constraints had been created
>>> between each pair. So, starting a resource in the "middle" of a group
>>> will also start everything before it.
> 
> Not in each case. 
> 
> 
> I tested a bit:
> target-role of the group: stopped. (This is inherited by the primitives of 
> the group if not declared otherwise.
> If declared for the primitive otherwise this supersedes the target-role of 
> the group.)
> 
> Starting first primitive of the group. Second primitive does not start 
> because target-role is stopped (inherited by the group).
> 
> 
> Next test:
> 
> target-role of the group still "stopped". target-roles of the primitives not 
> decleared otherwise.
> Starting second primitive. First primitive does not start because target-role 
> is stopped, inherited by the group.
> Second primitive does not start because first primitive does not start, 
> although target-role for the second primitive is started.
> Because second primitive needs first one.
> 
> Is my understanding correct ?

Yes

> 
> 
> 
> Bernd
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons 
> Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Colocation of a primitive resource with a clone with limited copies

2017-04-21 Thread Ken Gaillot
On 04/21/2017 07:14 AM, Vladislav Bogdanov wrote:
> 20.04.2017 23:16, Jan Wrona wrote:
>> On 20.4.2017 19:33, Ken Gaillot wrote:
>>> On 04/20/2017 10:52 AM, Jan Wrona wrote:
>>>> Hello,
>>>>
>>>> my problem is closely related to the thread [1], but I didn't find a
>>>> solution there. I have a resource that is set up as a clone C
>>>> restricted
>>>> to two copies (using the clone-max=2 meta attribute||), because the
>>>> resource takes long time to get ready (it starts immediately though),
>>> A resource agent must not return from "start" until a "monitor"
>>> operation would return success.
>>>
>>> Beyond that, the cluster doesn't care what "ready" means, so it's OK if
>>> it's not fully operational by some measure. However, that raises the
>>> question of what you're accomplishing with your monitor.
>> I know all that and my RA respects that. I didn't want to go into
>> details about the service I'm running, but maybe it will help you
>> understand. Its a data collector which receives and processes data from
>> a UDP stream. To understand these data, it needs templates which
>> periodically occur in the stream (every five minutes or so). After
>> "start" the service is up and running, "monitor" operations are
>> successful, but until the templates arrive the service is not "ready". I
>> basically need to somehow simulate this "ready" state.
> 
> If you are able to detect that your application is ready (it already
> received its templates) in your RA's monitor, you may want to use
> transient node attributes to indicate that to the cluster. And tie your
> vip with such an attribute (with location constraint with rules).

That would be a good approach.

I'd combine it with stickiness so the application doesn't immediately
move when a "not ready" node becomes "ready".

I'd also keep the colocation constraint with the application. That helps
if a "ready" node crashes, because nothing is going to change the
attribute in that case, until the application is started there again.
The colocation constraint guarantees that the attribute is current.

> http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_using_rules_to_determine_resource_location.html#_location_rules_based_on_other_node_properties
> 
> 
> Look at pacemaker/ping RA for attr management example.
> 
> [...]

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [ClusterLabs Developers] checking all procs on system enough during stop action?

2017-04-24 Thread Ken Gaillot
On 04/24/2017 10:32 AM, Jehan-Guillaume de Rorthais wrote:
> On Mon, 24 Apr 2017 17:08:15 +0200
> Lars Ellenberg  wrote:
> 
>> On Mon, Apr 24, 2017 at 04:34:07PM +0200, Jehan-Guillaume de Rorthais wrote:
>>> Hi all,
>>>
>>> In the PostgreSQL Automatic Failover (PAF) project, one of most frequent
>>> negative feedback we got is how difficult it is to experience with it
>>> because of fencing occurring way too frequently. I am currently hunting
>>> this kind of useless fencing to make life easier.
>>>
>>> It occurs to me, a frequent reason of fencing is because during the stop
>>> action, we check the status of the PostgreSQL instance using our monitor
>>> function before trying to stop the resource. If the function does not return
>>> OCF_NOT_RUNNING, OCF_SUCCESS or OCF_RUNNING_MASTER, we just raise an error,
>>> leading to a fencing. See:
>>> https://github.com/dalibo/PAF/blob/d50d0d783cfdf5566c3b7c8bd7ef70b11e4d1043/script/pgsqlms#L1291-L1301
>>>
>>> I am considering adding a check to define if the instance is stopped even
>>> if the monitor action returns an error. The idea would be to parse **all**
>>> the local processes looking for at least one pair of
>>> "/proc//{comm,cwd}" related to the PostgreSQL instance we want to
>>> stop. If none are found, we consider the instance is not running.
>>> Gracefully or not, we just know it is down and we can return OCF_SUCCESS.
>>>
>>> Just for completeness, the piece of code would be:
>>>
>>>my @pids;
>>>foreach my $f (glob "/proc/[0-9]*") {
>>>push @pids => basename($f)
>>>if -r $f
>>>and basename( readlink( "$f/exe" ) ) eq "postgres"
>>>and readlink( "$f/cwd" ) eq $pgdata;
>>>}
>>>
>>> I feels safe enough to me. The only risk I could think of is in a shared
>>> disk cluster with multiple nodes accessing the same data in RW (such setup
>>> can fail in so many ways :)). However, PAF is not supposed to work in such
>>> context, so I can live with this.
>>>
>>> Do you guys have some advices? Do you see some drawbacks? Hazards?  
>>
>> Isn't that the wrong place to "fix" it?
>> Why did your _monitor  return something "weird"?
> 
> Because this _monitor is the one called by the monitor action. It is able to
> define if an instance is running and if it feels good.
> 
> Take the scenario where the slave instance is crashed:
>   1/ the monitor action raise an OCF_ERR_GENERIC
>   2/ Pacemaker tries a recover of the resource (stop->start)
>   3/ the stop action fails because _monitor says the resource is crashed
>   4/ Pacemaker fence the node.
> 
>> What did it return?
> 
> Either OCF_ERR_GENERIC or OCF_FAILED_MASTER as instance.
> 
>> Should you not fix it there?
> 
> fixing this in the monitor action? This would bloat the code of this function.
> We would have to add a special code path in there to define if it is called
> as a real monitor action or just as a status one for other actions.
> 
> But anyway, here or there, I would have to add this piece of code looking at
> each processes. According to you, is it safe enough? Do you see some hazard
> with it?
> 
>> Just thinking out loud.
> 
> Thank you, it helps :)

It feels odd that there is a situation where monitor should return an
error (instead of "not running"), but stop should return OK.

I think the question is whether the service can be considered cleanly
stopped at that point -- i.e. whether it's safe for another node to
become master, and safe to try starting the crashed service again on the
same node.

If it's cleanly stopped, the monitor should probably return "not
running". Pacemaker will already compare that result against the
expected state, and recover appropriately if needed.

The PID check assumes there can only be one instance of postgresql on
the machine. If there are instances bound to different IPs, or some user
starts a private instance, it could be inaccurate. But that would err on
the side of fencing, so it might still be useful, if you don't have a
way of more narrowly identifying the expected instance.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] can't live migrate VirtualDomain which is part of a group

2017-04-24 Thread Ken Gaillot
On 04/24/2017 01:52 PM, Lentes, Bernd wrote:
> 
> 
> - On Apr 24, 2017, at 8:26 PM, Bernd Lentes 
> bernd.len...@helmholtz-muenchen.de wrote:
> 
>> Hi,
>>
>> i have a primitive VirtualDomain resource which i can live migrate without 
>> any
>> problem.
>> Additionally i have an IP as a resource which i can live mirgate easily too.
>> If i combine them in a group, i can't live migrate the VirtualDomain anymore.
>>
>> It is shuted down on one node and rebooted on the other. :-(
>>
>> This is my config:
>>
>> primitive prim_vm_mausdb VirtualDomain \
>>params config="/var/lib/libvirt/images/xml/mausdb_vm.xml" \
>>params hypervisor="qemu:///system" \
>>params migration_transport=ssh \
>>params autoset_utilization_cpu=false \
>>params autoset_utilization_hv_memory=false \
>>op start interval=0 timeout=120 \
>>op stop interval=0 timeout=130 \
>>op monitor interval=30 timeout=30 \
>>op migrate_from interval=0 timeout=180 \
>>op migrate_to interval=0 timeout=190 \
>>meta allow-migrate=true is-managed=true \
>>utilization cpu=4 hv_memory=8005
>>
>>
>> primitive prim_vnc_ip_mausdb IPaddr \
>>params ip=146.107.235.161 nic=br0 cidr_netmask=24 \
>>meta is-managed=true

I don't see allow-migrate on the IP. Is this a modified IPaddr? The
stock resource agent doesn't support migrate_from/migrate_to.

>>
>> group group_vnc_vm_mausdb prim_vnc_ip_mausdb prim_vm_mausdb \
>>meta target-role=Started
>>
>>
>> Why can't i live migrate the VirtualDomain primitive being part of a group ?
>>
>> Thanks.
>>
>>
>> Bernd
>>
>>
> 
> What i found in the net:
> http://lists.clusterlabs.org/pipermail/pacemaker/2011-November/012088.html
> 
> " Yes, migration only works without order-contraints the migrating service
> depends on ... and no way to force it."

I believe this was true in pacemaker 1.1.11 and earlier.

> 
> It's not possible ?
> 
> 
> Bernd
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons 
> Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Coming in Pacemaker 1.1.17: start a node in standby

2017-04-24 Thread Ken Gaillot
Hi all,

Pacemaker 1.1.17 will have a feature that people have occasionally asked
for in the past: the ability to start a node in standby mode.

It will be controlled by an environment variable (set in
/etc/sysconfig/pacemaker, /etc/default/pacemaker, or wherever your
distro puts them):


# By default, nodes will join the cluster in an online state when they first
# start, unless they were previously put into standby mode. If this
variable is
# set to "standby" or "online", it will force this node to join in the
# specified state when starting.
# (experimental; currently ignored for Pacemaker Remote nodes)
# PCMK_node_start_state=default


As described, it will be considered experimental in this release, mainly
because it doesn't work with Pacemaker Remote nodes yet. However, I
don't expect any problems using it with cluster nodes.

Example use cases:

You want want fenced nodes to automatically start the cluster after a
reboot, so they contribute to quorum, but not run any resources, so the
problem can be investigated. You would leave
PCMK_node_start_state=standby permanently.

You want to ensure a newly added node joins the cluster without problems
before allowing it to run resources. You would set this to "standby"
when deploying the node, and remove the setting once you're satisfied
with the node, so it can run resources at future reboots.

You want a standby setting to last only until the next boot. You would
set this permanently to "online", and any manual setting of standby mode
would be overwritten at the next boot.

Many thanks to developers Alexandra Zhuravleva and Sergey Mishin, who
contributed this feature as part of a project with EMC.
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] can't live migrate VirtualDomain which is part of a group

2017-04-24 Thread Ken Gaillot
On 04/24/2017 02:33 PM, Lentes, Bernd wrote:
> 
> - On Apr 24, 2017, at 9:11 PM, Ken Gaillot kgail...@redhat.com wrote:
> 
>>>> primitive prim_vnc_ip_mausdb IPaddr \
>>>>params ip=146.107.235.161 nic=br0 cidr_netmask=24 \
>>>>meta is-managed=true
>>
>> I don't see allow-migrate on the IP. Is this a modified IPaddr? The
>> stock resource agent doesn't support migrate_from/migrate_to.
> 
> Not modified. I can migrate the resource without the group easily between the 
> nodes. And also if i try to live-migrate the whole group,
> the IP is migrated.

Unfortunately, migration is not live migration ... a resource (the VM)
can't be live-migrated if it depends on another resource (the IP) that
isn't live-migrateable.

If you modify IPaddr to be live-migrateable, it should work. It has to
support migrate_from and migrate_to actions, and advertise them in the
meta-data. It doesn't necessarily have to do anything different from
stop/start, as long as that meets your needs.

>>> What i found in the net:
>>> http://lists.clusterlabs.org/pipermail/pacemaker/2011-November/012088.html
>>>
>>> " Yes, migration only works without order-contraints the migrating service
>>> depends on ... and no way to force it."
>>
>> I believe this was true in pacemaker 1.1.11 and earlier.
>>
> 
> Then it should be possible:
> 
> ha-idg-2:~ # rpm -q pacemaker
> pacemaker-1.1.12-11.12
> 
> Bernd
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons 
> Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
> 


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] can't live migrate VirtualDomain which is part of a group

2017-04-25 Thread Ken Gaillot
On 04/25/2017 09:14 AM, Lentes, Bernd wrote:
> 
> 
> - On Apr 24, 2017, at 11:11 PM, Ken Gaillot kgail...@redhat.com wrote:
> 
>> On 04/24/2017 02:33 PM, Lentes, Bernd wrote:
>>>
>>> ----- On Apr 24, 2017, at 9:11 PM, Ken Gaillot kgail...@redhat.com wrote:
>>>
>>>>>> primitive prim_vnc_ip_mausdb IPaddr \
>>>>>>params ip=146.107.235.161 nic=br0 cidr_netmask=24 \
>>>>>>meta is-managed=true
>>>>
>>>> I don't see allow-migrate on the IP. Is this a modified IPaddr? The
>>>> stock resource agent doesn't support migrate_from/migrate_to.
>>>
>>> Not modified. I can migrate the resource without the group easily between 
>>> the
>>> nodes. And also if i try to live-migrate the whole group,
>>> the IP is migrated.
>>
>> Unfortunately, migration is not live migration ... a resource (the VM)
>> can't be live-migrated if it depends on another resource (the IP) that
>> isn't live-migrateable.
>>
>> If you modify IPaddr to be live-migrateable, it should work. It has to
>> support migrate_from and migrate_to actions, and advertise them in the
>> meta-data. It doesn't necessarily have to do anything different from
>> stop/start, as long as that meets your needs.
>>
> 
> Hi Ken,
> 
> that means i have to edit the resource agent ?

Yes, copy it to a new name, and edit that. Best practice is to create
your own subdirectory under /usr/lib/ocf/resource.d and put it there, so
you use it as ocf::

> Bernd
>  
> 
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons 
> Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
> 


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Problem with clone ClusterIP

2017-04-25 Thread Ken Gaillot
On 04/25/2017 09:32 AM, Bratislav Petkovic wrote:
> I want to make active/active cluster with two physical servers.
> 
> On the servers are installed: oraclelinux-release-7.2-1.0.5.el7.x86_64,
> 
> Pacemaker 1.1.13-10.el7, Corosync Cluster Engine, version '2.3.4',
> 
> pcs 0.9.143. Cluster starts without a problem and I create a resource
> 
> ClusterIP that is in the same subnet as the IP addresses of the servers.
> 
> After creating I access ClusterIP without problems, but I clone ClusterIP
> 
> I can no longer access the this IP.
> 
> I did everything according to instructions from clusterlab.
> 
> Each server has two network cards that are on the teaming with LACP.
> 
>  
> 
> Best regards,
> 
>  
> 
> Bratislav Petkovic

IPaddr2 cloning depends on a special feature called multicast MAC. On
the host side, this is done via iptables' clusterip capability. However
not all Ethernet switches support multicast MAC (or the administrator
disabled it), so that is a possible cause.


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Problem with clone ClusterIP

2017-04-26 Thread Ken Gaillot
On 04/26/2017 02:45 AM, Bratislav Petkovic wrote:
> Tahank you,
> 
>  
> 
> We use the Cisco Nexus 7000 switches, they support Multicast MAC.
> 
> It is possible that something is not configured correctly.
> 
> In this environment working IBM PowerHA SystemMirror 7.1 (use Multicast)
>  without problems.
>  
> 
> Regards,
> 
>  
> 
> Bratislav

I believe SystemMirror uses multicast IP, which is at a higher level
than multicast Ethernet. Multicast Ethernet is much less commonly seen,
so it's often disabled.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] IPaddr2 cloning inside containers

2017-04-26 Thread Ken Gaillot
FYI, I stumbled across a report of a suspected kernel issue breaking
iptables clusterip inside containers:

https://github.com/lxc/lxd/issues/2773

ocf:heartbeat:IPaddr2 uses clusterip when cloned. I'm guessing no one's
tried something like that yet, but this is a note of caution to anyone
thinking about it.

Pacemaker's new bundle feature doesn't support cloning the IPs it
creates, but that might be an interesting future feature if this issue
is resolved.
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] in standby but still running resources..

2017-04-27 Thread Ken Gaillot
On 04/27/2017 08:29 AM, lejeczek wrote:
> .. is this ok?
> 
> hi guys,
> 
> pcs shows no errors after I did standby node, but pcs shows resources
> still are being ran on the node I just stoodby.
> Is this normal?
> 
> 0.9.152 @C7.3
> thanks
> P.

That should happen only for as long as it takes to stop the resources
there. If it's an ongoing condition, something is wrong.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] resource group vs colocation

2017-04-27 Thread Ken Gaillot
On 04/27/2017 02:02 PM, lejeczek wrote:
> hi everyone
> 
> I have a group and I'm trying to colocate - sounds strange - order with
> the group is not how I want it.
> I was hoping that with colocation set I can reorder the resources - can
> I? Because .. something, or my is not getting there.
> I have within a group:
> 
> IP
> mount
> smb
> IP1
> 
> and I colocated sets:
> 
> set IP IP1 sequential=false set mount smb
> 
> and yet smb would not start on IP1. I see resource are still being order
> as they list.
> 
> Could somebody shed more light on what is wrong and group vs colocation
> subject?
> 
> m. thanks
> L.

A group is a shorthand for colocation and order constraints between its
members. So, you should use either a group, or a colocation set, but not
both with the same members.

If you simply want to reorder the sequence in which the group members
start, just recreate the group, listing them in the order you want. That
is, the first member of the group will be started first, then the second
member, etc.

If you prefer using sets, then don't group the resources -- use separate
colocation and ordering constraints with the sets, as desired.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] should such a resource set work?

2017-04-28 Thread Ken Gaillot
On 04/28/2017 08:17 AM, lejeczek wrote:
> hi everybody
> 
> I have a set:
> 
> set IP2 IP2 IP2 LVM(exclusive) mountpoint smb smartd sequential=true
  ^^^

Is this a typo?

> setoptions score=INFINITY
> 
> it should work, right?
> 
> yet when I standby a node and I see cluster jumps straight to mountpoint
> and fails:
> 
> Failed Actions:
> * aLocalStorage5mnt_start_0 on nodeA 'not installed' (5): call=918,
> status=complete, exitreason='Couldn't find device
> [/dev/mapper/0-raid10.A]. Expected /dev/??? to exist',
> 
> Where am I making a mistake?
> thanks
> L.

Is this in a location, colocation, or order constraint?

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Question about fence_mpath

2017-04-28 Thread Ken Gaillot
On 04/28/2017 03:37 PM, Chris Adams wrote:
> Once upon a time, Seth Reid  said:
>> This confused me too when I set up my cluster. I found that everything
>> worked better if I didn't specify a device path. I think there was
>> documentation on Redhat that led me to try removing the "device" options.
> 
> fence_mpath won't work without device(s).  However, I figured out my
> problem: I needed to set pmck_host_check=none (both nodes in my cluster
> can handle fencing).  Then everything seems to work.

You only want pcmk_host_check=none if the fence device can fence either
node. If the device can only fence one node, you want
pcmk_host_list=


___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] How to fence cluster node when SAN filesystem fail

2017-05-02 Thread Ken Gaillot
Hi,

Upstream documentation on fencing in Pacemaker is available at:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#idm139683949958512

Higher-level tools such as crm shell and pcs make it easier; see their
man pages and other documentation for details.


On 05/01/2017 10:35 PM, Albert Weng wrote:
> Hi All,
> 
> My environment :
> (1) two node (active/passive) pacemaker cluster
> (2) SAN storage attached, add resource type "filesystem"
> (3) OS : RHEL 7.2
> 
> In old version of RHEL cluster, when attached SAN storage path lost(ex.
> filesystem fail),
> active node will trigger fence device to reboot itself.
> 
> but when i use pacemaker on RHEL cluster, when i remove fiber cable on
> active node, all resources failover to passive node normally, but active
> node doesn't reboot.
> 
> how to trigger fence reboot action when SAN filesystem lost?
> 
> Thank a lot~~~
> 
> 
> -- 
> Kind regards,
> Albert Weng
> 
> 
>   不含病毒。www.avast.com

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Resources still retains in primary node

2017-05-03 Thread Ken Gaillot
On 05/03/2017 02:30 AM, pillai bs wrote:
> Hi Experts!!!
> 
>   Am having two node HA setup (Primary/Secondary) with
> separate resources for Home/data/logs/Virtual IP.. As known the Expected
> behavior should be , if Primary node went down, secondary has to take
> in-charge (meaning initially the VIP will point the primary node, so
> user can access home/data/logs from primary node.Once primary node went
> down, the VIP/floatingIP will point the secondary node so that the user
> can experienced uninterrupted service).

Yes, that's a common setup for pacemaker clusters. Did you have a
problem with it?

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Resources still retains in Primary Node even though its interface went down

2017-05-03 Thread Ken Gaillot
On 05/03/2017 02:43 AM, pillai bs wrote:
> Hi Experts!!!
> 
>   Am having two node setup for HA (Primary/Secondary) with
> separate resources for Home/data/logs/Virtual IP.. As known the Expected
> behavior should be , if Primary node went down, secondary has to take
> in-charge (meaning initially the VIP will point the primary node, so
> user can access home/data/logs from primary node.Once primary node went
> down, the VIP/floatingIP will point the secondary node so that the user
> can experienced uninterrupted service).
>  I'm using dual ring support to avoid split brain. I have
> two interfaces (Public & Private).Intention for having private interface
> is for Data Sync alone.
> 
> I have tested my setup in two different ways:
> 1. Made primary Interface down (ifdown eth0), as expected VIP and other
> resources moved from primary to secondary node.(VIP will not be
> reachable from primary node)
> 2. Made Primary Interface down (Physically unplugged the Ethernet
> Cable). The primary node still retain the resources, VIP/FloatingIP was
> reachable from primary node.
> 
> Is my testing correct?? how come the VIP will be reachable even though
> eth0 was down. Please advice!!!
> 
> Regards,
> Madhan.B

Sorry, didn't see this message before replying to the other one :)

The IP resource is successful if the IP is up *on that host*. It doesn't
check that the IP is reachable from any other site. Similarly,
filesystem resources just make sure that the filesystem can be mounted
on the host. So, unplugging the Ethernet won't necessarily make those
resources fail.

Take a look at the ocf:pacemaker:ping resource for a way to ensure that
the primary host has connectivity to the outside world. Also, be sure
you have fencing configured, so that the surviving node can kill a node
that is completely cut off or unresponsive.

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] stonith device locate on same host in active/passive cluster

2017-05-04 Thread Ken Gaillot
On 05/03/2017 09:04 PM, Albert Weng wrote:
> Hi Marek,
> 
> Thanks your reply.
> 
> On Tue, May 2, 2017 at 5:15 PM, Marek Grac  > wrote:
> 
> 
> 
> On Tue, May 2, 2017 at 11:02 AM, Albert Weng  > wrote:
> 
> 
> Hi Marek,
> 
> thanks for your quickly responding.
> 
> According to you opinion, when i type "pcs status" then i saw
> the following result of fence :
> ipmi-fence-node1(stonith:fence_ipmilan):Started cluaterb
> ipmi-fence-node2(stonith:fence_ipmilan):Started clusterb
> 
> Does it means both ipmi stonith devices are working correctly?
> (rest of resources can failover to another node correctly)
> 
> 
> Yes, they are working correctly. 
> 
> When it becomes important to run fence agents to kill the other
> node. It will be executed from the other node, so the fact where
> fence agent resides currently is not important
> 
> Does "started on node" means which node is controlling fence behavior?
> even all fence agents and resources "started on same node", the cluster
> fence behavior still work correctly?
>  
> 
> Thanks a lot.
> 
> m,

Correct. Fencing is *executed* independently of where or even whether
fence devices are running. The node that is "running" a fence device
performs the recurring monitor on the device; that's the only real effect.

> should i have to use location constraint to avoid stonith device
> running on same node ?
> # pcs constraint location ipmi-fence-node1 prefers clustera
> # pcs constraint location ipmi-fence-node2 prefers clusterb
> 
> thanks a lot

It's a good idea, so that a node isn't monitoring its own fence device,
but that's the only reason -- it doesn't affect whether or how the node
can be fenced. I would configure it as an anti-location, e.g.

   pcs constraint location ipmi-fence-node1 avoids node1=100

In a 2-node cluster, there's no real difference, but in a larger
cluster, it's the simplest config. I wouldn't use INFINITY (there's no
harm in a node monitoring its own fence device if it's the last node
standing), but I would use a score high enough to outweigh any stickiness.

> On Tue, May 2, 2017 at 4:25 PM, Marek Grac  > wrote:
> 
> Hi,
> 
> 
> 
> On Tue, May 2, 2017 at 3:39 AM, Albert Weng
> mailto:weng.alb...@gmail.com>> wrote:
> 
> Hi All,
> 
> I have created active/passive pacemaker cluster on RHEL 7.
> 
> here is my environment:
> clustera : 192.168.11.1
> clusterb : 192.168.11.2
> clustera-ilo4 : 192.168.11.10
> clusterb-ilo4 : 192.168.11.11
> 
> both nodes are connected SAN storage for shared storage.
> 
> i used the following cmd to create my stonith devices on
> each node :
> # pcs -f stonith_cfg stonith create ipmi-fence-node1
> fence_ipmilan parms lanplus="ture"
> pcmk_host_list="clustera" pcmk_host_check="static-list"
> action="reboot" ipaddr="192.168.11.10"
> login=adminsitrator passwd=1234322 op monitor interval=60s
> 
> # pcs -f stonith_cfg stonith create ipmi-fence-node02
> fence_ipmilan parms lanplus="true"
> pcmk_host_list="clusterb" pcmk_host_check="static-list"
> action="reboot" ipaddr="192.168.11.11" login=USERID
> passwd=password op monitor interval=60s
> 
> # pcs status
> ipmi-fence-node1 clustera
> ipmi-fence-node2 clusterb
> 
> but when i failover to passive node, then i ran
> # pcs status
> 
> ipmi-fence-node1clusterb
> ipmi-fence-node2clusterb
> 
> why both fence device locate on the same node ? 
> 
> 
> When node 'clustera' is down, is there any place where
> ipmi-fence-node* can be executed?
> 
> If you are worrying that node can not self-fence itself you
> are right. But if 'clustera' will become available then
> attempt to fence clusterb will work as expected.
> 
> m, 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> 
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> 
> Project Home: http://www.clusterlabs.org
> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> 

Re: [ClusterLabs] crm_mon -h (writing to a html-file) not showing all desired information and having trouble with the -d option

2017-05-08 Thread Ken Gaillot
On 05/08/2017 11:13 AM, Lentes, Bernd wrote:
> Hi,
> 
> playing around with my cluster i always have a shell with crm_mon running 
> because it provides me a lot of useful and current information concerning 
> cluster, nodes, resources ...
> Normally i have a "crm_mon -nrfRAL" running.
> I'd like to have that output as a web page too.
> So i tried the option -h.
> I have crm_mon from pacemaker 1.1.12 on a SLES 11 SP4 box. I'm writing the 
> file to /srv/www/hawk/public/crm_mon.html.
> I have hawk running, so i don't need an extra webserver for that.
> 
> First, i was very astonished when i used the option -d (daemonize). Using 
> that hawk does not find the html-file, although i see it in the fs, and it's 
> looking good.
> Hawk (or lighttpd) throws an error 404. Without -d lighttpd finds the files 
> and presents it via browser !?!
> 
> This is the file without -d:
> 
> ha-idg-2:/srv/www/hawk/public # stat crm_mon.html
>   File: `crm_mon.html'
>   Size: 1963Blocks: 8  IO Block: 4096   regular file
> Device: 1fh/31d Inode: 7082Links: 1
> Access: (0644/-rw-r--r--)  Uid: (0/root)   Gid: (0/root)
> Access: 2017-05-08 18:03:25.695754151 +0200
> Modify: 2017-05-08 18:03:20.875680374 +0200
> Change: 2017-05-08 18:03:20.875680374 +0200
> 
> 
> Same file with crm_mon -d:
> 
> ha-idg-2:/srv/www/hawk/public # stat crm_mon.html
>   File: `crm_mon.html'
>   Size: 1963Blocks: 8  IO Block: 4096   regular file
> Device: 1fh/31d Inode: 7084Links: 1
> Access: (0640/-rw-r-)  Uid: (0/root)   Gid: (0/root)

The "other" bit is gone, is that it?

> Access: 2017-05-08 18:04:16.048524856 +0200
> Modify: 2017-05-08 18:04:16.048524856 +0200
> Change: 2017-05-08 18:04:16.048524856 +0200
>  Birth: -
> 
> I see no important difference, just the different inode.
> 
> This is the access.log from lighttpd:
> 
> 10.35.34.70 ha-idg-2:7630 - [08/May/2017:18:04:10 +0200] "GET /crm_mon.html 
> HTTP/1.1" 200 563 "https://ha-idg-2:7630/crm_mon.html"; "Mozilla/5.0 (Windows 
> NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) 
> Chrome/57.0.2987.133 Safa
> ri/537.36"
> 10.35.34.70 ha-idg-2:7630 - [08/May/2017:18:04:15 +0200] "GET /crm_mon.html 
> HTTP/1.1" 200 563 "https://ha-idg-2:7630/crm_mon.html"; "Mozilla/5.0 (Windows 
> NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) 
> Chrome/57.0.2987.133 Safa
> ri/537.36"
> 10.35.34.70 ha-idg-2:7630 - [08/May/2017:18:04:20 +0200] "GET /crm_mon.html 
> HTTP/1.1" 404 1163 "https://ha-idg-2:7630/crm_mon.html"; "Mozilla/5.0 (Windows 
> NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) 
> Chrome/57.0.2987.133 Saf
> ari/537.36"
> 
> It simply changes from http status code 200 to 404. Why ? 
> 
> And using "crm_mon -nfrotRALV -h /srv/www/hawk/public/crm_mon.html" i get the 
> following output:
> 
> 
> Cluster summary
> 
> Last updated: Mon May 8 18:08:58 2017
> Current DC: ha-idg-2
> 2 Nodes configured.
> 14 Resources configured.
> Config Options
> 
> STONITH of failed nodes   :   enabled
> Cluster is:   symmetric
> No Quorum Policy  :   Ignore
> Node List
> 
> Node: ha-idg-1: online
> prim_clvmd(ocf::lvm2:clvmd):  Started 
> prim_stonith_ipmi_ha-idg-2(stonith:external/ipmi):Started 
> prim_ocfs2(ocf::ocfs2:o2cb):  Started 
> prim_vm_mausdb(ocf::heartbeat:VirtualDomain): Started 
> prim_vg_cluster_01(ocf::heartbeat:LVM):   Started 
> prim_fs_lv_xml_vm (ocf::heartbeat:Filesystem):Started 
> prim_dlm  (ocf::pacemaker:controld):  Started 
> prim_vnc_ip_mausdb(ocf::lentes:IPaddr):   Started 
> Node: ha-idg-2: online
> prim_clvmd(ocf::lvm2:clvmd):  Started 
> prim_stonith_ipmi_ha-idg-1(stonith:external/ipmi):Started 
> prim_ocfs2(ocf::ocfs2:o2cb):  Started 
> prim_vg_cluster_01(ocf::heartbeat:LVM):   Started 
> prim_fs_lv_xml_vm (ocf::heartbeat:Filesystem):Started 
> prim_dlm  (ocf::pacemaker:controld):  Started 
> Inactive Resources
> 
> I'm missing the constraints, operations and timing details. How can i get 
> them ?
> 
> 
> Bernd

The crm_mon HTML code doesn't get many reports/requests/submissions from
users, so it doesn't get a lot of attention. I wouldn't be too surprised
if there are some loose ends.

I'm not sure why those sections wouldn't appear. The code for it seems
to be there.



___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Instant service restart during failback

2017-05-08 Thread Ken Gaillot
If you look in the logs when the node comes back, there should be some
"pengine:" messages noting that the restarts will be done, and then a
"saving inputs in " message. If you can attach that file (both
with and without the constraint changes would be ideal), I'll take a
look at it.

On 04/21/2017 05:26 AM, Euronas Support wrote:
> Seems that replacing inf: with 0: in some colocation constraints fixes the 
> problem, but still cannot understand why it worked for one node and not for 
> the other.
> 
> On 20.4.2017 12:16:02 Klechomir wrote:
>> Hi Klaus,
>> It would have been too easy if it was interleave.
>> All my cloned resoures have interlave=true, of course.
>> What bothers me more is that the behaviour is asymmetrical.
>>
>> Regards,
>> Klecho
>>
>> On 20.4.2017 10:43:29 Klaus Wenninger wrote:
>>> On 04/20/2017 10:30 AM, Klechomir wrote:
 Hi List,
 Been investigating the following problem recently:

 Have two node cluster with 4 cloned (2 on top of 2) + 1 master/slave
 services on it (corosync+pacemaker 1.1.15)
 The failover works properly for both nodes, i.e. when one node is
 restarted/turned in standby, the other properly takes over, but:

 Every time when node2 has been in standby/turned off and comes back,
 everything recovers propery.
 Every time when node1 has been in standby/turned off and comes back,
 part
 of the cloned services on node2 are getting instantly restarted, at the
 same second when node1 re-appeares, without any apparent reason (only
 the
 stop/start messages in the debug).

 Is there some known possible reason for this?
>>>
>>> That triggers some deja-vu feeling...
>>> Did you have a similar issue a couple of weeks ago?
>>> I remember in that particular case 'interleave=true' was not the
>>> solution to the problem but maybe here ...
>>>
>>> Regards,
>>> Klaus
>>>
 Best regards,
 Klecho

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] pacemaker daemon shutdown time with lost remote node

2017-05-08 Thread Ken Gaillot
On 04/28/2017 02:22 PM, Radoslaw Garbacz wrote:
> Hi,
> 
> I have a question regarding pacemaker daemon shutdown
> procedure/configuration.
> 
> In my case, when a remote node is lost pacemaker needs exactly 10minutes
> to shutdown, during which there is nothing logged.
> So my questions:
> 1. What is pacemaker doing at this time?
> 2. How to make it shorter?

The logs from the other nodes will be helpful. One of the nodes will be
the DC, and will have all the scheduled commands.

Generally, in a shutdown, pacemaker first tries to stop all resources.
If one of those stops is either taking a long time or timing out, that
might explain it.

> Changed Pacemaker Configuration:
> - cluster-delay
> - dc-deadtime
> 
> 
> Pacemaker Logs:
> Apr 28 17:38:08 [17689] ip-10-41-177-183 pacemakerd:   notice:
> crm_signal_dispatch: Caught 'Terminated' signal | 15 (invoking handler)
> Apr 28 17:38:08 [17689] ip-10-41-177-183 pacemakerd:   notice:
> pcmk_shutdown_worker:Shutting down Pacemaker
> Apr 28 17:38:08 [17689] ip-10-41-177-183 pacemakerd:   notice:
> stop_child:  Stopping crmd | sent signal 15 to process 17698
> Apr 28 17:48:07 [17695] ip-10-41-177-183   lrmd: info:
> cancel_recurring_action: Cancelling ocf operation
> monitor_head_monitor_191000
> Apr 28 17:48:07 [17695] ip-10-41-177-183   lrmd: info:
> log_execute: executing - rsc:monitor_head action:stop call_id:130
> [...]
> Apr 28 17:48:07 [17689] ip-10-41-177-183 pacemakerd: info: main:   
> Exiting pacemakerd
> Apr 28 17:48:07 [17689] ip-10-41-177-183 pacemakerd: info:
> crm_xml_cleanup: Cleaning up memory from libxml2
> 
> 
> Pacemaker built from github: 1.16
> 
> 
> Help greatly appreciated.
> 
> -- 
> Best Regards,
> 
> Radoslaw Garbacz
> XtremeData Incorporated

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Antw: Antw: notice: throttle_handle_load: High CPU load detected

2017-05-08 Thread Ken Gaillot
On 05/05/2017 12:37 AM, jitendra.jaga...@dell.com wrote:
>  
> 
> Hello All,
> 
>  
> 
> Sorry for resurrecting old thread.
> 
>  
> 
> I am also observing “High CPU load detected" messages in the logs
> 
>  
> 
> In this email chain, I see everyone is suggesting to change
> "load-threshold" settings
> 
>  
> 
> But I am not able to find any good information about “load-threshold”
> except this https://www.mankier.com/7/crmd
> 
>  
> 
> Even in Pacemaker document
> “http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/pdf/Pacemaker_Explained/Pacemaker-1.1-Pacemaker_Explained-en-US.pdf”
> 
>  
> 
> There is not much detail about “load-threshold”.
> 
>  
> 
> Please can someone share steps or any commands to modify “load-threshold”.
> 
>  
> 
> Thanks
> 
> Jitendra

Hi Jitendra,

Those messages indicate there is a real issue with the CPU load. When
the cluster notices high load, it reduces the number of actions it will
execute at the same time. This is generally a good idea, to avoid making
the load worse.

The messages don't hurt anything, they just let you know that there is
something worth investigating.

If you've investigated the load and it's not something to be concerned
about, you can change load-threshold to adjust what the cluster
considers "high". The load-threshold works like this:

* It defaults to 0.8 (which means pacemaker should try to avoid
consuming more than 80% of the system's resources).

* On a single-core machine, load-threshold is multiplied by 0.6 (because
with only one core you *really* don't want to consume too many
resources); on a multi-core machine, load-threshold is multiplied by the
number of cores (to normalize the system load per core).

* That number is then multiplied by 1.2 to get the "Noticeable CPU load
detected" message (debug level), by 1.6 to get the "Moderate CPU load"
message, and 2.0 to get the "High CPU load" message. These are measured
against the 1-minute system load average (the same number you would get
with top, uptime, etc.).

So, if you raise load-threshold above 0.8, you won't see the log
messages until the load gets even higher. But, that doesn't do anything
about the actual load problem.

> *From:*Kostiantyn Ponomarenko [mailto:konstantin.ponomare...@gmail.com]
> *Sent:* Tuesday, April 5, 2016 8:37 AM
> *To:* kgail...@redhat.com
> *Cc:* Cluster Labs - All topics related to open-source clustering
> welcomed 
> *Subject:* Re: [ClusterLabs] Antw: Antw: notice: throttle_handle_load:
> High CPU load detected
> 
>  
> 
> Thank you, Ken.
> 
> This helps a lot.
> 
> Now I am sure that my current approach fits best for me =)
> 
> 
> Thank you,
> 
> Kostia
> 
>  
> 
> On Wed, Mar 30, 2016 at 11:10 PM, Ken Gaillot  <mailto:kgail...@redhat.com>> wrote:
> 
> On 03/29/2016 08:22 AM, Kostiantyn Ponomarenko wrote:
> > Ken, thank you for the answer.
> >
> > Every node in my cluster under normal conditions has "load average" of
> > about 420. It is mainly connected to the high disk IO on the system.
> > My system is designed to use almost 100% of its hardware
> (CPU/RAM/disks),
> > so the situation when the system consumes almost all HW resources is
> > normal.
> 
> 420 suggests that HW resources are outstripped -- anything above the
> system's number of cores means processes are waiting for some resource.
> (Although with an I/O-bound workload like this, the number of cores
> isn't very important -- most will be sitting idle despite the high
> load.) And if that's during normal conditions, what will happen during a
> usage spike? It sounds like a recipe for less-than-HA.
> 
> Under high load, there's a risk of negative feedback, where monitors
> time out, causing pacemaker to schedule recovery actions, which cause
> load to go higher and more monitors to time out, etc. That's why
> throttling is there.
> 
> > I would like to get rid of "High CPU load detected" messages in the
> > log, because
> > they flood corosync.log as well as system journal.
> >
> > Maybe you can give an advice what would be the best way do to it?
> >
> > So far I came up with the idea of setting "load-threshold" to 1000% ,
> > because of:
> > 420(load average) / 24 (cores) = 17.5 (adjusted_load);
> > 2 (THROTLE_FACTOR_HIGH) * 10 (throttle_load_target) = 20
> >
> > if(adjusted_load > THROTTLE_FACTOR_HIGH * throttle_load_target) {
> > crm_no

<    1   2   3   4   5   6   7   8   9   10   >