Re: [ClusterLabs] ocf scripts shell and local variables

2016-09-01 Thread Dimitri Maziuk
On 09/01/2016 10:12 AM, Dejan Muhamedagic wrote:

> There is no other way to tell the UNIX/Linux system which
> interpreter to use. Or am I missing something?

Yes: I'm not talking about how it works, I'm talking about how it
doesn't. You can tell the system which interpreter to use, but the
system doesn't have to listen.

So e.g. whoever suggested (Lars?) that on non-Linux platforms you sed
all the shebang lines to /usr/bin/bash or whatever -- that's not
guaranteed to work.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] ocf scripts shell and local variables

2016-09-01 Thread Dejan Muhamedagic
On Wed, Aug 31, 2016 at 10:39:22AM -0500, Dmitri Maziuk wrote:
> On 2016-08-31 03:59, Dejan Muhamedagic wrote:
> >On Tue, Aug 30, 2016 at 12:32:36PM -0500, Dimitri Maziuk wrote:
> 
> >>I expect you're being deliberately obtuse.
> >
> >Not sure why do you think that
> 
> Because the point I was trying to make was that having shebang line say
> #!/opt/swf/bin/bash
> does not guarantee the script will actually be interpreted by
> /opt/swf/bin/bash. For example
> 
> >When a file is sourced, the "#!" line has no special meaning
> >(apart from documenting purposes).
> 
> (sic) Or when
> 
> >I haven't read the code either, but it must be some of the
> >exec(2) system calls.
> 
> it's execl("/bin/sh", "/bin/sh", "/script/file") instead of execl(
> "script/file', ...) directly.
> 
> (As an aside, I suspect the feature where exec(2) will run the loader which
> will read the magic and load an appropriate binfmt* kernel module, may well
> also be portable between "most" systems, just like "local" is portable to
> "most" shell. I don't think posix specifies anything more than "executable
> image" and that on a strictly posix-compliant system execl( "/my/script.sh",
> ... ) will fail. I am so old that I have a vague recollection it *had to be*
> execl("/bin/sh", "/bin/sh", "/script/file") back when I learned it. But this
> going even further OT.)
> 
> My point, again, was that solutions involving shebang lines are great as
> long as you can guarantee those shebang lines are being used on all
> supported platforms at all times.

There is no other way to tell the UNIX/Linux system which
interpreter to use. Or am I missing something?

> Sourcing findif.sh from IPAddr2 is proof
> by counter-example that they aren't and you can't.

findif.sh or any other file which is to be sourced therefore
must be compatible with the greatest common denominator
(shellwise).

Thanks,

Dejan

> Dima
> 
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] When to restart the pacemakerd

2016-09-01 Thread Julien Pivotto
Hi

We are using Corosync and Pacemaker in RHEL7.2.

When we change an option in the file /etc/corosync/corosync.conf (or add
nodes), do we need to restart the pacemakerd service? Or is one of the
following commands enough?

- /usr/sbin/corosync-cfgtool -R
- pcs cluster reload corosync

Thanks

-- 
 (o-Julien Pivotto
 //\Open-Source Consultant
 V_/_   Inuits - https://www.inuits.eu


signature.asc
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [Linux-ha-dev] Announcing crmsh release 2.1.7

2016-09-01 Thread Kristoffer Grönlund
Darren Thompson  writes:

> Just a quick question:
>
> If "scripts: no-quorum-policy=ignore" is becoming depreciated, how are we
> to manage two node (e.g. test) clusters that require this work around since
> quorum state on a single node is an odd state.
>

Hi Darren,

There are better mechanisms in corosync and Pacemaker for handling two
node clusters now while still maintaining quorum.

In corosync 2, we have the two_node: 1 setting for votequorum, which
ensures that a two node cluster doesn't suffer split brain (fencing is
required for this to work properly).

There is an explanation for how this works here:

http://people.redhat.com/ccaulfie/docs/Votequorum_Intro.pdf

Somewhat related, there used to be the start-delay meta parameter which
could be set for example for sbd stonith resources, to make a
double-fencing scenario less likely. This has now been replaced by the
pcmk_delay_max parameter. For an example of how to use this, see this
pull request for sbd:

https://github.com/ClusterLabs/sbd/pull/15/commits/ca2fba836eab169f0c8cacf7f3757c0485bcfef8

Cheers,
Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Service pacemaker start kills my cluster and other NFS HA issues

2016-09-01 Thread Pablo Pines Leon
Dear Ken,

Thanks for your reply. That configuration in Ubuntu works perfectly fine, the 
problem is that in CentOS 7 for some reason I am not even able to do a "service 
pacemaker stop" of the node that is running as master (with the slave off too) 
because it will have some failed actions that don't make any sense:

Migration Summary:
* Node nfsha1:
   res_exportfs_root: migration-threshold=100 fail-count=1 last-failure='Thu
 Sep  1 09:42:43 2016'
   res_exportfs_export1: migration-threshold=100 fail-count=100 last-fai
lure='Thu Sep  1 09:42:38 2016'

Failed Actions:
* res_exportfs_root_monitor_3 on nfsha1 'not running' (7): call=79, status=c
omplete, exitreason='none',
last-rc-change='Thu Sep  1 09:42:43 2016', queued=0ms, exec=0ms
* res_exportfs_export1_stop_0 on nfsha1 'unknown error' (1): call=88, status=Tim
ed Out, exitreason='none',
last-rc-change='Thu Sep  1 09:42:18 2016', queued=0ms, exec=20001ms

So I am wondering what is different between both OSes that will cause this 
different outcome.

Kind regards


From: Ken Gaillot [kgail...@redhat.com]
Sent: 31 August 2016 17:31
To: users@clusterlabs.org
Subject: Re: [ClusterLabs] Service pacemaker start kills my cluster and other 
NFS HA issues

On 08/30/2016 10:49 AM, Pablo Pines Leon wrote:
> Hello,
>
> I have set up a DRBD-Corosync-Pacemaker cluster following the
> instructions from https://wiki.ubuntu.com/ClusterStack/Natty adapting
> them to CentOS 7 (e.g: using systemd). After testing it in Virtual

There is a similar how-to specifically for CentOS 7:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Clusters_from_Scratch/index.html

I think if you compare your configs to that, you'll probably find the
cause. I'm guessing the most important missing pieces are "two_node: 1"
in corosync.conf, and fencing.


> Machines it seemed to be working fine, so it is now implemented in
> physical machines, and I have noticed that the failover works fine as
> long as I kill the master by pulling the AC cable, but not if I issue
> the halt, reboot or shutdown commands, that makes the cluster get in a
> situation like this:
>
> Last updated: Tue Aug 30 16:55:58 2016  Last change: Tue Aug 23
> 11:49:43 2016 by hacluster via crmd on nfsha2
> Stack: corosync
> Current DC: nfsha2 (version 1.1.13-10.el7_2.4-44eb2dd) - partition with
> quorum
> 2 nodes and 9 resources configured
>
> Online: [ nfsha1 nfsha2 ]
>
>  Master/Slave Set: ms_drbd_export [res_drbd_export]
>  Masters: [ nfsha2 ]
>  Slaves: [ nfsha1 ]
>  Resource Group: rg_export
>  res_fs (ocf::heartbeat:Filesystem):Started nfsha2
>  res_exportfs_export1(ocf::heartbeat:exportfs):FAILED nfsha2
> (unmanaged)
>  res_ip (ocf::heartbeat:IPaddr2):Stopped
>  Clone Set: cl_nfsserver [res_nfsserver]
>  Started: [ nfsha1 ]
>  Clone Set: cl_exportfs_root [res_exportfs_root]
>  res_exportfs_root  (ocf::heartbeat:exportfs):FAILED nfsha2
>  Started: [ nfsha1 ]
>
> Migration Summary:
> * Node 2:
>res_exportfs_export1: migration-threshold=100
> fail-count=100last-failure='Tue Aug 30 16:55:50 2016'
>res_exportfs_root: migration-threshold=100 fail-count=1
> last-failure='Tue Aug 30 16:55:48 2016'
> * Node 1:
>
> Failed Actions:
> * res_exportfs_export1_stop_0 on nfsha2 'unknown error' (1): call=134,
> status=Timed Out, exitreason='non
> e',
> last-rc-change='Tue Aug 30 16:55:30 2016', queued=0ms, exec=20001ms
> * res_exportfs_root_monitor_3 on nfsha2 'not running' (7): call=126,
> status=complete, exitreason='no
> ne',
> last-rc-change='Tue Aug 30 16:55:48 2016', queued=0ms, exec=0ms
>
> This of course blocks it, because the IP and the NFS exports are down.
> It doesn't even recognize that the other node is down. I am then forced
> to do "crm_resource -P" to get it back to a working state.
>
> Even when unplugging the master, and booting it up again, trying to get
> it back in the cluster executing "service pacemaker start" on the node
> that was unplugged will sometimes just cause the exportfs_root resource
> on the slave to fail (but the service is still up):
>
>  Master/Slave Set: ms_drbd_export [res_drbd_export]
>  Masters: [ nfsha1 ]
>  Slaves: [ nfsha2 ]
>  Resource Group: rg_export
>  res_fs (ocf::heartbeat:Filesystem):Started nfsha1
>  res_exportfs_export1(ocf::heartbeat:exportfs):Started nfsha1
>  res_ip (ocf::heartbeat:IPaddr2):Started nfsha1
>  Clone Set: cl_nfsserver [res_nfsserver]
>  Started: [ nfsha1 nfsha2 ]
>  Clone Set: cl_exportfs_root [res_exportfs_root]
>  Started: [ nfsha1 nfsha2 ]
>
> Migration Summary:
> * Node nfsha2:
>res_exportfs_root: migration-threshold=100 fail-count=1
> last-failure='Tue Aug 30 17:18:17 2016'
> * Node nfsha1:
>
> Failed Actions:
> * res_exportfs_root_monitor_3 on nfsha2 'not running' (7): call=34,
> status=complete, exitreason='non
> e',

Re: [ClusterLabs] [Linux-ha-dev] Announcing crmsh release 2.1.7

2016-09-01 Thread Darren Thompson
Team

good work on the new version, appreciated.

Just a quick question:

If "scripts: no-quorum-policy=ignore" is becoming depreciated, how are we
to manage two node (e.g. test) clusters that require this work around since
quorum state on a single node is an odd state.

Regards





Darren Thompson

Professional Services Engineer / Consultant



Level 3, 60 City Road

Southgate, VIC 3006

Mb: 0400 640 414

Mail: darr...@akurit.com.au 
Web: www.akurit.com.au

On 1 September 2016 at 17:01, Kristoffer Grönlund 
wrote:

> Hello everyone!
>
> Today I are proud to announce the release of `crmsh` version 2.1.7!
> The major new thing in this release is a backports of the events-based
> alerts support from the 2.3 branch.
>
> Big thanks to Hideo Yamauchi for his patience and testing of the
> alerts backport.
>
> This time, the list of changes is small enough that I can add it right
> here:
>
> - high: parse: Backport of event-driven alerts parser (#150)
> - high: hb_report: Don't collect logs from journalctl if -M is set
> (bsc#990025)
> - high: hb_report: Skip lines without timestamps in log correctly
> (bsc#989810)
> - high: constants: Add maintenance to set of known attributes (bsc#981659)
> - high: utils: Avoid deadlock if DC changes during idle wait (bsc#978480)
> - medium: scripts: no-quorum-policy=ignore is deprecated (bsc#981056)
> - low: cibconfig: Don't mix up CLI name with XML tag
>
> You can also get the list of changes from the changelog:
>
> * https://github.com/ClusterLabs/crmsh/blob/2.1.7/ChangeLog
>
> Right now, I don't have a set of pre-built rpm packages for Linux
> distributions ready, but I am going to make this available soon. This
> is in particular for centOS 6.x which still relies on Python 2.6
> support which makes running the later releases there more
> difficult. These packages will most likely appear as a subrepository
> here (more details coming soon):
>
> * http://download.opensuse.org/repositories/network:/ha-
> clustering:/Stable/
>
> Archives of the tagged release:
>
> * https://github.com/ClusterLabs/crmsh/archive/2.1.7.tar.gz
> * https://github.com/ClusterLabs/crmsh/archive/2.1.7.zip
>
>
> Thank you,
>
> Kristoffer
>
> --
> // Kristoffer Grönlund
> // kgronl...@suse.com
> ___
> Linux-HA-Dev: linux-ha-...@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
>
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] ip clustering strange behaviour

2016-09-01 Thread Klaus Wenninger
On 09/01/2016 06:50 AM, Gabriele Bulfon wrote:
> Thanks, got it.
> So, is it better to use "two_node: 1" or, as suggested else where, or
> "no-quorum-policy=stop"?
As said by Ken with Corosync 2 "two_node:1" is definitely the way  to go.
"no-quorum-policy=stop" was what I was suspecting to be the cause
for your undesired behavior. So the deprecated alternative would be
"no-quorum-policy=ignore" for a 2 node cluster (with corosync version
that doesn't support two_node and back in time when quorum was
done by pacemaker directly). Once you've two_node set it actually
shouldn't matter anymore which no-quorum-policy you are using as
corosync will always do as if you had quorum.
>
> About fencing, the machine I'm going to implement the 2-nodes cluster
> is a dual machine with shared disks backend.
> Each node has two 10Gb ethernets dedicated to the public ip and the
> admin console.
> Then there is a third 100Mb ethernet connecing the two machines
> internally.
> I was going to use this last one as fencing via ssh, but looks like
> this way I'm not gonna have ip/pool/zone movements if one of the nodes
> freezes or halts without shutting down pacemaker clean.
> What should I use instead?
Don't they have any remote-management interface like IPMI?

Otherwise you could add sbd with a watchdog configured to your ssh-fencing.
With sbd you can on top use your shared disks - what is name-giving for sbd.
("storage-based death").

Instead of fencing that powers off / reboots the unresponsive machine you
can as well isolate it using the network infrastructure. If this is
desirable
might depend on how you are accessing the shared disks. The fencing
mechanism should probably ensure that the unresponsive node doesn't
mess around with them anymore.
>
> Thanks for your help,
> Gabriele
>
> 
> *Sonicle S.r.l. *: http://www.sonicle.com 
> *Music: *http://www.gabrielebulfon.com 
> *Quantum Mechanics : *http://www.cdbaby.com/cd/gabrielebulfon
>
>
>
> --
>
> Da: Ken Gaillot 
> A: users@clusterlabs.org
> Data: 31 agosto 2016 17.25.05 CEST
> Oggetto: Re: [ClusterLabs] ip clustering strange behaviour
>
> On 08/30/2016 01:52 AM, Gabriele Bulfon wrote:
> > Sorry for reiterating, but my main question was:
> >
> > why does node 1 removes its own IP if I shut down node 2 abruptly?
> > I understand that it does not take the node 2 IP (because the
> > ssh-fencing has no clue about what happened on the 2nd node), but I
> > wouldn't expect it to shut down its own IP...this would kill any
> service
> > on both nodes...what am I wrong?
>
> Assuming you're using corosync 2, be sure you have "two_node: 1" in
> corosync.conf. That will tell corosync to pretend there is always
> quorum, so pacemaker doesn't need any special quorum settings. See the
> votequorum(5) man page for details. Of course, you need fencing in
> this
> setup, to handle when communication between the nodes is broken
> but both
> are still up.
>
> >
> 
> 
> > *Sonicle S.r.l. *: http://www.sonicle.com 
> > *Music: *http://www.gabrielebulfon.com
> 
> > *Quantum Mechanics : *http://www.cdbaby.com/cd/gabrielebulfon
> >
> >
> 
> >
> >
> > *Da:* Gabriele Bulfon 
> > *A:* kwenn...@redhat.com Cluster Labs - All topics related to
> > open-source clustering welcomed 
> > *Data:* 29 agosto 2016 17.37.36 CEST
> > *Oggetto:* Re: [ClusterLabs] ip clustering strange behaviour
> >
> >
> > Ok, got it, I hadn't gracefully shut pacemaker on node2.
> > Now I restarted, everything was up, stopped pacemaker service on
> > host2 and I got host1 with both IPs configured. ;)
> >
> > But, though I understand that if I halt host2 with no grace shut of
> > pacemaker, it will not move the IP2 to Host1, I don't expect host1
> > to loose its own IP! Why?
> >
> > Gabriele
> >
> >
> 
> 
> > *Sonicle S.r.l. *: http://www.sonicle.com 
> > *Music: *http://www.gabrielebulfon.com
> 
> > *Quantum Mechanics : *http://www.cdbaby.com/cd/gabrielebulfon
> >
> >
> >
> >
> 
> --
> >
> > Da: Klaus Wenninger 
> > A: users@clusterlabs.org
> > Data: 29 

[ClusterLabs] Announcing crmsh release 2.1.7

2016-09-01 Thread Kristoffer Grönlund
Hello everyone!

Today I are proud to announce the release of `crmsh` version 2.1.7!
The major new thing in this release is a backports of the events-based
alerts support from the 2.3 branch.

Big thanks to Hideo Yamauchi for his patience and testing of the
alerts backport.

This time, the list of changes is small enough that I can add it right
here:

- high: parse: Backport of event-driven alerts parser (#150)
- high: hb_report: Don't collect logs from journalctl if -M is set (bsc#990025)
- high: hb_report: Skip lines without timestamps in log correctly (bsc#989810)
- high: constants: Add maintenance to set of known attributes (bsc#981659)
- high: utils: Avoid deadlock if DC changes during idle wait (bsc#978480)
- medium: scripts: no-quorum-policy=ignore is deprecated (bsc#981056)
- low: cibconfig: Don't mix up CLI name with XML tag

You can also get the list of changes from the changelog:

* https://github.com/ClusterLabs/crmsh/blob/2.1.7/ChangeLog

Right now, I don't have a set of pre-built rpm packages for Linux
distributions ready, but I am going to make this available soon. This
is in particular for centOS 6.x which still relies on Python 2.6
support which makes running the later releases there more
difficult. These packages will most likely appear as a subrepository
here (more details coming soon):

* http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/

Archives of the tagged release:

* https://github.com/ClusterLabs/crmsh/archive/2.1.7.tar.gz
* https://github.com/ClusterLabs/crmsh/archive/2.1.7.zip


Thank you,

Kristoffer

-- 
// Kristoffer Grönlund
// kgronl...@suse.com

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org