from:"Jan Pokorný"

Re: [ClusterLabs] pcs cluster auth returns authentication error

2016-09-05 Thread Jan Pokorný

On 26/08/16 02:14 +, Jason A Ramsey wrote:
> Well, I got around the problem, but I don’t understand the solution…
> 
> I edited /etc/pam.d/password-auth and commented out the following line:
> 
> authrequiredpam_tally2.so onerr=fail audit silent 
> deny=5 unlock_time=900
> 
> Anyone have any idea why this was interfering?

No clear idea, but...

> On 08/25/2016 03:04 PM, Jason A Ramsey wrote:
>> type=USER_AUTH msg=audit(1472154922.415:69): user pid=1138 uid=0
>> auid=4294967295 ses=4294967295 subj=system_u:system_r:initrc_t:s0
>> msg='op=PAM:authentication acct="hacluster" exe="/usr/bin/ruby"
>> hostname=? addr=? terminal=? res=failed'

First, this definitely has nothing to do with SELinux (as opposed to
"AVC" type of audit record).

As a wild guess, if you want to continue using pam_tally2 module
(seems like a good idea), I'd suggest giving magic_root option
a try (and perhaps evaluate if that would be an acceptable compromise).

-- 
Jan (Poki)


pgpkU739TmiC1.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] What cib_stats line means in logfile

2016-09-05 Thread Jan Pokorný

On 05/09/16 21:26 +0200, Jan Pokorný wrote:
> On 25/08/16 17:55 +0200, Sébastien Emeriau wrote:
>> When i check my corosync.log i see this line :
>> 
>> info: cib_stats: Processed 1 operations (1.00us average, 0%
>> utilization) in the last 10min
>> 
>> What does it mean (cpu load or just information) ?
> 
> These are just periodically (10 minutes by default, if any
> operations observed at all) emitted diagnostic summaries that
> were once considered useful, which was later reconsidered
> leading to their complete removal:
> 
> https://github.com/ClusterLabs/pacemaker/commit/73e8c89#diff-37b681fa792dfc09ec67bb0d64eb55feL306
> 
> Honestly, using as old Pacemaker as 1.1.8 (released 4 years ago)

actually, it must have been even older than that (I'm afraid to ask).

> would be a bigger concern for me.  Plenty of important fixes
> (as well as enhancements) have been added since then...

P.S. Checked my mailbox, aggregating plentiful sources such as this
list and various GitHub notifications, and found 1 other trace of
such an oudated version within this year + 2 another last year(!).

-- 
Jan (Poki)


pgpCKucUUGJKN.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] What cib_stats line means in logfile

2016-09-05 Thread Jan Pokorný

On 25/08/16 17:55 +0200, Sébastien Emeriau wrote:
> When i check my corosync.log i see this line :
> 
> info: cib_stats: Processed 1 operations (1.00us average, 0%
> utilization) in the last 10min
> 
> What does it mean (cpu load or just information) ?

These are just periodically (10 minutes by default, if any
operations observed at all) emitted diagnostic summaries that
were once considered useful, which was later reconsidered
leading to their complete removal:

https://github.com/ClusterLabs/pacemaker/commit/73e8c89#diff-37b681fa792dfc09ec67bb0d64eb55feL306

Honestly, using as old Pacemaker as 1.1.8 (released 4 years ago)
would be a bigger concern for me.  Plenty of important fixes
(as well as enhancements) have been added since then...

-- 
Jan (Poki)


pgp_r5Exg7FXk.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] converting configuration

2016-08-24 Thread Jan Pokorný

Hello ,

[First, please note that your email sending application does not
treat plaintext part of what is sent correctly -- formatting,
at least new lines, are lost, which impacts readability in both
text-only email clients and on ClusterLabs users ML archive pages:
http://oss.clusterlabs.org/pipermail/users/2016-August/003784.html
Perhaps, it could be remediated by instructing you preferred app
to compose emails in plaintext right away (or to send plaintext
only).

Lower, I reformatted your question manually...]

On 24/08/16 18:18 +0200, Gabriele Bulfon wrote:
> In my previous tests I used a prebuilt older pacemaker/heartbeat
> package with a configuration like:
> 
> primitive xstor2-stonith stonith:external/ssh-sonicle \
>   op monitor interval="25" timeout="25" start-delay="25" \
>   params hostlist="xstor2"
> primitive xstor3-stonith stonith:external/ssh-sonicle \
>   op monitor interval="25" timeout="25" start-delay="25" \
>   params hostlist="xstor3"
> location xstor2-stonith-pref xstor2-stonith -inf: xstor2
> location xstor3-stonith-pref xstor3-stonith -inf: xstor3
> property stonith-action=poweroffcommit
> 
> Now that I upgraded everything from sources and moved over to
> corosync 2, these commands are not recognized, refused with
> "primitive not supported by the RNG schema".

Element "primitive" is really long with us and definitely not going
away any time soon.  There's likely an issue elsewhere, just resulting
with such a misleading message.

> Is there any way I can easily convert my old commands into the new
> ones?

Others can advise on crm usage, or help you troubleshoot that message
first.

-- 
Jan (Poki)

pgp1CVTZhDMrr.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] pacemakerd quits after few seconds with some errors

2016-08-22 Thread Jan Pokorný

On 23/08/16 07:23 +0200, Gabriele Bulfon wrote:
> Thanks! I am using Corosync 2.3.6 and Pacemaker 1.1.4 using the 
> "--with-corosync".
> How is Corosync looking for his own version?

The situation may be as easy as building corosync from GitHub-provided
automatic tarball, which is never a good idea if upstream has its own
way of proper release delivery:
http://build.clusterlabs.org/corosync/releases/
(specific URLs are also being part of the corosync announcements
on this list)

The issue with automatic tarballs already reported:
https://github.com/corosync/corosync/issues/116

-- 
Jan (Poki)


pgpQE2w5U4jgY.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Entire Group stop on stopping of single Resource

2016-08-22 Thread Jan Pokorný

On 19/08/16 23:09 +0530, jaspal singla wrote:
> I have an resource group (ctm_service) comprise of various resources. Now
> the requirement is when one of its resource stops for soem time (10-20)
> seconds, I want entire group will be stopped.

Note that if resource is stopped _just_ for this period (in seconds)
while monitor is set to a bigger value (30 s), pacemaker may miss the
resource being intermittently stopped.

> Is it possible to achieve this in pacemaker. Please help!

Just for clarification, do you mean stopped completely within the
cluster and not just on the node the group was running when one of
its resources stopped?

>  Resource Group: ctm_service
>  FSCheck
> (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/FsCheckAgent.py):
>(target-role:Stopped) Stopped
>  NTW_IF (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/NtwIFAgent.py):
>  (target-role:Stopped) Stopped
>  CTM_RSYNC  (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/RsyncAgent.py):
>  (target-role:Stopped) Stopped
>  REPL_IF
> (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/ODG_IFAgent.py):
> (target-role:Stopped) Stopped
>  ORACLE_REPLICATOR
> (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/ODG_ReplicatorAgent.py):
> (target-role:Stopped) Stopped
>  CTM_SID
> (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/OracleAgent.py):
> (target-role:Stopped) Stopped
>  CTM_SRV(lsb:../../..//cisco/PrimeOpticalServer/HA/bin/CtmAgent.py):
>(target-role:Stopped) Stopped
>  CTM_APACHE 
> (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/ApacheAgent.py):
> (target-role:Stopped) Stopped
> 
> _
> 
> 
> This is resource and resource group properties:
> 
> 
> ___
> 
> pcs -f cib.xml.geo resource create FSCheck lsb:../../..//cisco/
> PrimeOpticalServer/HA/bin/FsCheckAgent.py op monitor id=FSCheck-OP-monitor
> name=monitor interval=30s
> pcs -f cib.xml.geo resource create NTW_IF lsb:../../..//cisco/
> PrimeOpticalServer/HA/bin/NtwIFAgent.py op monitor id=NtwIFAgent-OP-monitor
> name=monitor interval=30s
> pcs -f cib.xml.geo resource create CTM_RSYNC lsb:../../..//cisco/
> PrimeOpticalServer/HA/bin/RsyncAgent.py op monitor id=CTM_RSYNC-OP-monitor
> name=monitor interval=30s on-fail=ignore stop id=CTM_RSYNC-OP-stop
> interval=0 on-fail=stop
> pcs -f cib.xml.geo resource create REPL_IF lsb:../../..//cisco/
> PrimeOpticalServer/HA/bin/ODG_IFAgent.py op monitor id=REPL_IF-OP-monitor
> name=monitor interval=30 on-fail=ignore stop id=REPL_IF-OP-stop interval=0
> on-fail=stop
> pcs -f cib.xml.geo resource create ORACLE_REPLICATOR lsb:../../..//cisco/
> PrimeOpticalServer/HA/bin/ODG_ReplicatorAgent.py op monitor
> id=ORACLE_REPLICATOR-OP-monitor name=monitor interval=30s on-fail=ignore
> stop id=ORACLE_REPLICATOR-OP-stop interval=0 on-fail=stop
> pcs -f cib.xml.geo resource create CTM_SID lsb:../../..//cisco/
> PrimeOpticalServer/HA/bin/OracleAgent.py op monitor id=CTM_SID-OP-monitor
> name=monitor interval=30s
> pcs -f cib.xml.geo resource create CTM_SRV lsb:../../..//cisco/
> PrimeOpticalServer/HA/bin/CtmAgent.py op monitor id=CTM_SRV-OP-monitor
> name=monitor interval=30s
> pcs -f cib.xml.geo resource create CTM_APACHE lsb:../../..//cisco/
> PrimeOpticalServer/HA/bin/ApacheAgent.py op monitor
> id=CTM_APACHE-OP-monitor name=monitor interval=30s
> pcs -f cib.xml.geo resource create CTM_HEARTBEAT lsb:../../..//cisco/
> PrimeOpticalServer/HA/bin/HeartBeat.py op monitor
> id=CTM_HEARTBEAT-OP-monitor name=monitor interval=30s
> pcs -f cib.xml.geo resource create FLASHBACK  lsb:../../..//cisco/
> PrimeOpticalServer/HA/bin/FlashBackMonitor.py op monitor
> id=FLASHBACK-OP-monitor name=monitor interval=30s
> 
> 
> pcs -f cib.xml.geo resource group add ctm_service FSCheck NTW_IF CTM_RSYNC
> REPL_IF ORACLE_REPLICATOR CTM_SID CTM_SRV CTM_APACHE
> 
> pcs -f cib.xml.geo resource meta ctm_service migration-threshold=1
> failure-timeout=10 target-role=stopped

Why do you have target-role=stopped (should preferably be title-cased
"Stopped") here/is that only for the test purposes?  I ask as it may
intefere with any subsequent modifications.


P.S. The presented configuration resembles output of clufter, so any
feedback to be turned into its improvements welcome.

-- 
Jan (Poki)


pgpl7Hmkl8gdo.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster

2016-08-15 Thread Jan Pokorný

On 15/08/16 14:48 +0200, Jan Pokorný wrote:
>> On 04/08/16 07:21 PM, Dan Swartzendruber wrote:
>>> On 2016-08-04 19:03, Digimer wrote:
>>>> As for DRAC vs IPMI, no, they are not two things. In fact, I am pretty
>>>> certain that fence_drac is a symlink to fence_ipmilan. All DRAC is (same
>>>> with iRMC, iLO, RSA, etc) is "IPMI + features". Fundamentally, the fence
>>>> action; rebooting the node, works via the basic IPMI standard using the
>>>> DRAC's BMC.
>>>> 
>>>> [...]
>>> 
>>> At least on CentOS7, fence_ipmilan and fence_drac are not the same.
>>> e.g. they are both python scripts that are totally different.
>> 
>> [...] 
>> 
>> As for the two agents not being symlinked, OK. It still doesn't change
>> the core point through that both fence_ipmilan and fence_drac would be
>> acting on the same target.
> 
> Just thought I'd add some clarifications:
> 
> - in fact fence-agents upstream seems to have thrown the idea of
>   proper symlinks away before functionality to that effect was added,
>   eventually using file copies instead of symlinks, with the rationale
>   "this approach is not recommended so they regular files"

Reference needed (accidentally omitted):
https://github.com/ClusterLabs/fence-agents/commit/87266bc

>   [Marx&Oyvind, I cannot really imagine what issues this was meant to
>   solve nor why it would be not recommended (in Pacemaker, stat calls
>   are used that work with symlink targets, not the immediate link
>   files, ditto other standard file handling functions), but it seems
>   pretty non-systemic compared to, e.g., fence_xvm -> fence_virt:
>   
> https://github.com/ClusterLabs/fence-virt/blob/f1f1a2437c5b0811269b5859a5ef646f44105a88/client/Makefile.in#L39
>   and this also makes resulting packages inflated with redundant
>   scripts + man pages needlessly;  I'd make a PR for that but it
>   seems premature until the recursive make/install issue with
>   "symlinked" agents has a definitive conclusion (PR 81+82), but
>   basically, you just want 'ln -s SRC DST' instead of 'cp SRC DST']
> 
> - fence_ipmilan and fence_drac are indeed not even virtually
>   symlinked; quick and dirty way to receive this information, see
>   https://bugzilla.redhat.com/show_bug.cgi?id=1210679#c12
>   (you may need ' | tr -s " "' just after 'ls -l' command)
>   from where you can see that it is fence_idrac which is a virtual
>   symlink (same implementation) as fence_ipmilan, while fence_drac
>   is an agent on its own
> 
> 
> Hope this helps.

-- 
Jan (Poki)


pgptRj7oPo5EC.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Fencing with a 3-node (1 for quorum only) cluster

2016-08-15 Thread Jan Pokorný

> On 04/08/16 07:21 PM, Dan Swartzendruber wrote:
>> On 2016-08-04 19:03, Digimer wrote:
>>> As for DRAC vs IPMI, no, they are not two things. In fact, I am pretty
>>> certain that fence_drac is a symlink to fence_ipmilan. All DRAC is (same
>>> with iRMC, iLO, RSA, etc) is "IPMI + features". Fundamentally, the fence
>>> action; rebooting the node, works via the basic IPMI standard using the
>>> DRAC's BMC.
>>> 
>>> [...]
>> 
>> At least on CentOS7, fence_ipmilan and fence_drac are not the same.
>> e.g. they are both python scripts that are totally different.
> 
> [...] 
> 
> As for the two agents not being symlinked, OK. It still doesn't change
> the core point through that both fence_ipmilan and fence_drac would be
> acting on the same target.

Just thought I'd add some clarifications:

- in fact fence-agents upstream seems to have thrown the idea of
  proper symlinks away before functionality to that effect was added,
  eventually using file copies instead of symlinks, with the rationale
  "this approach is not recommended so they regular files"

  [Marx&Oyvind, I cannot really imagine what issues this was meant to
  solve nor why it would be not recommended (in Pacemaker, stat calls
  are used that work with symlink targets, not the immediate link
  files, ditto other standard file handling functions), but it seems
  pretty non-systemic compared to, e.g., fence_xvm -> fence_virt:
  
https://github.com/ClusterLabs/fence-virt/blob/f1f1a2437c5b0811269b5859a5ef646f44105a88/client/Makefile.in#L39
  and this also makes resulting packages inflated with redundant
  scripts + man pages needlessly;  I'd make a PR for that but it
  seems premature until the recursive make/install issue with
  "symlinked" agents has a definitive conclusion (PR 81+82), but
  basically, you just want 'ln -s SRC DST' instead of 'cp SRC DST']

- fence_ipmilan and fence_drac are indeed not even virtually
  symlinked; quick and dirty way to receive this information, see
  https://bugzilla.redhat.com/show_bug.cgi?id=1210679#c12
  (you may need ' | tr -s " "' just after 'ls -l' command)
  from where you can see that it is fence_idrac which is a virtual
  symlink (same implementation) as fence_ipmilan, while fence_drac
  is an agent on its own


Hope this helps.

-- 
Jan (Poki)


pgpF127a0x1Kd.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] unable to start mysql as a clustered service, OK stand-alone

2016-08-11 Thread Jan Pokorný

On 09/08/16 16:20 -0400, berg...@merctech.com wrote:
> I've got a 3-node CentOS6 cluster and I'm trying to add mysql 5.1 as
> a new service. Other cluster services (IP addresses, Postgresql,
> applications) work fine.
> 
> The mysql config file and data files are located on shared,
> cluster-wide storage (GPFS), as are config/data files for other
> services which work correctly.
> 
> On each node, I can successfully start mysql locally via:
>   service mysqld start
> and via:
>   rg_test test /etc/cluster/cluster.conf start service mysql
> 
> (in each case, the corresponding command with the 'stop' option will
> also successfully shut down mysql).
> 
> However, attempting to start the mysql service with clusvcadm
> results in the service failing over from one node to the next, and
> being marked as "stopped" after the last node.
> 
> Each failover happens very quickly, in about 5 seconds. I suspect
> that rgmanager isn't waiting long enough for mysql to start before
> checking if it is running and I have added startup delays in
> cluster.conf, but they don't seem to be honored. Nothing is written
> into the mysql log file at this time -- no startup or failure
> messages, which implies that the mysqld never begins to run. The
> only log entries (/var/log/messages, /var/log/cluster/*, etc)
> reference rgmanager, not the mysql process itself.
> 
> Any suggestions?

see inline below...

> RHCS components:
>   cman-3.0.12.1-78.el6.x86_64
>   luci-0.26.0-78.el6.centos.x86_64
>   rgmanager-3.0.12.1-26.el6_8.3.x86_64
>   ricci-0.16.2-86.el6.x86_64
>   corosync-1.4.7-5.el6.x86_64
> 
> 
> - /etc/cluster/cluster.conf (edited subset) 
> -
> 
> 
> 
>  config_file="/var/lib/pgsql/data/postgresql.conf" name="PostgreSQL8" 
> postmaster_user="postgres" startup_wait="25"/>
> 
>  config_file="/cluster_shared/mysql_centos6/etc/my.cnf" 
> listen_address="192.168.169.173" name="mysql" shutdown_wait="10" 
> startup_wait="30"/>
> 
>  restart_expire_time="180">
> 
> 
> 
> 
> 
> 
> --
> 
> 
> - /var/log/cluster/rgmanager.log from attempt to start 
> mysql with clusvcadm ---
> Aug 08 11:58:16 rgmanager Recovering failed service service:mysql
> Aug 08 11:58:16 rgmanager [ip] Link for eth2: Detected
> Aug 08 11:58:16 rgmanager [ip] Adding IPv4 address 192.168.169.173/24 to eth2
> Aug 08 11:58:16 rgmanager [ip] Pinging addr 192.168.169.173 from dev eth2
> Aug 08 11:58:18 rgmanager [ip] Sending gratuitous ARP: 192.168.169.173 
> c8:1f:66:e8:bb:34 brd ff:ff:ff:ff:ff:ff
> Aug 08 11:58:19 rgmanager [mysql] Verifying Configuration Of mysql:mysql
> Aug 08 11:58:19 rgmanager [mysql] Verifying Configuration Of mysql:mysql > 
> Succeed
> Aug 08 11:58:19 rgmanager [mysql] Monitoring Service mysql:mysql
> Aug 08 11:58:19 rgmanager [mysql] Checking Existence Of File 
> /var/run/cluster/mysql/mysql:mysql.pid [mysql:mysql] > Failed
> Aug 08 11:58:19 rgmanager [mysql] Monitoring Service mysql:mysql > Service Is 
> Not Running
> Aug 08 11:58:19 rgmanager [mysql] Starting Service mysql:mysql
> Aug 08 11:58:19 rgmanager [mysql] Looking For IP Address > Succeed -  IP 
> Address Found
> Aug 08 11:58:20 rgmanager [mysql] Starting Service mysql:mysql > Succeed
> Aug 08 11:58:21 rgmanager [mysql] Monitoring Service mysql:mysql
> Aug 08 11:58:21 rgmanager 1 events processed
> Aug 08 11:58:21 rgmanager [mysql] Checking Existence Of File 
> /var/run/cluster/mysql/mysql:mysql.pid [mysql:mysql] > Failed

As business of launching services used to be incredibly racy (and
often, still is), where launching scripts presumably finishes so as
to denote that the service is ready for service while that's not
entirely true as it in fact is still "just warming up" (perhaps not
even the PID file is created by then), I can imagine that hackish
workaround

> 127 # Sleep 1 sec before checking status so mysqld can start
> 128 sleep 1

may not be enough in your deployment (large DB, high load due to other
[clustered or not] services unlike in rg_test scenario...) so I'd
start with tweaking that value in /usr/share/cluster/mysql.sh to some
higher figures to see if it helps.

> Aug 08 11:58:21 rgmanager [mysql] Monitoring Service mysql:mysql > Service Is 
> Not Running
> Aug 08 11:58:21 rgmanager start on mysql "mysql" returned 7 (unspecified)
> Aug 08 11:58:21 rgmanager #68: Failed to start service:mysql; return value: 1
> Aug 08 11:58:21 rgmanager Stopping service service:mysql
> Aug 08 11:58:21 rgmanager [mysql] Verifying Configuration Of mysql:mysql
> Aug 08 11:58:21 rgmanager [mysql] Verifying Configuration Of mysql:mysql > 
> Succeed
> Aug 08 11:58:21 rgmanage

Re: [ClusterLabs] Pacemaker newbie needs advice for adding a new cluster node

2016-08-11 Thread Jan Pokorný

On 11/08/16 13:20 +0200, t...@it-hluchnik.de wrote:
> maybe someone can help me adding a node in my test cluster.
> 
> I have a working three-node test Cluster in my VirtualBox, using
> OEL7.2 and now I try to add another node but I have no plan and I
> did some kind of try & error which does not work.
> 
> [...]
> 
> I setup another host with name knoten04-hb, pcsd is started by
> systemd. My first idea was modifying /etc/corosync/corosync.conf on
> all four hosts, adding the new host.

This "let's do it manually" is more often than not shortsighted, as
there is some expert knowledge wired into the high level management
tools like pcs that is hard to mimic in full scope and in such
a coordinated way if there's not enough experience.

Also note that the tools like pcs may not be bulletproof against
necessarily all modifications behind their backs, which is exactly
what you are doing with your procedure.  On the other hand, it
brings feedback on where they could get more robust in such
unfinished/temporary configuration stages.

> When done, without any stop/start action, pcs shows me this:
> 
> # pcs status
> 
> [...]
> 
> Online: [ knoten01-hb knoten02-hb knoten03-hb ]
> 
> [...]
> 
> PCSD Status shows the new node, everything else is unchanged. I
> guess the corosync part is OK.
> 
> Next step is getting pacemaker configured. I tried:
> 
> # cibadmin --query > add_knoten04-hb_cfg
> # vi add_knoten04-hb_cfg
> # cibadmin --replace --xml-file add_knoten04-hb_cfg
> 
> The only entry I changed was this:
> 
> 
>   
>   
>   
>   <==
> 
> 
> 
> Again, this looks good:
> 
> # pcs status
> 
> [...]
> 
> Online: [ knoten01-hb knoten02-hb knoten03-hb ]
> OFFLINE: [ knoten04-hb ]
> 
> [...]
> 
> 
> knoten04-hb is known but offline, for sure. OK, let's start it:
> 
> 
> # pcs cluster start knoten04-hb
> knoten04-hb: Starting Cluster...
> 
> The funny thing: pcs status on the old nodes tell me:
> 
> [root@knoten01 ~]# pcs status
> 
> [...]
> 
> Online: [ knoten01-hb knoten02-hb knoten03-hb ]
> OFFLINE: [ knoten04-hb ]
> 
> [...]
> 
> 
> And pcs on the new node tells me that:
> 
> [root@knoten04 ~]# pcs status
> 
> [...]
> 
> Online: [ knoten04-hb ]
> OFFLINE: [ knoten01-hb knoten02-hb knoten03-hb ]
>   pcsd: active/enabled
> 
> [...]
> 
> This is obviously no valid cluster. So what am I doing wrong? How to
> add the node, getting a working four-node cluster?

# pcs cluster node add --help

It should take care of everything you really need, incl. corosync
configuration reload, etc.

-- 
Jan (Poki)


pgp4Zes61GiQz.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Unable to Build fence-agents from Source on RHEL6

2016-08-10 Thread Jan Pokorný

On 10/08/16 10:12 -0500, Dmitri Maziuk wrote:
> On 2016-08-10 10:04, Jason A Ramsey wrote:
> 
>> Traceback (most recent call last):
>> 
>> File "eps/fence_eps", line 14, in 
>> 
>> if sys.version_info.major > 2:
>> 
>> AttributeError: 'tuple' object has no attribute 'major'
> 
> 
> Replace with sys.version_info[0]

Actually, this and other Python 2.6 compatibility fix, both
originated by Lars (@lge), are already pending in the queue:
https://github.com/ClusterLabs/fence-agents/pull/82
hence are expected to eventually hit the tree (possibly
through PR 81).

-- 
Jan (Poki)


pgpkOm1IbA1Te.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Unable to Build fence-agents from Source on RHEL6

2016-08-10 Thread Jan Pokorný

On 10/08/16 20:50 +0200, Jan Pokorný wrote:
> On 10/08/16 16:52 +, Jason A Ramsey wrote:
>> Installing the openwsman-python package doesn’t work. Configure’ing
>> the fence-agents source tree fails because it still can’t find the
>> pywsman module. I thought that it might be because it’s looking in
>> /usr/lib/python-x/site-packages rather than
>> /usr/lib64/python-x/site-packages (or vice versa…I can’t remember…)
>> but when I looked at the output, it was definitely looking in the
>> directory that had pywsman.py/pyc/pyo or whatever it was.
> 
> Hmm, here's why:
> 
> # python -c 'import pywsman'  
>> Traceback (most recent call last):   
>>  
>>   File "", line 1, in
>>  
>>   File "/usr/lib/python2.6/site-packages/pywsman.py", line 25, in
>>  
>> _pywsman = swig_import_helper()  
>>  
>>   File "/usr/lib/python2.6/site-packages/pywsman.py", line 17, in 
>> swig_import_helper  
>> import _pywsman  
>>  
>> ImportError: /usr/lib64/python2.6/site-packages/_pywsman.so: undefined 
>> symbol: SWIG_exception 
> 
> it should also answer your "it’s not in the default yum repos" point
> you've raised as indeed, both libwsman-devel and openwsman-python
> are in "optional" repository with RHEL 6, meaning packages as-are,
> without liabilities (mostly enabling the supported ones to be built;
> openwsman-python in particular can be just a never triggered byproduct
> when the important sibling packages, perhaps build prerequisites,
> got built).

FYI, this was already reported some years ago:
https://bugzilla.redhat.com/824277

> So either stick with upstream provided version for binaries + bindings
> or you may have better luck with RHEL 7 (slash derivatives).
> 
> Anyway, you got me (indirectly) to make this PR against FAs:
> https://github.com/ClusterLabs/fence-agents/pull/84

-- 
Jan (Poki)


pgpaGqcl6Y7gJ.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Unable to Build fence-agents from Source on RHEL6

2016-08-10 Thread Jan Pokorný

On 10/08/16 16:52 +, Jason A Ramsey wrote:
> Installing the openwsman-python package doesn’t work. Configure’ing
> the fence-agents source tree fails because it still can’t find the
> pywsman module. I thought that it might be because it’s looking in
> /usr/lib/python-x/site-packages rather than
> /usr/lib64/python-x/site-packages (or vice versa…I can’t remember…)
> but when I looked at the output, it was definitely looking in the
> directory that had pywsman.py/pyc/pyo or whatever it was.

Hmm, here's why:

# python -c 'import pywsman'  
> Traceback (most recent call last):
> 
>   File "", line 1, in 
> 
>   File "/usr/lib/python2.6/site-packages/pywsman.py", line 25, in 
> 
> _pywsman = swig_import_helper()   
> 
>   File "/usr/lib/python2.6/site-packages/pywsman.py", line 17, in 
> swig_import_helper  
> import _pywsman   
> 
> ImportError: /usr/lib64/python2.6/site-packages/_pywsman.so: undefined 
> symbol: SWIG_exception 

it should also answer your "it’s not in the default yum repos" point
you've raised as indeed, both libwsman-devel and openwsman-python
are in "optional" repository with RHEL 6, meaning packages as-are,
without liabilities (mostly enabling the supported ones to be built;
openwsman-python in particular can be just a never triggered byproduct
when the important sibling packages, perhaps build prerequisites,
got built).

So either stick with upstream provided version for binaries + bindings
or you may have better luck with RHEL 7 (slash derivatives).

Anyway, you got me (indirectly) to make this PR against FAs:
https://github.com/ClusterLabs/fence-agents/pull/84

-- 
Jan (Poki)


pgp2Z567JhhHS.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Unable to Build fence-agents from Source on RHEL6

2016-08-10 Thread Jan Pokorný

On 09/08/16 20:20 +, Jason A Ramsey wrote:
> Here’s the output I now get out of pip install pywsman:
> 
> < stupiderrormessage >
> 
> # pip install pywsman
> DEPRECATION: Python 2.6 is no longer supported by the Python core team, 
> please upgrade your Python. A future version of pip will drop support for 
> Python 2.6
> Collecting pywsman
> /usr/lib/python2.6/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:318:
>  SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name 
> Indication) extension to TLS is not available on this platform. This may 
> cause the server to present an incorrect TLS certificate, which can cause 
> validation failures. You can upgrade to a newer version of Python to solve 
> this. For more information, see 
> https://urllib3.readthedocs.org/en/latest/security.html#snimissingwarning.
>   SNIMissingWarning
> /usr/lib/python2.6/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:122:
>  InsecurePlatformWarning: A true SSLContext object is not available. This 
> prevents urllib3 from configuring SSL appropriately and may cause certain SSL 
> connections to fail. You can upgrade to a newer version of Python to solve 
> this. For more information, see 
> https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
>   InsecurePlatformWarning
>   Using cached pywsman-2.5.2-1.tar.gz
> Building wheels for collected packages: pywsman
>   Running setup.py bdist_wheel for pywsman ... error
>   Complete output from command /usr/bin/python -u -c "import setuptools, 
> tokenize;__file__='/tmp/pip-build-bvG1Jf/pywsman/setup.py';exec(compile(getattr(tokenize,
>  'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" 
> bdist_wheel -d /tmp/tmp3R7Zz5pip-wheel- --python-tag cp26:
>   No version.i.in file found -- Building from sdist.
>   /usr/lib/python2.6/site-packages/setuptools/dist.py:364: UserWarning: 
> Normalizing '2.5.2-1' to '2.5.2.post1'
> normalized_version,
>   running bdist_wheel
>   running build
>   running build_ext
>   building '_pywsman' extension
>   swigging openwsman.i to openwsman_wrap.c
>   swig -python -I/tmp/pip-build-bvG1Jf/pywsman -I/usr/include/openwsman 
> -features autodoc -o openwsman_wrap.c openwsman.i
>   wsman-client.i:44: Warning(504): Function _WsManClient must have a return 
> type.
>   wsman-client.i:61: Warning(504): Function _WsManClient must have a return 
> type.
>   creating build
>   creating build/temp.linux-x86_64-2.6
>   gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall 
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector 
> --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv 
> -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
> -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE 
> -fPIC -fwrapv -fPIC -I/tmp/pip-build-bvG1Jf/pywsman -I/usr/include/openwsman 
> -I/usr/include/python2.6 -c openwsman.c -o 
> build/temp.linux-x86_64-2.6/openwsman.o
>   gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall 
> -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector 
> --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv 
> -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
> -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE 
> -fPIC -fwrapv -fPIC -I/tmp/pip-build-bvG1Jf/pywsman -I/usr/include/openwsman 
> -I/usr/include/python2.6 -c openwsman_wrap.c -o 
> build/temp.linux-x86_64-2.6/openwsman_wrap.o
>   openwsman_wrap.c: In function ‘_WsXmlDoc_string’:
>   openwsman_wrap.c:3225: warning: implicit declaration of function 
> ‘ws_xml_dump_memory_node_tree_enc’
>   openwsman_wrap.c: In function ‘__WsXmlNode_size’:
>   openwsman_wrap.c:3487: warning: implicit declaration of function 
> ‘ws_xml_get_child_count_by_qname’
>   openwsman_wrap.c: In function ‘epr_t_cmp’:
>   openwsman_wrap.c:3550: warning: passing argument 2 of ‘epr_cmp’ discards 
> qualifiers from pointer target type
>   /usr/include/openwsman/wsman-epr.h:163: note: expected ‘struct epr_t *’ but 
> argument is of type ‘const struct epr_t *’
>   openwsman_wrap.c: In function ‘epr_t_string’:
>   openwsman_wrap.c:3556: warning: implicit declaration of function 
> ‘epr_to_string’
>   openwsman_wrap.c:3556: warning: return makes pointer from integer without a 
> cast
>   openwsman_wrap.c: In function ‘epr_t_selector_names’:
>   openwsman_wrap.c:3575: error: ‘key_value_t’ undeclared (first use in this 
> function)
>   openwsman_wrap.c:3575: error: (Each undeclared identifier is reported only 
> once
>   openwsman_wrap.c:3575: error: for each function it appears in.)
>   openwsman_wrap.c:3575: error: ‘p’ undeclared (first use in this function)
>   openwsman_wrap.c: In function ‘_WS_CONTEXT_parse_enum_request’:
>   openwsman_wrap.c:3821: error: too many arguments to function 
> ‘wsman_parse_enum_request’
>   openwsman_wrap.c: In function ‘WsManTransport_get_username’:
>   openwsman_

[ClusterLabs] [Announce] clufter v0.59.5 released

2016-08-08 Thread Jan Pokorný

I am happy to announce that clufter, a tool/library for transforming
and analyzing cluster configuration formats, got its version 0.59.5
tagged and released (incl. signature using my 60BCBB4F5CD7F9EF key):


or alternative (original) location:



The test suite is the same as for 0.59.1 as nothing changed there:

or alternatively:


Changelog highlights for v0.59.5 (also available as a tag message):

- bug fix release straightening cmd-wrap filter/command format up
- feature extensions:
  . when *2pcscmd or cmd-wrap commands colorize the output, "meta"
words for the shell are distinguished as well (related to the
first bug that this very handling also helped in nailing down)
- bug fixes:
  . {ccs,pcs}2pcscmd* commands would previously emit an incorrectly
quoted command (in the self-check the following sequence of
commands is indeed being run on the to-be-clustered machine)
unless -g or --noguidance option was used;
in turn the respective internals received a considerable
overhaul to be able to cope with nested commands
(command/process substitution) better;
for users of previous releases, the remedy is to use this -g switch
or to pass --noop=cmd-wrap to suppress the faulty filter from
the pipeline (at the expense of not-so-easy-to-consume output)
  . *2pcscmd* commands would previously omit colorizing some parts
of pcs syntax, contrary to the predestined expectations; e.g.:

pcs -f tmp-cib.xml constraint colocation SERVICE-foo rule ...

* * *

The public repository (notably master and next branches) is currently at

(rather than ).

Official, signed releases can be found at
 or, alternatively, at

(also beware, automatic git archives preserve a "dev structure").

Natively packaged in Fedora (python-clufter, clufter-cli).

Issues & suggestions can be reported at either of (regardless if Fedora)
,
.


Happy clustering/high-availing :)

-- 
Jan (Poki)


pgpJvnrqAi7QA.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] [Announce] clufter v0.59.4 released

2016-08-02 Thread Jan Pokorný

I am happy to announce that clufter, a tool/library for transforming
and analyzing cluster configuration formats, got its version 0.59.4
tagged and released (incl. signature using my 60BCBB4F5CD7F9EF key):


or alternative (original) location:



The test suite is the same as for 0.59.1 as nothing changed there:

or alternatively:


Changelog highlights for v0.59.4 (also available as a tag message):

- bug fix release straightening recent changes up
- feature extensions:
  . when *2pcscmd commands colorize the output, upcoming booth-related
pcs commands are considered as well
- bug fixes:
  . {ccs,pcs}2pcscmd* would previously exceed a recursion limit due
to not catching the bottom of the recursion properly
  . with *2pcscmd* commands, --dump=cmd-annotate switch would
previously cause troubles as newly introduced Nothing format
lacked "hash" property (which is used to construct reasonably
unique file name to store the intermediate result at individual
phases of the filter-piping process)

* * *

The public repository (notably master and next branches) is currently at

(rather than ).

Official, signed releases can be found at
 or, alternatively, at

(also beware, automatic git archives preserve a "dev structure").

Natively packaged in Fedora (python-clufter, clufter-cli).

Issues & suggestions can be reported at either of (regardless if Fedora)
,
.


Happy clustering/high-availing :)

-- 
Jan (Poki)


pgp_wEcXjLBPk.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] [Announce] clufter v0.59.3 released

2016-07-29 Thread Jan Pokorný

I am happy to announce that clufter, a tool/library for transforming
and analyzing cluster configuration formats, got its version 0.59.3
tagged and released (incl. signature using my 60BCBB4F5CD7F9EF key):


or alternative (original) location:



The test suite is the same as for 0.59.1 as nothing changed there:

or alternatively:


Changelog highlights for v0.59.3 (also available as a tag message):

- release enriching "pcs commands" output with colors at terminal output
- feature extensions:
  . *2pcscmd commands will now colorize the output (very plain support,
more to be expected) if either a terminal is used as a sink (and
colors not explicitly forbidden) or if this is enforced; so far
only shell comments and some parts of pcs syntax are supported,
but even in this form, it should help users to wrap their heads
(eyes) around what can be considered quite a complex output
from the first sight

* * *

The public repository (notably master and next branches) is currently at

(rather than ).

Official, signed releases can be found at
 or, alternatively, at

(also beware, automatic git archives preserve a "dev structure").

Natively packaged in Fedora (python-clufter, clufter-cli).

Issues & suggestions can be reported at either of (regardless if Fedora)
,
.


Happy clustering/high-availing :)

-- 
Jan (Poki)


pgpL_rQYaMOGF.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] [Announce] clufter v0.59.2 released

2016-07-28 Thread Jan Pokorný

Hello ,

I am happy to announce that clufter, a tool/library for transforming
and analyzing cluster configuration formats, got its version 0.59.1
tagged and released (incl. signature using my 60BCBB4F5CD7F9EF key):


or alternative (original) location:



The test suite is the same as for 0.59.1 as nothing changed there:

or alternatively:


Changelog highlights for v0.59.2 (also available as a tag message):

- release enriching "pcs commands" output with some context of use
- feature extensions:
  . *2pcscmd commands now first emit a comment block containing key
pieces of information about the run, such as a current date,
library version, the overall command that was executed, and
importantly (more and more), the target system specification
(this utilizes a new, dedicated cmd-annotate filter)
- internal enhancements:
  . so far, all formats used to represent concrete information
representable in various pertaining forms;  generator type
of filters (such as mentioned cmd-annotate) imposed the
existence of a special "empty" format (analogous to "void"
in C) for generators to map from into something useful,
so this release introduces "Nothing" format and makes sure
it's generally usable throughout the internals just as well

* * *

The public repository (notably master and next branches) is currently at

(rather than ).

Official, signed releases can be found at
 or, alternatively, at

(also beware, automatic git archives preserve a "dev structure").

Natively packaged in Fedora (python-clufter, clufter-cli).

Issues & suggestions can be reported at either of (regardless if Fedora)
,
.


Happy clustering/high-availing :)

-- 
Jan (Poki)


pgpbhSh4wZ8Cs.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] [Announce] clufter v0.59.1 released

2016-07-26 Thread Jan Pokorný

I am happy to announce that clufter, a tool/library for transforming
and analyzing cluster configuration formats, got its version 0.59.1
tagged and released (incl. signature using my 60BCBB4F5CD7F9EF key):


or alternative (original) location:



The test suite for this version is also provided:

or alternatively:


Changelog highlights for v0.59.1 (also available as a tag message):

- mostly a maintenance release with a small new feature and some fixes
- bug fixes:
  . internal: facts.infer_dist('*') results would previously diverge
from results for a specific query
  . X.04 Ubuntu's versioning fanciness is now ditched in favor of
canonical (sic) X.4 versioning; this was not a buggy behavior
per se, but internally, 04 used to be parsed as octal literal
which would go bananas with expressions like '08';
to maintain compatibility, respective string-held X.04 aliases
were introduced, though
- feature extensions:
  . there is now --list-dists option to clufter that is intended
mainly to suggest as to which --dist option values (note this
has an increased importance, at least as of the previous
release) are supporte
- internal enhancements:
  . at various places, in-XSLT branching was turned into external
decisions about the parameters to feed XSLT processing proper

* * *

The public repository (notably master and next branches) is currently at

(rather than ).

Official, signed releases can be found at
 or, alternatively, at

(also beware, automatic git archives preserve a "dev structure").

Natively packaged in Fedora (python-clufter, clufter-cli).

Issues & suggestions can be reported at either of (regardless if Fedora)
,
.


Happy clustering/high-availing :)

-- 
Jan (Poki)


pgpYxcIZWCw0y.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] [Announce] clufter v0.59.0 released

2016-07-22 Thread Jan Pokorný

I am happy to announce that clufter, a tool/library for transforming
and analyzing cluster configuration formats, got its version 0.59.0
released and published (incl. signature using my 60BCBB4F5CD7F9EF key):


or alternative (original) location:



The test suite for this version is also provided:

or alternatively:


Changelog highlights for v0.59.0:
- this is a feature extension and bug fix release
- bug fixes:
  . previously, pcs2pcscmd* commands would attempt to have quorum
device configured using "pcs quorum add" whereas the correct syntax
is "pcs quorum device add"
  . with {cib,pcs}2pcscmd* commands, clufter no longer chokes on
validation failures (unless --nocheck provided) due to source CIB
file using newer "validate-with" validation version specification
than supported so far, such as with pacemaker-2.5 introducing the
alert handlers stanza in CIB, because the support has been extended
up that very version (only affects deployments that do not borrow
the schemas from the installed pacemaker on-the-fly during a build
stage, which is not the case when building RPMs using the upstream
specfile)
- feature extensions:
  . {cib,pcs}2pcscmd* commands are now aware of configured alert
handlers in CIB and able to emit respective configuration
commands using pcs tool
- functional changes:
  . due to too many moving targets (corosync, pacemaker, pcs) with
features being gradually added, clufter as of this release
relies on the specified distribution target (which basically boils
down to snapshot of the supported features, as opposed to passing
zillion extra parameters expressing the same) stronger than ever;
this has several implications: do not expect that one sequence
of pcs commands at the clufter's output is portable to completely
different environment, and your distribution/setup may not be
supported (I try to cover Fedora, RHEL+derivates, Debian and Ubuntu
directly) in which case facts.py (where everything is tracked)
needs to be patched

* * *

The public repository (notably master and next branches) is currently at

(rather than ).

Official, signed releases can be found at
 or, alternatively, at

(also beware, automatic git archives preserve a "dev structure").

Natively packaged in Fedora (python-clufter, clufter-cli).

Issues & suggestions can be reported at either of (regardless if Fedora)
,
.


Happy clustering/high-availing :)

-- 
Jan (Poki)


pgpcGdzXFqU9D.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Pacemaker in puppet with cib.xml?

2016-07-22 Thread Jan Pokorný

On 21/07/16 21:51 +0200, Jan Pokorný wrote:
> Yes, it's counterintuitive to have this asymmetry and it could be
> made to work with some added effort at the side of pcs with
> the original, disapproved, sequence as-is, but that's perhaps
> sound of the future per the referenced pcs bug.
> So take this idiom as a rule of thumb not to be questioned
> any time soon.

...at least until something better is around:
https://bugzilla.redhat.com/1359057 (open for comments)

-- 
Jan (Poki)


pgprGseIVSXop.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Pacemaker in puppet with cib.xml?

2016-07-21 Thread Jan Pokorný

On 21/07/16 16:02 -0400, Stephano-Shachter, Dylan wrote:
> So I should be using "pcs cluster cib > file" to get the config and then
> "pcs cluster cib-push --config file" to push it?

If you are going to change the file using "pcs -f " in the
interim, definitely.

It's perhaps more intuitive to use "pcs cluster cib " form,
but whatever you like.

> Also I shouldn't have to add --config to the pcs -f commands right?

True, --config only applies to those cib/cib-push commands
(and should be avoided/used, respectively as explained).

> On Thu, Jul 21, 2016 at 3:51 PM, Jan Pokorný  wrote:
> 
>> On 21/07/16 13:52 -0500, Ken Gaillot wrote:
>>> On 07/21/2016 01:35 PM, Stephano-Shachter, Dylan wrote:
>>>> I want to put the pacemaker config for my two node cluster in puppet
>>>> but, since it is just one cluster, it seems overkill to use the corosync
>>>> module. If I just have puppet push cib.xml to each machine, will that
>>>> work? To make changes, I would just use pcs to update things and then
>>>> copy cib.xml back to puppet. I am not sure what happens when you change
>>>> cib.xml while the cluster is running. Is it safe?
>>> 
>>> No, pacemaker checksums the CIB and won't accept a file that isn't
>>> properly signed. Also, the cluster automatically synchronizes changes
>>> made to the CIB across all nodes, so there is no need to push changes
>>> more than once.
>>> 
>>> Since you're using pcs, the update process could go like this:
>>> 
>>>   # Get the current configuration:
>>>   pcs cluster cib --config > cib-new.xml
>> 
>> As I feel guilty for contributing to this misconception with clufter
>> "pcs commands" output at one point (also see
>> https://bugzilla.redhat.com/1328078; still part of the blame
>> is in pcs I believe: https://bugzilla.redhat.com/1328066),
>> something has just started screaming in me:
>> 
>> DO NOT USE pcs cluster cib WITH --config LIKE SUGGESTED, BUT RATHER:
>> 
>> pcs cluster cib > cib-new.xml
>> 
>>>   # Make changes:
>>>   pcs -f cib-new.xml 
>>>   
>> 
>> ...as otherwise the modifications like this ^ would fail.
>> 
>>>   # Upload the configuration changes to the cluster:
>>>   pcs cluster cib-push --config cib-new.xml
>> 
>> Note that with cib-push, --config is OK, moreover it's vital as you
>> really don't want to propagate stale status section and what not
>> when changing modifying configuration.
>> 
>> Yes, it's counterintuitive to have this asymmetry and it could be
>> made to work with some added effort at the side of pcs with
>> the original, disapproved, sequence as-is, but that's perhaps
>> sound of the future per the referenced pcs bug.
>> So take this idiom as a rule of thumb not to be questioned
>> any time soon.
>> 
>>> Using "--config" is important so you only work with the configuration
>>> section of the CIB, and not the dynamically determined cluster
>>> properties and status.
>> 
>> (This, apparently, justifies just the cib-push use.)
>> 
>>> 
>>> The first and last commands can be done on any one node, with the
>>> cluster running. The "pcs -f" commands can be done anywhere/anytime.

-- 
Jan (Poki)


pgpiLRX9bi11r.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Pacemaker in puppet with cib.xml?

2016-07-21 Thread Jan Pokorný

On 21/07/16 13:52 -0500, Ken Gaillot wrote:
> On 07/21/2016 01:35 PM, Stephano-Shachter, Dylan wrote:
>> I want to put the pacemaker config for my two node cluster in puppet
>> but, since it is just one cluster, it seems overkill to use the corosync
>> module. If I just have puppet push cib.xml to each machine, will that
>> work? To make changes, I would just use pcs to update things and then
>> copy cib.xml back to puppet. I am not sure what happens when you change
>> cib.xml while the cluster is running. Is it safe?
> 
> No, pacemaker checksums the CIB and won't accept a file that isn't
> properly signed. Also, the cluster automatically synchronizes changes
> made to the CIB across all nodes, so there is no need to push changes
> more than once.
> 
> Since you're using pcs, the update process could go like this:
> 
>   # Get the current configuration:
>   pcs cluster cib --config > cib-new.xml

As I feel guilty for contributing to this misconception with clufter
"pcs commands" output at one point (also see
https://bugzilla.redhat.com/1328078; still part of the blame
is in pcs I believe: https://bugzilla.redhat.com/1328066),
something has just started screaming in me:

DO NOT USE pcs cluster cib WITH --config LIKE SUGGESTED, BUT RATHER:

pcs cluster cib > cib-new.xml

>   # Make changes:
>   pcs -f cib-new.xml 
>   

...as otherwise the modifications like this ^ would fail.

>   # Upload the configuration changes to the cluster:
>   pcs cluster cib-push --config cib-new.xml

Note that with cib-push, --config is OK, moreover it's vital as you
really don't want to propagate stale status section and what not
when changing modifying configuration.

Yes, it's counterintuitive to have this asymmetry and it could be
made to work with some added effort at the side of pcs with
the original, disapproved, sequence as-is, but that's perhaps
sound of the future per the referenced pcs bug.
So take this idiom as a rule of thumb not to be questioned
any time soon.

> Using "--config" is important so you only work with the configuration
> section of the CIB, and not the dynamically determined cluster
> properties and status.

(This, apparently, justifies just the cib-push use.)

> 
> The first and last commands can be done on any one node, with the
> cluster running. The "pcs -f" commands can be done anywhere/anytime.

-- 
Jan (Poki)


pgpDt2Pl7rGO3.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] agent ocf:pacemaker:controld

2016-07-18 Thread Jan Pokorný

> On 18/07/16 07:59, Da Shi Cao wrote:
>> dlm_controld is very tightly coupled with cman.

Wrong assumption.

In fact, support for shipping ocf:pacemaker:controld has been
explicitly restricted to cases when CMAN logic (specifically the
respective handle-all initscript that is in turn, in that limited use
case, triggered from pacemaker's proper one and, moreover, takes
care of dlm_controld management on its own so any subsequent attempts
to do the same would be ineffective) is _not_ around:

https://github.com/ClusterLabs/pacemaker/commit/6a11d2069dcaa57b445f73b52f642f694e55caf3
(accidental syntactical typos were fixed later on:
https://github.com/ClusterLabs/pacemaker/commit/aa5509df412cb9ea39ae3d3918e0c66c326cda77)

>> I have built a cluster purely with
>> pacemaker+corosync+fence_sanlock. But if agent
>> ocf:pacemaker:controld is desired, dlm_controld must exist! I can
>> only find it in cman.
>> Can the command dlm_controld be obtained without bringing in cman?

To recap what others have suggested:

On 18/07/16 08:57 +0100, Christine Caulfield wrote:
> There should be a package called 'dlm' that has a dlm_controld suitable
> for use with pacemaker.

On 18/07/16 17:26 +0800, Eric Ren wrote:
> DLM upstream hosted here:
>   https://git.fedorahosted.org/cgit/dlm.git/log/
> 
> The name of DLM on openSUSE is libdlm.

-- 
Jan (Poki)


pgpQSA_WNXHUv.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] [Announce] clufter v0.58.0 released

2016-07-15 Thread Jan Pokorný

I am happy to announce that clufter, a tool/library for transforming
and analyzing cluster configuration formats, got its version 0.58.0
released and published (incl. signature using my 60BCBB4F5CD7F9EF key):


or alternative (original) location:



The test suite for this version is also provided:

or alternatively:


Changelog highlights for v0.58.0:
- this is a feature extension and possibly a bug fix release
- bug fixes:
  . the upstream-suggested (meta) specfile and its form of advising
how to run Python intepreter (added with previous release)
turned to cause issues if setuptools are not recent enough,
so double-check that no extra double-quoting is injected
in any case
[https://github.com/pypa/setuptools/issues/188]
  . some internal-only negligence was fixed to match the design
intentions (may affect too relaxed 3rd party plugins)
- feature extensions:
  . pcs2pcscmd* commands are now aware of quorum device configured
in corosync.conf and are able to emit respective configuration
commands using pcs tool

* * *

The public repository (notably master and next branches) is currently at

(rather than ).

Official, signed releases can be found at
 or, alternatively, at

(also beware, automatic git archives preserve a "dev structure").

Natively packaged in Fedora (python-clufter, clufter-cli).

Issues & suggestions can be reported at either of (regardless if Fedora)
,
.


Happy clustering/high-availing :)

-- 
Jan (Poki)


pgpsL4ddwsfyY.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Clusvcadm -Z substitute in Pacemaker

2016-07-13 Thread Jan Pokorný

On 13/07/16 12:50 +0200, emmanuel segura wrote:
> using pcs resource unmanage leave the monitoring resource actived, I
> usually set the monitor interval=0 :)

Some time ago, I've filed a bug against pcs for it to perform these
two steps in one go: https://bugzilla.redhat.com/1303969
This slowly becomes a recurring topic.

-- 
Jan (Poki)


pgpvGdACfO7fy.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] [Announce] clufter-0.57.0 (+0.56.3) released

2016-07-07 Thread Jan Pokorný

On 07/07/16 15:16 +0200, Jan Pokorný wrote:
> (somehow I managed to mangle the changelog, also in the tag message
> where I just copied the lines from, so to reconcile that,
> modifications inline below)

(gotta switch from holiday mode again)

> On 01/07/16 22:57 +0200, Jan Pokorný wrote:
>> Changelog highlights for v0.56.3:
>> - this is a bug fix release
>> - bug fixes:
>>   . with *2pcscmd* commands, clufter no longer suggests
>> "pcs cluster cib  --config" that doesn't currently work for
>> subsequent local-modification pcs commands (which is the purpose
>> together with sequence-crowning cib-push in this context), so
>> rather use mere "pcs cluster cib "
> 
>  [resolves: rhbz#1328078]

v-- actually this was a mis-paste from v0.56.1

>  . with *2pcscmd* commands, clufter no longer chokes on
> 
>> validation failures (unless --nocheck provided) due to source CIB
>> file using newer "validate-with" validation version specification
>> than the only supported so far (pacemaker-1.2.rng) or possibly
>> using a syntax not compatible with that; now also 2.0, 2.3 and 2.4
>> versions are supported, and the specfile is ready to borrow the
>> schemas from the installed pacemaker on-the-fly during a build stage

^-- actually this was a mis-paste from v0.56.1

>>   . with [cp]cs2pcscmd commands, clufter no longer suggests
>> "pcs cluster start --all --wait=-1"  as part of the emitted command
>> sequence  (last option decides, through a failure, whether pcs accepts
>> a numeric argument there, which would then make the rest of sequence
>> use this recent, more elegant provision of pcs instead of "sleep")
>> without suppressing both standard and error outputs so as to prevent
>> unnecessary clutter with newer, compatible versions of pcs

-- 
Jan (Poki)


pgpzrW1RrgYCw.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Doing reload right

2016-07-07 Thread Jan Pokorný

On 05/07/16 10:07 -0500, Ken Gaillot wrote:
> On 07/04/2016 03:52 AM, Vladislav Bogdanov wrote:
>> 01.07.2016 18:26, Ken Gaillot wrote:
>> 
>> [...]
>> 
>>> You're right, "parameters" or "params" would be more consistent with
>>> existing usage. "Instance attributes" is probably the most technically
>>> correct term. I'll vote for "reload-params"
>> 
>> May be "reconfigure" fits better? This would at least introduce an
>> action name which does not intersect with LSB/systemd/etc.
>> 
>> "reload" is for service itself as admin would expect, "reconfigure" is
>> for its controlling resource.
>> 
>> [...]
> 
> I like "reconfigure", but then the new parameter attribute should be
> "reconfigurable", which could be confusing. "All my parameters are
> reconfigurable!" I suppose we could call the attribute
> "change_triggers_reconfigure".

# old contestants

reload-attributes
reload-parameters
reload-params   reloadable

reconfigure reconfigurable

# new contestants
# (in bigger distance from "reload" and in some cases with a slight
# personification of "agent")

reflect reflectable
reconcile   reconcilable
consult consultable # Prolog influence
pullpullable# git/pcs/... influence
sinksinkable# as in "sink in"
comprehend  comprehensible

[add yours for this next round of brainstorming]

-- 
Jan (Poki)


pgpKD4WodNBis.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] [Announce] clufter-0.57.0 (+0.56.3) released

2016-07-07 Thread Jan Pokorný

(somehow I managed to mangle the changelog, also in the tag message
where I just copied the lines from, so to reconcile that,
modifications inline below)

On 01/07/16 22:57 +0200, Jan Pokorný wrote:
> Changelog highlights for v0.56.3:
> - this is a bug fix release
> - bug fixes:
>   . with *2pcscmd* commands, clufter no longer suggests
> "pcs cluster cib  --config" that doesn't currently work for
> subsequent local-modification pcs commands (which is the purpose
> together with sequence-crowning cib-push in this context), so
> rather use mere "pcs cluster cib "

 [resolves: rhbz#1328078]
 . with *2pcscmd* commands, clufter no longer chokes on

> validation failures (unless --nocheck provided) due to source CIB
> file using newer "validate-with" validation version specification
> than the only supported so far (pacemaker-1.2.rng) or possibly
> using a syntax not compatible with that; now also 2.0, 2.3 and 2.4
> versions are supported, and the specfile is ready to borrow the
> schemas from the installed pacemaker on-the-fly during a build stage

 [resolves: rhbz#1300014]

>   . with [cp]cs2pcscmd commands, clufter no longer suggests
> "pcs cluster start --all --wait=-1"  as part of the emitted command
> sequence  (last option decides, through a failure, whether pcs accepts
> a numeric argument there, which would then make the rest of sequence
> use this recent, more elegant provision of pcs instead of "sleep")
> without suppressing both standard and error outputs so as to prevent
> unnecessary clutter with newer, compatible versions of pcs

-- 
Jan (Poki)


pgp4eyZ6W3eOh.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] [Announce] clufter-0.57.0 (+0.56.3) released

2016-07-01 Thread Jan Pokorný

I am happy to announce that clufter-0.57.0, a tool/library for
transforming/analyzing cluster configuration formats, has been
released and published (incl. signature using my 60BCBB4F5CD7F9EF key,
expiration of which was prolonged just a few days back so you may
want to consult key servers first):


or alternative (original) location:



The test suite for this version is also provided:

or alternatively:


[interpolate the same for v0.56.3 that arrived just a little bit earlier
and is detailed below as well for completeness]

Changelog highlights for v0.56.3:
- this is a bug fix release
- bug fixes:
  . with *2pcscmd* commands, clufter no longer suggests
"pcs cluster cib  --config" that doesn't currently work for
subsequent local-modification pcs commands (which is the purpose
together with sequence-crowning cib-push in this context), so
rather use mere "pcs cluster cib "
validation failures (unless --nocheck provided) due to source CIB
file using newer "validate-with" validation version specification
than the only supported so far (pacemaker-1.2.rng) or possibly
using a syntax not compatible with that; now also 2.0, 2.3 and 2.4
versions are supported, and the specfile is ready to borrow the
schemas from the installed pacemaker on-the-fly during a build stage
[resolves: rhbz#1328078]
  . with [cp]cs2pcscmd commands, clufter no longer suggests
"pcs cluster start --all --wait=-1"  as part of the emitted command
sequence  (last option decides, through a failure, whether pcs accepts
a numeric argument there, which would then make the rest of sequence
use this recent, more elegant provision of pcs instead of "sleep")
without suppressing both standard and error outputs so as to prevent
unnecessary clutter with newer, compatible versions of pcs

Changelog highlights for v0.57.0:
- this is a feature extension and bug fix release
- bug fixes:
  . with *2pcscmd* commands, clufter would previously emit doubled
"pcs" at the beginning for the command defining simple order
constraint
  . with *2pcscmd* commands, clufter would previously omit and/or
logic operators between each pair of atomic expressions
forming a rule for location constraint
  . with  *2pcscmd* commands, clufter would previously disregard
master/slave roles correctly encoded with a capitalized first
letter in CIB for colocation and location constraints
- feature extensions:
  . with *2pcscmd* commands, clufter now supports resource sets
for colocation and order constraints
  . with *2pcscmd* commands, clufter now supports ticket contraints
(incl. resource sets)

* * *

The public repository (notably master and next branches) is currently at

(rather than ).

Official, signed releases can be found at
 or, alternatively, at

(also beware, automatic archives by GitHub preserve a "dev structure").

Natively packaged in Fedora (python-clufter, clufter-cli).

Issues & suggestions can be reported at either of (regardless if Fedora)
,

(rather than ).


Happy clustering/high-availing :)

-- 
Jan (Poki)


pgpuR09YrsBrB.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antw: Doing reload right

2016-07-01 Thread Jan Pokorný

On 01/07/16 09:23 +0200, Ulrich Windl wrote:
 Ken Gaillot  schrieb am 30.06.2016 um 18:58 in 
 Nachricht
> <57754f9f.8070...@redhat.com>:
>> I've been meaning to address the implementation of "reload" in Pacemaker
>> for a while now, and I think the next release will be a good time, as it
>> seems to be coming up more frequently.
>> 
>> In the current implementation, Pacemaker considers a resource parameter
>> "reloadable" if the resource agent supports the "reload" action, and the
>> agent's metadata marks the parameter with "unique=0". If (only) such
>> parameters get changed in the resource's pacemaker configuration,
>> pacemaker will call the agent's reload action rather than the
>> stop-then-start it usually does for parameter changes.
>> 
>> This is completely broken for two reasons:
> 
> I agree ;-)
> 
>> 
>> 1. It relies on "unique=0" to determine reloadability. "unique" was
>> originally intended (and is widely used by existing resource agents) as
>> a hint to UIs to indicate which parameters uniquely determine a resource
>> instance. That is, two resource instances should never have the same
>> value of a "unique" parameter. For this purpose, it makes perfect sense
>> that (for example) the path to a binary command would have unique=0 --
>> multiple resource instances could (and likely would) use the same
>> binary. However, such a parameter could never be reloadable.
> 
> I tought unique=0 were reloadable (unique=1 were not)...

I see a doubly-distorted picture here:
- actually "unique=1" on a RA parameter (together with this RA supporting
  "reload") currently leads to reload-on-change
- also the provided example shows why reload for "unique=0" is wrong,
  but as the opposite applies as of current state, it's not an argument
  why something is broken

See also:
https://github.com/ClusterLabs/pacemaker/commit/2f5d44d4406e9a8fb5b380cb56ab8a70d7ad9c23

>> 2. Every known resource agent that implements a reload action does so
>> incorrectly. Pacemaker uses reload for changes in the resource's
>> *pacemaker* configuration, while all known RAs use reload for a
>> service's native reload capability of its own configuration file. As an
>> example, the ocf:heartbeat:named RA calls "rndc reload" for its reload
>> action, which will have zero effect on any pacemaker-configured
>> parameters -- and on top of that, the RA uses "unique=0" in its correct
>> UI sense, and none of those parameters are actually reloadable.

(per the last subclause, applicable also, after mentioned inversion, for
"unique=1", such as a pid file path, which cannot be reloadable for
apparent reason)

> Maybe LSB confusion...

That's not entirely fair vindication, as when you have to do some
extra actions with parameters in LSB-aliased "start" action in the
RA, you should do such reflections also for "reload".

>> My proposed solution is:
>> 
>> * Add a new "reloadable" attribute for resource agent metadata, to
>> indicate reloadable parameters. Pacemaker would use this instead of
>> "unique".
> 
> No objections if you change the XML metadata version number this time ;-)

Good point, but I guess everyone's a bit scared to open this Pandora
box as there's so much technical debt connected to that (unifying FA/RA
metadata if possible, adding new UI-oriented annotations, pacemaker's
silent additions like "private" parameter).
I'd imagine an established authority for OCF matters (and maintaing
https://github.com/ClusterLabs/OCF-spec) and at least partly formalized
process inspired by Python PEPs for coordinated development:
https://www.python.org/dev/peps/pep-0001/



>> * Add a new "reload-options" RA action for the ability to reload
>> Pacemaker-configured options. Pacemaker would call this instead if "reload".
> 
> Why not "reload-parameters"?

That came to my mind as well.  Or not wasting time/space on too many
letters, just "reload-params", perhaps.



>> * Formalize that "reload" means reload the service's own configuration,
>> legitimizing the most common existing RA implementations. (Pacemaker
>> itself will not use this, but tools such as crm_resource might.)
> 
> Maybe be precise what your "reload-options" is expected to do,
> compared to the "reload" action.  I'm still a bit confused. Maybe a
> working example...

IIUIC, reload-options should first reflect the parameters as when
"start" is invoked, then delegate the responsibility to something
that triggers as native reload as possible (which was mentioned
is commonly [and problematically] implemented directly in current
"reload" actions of common RAs).

>> * Review all ocf:pacemaker and ocf:heartbeat agents to make sure they
>> use unique, reloadable, reload, and reload-options properly.
>> 
>> The downside is that this breaks backward compatibility. Any RA that
>> actually implements unique and reload so that reload works will lose
>> reload capability until it is updated to the new style.
> 
> Maybe there's a solution that is even simpler: Keep the action name,
> but

Re: [ClusterLabs] [corosync][pacemaker] runtime issues, while it builds OK

2016-07-01 Thread Jan Pokorný

On 30/06/16 11:26 +0200, Bogdan Dobrelya wrote:
> On 06/30/2016 11:16 AM, Bogdan Dobrelya wrote:
>> I'm building the libqb -> corosync -> pacemaker and commit each as a
>> separate docker container layer. Then I run the latter and spawn a
>> corosync with a simple config, then a pacemaker instance.
>> 
>> The issue is, that when I just run processes, the pacemaker complains
>> it has an ipc issues (strace dump):
>> 
>> connect(6, {sa_family=AF_LOCAL, sun_path=@"corosync.ipc"}, 110) = -1
>> ECONNREFUSED (Connection refused)
>> 
>> According to that I've found on that topic, it seems related to the
>> wrong libqb/corosync versions the pacemaker app was build for.
>> 
>> BUT, if I first *rebuild* libqb -> corosync -> pacemaker, then
>> *relaunch* corosync -> pacemaker, it works like a charm! I have no
>> explanation to that strange magic. Any ideas, why it is so?
>> 
>> Ofc, I can w/a that by a containers entry point like:
>> a corosync: build libqb -> corosync, then launch
>> a pacemaker: build libqb -> corosync -> pacemaker, then launch
>> 
>> But that'd be a really ugly band aiding and is a very long to start as well.
>> 
>> Note, the libqb/corosync builds are from a signed tarballs, the
>> pacemaker - from its github tag checked-out as I want to "play" with a
>> latest builds from trunc.
>> 
> 
> Hm, it seems the issue is with the artifacts make produces then I
> execute a libqb->corosync->pacemaker chain. If I build containers from
> mounted repos, throwing them out at the commit stage and leaving only
> the resulting artifacts, I have that strange behavior I've described
> above. But if I commit the build dirs into containers as well, it starts
> working w/o additional runtime rebuilds!
> 
> I thought the artifacts make produces are self contained. It seems they
> are not :(

Are you launching corosync + pacemaker directly from their build
directories (with some LD_LIBRARY_PATH magic so that the necessary
libraries are found) or do you "make install" for all three and go
from here?
Is there anything else that might be considered nonstandard?

The only idea at the moment would be to compare ./configure outputs
in a lucky and unlucky cases.

-- 
Jan (Poki)


pgpd16xbgLfJh.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Pacemaker 1.1.15 released

2016-06-22 Thread Jan Pokorný

On 21/06/16 18:20 -0500, Ken Gaillot wrote:
> ClusterLabs is proud to announce the latest release of the Pacemaker
> cluster resource manager, version 1.1.15. The source code is available at:
> 
> https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.15

Once again, to check this final release out using Fedora/EPEL builds,
there's a COPR link for your convenience (you can also stick with
repo file downloaded from Overview page and install the packages
in a common way):

https://copr.fedorainfracloud.org/coprs/jpokorny/pacemaker/

Fedora rawhide will be updated shortly, F23 + F24 will follow.

I'd also like to point out that there's a build hardening PR in the
pipeline (only last two patches are interesting):
https://github.com/ClusterLabs/pacemaker/pull/1081
(COPR build under jpokorny/pacemaker-testing:
https://copr.fedorainfracloud.org/coprs/jpokorny/pacemaker-testing/build/360934/)
so any feedback on that, especially if you package Pacemaker for
distro not covered in COPR, is welcome.

-- 
Jan (Poki)

pgpt1o61b11Dq.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Alert notes

2016-06-17 Thread Jan Pokorný

On 15/06/16 18:45 +0200, Klaus Wenninger wrote:
> On 06/15/2016 06:11 PM, Ferenc Wágner wrote:
>> Did you think about filtering the environment variables passed to the
>> alert scripts?  NOTIFY_SOCKET probably shouldn't be present, and PATH
>> probably shouldn't contain sbin directories; I guess all these are
>> inherited from systemd in my case.
> 
> It is just what crmd comes along with ... but interesting point ...

... and having Shellshock vulnerability in mind, also a little bit
worring (yes, even nowadays).

(that being said, I've already presented my subversive opinion that
shell introduces more headaches than reasonable, as using it may be
most natural and with almost no barriers to entry, but it's actually quite
hard to make scripts bullet-proof; say chances the script will be derailed
just with a space-contained [not talking about quotes] parameter are
quite high: http://clusterlabs.org/pipermail/users/2015-May/000403.html)

-- 
Jan (Poki)


pgpjRoiHqKzCJ.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] dovecot RA

2016-06-08 Thread Jan Pokorný

On 07/06/16 14:48 -0500, Dimitri Maziuk wrote:
> next question: I'm on centos 7 and there's no more /etc/init.d/ anything>. With lennartware spreading, is there a coherent plan to deal
> with former LSB agents?

Pacemaker can drive systemd-managed services for quite some time.

Provided that the project/daemon you care about carries the unit
file, you can use that unless there are distinguished roles for the
provided service within the cluster (like primary+replicas), there's
a need to run multiple varying instances of the same service,
or other cluster-specific features are desired.

For dovecot, I can see:
# rpm -ql dovecot | grep \.service
/usr/lib/systemd/system/dovecot.service 

> Specifically, should I roll my own RA for dovecot or is there one in the
> works somewhere?

If you miss something with the generic approach per above, and there's
no fitting open-sourced RA around then it's probably your last resort.

For instance, there was once an agent written in C (highly unusual),
but seems abandoned a long time ago:
https://github.com/perrit/dovecot-ocf-resource-agent

-- 
Jan (Poki)

pgpQc0V6UzOwG.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Error "xml does not conform to the schema" upon "pcs cluster standby" command

2016-06-03 Thread Jan Pokorný

Hello Nikhil,

On 03/06/16 16:33 +0530, Nikhil Utane wrote:
> The node is up alright.
> 
> [root@airv_cu pcs]# pcs cluster status
> Cluster Status:
>  Stack: corosync
>  Current DC: airv_cu (version 1.1.14-5a6cdd1) - partition WITHOUT quorum
>  Last updated: Fri Jun  3 11:01:32 2016 Last change: Fri Jun  3
> 09:57:52 2016 by hacluster via crmd on airv_cu
>  2 nodes and 0 resources configured
> 
> Upon entering command "pcs cluster standby airv_cu" getting below error.
> Error: cannot load cluster status, xml does not conform to the schema.
> 
> What could be wrong?

if you have a decently recent versions of both pacemaker and pcs (ca 3
months old or newer) it's entirely possible that this commit will
resolve it for you on the pacemaker side:

https://github.com/ClusterLabs/pacemaker/pull/1040/commits/87a82a165ccacaf1a0c48b5e1fad684a8dd2d8c9

I'm just about to provide update to the expected test results and then
it (the whole pull request) is expected to land soon after that.

-- 
Jan (Poki)


pgpMOIycmzA3c.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Can't get nfs4 to work.

2016-06-02 Thread Jan Pokorný

On 02/06/16 02:35 +0200, Dennis Jacobfeuerborn wrote:
> On 01.06.2016 20:25, Stephano-Shachter, Dylan wrote:
>> I have just finished setting up my HA nfs cluster and I am having a small
>> problem. I would like to have nfs4 working but whenever I try to mount I
>> get the following message,
>> 
>> mount: no type was given - I'll assume nfs because of the colon
> 
> I'm not sure if the type "nfs" is supposed to work with v4 as well but
> on my systems the mounts use the explicit type "nfs4" so you can try
> mounting with "-t nfs4".

$ rpm -qf $(man -w mount.nfs)
> nfs-utils-1.3.3-7.rc4.fc22.x86_64

$ man mount.nfs | fmt -w70 | grep -A2 Under
>   Under Linux 2.6.32 and later kernel versions, mount.nfs can
>   mount all NFS file system versions.  Under earlier Linux
>   kernel versions, mount.nfs4 must be used  for mounting NFSv4
>   file systems while mount.nfs must be used for NFSv3 and v2.

-- 
Jan (Poki)


pgpGOFtGYIpvL.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] fence_sanlock and pacemaker

2016-05-30 Thread Jan Pokorný

[adding sanlock-devel ML into the loop as deemed appropriate]

On 18/05/16 12:49 +, Da Shi Cao wrote:
> After some try and error, fence_sanlock can be used as a stonith
> resource in pacemaker+corosync.
> 1. Add a "monitor" action, which is exactly the same action as
> "status".

Per documented API (already pacemaker-aware) for fence agents:
https://fedorahosted.org/cluster/wiki/FenceAgentAPI#agent_ops

Also it looks like the resource relying on this fence agent should
declare "provides='unfencing'" as its meta-attribute(?),
but that falls into the configuration domain.

> 2. Make "status" action return "false" if a resource belongs to a
> host is acquired and owned by another host. It returned "true"
> erroneously since it didn't make a test on the owner id of a
> resource in version 3.3.0.

Might be a bug?

(I only have a very vague idea how sanlock works)

> 3. Make fence_sanlockd try for several times before it failed if the
> resource for a host is owned by another host. This gives a time
> window for the resource to be released manually at the other host.
> 
> Sometimes a resource of a host get locked permanently by another
> host if the "off" action failed, often in time out.

Can you remove ambiguities in the above sentence (on which of the
machines the "off" action failed, on "another host"?), please?

(again, not that I will be very helpful afterwards, but someone could)

-- 
Jan (Poki)


pgpmvAnctODw6.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Pacemaker 1.1.15 - Release Candidate 3

2016-05-27 Thread Jan Pokorný

On 27/05/16 16:21 -0500, Ken Gaillot wrote:
> The third release candidate for Pacemaker version 1.1.15 is now
> available at:
> 
> https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.15-rc3

Once gain, to check this release candidate out using Fedora/EPEL builds,
there's a COPR link for your convenience (you can also stick with
repo file downloaded from Overview page and install the packages
in a common way):

https://copr.fedorainfracloud.org/coprs/jpokorny/pacemaker/

Fedora rawhide will be updated shortly and the -rc3 build may also
take a few minutes (from now) to finish.

> Perhaps the most visible change since 1.1.15-rc2 is that many log
> messages have been made more user-friendly. Partly this is due to taking
> advantage of the "extended information logging" feature of libqb 0.17.2
> and greater (if installed on the build machine); the same message may be
> logged to the system log without developer-oriented details, and to the
> pacemaker detail log with the extra detail.
> 
> This release candidate includes multiple bugfixes since 1.1.15-rc2, most
> importantly:
> 
> * In 1.1.14, the controld resource agent was modified to return a
> monitor error when DLM is in the "wait fencing" state. This turned out
> to be too aggressive, resulting in fencing the monitored node
> unnecessarily if a slow fencing operation against another node was in
> progress. The agent now does additional checking to determine whether to
> return an error or not.
> 
> * A bug introduced in 1.1.14, resulting in the have-watchdog property
> always being set to true, has been fixed. The cluster now properly
> checks for a running sbd process.
> 
> * A regression introduced in 1.1.15-rc1 has been fixed. When a node ID
> is reused, attrd would have problems setting attributes for the new node.
> 
> Everyone is encouraged to download, compile and test the new release.
> Your feedback is important and appreciated. I am aiming for one more
> release candidate, with the final release in mid- to late June.

-- 
Jan (Poki)


pgpLgrkWGr10X.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] cluster stops randomly

2016-05-23 Thread Jan Pokorný

On 21/05/16 04:46 +, H Yavari wrote:
> I have a cluster and it works good, but I see sometimes cluster is
> stopped on all nodes and I should start manually. pcsd service is
> running but cluster is stopped.I see the pacemaker log but I
> couldn't find any warning or error. what is the issue? 
> (stonith is disable.)

- disabled stonith/fencing not set up is high risk rather than high
  availability in majority of the cases

- is "cluster was started and stopped inadvertently" what you mean?

- please provide the part of the log around the moment cluster ceased
  to work properly plus cluster's configuration (we are not good in
  telephatic remote access yet)

-- 
Jan (Poki)


pgpE3A7OKHtX4.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Pacemaker 1.1.15 - Release Candidate 2

2016-05-16 Thread Jan Pokorný

On 16/05/16 10:48 -0500, Ken Gaillot wrote:
> The second release candidate for Pacemaker version 1.1.15 is now
> available at:
> 
> https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.15-rc2
> 
> The most interesting changes since 1.1.15-rc1 are:
> 
> * With the new "alerts" feature, the "tstamp_format" attribute has been
> renamed to "timestamp-format" and properly defaults to "%H:%M:%S.%06N".
> 
> * A regression introduced in 1.1.15-rc1 has been fixed. After a cluster
> partition, node attribute values might not be properly re-synchronized
> among nodes.
> 
> * The SysInfo resource now automatically sets the #health_disk node
> attribute back to "green" if free disk space recovers after becoming too
> low.
> 
> * Other minor bug fixes.

Once gain, to check this release candidate out using Fedora/EPEL builds,
there's a COPR link[*] for your convenience (you can also stick with
repo file downloaded from Overview page and install the packages
in a common way):

https://copr.fedorainfracloud.org/coprs/jpokorny/pacemaker/

Fedora rawhide will be updated shortly.

> Everyone is encouraged to download, compile and test the new release.
> Your feedback is important and appreciated. I am aiming for one or two
> more release candidates, with the final released in mid- to late June.

-- 
Jan (Poki)


pgplu0czdzH6a.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Trouble with deb packaging from 1.12 to 1.15

2016-05-16 Thread Jan Pokorný

Hello Andrey,

On 16/05/16 18:50 +0300, Andrey Rogovsky wrote:
> I have deb rules, comes from 1.12 and try apply it to current release.
> In the building I get an error:
> dh_testroot -a
> rm -rf `pwd`/debian/tmp/usr/lib/service_crm.so
> rm -rf `pwd`/debian/tmp/usr/lib/service_crm.la
> rm -rf `pwd`/debian/tmp/usr/lib/service_crm.a
> dh_install --sourcedir=debian/tmp --list-missing
> dh_install: pacemaker missing files (usr/lib*/heartbeat/attrd), aborting
> 
> I was check buildroot - this directory and symlinks is missing
> Is this correct? May be I need add they manual?

I think your best bet is joining forces with Debian HA maintainers:
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/debian-ha-maintainers

(I know some of them follow this very list as well, but your question
seems too specific for particular packaging method)

-- 
Jan (Poki)


pgpYeFyq6OkJU.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] help compiling pacemaker 1.1 on Amazon Linux

2016-05-06 Thread Jan Pokorný

On 06/05/16 16:06 +0200, Jan Pokorný wrote:
> On 06/05/16 14:52 +0100, Jim Rippon wrote:
>> Thank you guys,
>> 
>> I've taken your advice onboard and checked out 1.1.15-rc1 including the
>> patches Lars mentions as Jan suggested - I see another warning with fatal
>> warnings enabled:
>> 
>> gmake[2]: Entering directory `/home/jrippon/pacemaker/lib/cluster'
>>   CC   election.lo
>>   CC   cluster.lo
>>   CC   membership.lo
>>   CC   cpg.lo
>>   CC   legacy.lo
>> legacy.c:55:13: error: 'valid_cman_name' defined but not used
>> [-Werror=unused-function]
>>  static bool valid_cman_name(const char *name, uint32_t nodeid)
>>  ^
>> cc1: all warnings being treated as errors
>> gmake[2]: *** [legacy.lo] Error 1
>> gmake[2]: Leaving directory `/home/jrippon/pacemaker/lib/cluster'
>> gmake[1]: *** [all-recursive] Error 1
>> gmake[1]: Leaving directory `/home/jrippon/pacemaker/lib'
>> make: *** [core] Error 1
>> 
>> This is, of course, only a warning with fatal warnings disabled as Jan
>> suggested - I can build successfully with that function commented without
>> other errors.
>> 
>> I'm happy to trial 1.1.15-rc1 built with fatal warnings disabled for now -
>> my C foo isn't good enough to debug the problem further, but if I can be of
>> any help getting that warning fixed for rc2 I'd be happy to assist :)
> 
> http://clusterlabs.org/pipermail/users/2016-May/002902.html

Damn clipboards:
https://github.com/ClusterLabs/pacemaker/pull/994

Happy Friday!

-- 
Jan (Poki)


pgpvqrvTuG2Aq.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] help compiling pacemaker 1.1 on Amazon Linux

2016-05-06 Thread Jan Pokorný

On 06/05/16 14:52 +0100, Jim Rippon wrote:
> Thank you guys,
> 
> I've taken your advice onboard and checked out 1.1.15-rc1 including the
> patches Lars mentions as Jan suggested - I see another warning with fatal
> warnings enabled:
> 
> gmake[2]: Entering directory `/home/jrippon/pacemaker/lib/cluster'
>   CC   election.lo
>   CC   cluster.lo
>   CC   membership.lo
>   CC   cpg.lo
>   CC   legacy.lo
> legacy.c:55:13: error: 'valid_cman_name' defined but not used
> [-Werror=unused-function]
>  static bool valid_cman_name(const char *name, uint32_t nodeid)
>  ^
> cc1: all warnings being treated as errors
> gmake[2]: *** [legacy.lo] Error 1
> gmake[2]: Leaving directory `/home/jrippon/pacemaker/lib/cluster'
> gmake[1]: *** [all-recursive] Error 1
> gmake[1]: Leaving directory `/home/jrippon/pacemaker/lib'
> make: *** [core] Error 1
> 
> This is, of course, only a warning with fatal warnings disabled as Jan
> suggested - I can build successfully with that function commented without
> other errors.
> 
> I'm happy to trial 1.1.15-rc1 built with fatal warnings disabled for now -
> my C foo isn't good enough to debug the problem further, but if I can be of
> any help getting that warning fixed for rc2 I'd be happy to assist :)

http://clusterlabs.org/pipermail/users/2016-May/002902.html

-- 
Jan (Poki)


pgpWga3i65a2g.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] help compiling pacemaker 1.1 on Amazon Linux

2016-05-06 Thread Jan Pokorný

On 06/05/16 13:31 +0200, Lars Ellenberg wrote:
> On Fri, May 06, 2016 at 11:51:25AM +0100, Jim Rippon wrote:
>> I'm new to the list so apologies if I'm way off base, but I wonder
>> if someone can help me please?
>> 
>> I'm looking to build private RPMs for pacemaker and any dependencies
>> for Amazon Linux because I have as yet been unable to get the CentOS
>> binaries to install successfully.
>> 
>> I've got libqb and libqb-devel RPMs built successfully (also not
>> available in amazon linux repos) and installed on my build instance,
>> but I am getting errors when making pacemaker from tag
>> Pacemaker-1.1.14
>> 
>> I am doing the following:
>> 
>>  git checkout --force Pacemaker-1.1.14
>>  git clean -fdx
>>  ./autogen.sh
>>  ./configure # with options suggested by autogen.sh
>>  make
>> 
>> I then receive the following errors from lib/common and the make fails:
>> 
>>  In file included from ../../include/crm_internal.h:33:0,
>>  from ipc.c:19:
>>  ipc.c: In function 'crm_ipcs_flush_events':
>>  ../../include/crm/common/logging.h:140:23: error: format '%d'
>> expects argument of type 'int', but argument 10 has type 'ssize_t'
>> [-Werror=format=]
> 
> 
> That's "just" a format error about ssize_t != int.
> See also
> https://github.com/ClusterLabs/pacemaker/commit/fc87717
> where I already fixed this (and other) format errors.
> 
> Of course you could also drop the -Werror,
> and hope that most of the time, for your platform,
> gcc will do something useful still.

This would be done with passing "--disable-fatal-warnings" to
./configure invocation.

Note that such build failures will also go away once you try
with libqb <= v1.0rc1, i.e., anything that doesn't include
https://github.com/ClusterLabs/libqb/pull/175 (736e2c8).

But both will just fool yourself into thinking that Pacemaker
always passes logging parameters per the format strings (and
hence segfaults due to such mishandlings are not possible),
which is not the case until the changeset Lars referenced.

So actually using recently released Pacemaker-1.1.15-rc1
(which includes that) might be a better idea.

Hope this helps.

-- 
Jan (Poki)


pgpdt1aL4TgKG.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Unable to run Pacemaker: pcmk_child_exit

2016-05-06 Thread Jan Pokorný

On 06/05/16 16:59 +0530, Nikhil Utane wrote:
> 
> [...]
> 
>> On 05/06/2016 12:40 PM, Nikhil Utane wrote:
>>> As I am cross-compiling pacemaker on a build machine and later moving
>>> the binaries to the target, few binaries were missing. After fixing
>>> that and bunch of other errors/warning, I am able to get pacemaker
>>> started though not completely running fine.

> As I mentioned, I am cross-compiling and copying the relevant files
> on target platform.

I am afraid you are doing the "install" step of deploying from sources
across the machines utterly wrong.

> In one of the earlier run pacemaker cribbed out not finding
> /usr/share/pacemaker/pacemaker-1.0.rng.

What more to expect if you believe you can do with moving binaries
and getting relevant files OK by hand.  That doesn't really scale
and is error-prone, leading to more time spent on guesstimating
authoritative installation recipe that's already there (see below).

> I found this file under xml folder in the build folder, so I copied all the
> files under xml folder onto the target.
> Did that screw it up?
> 
> This is the content of the folder:
> [root@airv_cu pacemaker]# ls /usr/share/pacemaker/
> Makefile  constraints-2.1.rng   nodes-1.0.rng
> pacemaker-2.1.rng rule.rng
> Makefile.am   constraints-2.2.rng   nodes-1.2.rng
> pacemaker-2.2.rng score.rng
> Makefile.in   constraints-2.3.rng   nodes-1.3.rng
> pacemaker-2.3.rng status-1.0.rng
> Readme.md constraints-next.rng  nvset-1.3.rng
> pacemaker-2.4.rng tags-1.3.rng
> acls-1.2.rng  context-of.xslnvset.rng
> pacemaker-next.rngupgrade-1.3.xsl
> acls-2.0.rng  crm-transitional.dtd  ocf-meta2man.xsl
>  pacemaker.rng upgrade06.xsl
> best-match.sh crm.dtd   options-1.0.rng
> regression.core.shversions.rng
> cib-1.0.rng   crm.xsl   pacemaker-1.0.rng
> regression.sh
> cib-1.2.rng   crm_mon.rng   pacemaker-1.2.rng
> resources-1.0.rng
> constraints-1.0.rng   fencing-1.2.rng   pacemaker-1.3.rng
> resources-1.2.rng
> constraints-1.2.rng   fencing-2.4.rng   pacemaker-2.0.rng
> resources-1.3.rng

Now, you got overapproximation of what you really need
(e.g., context-of.xsl and best-match.sh are just helpers for developers
and make sense only from within the source tree, just as Makefile etc.
does), which is what you want to avoid, especially in case of the
embedded board.

So now, what you should do instead is along these lines:

$ mkdir pcmk-tree
$ export CFLAGS=... CC=... # what you need for cross-compilation
$ ./configure ...
$ make && make install DESTDIR=$(pwd)/pcmk-tree
$ tar czpf pcmk-tree.tar.gz pcmk-tree

and now, distribute pcmk-tree.tar.gz to you target, untar it with
something like "-k --strip-components=1" in the / dir.

Or better yet, go a proper package management route, best using
"make rpm" target (you'll have to edit pacemaker.spec or RPM macros
on your system so as to pass the cross-compilation flags across)
and then just install the package at the target if that's doable
in your environment.

Hope this helps.

-- 
Jan (Poki)

pgpS_ePtMrrML.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] libqb 0.17.1 - segfault at 1b8

2016-05-02 Thread Jan Pokorný

Hello Radoslaw,

On 02/05/16 11:47 -0500, Radoslaw Garbacz wrote:
> When testing pacemaker I encountered a start error, which seems to be
> related to reported libqb segmentation fault.
> - cluster started and acquired quorum
> - some nodes failed to connect to CIB, and lost membership as a result
> - restart solved the problem
> 
> Segmentation fault reports libqb library in version 0.17.1, a standard
> package provided for CentOS.6.

Chances are that you are running into this nasty bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1114852

> Please let me know if the problem is known, and if  there is a remedy (e.g.
> using the latest libqb).

Try libqb >= 0.17.2.

[...]

> Logs from /var/log/messages:
> 
> Apr 22 15:46:41 (...) pacemakerd[90]:   notice: Additional logging
> available in /var/log/pacemaker.log
> Apr 22 15:46:41 (...) pacemakerd[90]:   notice: Configured corosync to
> accept connections from group 498: Library error (2)

IIRC, that last line ^ was one of the symptoms.

-- 
Jan (Poki)


pgpsFynHFYy51.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Pacemaker 1.1.15 - Release Candidate 1

2016-04-26 Thread Jan Pokorný

On 22/04/16 17:54 -0500, Ken Gaillot wrote:
> ClusterLabs is happy to announce the first release candidate for
> Pacemaker version 1.1.15. Source code is available at:
> 
> https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.15-rc1

To check this release candidate out using Fedora/EPEL builds,
there's a COPR link[*] for your convenience (you can also stick with
repo file downloaded from Overview page and install the packages
in a common way):

https://copr.fedorainfracloud.org/coprs/jpokorny/pacemaker/

Fedora rawhide will be updated shortly.


[*] fedora-22-ppc64le build failed due to possibly intermittent issue
with dependencies

-- 
Jan (Poki)


pgpvqPVK5vEoi.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] getting "Totem is unable to form a cluster" error

2016-04-12 Thread Jan Pokorný

On 12/04/16 15:03 +0200, Lars Ellenberg wrote:
> On Mon, Apr 11, 2016 at 08:23:03AM +0200, Jan Friesse wrote:
> 
> ...
 bond0:  mtu 1500 qdisc noqueue state UP
   link/ether 74:e6:e2:73:e5:61 brd ff:ff:ff:ff:ff:ff
   inet 10.150.20.91/24 brd 10.150.20.55 scope global bond0
   inet 192.168.150.12/22 brd 192.168.151.255 scope global bond0:cluster
   inet6 fe80::76e6:e2ff:fe73:e561/64 scope link
  valid_lft forever preferred_lft forever
>>> 
>>> This is ifconfig output? I'm just wondering how you were able to set
>>> two ipv4 addresses (in this format, I would expect another interface
>>> like bond0:1 or nothing at all)?
> ...
> 
> No, it is "ip addr show" output.
> 
>> RHEL 6:
>> 
>> # tunctl -p
>> Set 'tap0' persistent and owned by uid 0
>> 
>> # ip addr add 192.168.7.1/24 dev tap0
>> # ip addr add 192.168.8.1/24 dev tap0
>> # ifconfig tap0
>> tap0  Link encap:Ethernet  HWaddr 22:95:B1:85:67:3F
>>   inet addr:192.168.7.1  Bcast:0.0.0.0  Mask:255.255.255.0
>>   BROADCAST MULTICAST  MTU:1500  Metric:1
>>   RX packets:0 errors:0 dropped:0 overruns:0 frame:0
>>   TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>   collisions:0 txqueuelen:500
>>   RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
> 
> 
> # ip addr add 192.168.7.1/24 dev tap0
> # ip addr add 192.168.8.1/24 dev tap0 label tap0:jan
> # ip addr show dev tap0
> 
> And as long as you actually use those "label"s,
> you then can even see these with "ifconfig tap0:jan".

Further reading: http://inai.de/2008/02/19

-- 
Jan (Poki)


pgpjWSaAahUMz.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Freezing/Unfreezing in Pacemaker ?

2016-04-08 Thread Jan Pokorný

On 07/04/16 09:12 -0500, Ken Gaillot wrote:
> On 04/07/2016 06:40 AM, jaspal singla wrote:
>> As we have clusvcadm -U  and clusvcadm -Z 
>>  to freeze and unfreeze resource in CMAN. Would really appreciate if
>> someone please give some pointers for freezing/unfreezing a resource in
>> Pacemaker (pcs) as well.
> 
> The equivalent in pacemaker is "managed" and "unmanaged" resources.
> 
> The usage depends on what tools you are using. For pcs, it's "pcs
> resource unmanage " to freeze, and "manage" to unfreeze.
> At a lower level, it's setting the is-managed meta-attribute of the
> resource.

Depending on the context, such as when you don't want the resource
manager just to stop taking care about the resource/group, but rather
want it to be kept down (preventing its comeback), you can also
disable/enable instead of freeze/unfreeze.

Admittedly, this semantic difference is somewhat lost in my artifical
cluster properties taxonomy mapped to both RGManager and Pacemaker
approach:

https://pagure.io/clufter/blob/master/f/__root__/doc/rgmanager-pacemaker/03.groups.txt#_219

> It's also possible to set the maintenance-mode cluster property to
> "freeze" all resources.

(Hmm, this is not covered from this perspective well either:
https://pagure.io/clufter/blob/master/f/__root__/doc/rgmanager-pacemaker/01.cluster.txt)

-- 
Jan (Poki)


pgpumlVFTMVoW.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] [ClusterLabsu] [patch][crmsh] Rework function next_nodeid.

2016-04-06 Thread Jan Pokorný

Andrei,

On 06/04/16 14:39 +0300, Andrei Maruha wrote:
> Attached patch contains a little bit reworked function next_nodeid> 
> 
> [...]

there are two better aligned channels to propose patches (ordered
by preference, at least judging based on
https://github.com/ClusterLabs/crmsh/pulls?q=is%3Apr+is%3Aclosed):

1. pull request against https://github.com/ClusterLabs/crmsh

2. patch (per git conventions, which is abided here) sent to
   developers@c.o ML, with appropriately labeled topic so that
   the respective upstream folks can hook on easily (in this case,
   the prefix should contain "crmsh"; see how I modified the topic,
   along with cross-posting to the correct list)

There is a reason behind having two mailing lists, different audience
being the most prominent one (true, devels will likely read both,
but the rest of "users" would be better off without such a traffic)

Thanks for understanding.

-- Jan

> From 56d99aa764abb2af8d638425b10a1e493d935e4b Mon Sep 17 00:00:00 2001
> From: Andrei Maruha 
> Date: Wed, 6 Apr 2016 12:33:27 +0300
> Subject: low: corosync: Don't take next node id based on max value, if some
>  smaller node id is free.
> 
> Do not assign node id equals to 'maxid + 1' in case if some node was
> removed and free node id can be reused.
> 
> diff --git a/crmsh/corosync.py b/crmsh/corosync.py
> index e9950b8..6401f52 100644
> --- a/crmsh/corosync.py
> +++ b/crmsh/corosync.py
> @@ -327,11 +327,16 @@ def diff_configuration(nodes, checksum=False):
>  utils.remote_diff(local_path, nodes)
>  
>  
> -def next_nodeid(parser):
> +def get_free_nodeid(parser):
>  ids = parser.get_all('nodelist.node.nodeid')
>  if not ids:
>  return 1
> -return max([int(i) for i in ids]) + 1
> +ids = [int(i) for i in ids]
> +max_id = max(ids) + 1
> +for i in xrange(1, max_id):
> +if i not in ids:
> +return i
> +return max_id
>  
>  
>  def get_ip(node):
> @@ -399,7 +404,7 @@ def add_node(addr, name=None):
>  p = Parser(f)
>  
>  node_addr = addr
> -node_id = next_nodeid(p)
> +node_id = get_free_nodeid(p)
>  node_name = name
>  node_value = (make_value('nodelist.node.ring0_addr', node_addr) +
>make_value('nodelist.node.nodeid', str(node_id)))
> diff --git a/test/unittests/test_corosync.py b/test/unittests/test_corosync.py
> index db8dd8c..d2a25b6 100644
> --- a/test/unittests/test_corosync.py
> +++ b/test/unittests/test_corosync.py
> @@ -5,6 +5,7 @@
>  
>  import os
>  import unittest
> +import mock
>  from crmsh import corosync
>  from crmsh.corosync import Parser, make_section, make_value
>  
> @@ -67,7 +68,7 @@ class TestCorosyncParser(unittest.TestCase):
>  p.add('nodelist',
>make_section('nodelist.node',
> make_value('nodelist.node.ring0_addr', 
> '10.10.10.10') +
> -   make_value('nodelist.node.nodeid', 
> str(corosync.next_nodeid(p)
> +   make_value('nodelist.node.nodeid', 
> str(corosync.get_free_nodeid(p)
>  _valid(p)
>  self.assertEqual(p.count('nodelist.node'), 6)
>  self.assertEqual(p.get_all('nodelist.node.nodeid'),
> @@ -75,11 +76,11 @@ class TestCorosyncParser(unittest.TestCase):
>  
>  def test_add_node_no_nodelist(self):
>  "test checks that if there is no nodelist, no node is added"
> -from crmsh.corosync import make_section, make_value, next_nodeid
> +from crmsh.corosync import make_section, make_value, get_free_nodeid
>  
>  p = Parser(F1)
>  _valid(p)
> -nid = next_nodeid(p)
> +nid = get_free_nodeid(p)
>  self.assertEqual(p.count('nodelist.node'), nid - 1)
>  p.add('nodelist',
>make_section('nodelist.node',
> @@ -89,11 +90,11 @@ class TestCorosyncParser(unittest.TestCase):
>  self.assertEqual(p.count('nodelist.node'), nid - 1)
>  
>  def test_add_node_nodelist(self):
> -from crmsh.corosync import make_section, make_value, next_nodeid
> +from crmsh.corosync import make_section, make_value, get_free_nodeid
>  
>  p = Parser(F2)
>  _valid(p)
> -nid = next_nodeid(p)
> +nid = get_free_nodeid(p)
>  c = p.count('nodelist.node')
>  p.add('nodelist',
>make_section('nodelist.node',
> @@ -101,7 +102,7 @@ class TestCorosyncParser(unittest.TestCase):
> make_value('nodelist.node.nodeid', str(nid
>  _valid(p)
>  self.assertEqual(p.count('nodelist.node'), c + 1)
> -self.assertEqual(next_nodeid(p), nid + 1)
> +self.assertEqual(get_free_nodeid(p), nid + 1)
>  
>  def test_remove_node(self):
>  p = Parser(F2)
> @@ -118,5 +119,14 @@ class TestCorosyncParser(unittest.TestCase):
>  _valid(p)
>  self.assertEqual(p.count('service.ver'), 1)
>  
> +def test_get_free_nodeid(self):
> +mock_parser = mo

Re: [ClusterLabs] Corosync do not send traffic

2016-04-04 Thread Jan Pokorný

On 30/03/16 17:09 +0200, Roberto Munoz Gomez wrote:
> I am trying to migrate the bindnetaddr option from corosync.conf to
> cluster.conf but couldn't make it work. I want to use this interface for
> corosync traffic.
> 
> inet addr:10.76.125.236  Bcast:10.76.125.239  Mask:255.255.255.240

Looking at cluster.rng schema (/var/lib/cluster/cluster.rng or the
respective counterpart tracked statically in the repository[1]),
you'll want to configure what can be described with XPath
"/cluster/totem/interface/@bindnetaddr", for instance using ccs:

# ccs --sync --exp interface 'cluster:totem' bindnetaddr=10.76.125.224
(value obtained with: ipcalc -n 10.76.125.236 255.255.255.240)

You will likely have to restart the cluster software for this change
to be taken into account.

I haven't tested this will work, it also depends on if you have
pre-existing "interface" element in that place.

> In man cluster.conf says the last octect must be 0.

Here, you are likely confusing cluster.conf with corosync.conf.
Or perhaps something entirely else, as even that man page states
that network address should be used (last octet zeroed is only
for a specific case of /24 netmask, i.e., 255.255.255.0).

> But despite ccs_config validate says it is ok, the cman_tool fails.

ccs_config is a very weak form of a validation and only performs
superficial data type checking, if at all (most of the cases).
It's more a syntactical sanity check + "only defined resource/fence
agents referred?" detector.  Keep that in mind.

[1]
https://git.fedorahosted.org/cgit/cluster.git/tree/config/tools/xml/cluster.rng.in.head?h=cluster-3.2.0#n310

-- 
Jan (Poki)

pgparEuRQCEJo.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] [Announce] libqb 1.0 release

2016-04-04 Thread Jan Pokorný

On 01/04/16 13:59 +0100, Christine Caulfield wrote:
> I am very pleased to announce the 1.0 release of libqb
> 
> [...]
> 
> The current release tarball is here:
> https://github.com/ClusterLabs/libqb/releases/download/v1.0/libqb-1.0.tar.gz

One should preferably start at
https://github.com/ClusterLabs/libqb/releases/tag/v1.0
so as to observe that also XZ format tarball is available,
plus the signed checksums for both:

$ gpg --verify-files libqb-1.0.sha256.asc
$ sha256sum -c libqb-1.0.sha256

As a reminder, libqb releases used to be (primarily) located at
https://fedorahosted.org/releases/q/u/quarterback/
but the current stance is that solely GitHub releases/downloads
section of the project serves this purpose
(https://github.com/ClusterLabs/libqb/issues/151#issuecomment-138277436).
Consequently, v1.0 distribution files are missing at the former place.

Chrissie, can you provide a definitive answer this is indeed intended?

For developers employing libqb: there's now a stable URL for the
documentation generated for the newest (pre)release at the moment:
https://clusterlabs.github.io/libqb/CURRENT/doxygen/

Also, as the tradition goes, there are COPR builds for Fedora/EPEL:
https://copr.fedorainfracloud.org/coprs/jpokorny/libqb/build/173280/
(this time followed by proper releases for Fedora that should be
available sometime next week [unless there is a blocking issue]
or substantially earlier for F23+F24 and rawhide, respectively).

-- 
Jan (Poki)

pgpdFibeIT9cM.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Coming in 1.1.15: recurring stop operations

2016-04-02 Thread Jan Pokorný

On 01/04/16 11:41 -0500, Ken Gaillot wrote:
> In current versions of Pacemaker, trying to configure a stop operation
> with a time interval results in a warning about an "invalid configuration."
> 
> A popular request from the community has been to enable this feature,
> and I am proud to announce recurring stop operations will be part of the
> future 1.1.15 release.
> 
> A common situation where a recurring stop operation is required is when
> a particular provided service is funded in the budget for only part of
> the time. Now, configuring a stop operation with interval=36h allows you
> to stop providing the service every day and a half.
> 
> Another common use case requested by users is to more evenly distribute
> the staff utilization at 24-hour NOC facilities. With a interval=8h stop
> operation, you can be sure that you will get your salaries' worth from
> every NOC shift.
> 
> Lastly, some users have requested sysadmin training exercises. With this
> new feature, it is possible to use rules to apply the interval only
> during conditions of your choosing. For example, you can set a 2-hour
> stop interval to apply only during full moons that occur on a Friday.
> That will give a thorough disaster training workout to your new sysadmins.
> -- 
> Ken Gaillot 
> 
> P.S. Please check the date of this post before replying. ;)

The disclaimer might came later on...

I was going to suggest that high unavailability is not that easy
if you don't have an access (physical or remote) to power switches
(that's why fencing devices are even more vital for this use case!
RFC 3251 hasn't made it to production yet, sadly) or not enough
administrative privileges.  You can still try DoS-ing the machine
as a boundary-line resort to maximize the downtime (other means
are kept as an exercise for the criminal circles).

-- 
Jan (Poki)


pgpQ93x8qNSnz.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antwort: Re: Antwort: Re: pacemakerd: undefined symbol: crm_procfs_process_info

2016-03-24 Thread Jan Pokorný

On 24/03/16 14:38 +0100, philipp.achmuel...@arz.at wrote:
> Jan Pokorný  schrieb am 24.03.2016 12:48:44:
> 
>> Von: Jan Pokorný 
>> An: Cluster Labs - All topics related to open-source clustering 
>> welcomed 
>> Datum: 24.03.2016 12:50
>> Betreff: Re: [ClusterLabs] Antwort: Re: pacemakerd: undefined 
>> symbol: crm_procfs_process_info
>> 
>> On 24/03/16 08:44 +0100, philipp.achmuel...@arz.at wrote:
>>> Jan Pokorný  schrieb am 23.03.2016 19:22:13:
>>> 
>>>> Von: Jan Pokorný 
>>>> An: users@clusterlabs.org
>>>> Datum: 23.03.2016 19:23
>>>> Betreff: Re: [ClusterLabs] pacemakerd: undefined symbol: 
>>>> crm_procfs_process_info
>>>> 
>>>> On 23/03/16 18:40 +0100, philipp.achmuel...@arz.at wrote:
>>>>> $ sudo pacemakerd -V
>>>>> pacemakerd: symbol lookup error: pacemakerd: undefined symbol: 
>>>>> crm_procfs_process_info
>>>> 
>>>> For a start, please provide output of:
>>>> 
>>>> ls -l $(rpm -E %{_libdir})/libcrmcommon.so*
>>>> ldd $(rpm -E %{_sbindir})/pacemakerd
>>>> 
>>>> Adjust the path per your actual installation, also depending
>>>> how you got the pacemaker installed: from RPMs (assumed),
>>>> by starting with the sources and compiling by hand, etc.
>>> 
>>> i got sources from github and compiled by hand. 
>>> 
>>>> Note that if RPMs were indeed used, you should rather make sure
>>>> that the same version of the packages arising from single
>>>> SRPM is installed (pacemaker, pacemaker-libs, ...).
>>> 
>>> on that hint - i removed all old source directories and startet new 
>>> download/compilation today.
>>> after that everything works like expected - may i messed up some old 
> files 
>>> in working directory.
>> 
>> Do you use "make install" as part of your procedure?
>> Where I was headed is that either "ldconfig" invocation might be
>> missing once the libraries are at place, or that /usr/lib* remnants
>> take precedence over /usr/local/lib* files in run-time linking
>> (provided that use use default installation prefix).
> 
> Yes, i use "make install" with default parameters to install to my 
> environment. still not sure what happened yesterday - may some file 
> permission issues during sync files in my environments.

Additional syncing step might add this sort of fragility.
Anyway, please keep an eye on this should it ever be reproduced.
It's hard to claim native build/install arrangement is flawless
in any case.

-- 
Jan (Poki)


pgphysUMtsqqr.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Antwort: Re: pacemakerd: undefined symbol: crm_procfs_process_info

2016-03-24 Thread Jan Pokorný

On 24/03/16 08:44 +0100, philipp.achmuel...@arz.at wrote:
> Jan Pokorný  schrieb am 23.03.2016 19:22:13:
> 
>> Von: Jan Pokorný 
>> An: users@clusterlabs.org
>> Datum: 23.03.2016 19:23
>> Betreff: Re: [ClusterLabs] pacemakerd: undefined symbol: 
>> crm_procfs_process_info
>> 
>> On 23/03/16 18:40 +0100, philipp.achmuel...@arz.at wrote:
>>> $ sudo pacemakerd -V
>>> pacemakerd: symbol lookup error: pacemakerd: undefined symbol: 
>>> crm_procfs_process_info
>> 
>> For a start, please provide output of:
>> 
>> ls -l $(rpm -E %{_libdir})/libcrmcommon.so*
>> ldd $(rpm -E %{_sbindir})/pacemakerd
>> 
>> Adjust the path per your actual installation, also depending
>> how you got the pacemaker installed: from RPMs (assumed),
>> by starting with the sources and compiling by hand, etc.
> 
> i got sources from github and compiled by hand. 
> 
>> Note that if RPMs were indeed used, you should rather make sure
>> that the same version of the packages arising from single
>> SRPM is installed (pacemaker, pacemaker-libs, ...).
> 
> on that hint - i removed all old source directories and startet new 
> download/compilation today.
> after that everything works like expected - may i messed up some old files 
> in working directory.

Do you use "make install" as part of your procedure?
Where I was headed is that either "ldconfig" invocation might be
missing once the libraries are at place, or that /usr/lib* remnants
take precedence over /usr/local/lib* files in run-time linking
(provided that use use default installation prefix).

-- 
Jan (Poki)


pgpw8zpCTd5Wm.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] pacemakerd: undefined symbol: crm_procfs_process_info

2016-03-23 Thread Jan Pokorný

On 23/03/16 18:40 +0100, philipp.achmuel...@arz.at wrote:
> $ sudo pacemakerd -V
> pacemakerd: symbol lookup error: pacemakerd: undefined symbol: 
> crm_procfs_process_info

For a start, please provide output of:

ls -l $(rpm -E %{_libdir})/libcrmcommon.so*
ldd $(rpm -E %{_sbindir})/pacemakerd

Adjust the path per your actual installation, also depending
how you got the pacemaker installed: from RPMs (assumed),
by starting with the sources and compiling by hand, etc.

Note that if RPMs were indeed used, you should rather make sure
that the same version of the packages arising from single
SRPM is installed (pacemaker, pacemaker-libs, ...).

-- 
Jan (Poki)

pgpWx1j5ONxik.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Error creating a fence mechanism

2016-03-22 Thread Jan Pokorný

On 22/03/16 09:31 +0100, Tomas Jelinek wrote:
> Dne 21.3.2016 v 21:17 Douglas Restrepo napsal(a):
>> The problem what Im having is that I don't have a physical PDU, so I
>> have to simulate one.

You don't have to, there are fencing mechanism beyond power switching
(fence_ifmib for controlling data switches, fence_scsi and fence_mpath
for restricting data access to shared storage, etc.).  Admittedly,
their eligibility depends heavily on the particular environment.

>> So to remove this error and go back to the previous status I executed
>> the command
>> 
>> pcs resource cleanup fence_node_01
>> 
>> but now, when I execute the command
>> pcs stonith list
>> 
>> Im getting the error
>> Error: unable to locate command: /usr/sbin/fence_manual
>> 
>> Can someone guide me with this process?
>> I don´t know why im getting this error configuring the fence mechanism.

Unfortunately, pcs oversimplifies the execution failure.
My guess is that you are missing execution permissions:

# chmod +x /usr/sbin/fence_manual

If the problem persists, please provide output of

# ls -l /usr/sbin/fence_manual

-- 
Jan (Poki)


pgpQuj424YN6k.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] fence_scsi no such device

2016-03-21 Thread Jan Pokorný

On 21/03/16 09:46 -0500, Ken Gaillot wrote:
> You need more attributes, such as "devices" to specify which SCSI
> devices to cut off, and either "key" or "nodename" to specify the
> node key for SCSI reservations.

Hmm, I keep lamenting that by extending agents metadata with inline
RelaxNG grammar to express co-occurrence/mutual exclusion of
particular parameters and/or its datatype in detail, and by using
that information at the configuration front-ends, we would push
the overall user experience to the new level
(https://bugzilla.redhat.com/show_bug.cgi?id=1281463#c4).

-- 
Jan (Poki)

pgpF0e6JAZcEK.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] [Announce] clufter-0.56.2 released

2016-03-19 Thread Jan Pokorný

I am happy to announce that clufter-0.56.2, a tool/library for
transforming/analyzing cluster configuration formats, has been
released and published (incl. signature using my 60BCBB4F5CD7F9EF key,
expiration of which was prolonged just a few days back so you may
want to consult key servers first):


or alternative (original) location:



The test suite for this version is also provided:

or alternatively:


Changelog highlights (also available as a tag message):
- this is a bug fix release with one minor enhancement
- bug fixes:
  . with {cib,pcs}2pcscmd* commands, clufter no longer chokes on
validation failures (unless --nocheck provided) due to source CIB
not containing "status" section (which is normally the case with
implicit input located in /var/lib/pacemaker/cib/cib.xml);
now the bundled, compacted schemas mark this section optional
and also the recipe to distill such format from pacemaker native
schemas ensures the same assumption holds even if not pre-existed
[resolves: rhbz#1269964, comment 9]
[see also: https://github.com/ClusterLabs/pacemaker/pull/957]
  . internal representations of command + options/arguments was fixed
in several ways so as to provide correct outcomes in both general
(previously, some options could be duplicated while overwriting
other options/arguments, and standalone negative numbers were
considered options) and pcs (--wait=X cannot be decoupled the same
way option parsers can usually cope with, as pcs built-in parser
treats this specifically) cases
- enhancements:
  . [cp]cs2pcscmd* commands now supports "--wait" parameter to pcs
command for starting the cluster and prefers it to static "sleep"
when possible (pcs version recent enough)
[see also: rhbz#1229822]

* * *

The public repository (notably master and next branches) is currently at

(rather than ).

Official, signed releases can be found at
 or, alternatively, at

(also beware, automatic archives by GitHub preserve a "dev structure").

Natively packaged in Fedora (python-clufter, clufter-cli).

Issues & suggestions can be reported at either of (regardless if Fedora)
,

(rather than ).


Happy clustering/high-availing :)

-- 
Jan (Poki)


pgpco2EKbLp2N.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] [Announce] libqb 10.rc4 release

2016-03-19 Thread Jan Pokorný

On 17/03/16 16:37 +, Christine Caulfield wrote:
> This is a bugfix release and a potential 1.0 candidate.

Primarily serving for building some of the components in the common
cluster stack nowadays, libqb releases should likely be announced
(also) in developers ML, release candidates in particular (hence
taking the liberty to cross-post).

As usual, there are COPR builds for Fedora/EPEL for convenient tryout:
https://copr.fedorainfracloud.org/coprs/jpokorny/libqb/build/168979/
This time around, I made the builds fully self-source-hosted
(SRPM obtained with "make srpm").

> There are no actual code changes in this release, most of the patches
> are to the build system. Thanks to Jan Pokorný for, er, all of them.
> I've bumped the library soname to 0.18.0 which should really have
> happened last time.
> 
> Changes from 1.0rc3
> 
> build: fix tests/_syslog_override.h not being distributed
> build: enable syslog tests when configuring in spec
> build: do not install syslog_override for the RPM packaging
> build: update library soname to 0.18.0
> build: do not try to second-guess "distdir" Automake variable
> build: switch to XZ tarball format for {,s}rpm packaging
> CI: make sure RPM can be built all the time
> Generating the man pages definitely doesn't depend on existence of
> (possibly generated) header files that we omit anyway.
> build: drop extra qbconfig.h rule for auto_check_header self-test
> build: extra clean-local rule instead of overriding clean-generic
> build: docs: {dependent -> public}_headers + more robust obtaining
> build: tests: grab "public_headers" akin to docs precedent
> build: fix preposterous usage of $(AM_V_GEN)
> build: tests: add intermediate check-headers target
> CI: "make check" already included in "make distcheck"
> build: fix out-of-tree build broken with 0b04ed5 (#184)
> docs: enhance qb_log_ctl* description + fix doxygen warning
> docs: apply "doxygen -u" on {html,man}.dox.in, fix deprecations
> docs: {html,man}.dox.in: strip options for unused outputs
> docs: {html,man}.dox.in: unify where reasonable
> docs: make README.markdown always point to "CURRENT" docs
> build: reorder LINT_FLAGS in a more logical way
> build: make the code splint-friendly where not already
> build: better support for splint checker
> build: make splint check tolerant of existing defects

-- 
Jan (Poki)


pgp_wFHW0Neni.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Avoid HTML-only please (Was: crm_mon change in behaviour PM 1.1.12 -> 1.1.14: crm_mon -XA filters #health.* node attributes)

2016-03-04 Thread Jan Pokorný

On 03/03/16 17:07 +0100, Martin Schlegel wrote:
> Hello everybody

Welcome Martin,

> This is my first post on this mailing list and I am only using Pacemaker since
> fall 2015 ... please be gentle :-) and I will do the same.

the list would really appreciate if you could make your email client
(be it SW run on your machine or a web-based one) send plain-text
format when addressing it (mixed plain-text + HTML is fine).

For instance, see how your post looks like in the archives:
http://oss.clusterlabs.org/pipermail/users/2016-March/002398.html

Thanks for understanding.

-- 
Jan (Poki)


pgpjpUT6z4rKX.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] [Announce] libqb 10.rc3 release

2016-02-26 Thread Jan Pokorný

On 26/02/16 08:58 +0100, Jan Pokorný wrote:
> On 26/02/16 14:27 +0900, Keisuke MORI wrote:
>> As of libqb-1.0rc3, Pacemaker fails to build upon it with the gcc
>> warnings as below.
>> There was no such a problem until 1.0rc2, and it seems that the
>> changes in the pull request #175 is related.
>> 
>> https://github.com/ClusterLabs/libqb/pull/175
>> 
>> {{{
>> [root@build-centos71 pacemaker ((Pacemaker-1.1.14))]# rpm -qa | grep libqb
>> libqb-1.0rc3-1.el7.x86_64
>> libqb-devel-1.0rc3-1.el7.x86_64
>> 
>> [root@build-centos71 pacemaker ((Pacemaker-1.1.14))]# git checkout
>> Pacemaker-1.1.14
>> HEAD is now at 70404b0... Merge pull request #892 from kgaillot/1.1
>> [root@build-centos71 pacemaker ((Pacemaker-1.1.14))]# make release
>> (snip)
>> In file included from ../../include/crm_internal.h:33:0,
>>  from ipc.c:19:
>> ipc.c: In function 'crm_ipcs_flush_events':
>> ../../include/crm/common/logging.h:140:23: error: format '%d' expects
>> argument of type 'int', but argument 10 has type 'ssize_t'
>> [-Werror=format=]
>>  static struct qb_log_callsite *trace_cs = NULL; \
>>^
>> ../../include/crm/common/logging.h:254:37: note: in expansion of macro
>> 'do_crm_log_unlikely'
>>  #  define crm_trace(fmt, args...)   do_crm_log_unlikely(LOG_TRACE,
>> fmt , ##args)
>>  ^
>> [...]
> 
> Looks like that PR of mine turned out to be generally a good thing as it
> effectively discovers wrong printf-format usage compared to the actual
> arguments, exactly according to what you report :-)
> 
> Will take care of the problematic code in Pacemaker.

https://github.com/ClusterLabs/pacemaker/pull/939

-- 
Jan (Poki)


pgpaNFhZ0yryR.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] [Announce] libqb 10.rc3 release

2016-02-26 Thread Jan Pokorný

On 25/02/16 13:24 +, Christine Caulfield wrote:
> I am pleased to announce the third 1.0 release candidate release of
> libqb. Huge thanks to all those who have contributed to this release.

Once again, there are EPEL/Fedora builds for convenient testing:  
https://copr.fedorainfracloud.org/coprs/jpokorny/libqb/build/163316/

Note that upsteam spec file was used (make libqb.spec), only with
a slight adjustment of the Version & Release tags + pointing Source0 to
https://github.com/ClusterLabs/libqb/releases/download/v1.0rc3/libqb-1.0rc3.tar.xz


This time around, all users of libqb are strongly advised to try
building with this version of libqb to discover potential misuse of
printf-like format string vs. actual arguments serving for
substitution within it.  This is due to the newly introduced annotation
to give the compiler a hint about the printf-like semantics of the
arguments to particular functions:
https://github.com/ClusterLabs/libqb/pull/175

Warm thanks to Keisuke-san for pointing the possible issues arising
from this coding defensive measure, as he was first to report failure to
build Pacemaker as a direct consequence (I am fixing that right now):
http://oss.clusterlabs.org/pipermail/users/2016-February/002375.html

-- 
Jan (Poki)


pgpQS2GXKuRde.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] [Announce] libqb 10.rc3 release

2016-02-26 Thread Jan Pokorný

On 26/02/16 14:27 +0900, Keisuke MORI wrote:
> As of libqb-1.0rc3, Pacemaker fails to build upon it with the gcc
> warnings as below.
> There was no such a problem until 1.0rc2, and it seems that the
> changes in the pull request #175 is related.
> 
> https://github.com/ClusterLabs/libqb/pull/175
> 
> {{{
> [root@build-centos71 pacemaker ((Pacemaker-1.1.14))]# rpm -qa | grep libqb
> libqb-1.0rc3-1.el7.x86_64
> libqb-devel-1.0rc3-1.el7.x86_64
> 
> [root@build-centos71 pacemaker ((Pacemaker-1.1.14))]# git checkout
> Pacemaker-1.1.14
> HEAD is now at 70404b0... Merge pull request #892 from kgaillot/1.1
> [root@build-centos71 pacemaker ((Pacemaker-1.1.14))]# make release
> (snip)
> In file included from ../../include/crm_internal.h:33:0,
>  from ipc.c:19:
> ipc.c: In function 'crm_ipcs_flush_events':
> ../../include/crm/common/logging.h:140:23: error: format '%d' expects
> argument of type 'int', but argument 10 has type 'ssize_t'
> [-Werror=format=]
>  static struct qb_log_callsite *trace_cs = NULL; \
>^
> ../../include/crm/common/logging.h:254:37: note: in expansion of macro
> 'do_crm_log_unlikely'
>  #  define crm_trace(fmt, args...)   do_crm_log_unlikely(LOG_TRACE,
> fmt , ##args)
>  ^
> [...]

Looks like that PR of mine turned out to be generally a good thing as it
effectively discovers wrong printf-format usage compared to the actual
arguments, exactly according to what you report :-)

Will take care of the problematic code in Pacemaker.

-- 
Jan (Poki)


pgp2pRrOiPRpq.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] pacemakerd --version display bug?

2016-02-24 Thread Jan Pokorný

On 23/02/16 22:07 +, Jeremy Matthews wrote:
> I recently installed pacemaker on a system and found it interesting
> that pacemaker -version and yum info pacemaker give different
> versions, 1.1.11 and 1.1.12, respectively:
> 
> 
> [root@g5se-f5c56a rc3.d]# pacemakerd --version
> Pacemaker 1.1.11
> Written by Andrew Beekhof
> 
> [root@g5se-f5c56a rc3.d]# yum info pacemaker
> [...]
> Version : 1.1.12
> Release : 8.el6_7.2
> [...]
> 
> They should be the same, right?  Maybe pacemaker -version should return 
> 1.1.12?

Explanation is rather simple and boils down to upstream vs. downstream
versioning in the edge cases as this one -- maybe I could stop at
clarifying the RPM version you mention is based on Pacemaker-1.1.12-rc3,
but here we go a little bit deeper:

- upstream view: new version is not pronounced until real release is
  cut, i.e., what is tagged in due course is not a release candidate (rc);
  the procedure to get things ready for that includes bumping version in
  version.m4 source file, which is exactly what determines the response
  of --version for various Pacemaker binaries

- downstream view: pre-release of version X -> Version tag says X
  (info about this package being based on pre-release SHOULD be
  encoded in Release tag, but well, not always happens)


TL;DR this is an edge case involving an about-to-be-released version

-- 
Jan (Poki)


pgpNtGhq9nM1v.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Sudden stop of pacemaker functions

2016-02-17 Thread Jan Pokorný

On 17/02/16 15:15 +0200, Klechomir wrote:
> Here is the output from your command:
> 
> attrd: 609413
> cib: 609409
> corosync: 608778
> crmd: 609415
> lrmd: 609412
> pengine: 609414
> pacemakerd: 609407
> stonithd: 609411

This may mean that you are triggering this nasty bug in libqb:
https://github.com/ClusterLabs/libqb/pull/162
(fixed in libqb-0.17.2)

> Regarding using a newer version, that's what I've been thinking about, but
> I've been using this combination of corosync/pacemaker for many years on a
> different hardware and hever had similar problem.
> The main difference is that I have stonith enabled only the problematic
> cluster, but I also suspect that the node, which causes this problem may
> have some hardware issues.

Stonith/fencing should be configured at any cluster to satisfy fully
what HA clusters are for, full stop.

> BTW my last few tests with the newest corosync/pacemaker gave me very
> annoying delay, when commiting configuration changes (maybe it's a known
> problem?).

Cannot comment on this but definitely good to be aware of possible
performance regressions.

-- 
Jan (Poki)


pgpZjq8RurGU_.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Sudden stop of pacemaker functions

2016-02-17 Thread Jan Pokorný

On 17/02/16 14:10 +0200, Klechomir wrote:
> Having strange issue lately.
> I have two node cluster with some cloned resources on it.
> One of my nodes suddenly starts reporting all its resources down (some of
> them are actually running), stops logging and reminds in this this state
> forever, while still responding to crm commands.
> 
> The curious thing is that restarting corosync/pacemaker doesn't change
> anything.
> 
> Here are the last lines in the log after restart:
> 
> [...]
> Feb 17 12:55:19 [609409] CLUSTER-1cib: info:
> cib_process_replace:   Replaced 0.238.40 with 0.238.40 from CLUSTER-2
> Feb 17 12:55:21 [609413] CLUSTER-1  attrd:  warning: attrd_cib_callback:
> Update shutdown=(null) failed: No such device or address
> Feb 17 12:55:22 [609413] CLUSTER-1  attrd:  warning: attrd_cib_callback:
> Update terminate=(null) failed: No such device or address
> Feb 17 12:55:25 [609413] CLUSTER-1  attrd:  warning: attrd_cib_callback:
> Update pingd=(null) failed: No such device or address
> Feb 17 12:55:26 [609413] CLUSTER-1  attrd:  warning: attrd_cib_callback:
> Update fail-count-p_Samba_Server=(null) failed: No such device or address
> Feb 17 12:55:26 [609413] CLUSTER-1  attrd:  warning: attrd_cib_callback:
> Update master-p_Device_drbddrv1=(null) failed: No such device or address
> Feb 17 12:55:27 [609413] CLUSTER-1  attrd:  warning: attrd_cib_callback:
> Update last-failure-p_Samba_Server=(null) failed: No such device or address
> Feb 17 12:55:27 [609413] CLUSTER-1  attrd:  warning: attrd_cib_callback:
> Update probe_complete=(null) failed: No such device or address
> 
> After that the logging on the problematic node stops.

Note sure I follow, what does the following command produce:

for i in attrd cib corosync crmd lrmd pengine pacemakerd stonithd; do \
echo "${i}: $(pgrep ${i})"; done

?

> Corosync is v2.1.0.26; Pacemaker v1.1.8

Definitely try a most recent version of Pacemaker; what you are using
is 3.5 years old and plentiful fixes landed since then.

-- 
Jan (Poki)


pgp2CA_kgDbwA.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Cluster resources migration from CMAN to Pacemaker

2016-02-10 Thread Jan Pokorný

On 10/02/16 19:00 +0100, Jan Pokorný wrote:
> This actually looks like a bug in the clufter's answer for the
> suggested conversion path as it, so far, assumed migration-threshold
> and friends will work without tweaking any cluster-wide options.

This might help the understandability:
https://github.com/ClusterLabs/pacemaker/pull/916

-- 
Jan (Poki)


pgpYFEGZNTubD.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Cluster resources migration from CMAN to Pacemaker

2016-02-10 Thread Jan Pokorný

On 09/02/16 15:34 +0530, jaspal singla wrote:
> Hi Jan/Digiman,

(as a matter of fact, Digimer, from Digital Mermaid :-)

> Thanks for your replies. Based on your inputs, I managed to configure these
> values and results were fine but still have some doubts for which I would
> seek your help. I also tried to dig some of issues on internet but seems
> due to lack of cman -> pacemaker documentation, I couldn't find any.

That's not exactly CMAN -> Pacemaker, better conceptual expression is
  (CMAN,rgmanager) -> (Corosync v2,Pacemaker)
or
  (CMAN,rgmanager) -> (Corosync/CMAN,Pacemaker) 
depending on what's the exact target (these expressions is what's
"clufter -h" uses to provide a hint about facilitated conversions).

And yes, it's so non-existent I determined to put some bits of
non-code knowledge to the docs accompanying clufter:
https://pagure.io/clufter/blob/master/f/__root__/doc/rgmanager-pacemaker
thus at least partially fill the vacuum (+ lay some common grounds
to talk about cluster properties in a way as implementation-agnostic
as possible <-- I am not aware of similar effort but I didn't search
extensively).

Any help with extending/refining it is welcome.

> I have configured 8 scripts under one resource as you recommended. But out
> of which 2 scripts are not being executed by cluster by cluster itself.
> When I tried to execute the same script manually, I am able to do it but
> through pacemaker command I don't.
> 
> For example:
> 
> This is the output of crm_mon command:
> 
> ###
> Last updated: Mon Feb  8 17:30:57 2016  Last change: Mon Feb  8
> 17:03:29 2016 by hacluster via crmd on ha1-103.cisco.com
> Stack: corosync
> Current DC: ha1-103.cisco.com (version 1.1.13-10.el7-44eb2dd) - partition
> with quorum
> 1 node and 10 resources configured
> 
> Online: [ ha1-103.cisco.com ]
> 
>  Resource Group: ctm_service
>  FSCheck
>  (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/FsCheckAgent.py):
>  Started ha1-103.cisco.com
>  NTW_IF
> (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/NtwIFAgent.py):  Started
> ha1-103.cisco.com
>  CTM_RSYNC
>  (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/RsyncAgent.py):  Started
> ha1-103.cisco.com
>  REPL_IF
>  (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/ODG_IFAgent.py): Started
> ha1-103.cisco.com
>  ORACLE_REPLICATOR
>  (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/ODG_ReplicatorAgent.py):
> Started ha1-103.cisco.com
>  CTM_SID
>  (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/OracleAgent.py): Started
> ha1-103.cisco.com
>  CTM_SRV
>  (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/CtmAgent.py):Stopped
>  CTM_APACHE
> (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/ApacheAgent.py): Stopped
>  Resource Group: ctm_heartbeat
>  CTM_HEARTBEAT
>  (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/HeartBeat.py):   Started
> ha1-103.cisco.com
>  Resource Group: ctm_monitoring
>  FLASHBACK
>  (lsb:../../..//cisco/PrimeOpticalServer/HA/bin/FlashBackMonitor.py):
>  Started ha1-103.cisco.com
> 
> Failed Actions:
> * CTM_SRV_start_0 on ha1-103.cisco.com 'unknown error' (1): call=577,
> status=complete, exitreason='none',
> last-rc-change='Mon Feb  8 17:12:33 2016', queued=0ms, exec=74ms
> 
> #
> 
> 
> CTM_SRV && CTM_APACHE are in stopped state. These services are not being
> executed by cluster OR it is being failed somehow by cluster, not sure
> why?  When I manually execute CTM_SRV script, the script gets executed
> without issues.
> 
> -> For manually execution of this script I ran the below command:
> 
> # /cisco/PrimeOpticalServer/HA/bin/OracleAgent.py status
> 
> Output:
> 
> _
> 2016-02-08 17:48:41,888 INFO MainThread CtmAgent
> =
> Executing preliminary checks...
>  Check Oracle and Listener availability
>   => Oracle and listener are up.
>  Migration check
>   => Migration check completed successfully.
>  Check the status of the DB archivelog
>   => DB archivelog check completed successfully.
>  Check of Oracle scheduler...
>   => Check of Oracle scheduler completed successfully
>  Initializing database tables
>   => Database tables initialized successfully.
>  Install in cache the store procedure
>   => Installing store procedures completed successfully
>  Gather the oracle system stats
>   => Oracle stats completed successfully
> Preliminary checks completed.
> =
> Starting base services...
> Starting Zookeeper...
> JMX enabled by default
> Using config: /opt/CiscoTransportManagerServer/zookeeper/bin/../conf/zoo.cfg
> Starting zookeeper ... STARTED
>  Retrieving name serv

Re: [ClusterLabs] fence agent and using it with pacemaker

2016-02-10 Thread Jan Pokorný

On 10/02/16 15:20 +0100, Stanislav Kopp wrote:
> I have general, clarification question about how fence agents work
> with pacemaker (crmsh in particular). As far I understood STDIN
> arguments can be used within pacemaker resources and command line
> arguments in terminal (for testing and scripting?).

Fencing scripts from fence-agents package support both kinds of input;
Pacemaker will pass the arguments (de facto attributes/parameters of
the particular stonith resource as specified in CIB via tools like
crmsh + some fence-agents API specific parameters like "action", but
user-provided values always take precedence when configured) by piping
them into the running script, but there is no reason you could not do
the same from terminal, e.g.:

# /usr/sbin/fence_pve < I have "fence_pve" [1] agent which works fine with command line
> arguments, but not with pacemaker, it says some parameters like
> "passwd" or "login" does not exist,

Can you fully specify "it" in the previous sentence, please?
Or even better, can you mimic what Pacemaker pumps into the agent
per the example above?

There may be a bug in interaction between fence_pve implementation
and the fencing library, which does the heavy lifting behind the scenes.

> although STDIN parameters are supported [2]
> 
> [1] 
> https://github.com/ClusterLabs/fence-agents/blob/master/fence/agents/pve/fence_pve.py
> [2] https://www.mankier.com/8/fence_pve#Stdin_Parameters

-- 
Jan (Poki)


pgpS3cj3_7KA7.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] [Announce] clufter-0.56.1 released

2016-02-09 Thread Jan Pokorný

I am happy to announce that clufter-0.56.1, a tool/library for
transforming/analyzing cluster configuration formats, has been
released and published (incl. signature using my 60BCBB4F5CD7F9EF key,
expiration of which was prolonged just a few days back so you may
want to consult key servers first):


or alternative (original) location:



The test suite is the same as for 0.56.0 as nothing changed there:

or alternatively:


Changelog highlights (also available as a tag message):
- this is a bug fix release
- bug fixes:
  . with {cib,pcs}2pcscmd* commands, clufter no longer chokes on
validation failures (unless --nocheck provided) due to source CIB
file using newer "validate-with" validation version specification
than the only supported so far (pacemaker-1.2.rng) or possibly
using a syntax not compatible with that; now also 2.0, 2.3 and 2.4
versions are supported, and the specfile is ready to borrow the
schemas from the installed pacemaker on-the-fly during a build stage
[resolves: rhbz#1300014]

 * * *

The public repository (notably master and next branches) is currently at

(rather than ).

Official, signed releases can be found at
 or, alternatively, at

(also beware, automatic archives by GitHub preserve a "dev structure").

Natively packaged in Fedora (python-clufter, clufter-cli).

Issues & suggestions can be reported at either of (regardless if Fedora)
,

(rather than ).


Happy clustering/high-availing :)

-- 
Jan (Poki)


pgpEITQWBuhr6.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Fwd: dlm not starting

2016-02-05 Thread Jan Pokorný

On 05/02/16 14:22 -0500, nameless wrote:
> Hi

Hello >nameless<,

technical aspect aside, it goes without saying that engaging in
a community assumes some level of cultural and social compatibility.
Otherwise there is a danger the cluster will partition, and
that would certainly be unhelpful.

Maybe this is a misunderstanding on my side, but so far, you don't
appear compatible, maturity-wise.

Happy Friday, !

-- 
Jan (Poki)


pgpuDRk12fVfi.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] [Announce] libqb 1.0rc2 release (fixed subject)

2016-02-03 Thread Jan Pokorný

On 02/02/16 11:05 +, Christine Caulfield wrote:
> I am pleased to announce the second 1.0 release candidate release of
> libqb. Huge thanks to all those who have contributed to this release.

IIUIC, good news is that so far 1.0.0 is a drop-in replacement for
0.17.2.

For convenience, there are EPEL/Fedora builds for testing:
https://copr.fedorainfracloud.org/coprs/jpokorny/libqb/build/157747/

Specfile has already been changed akin to:
https://github.com/ClusterLabs/libqb/pull/174

EPEL5 builds omitted as they fail due to autoconf being too ancient
(i.e., not because of the changes in the mentioned PR).

-- 
Jan (Poki)


pgpidN4jJI6_f.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] [Announce] clufter-0.56.0 released

2016-02-01 Thread Jan Pokorný

I am happy to announce that clufter-0.56.0, a tool/library for
transforming/analyzing cluster configuration formats, has been
released and published (incl. signature using my 60BCBB4F5CD7F9EF key,
expiration of which was prolonged just a few days back so you may
want to consult key servers first):


or alternative (original) location:



A separate tarball containing the test suite is also available:

or alternatively:


Changelog highlights (also available as a tag message):
- this is a release delivering both bug fixes and enhancements
- bug fixes:
  . "assisted recovery" now works on systems without /dev/tty as well
as on systems for which open-modify[open+close] (final close yet
to come) of particular file won't reliably discover mtime change;
now strict open-close-modify is used instead (and mtime check in
a was-file-changed test is preceded with a file size comparison for
good measure), making intermittent failures in test runs disappear
  . clufter is now capable of handling command options as unicode
(relates to the usage as a library, original discovery thanks
to pcs) and "the magic interpolation" of the command inputs now
works at places where it was supposed to but unfortunately did not
  . ccs2pcs* commands no longer generate accidentally broken values
of attributes marked as having an ID type in the schema
[resolves: rhbz#1300050]
  . ccs2pcs* commands now translate notion of recovery/relocate recover
policy of the resource group as supported by RGManager into the
parallel expression in Pacemaker universe;  the same applies to
__independent_subtree=2 at the resource level and empty restricted
failover domain (that is referred to from existing resource group)
  . ccs2pcs* commands now propagate stop timeout of the vm original
resource agent
  . *2pcscmd* commands now support group meta attributes properly
  . *2pcscmd* commands no longer emit bogus properties of the operations
(id, name, interval) as these are position-fixed values in the
respective pcs syntax, hence not requiring explicit key=value
treatment
- new behaviour and features:
  . help screens and man pages now specify where to report bugs
(configurable at build time)
  . help screens and man pages for *2pcscmd* commands now warn
againts using --tmp-cib '' (empty string) as it means resorting
to shot-by-shot semantics, as opposed to accumulate-and-push
(desirable), and this can lead to unexpected inconsistencies
- miscellaneous
  . run-tests helper script now offers better supports for nosetests
as an alternative to unittest(2)
- and a bunch of minor fixes, sanitizations, etc. as usual

 * * *

The public repository (notably master and next branches) is currently at

(rather than ).

Official, signed releases can be found at
 or, alternatively, at

(also beware, automatic archives by GitHub preserve a "dev structure").

Natively packaged in Fedora (python-clufter, clufter-cli).

Issues & suggestions can be reported at either of (regardless if Fedora)
,

(rather than ).


Happy clustering/high-availing :)

-- 
Jan (Poki)


pgpRaZKILEs1A.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Cluster resources migration from CMAN to Pacemaker

2016-01-29 Thread Jan Pokorný

On 27/01/16 19:41 +0100, Jan Pokorný wrote:
> On 27/01/16 11:04 -0600, Ken Gaillot wrote:
>> On 01/27/2016 02:34 AM, jaspal singla wrote:
>>> 1) In CMAN, there was meta attribute - autostart=0 (This parameter disables
>>> the start of all services when RGManager starts). Is there any way for such
>>> behavior in Pacemaker?
> 
> Please be more careful about the descriptions; autostart=0 specified
> at the given resource group ("service" or "vm" tag) means just not to
> start anything contained in this very one automatically (also upon
> new resources being defined, IIUIC), definitely not "all services".
> 
> [...]
> 
>> I don't think there's any exact replacement for autostart in pacemaker.
>> Probably the closest is to set target-role=Stopped before stopping the
>> cluster, and set target-role=Started when services are desired to be
>> started.

Beside is-managed=false (as currently used in clufter), I also looked
at downright disabling "start" action, but this turned out to be a naive
approach caused by unclear documentation.

Pushing for a bit more clarity (hopefully):
https://github.com/ClusterLabs/pacemaker/pull/905

>>> 2) Please put some alternatives to exclusive=0 and __independent_subtree?
>>> what we have in Pacemaker instead of these?

(exclusive property discussed in the other subthread; as a recap,
no extra effort is needed to achieve exclusive=0, exclusive=1 is
currently a show stopper in clufter as neither approach is versatile
enough)

> For __independent_subtree, each component must be a separate pacemaker
> resource, and the constraints between them would depend on exactly what
> you were trying to accomplish. The key concepts here are ordering
> constraints, colocation constraints, kind=Mandatory/Optional (for
> ordering constraints), and ordered sets.

Current approach in clufter as of the next branch:
- __independent_subtree=1 -> do nothing special (hardly can be
 improved?)
- __independent_subtree=2 -> for that very resource, set operations
 as follows:
 monitor (interval=60s) on-fail=ignore
 stop interval=0 on-fail=stop

Groups carrying such resources are not unrolled into primitives plus
contraints, as the above might suggest (also default kind=Mandatory
for underlying order constraints should fit well).

Please holler if this is not sound.


So when put together with some other changes/fixes, current
suggested/informative sequence of pcs commands goes like this:

pcs cluster auth ha1-105.test.com
pcs cluster setup --start --name HA1-105_CLUSTER ha1-105.test.com \
  --consensus 12000 --token 1 --join 60
sleep 60
pcs cluster cib tmp-cib.xml --config
pcs -f tmp-cib.xml property set stonith-enabled=false
pcs -f tmp-cib.xml \
  resource create RESOURCE-script-FSCheck \
  lsb:../../..//data/Product/HA/bin/FsCheckAgent.py \
  op monitor interval=30s
pcs -f tmp-cib.xml \
  resource create RESOURCE-script-NTW_IF \
  lsb:../../..//data/Product/HA/bin/NtwIFAgent.py \
  op monitor interval=30s
pcs -f tmp-cib.xml \
  resource create RESOURCE-script-CTM_RSYNC \
  lsb:../../..//data/Product/HA/bin/RsyncAgent.py \
  op monitor interval=30s on-fail=ignore stop interval=0 on-fail=stop
pcs -f tmp-cib.xml \
  resource create RESOURCE-script-REPL_IF \
  lsb:../../..//data/Product/HA/bin/ODG_IFAgent.py \
  op monitor interval=30s on-fail=ignore stop interval=0 on-fail=stop
pcs -f tmp-cib.xml \
  resource create RESOURCE-script-ORACLE_REPLICATOR \
  lsb:../../..//data/Product/HA/bin/ODG_ReplicatorAgent.py \
  op monitor interval=30s on-fail=ignore stop interval=0 on-fail=stop
pcs -f tmp-cib.xml \
  resource create RESOURCE-script-CTM_SID \
  lsb:../../..//data/Product/HA/bin/OracleAgent.py \
  op monitor interval=30s
pcs -f tmp-cib.xml \
  resource create RESOURCE-script-CTM_SRV \
  lsb:../../..//data/Product/HA/bin/CtmAgent.py \
  op monitor interval=30s
pcs -f tmp-cib.xml \
  resource create RESOURCE-script-CTM_APACHE \
  lsb:../../..//data/Product/HA/bin/ApacheAgent.py \
  op monitor interval=30s
pcs -f tmp-cib.xml \
  resource create RESOURCE-script-CTM_HEARTBEAT \
  lsb:../../..//data/Product/HA/bin/HeartBeat.py \
  op monitor interval=30s
pcs -f tmp-cib.xml \
  resource create RESOURCE-script-FLASHBACK \
  lsb:../../..//data/Product/HA/bin/FlashBackMonitor.py \
  op monitor interval=30s
pcs -f tmp-cib.xml \
  resource group add SERVICE-ctm_service-GROUP RESOURCE-script-FSCheck \
  RESOURCE-script-NTW_IF RESOURCE-script-CTM_RSYNC \
  RESOURCE-script-REPL_IF RESOURCE-script-ORACLE_REPLICATOR \
  RESOURCE-script-CTM_SID RESOURCE-script-CTM_SRV \
  RESOURCE-script-CTM_APACHE
pcs -f tmp-cib.xml resource \
  meta SERVICE-ctm_service-GROUP is-managed=false
pcs -f tmp-cib.xml \
  resource group add SERVICE-ctm_heartb

Re: [ClusterLabs] Cluster resources migration from CMAN to Pacemaker

2016-01-27 Thread Jan Pokorný

On 27/01/16 11:04 -0600, Ken Gaillot wrote:
> On 01/27/2016 02:34 AM, jaspal singla wrote:
>> I have couple of concerns more to answer, please help!
> 
> I'm not familiar with rgmanager, so there may be better ways that
> hopefully someone else can suggest, but here are some ideas off the top
> of my head:
> 
>> 1) In CMAN, there was meta attribute - autostart=0 (This parameter disables
>> the start of all services when RGManager starts). Is there any way for such
>> behavior in Pacemaker?

Please be more careful about the descriptions; autostart=0 specified
at the given resource group ("service" or "vm" tag) means just not to
start anything contained in this very one automatically (also upon
new resources being defined, IIUIC), definitely not "all services".

Note that "service" in the RGManager's view != service as perceived by
common *nix user (that would be just "resource" or "script resource"
in particular there).  I know, it's confusing, hence the context and
clarity is important.

>> I tried to explore is-manged=0 but when I start the cluster through pcs
>> cluster start OR pcs cluster start --all, my all of the resources gets
>> started (even the one which has meta attribute configured as
>> is-managed=false). Any clue to achieve  such behavior?
>> 
>> What does is-manged=false do?
> 
> Any service with is-managed=false should not be started or stopped by
> the cluster. If they are already running, they should be left running;
> if they are stopped, they should be left stopped.
> 
> I'm not sure why it didn't work in your test; maybe attach the output of
> "pcs cluster cib" and "pcs status" with the setting in effect.

I was looking at this and resources/groups with is-managed=false
should not be started automatically by Pacemaker.

Requested details would help us identify the issue.

However, there is an apparent problem in case you run the suggested
recipe command-by-command without accumulating the changes before the
final configuration handover to the cluster
(pcs cib tmp.xml; pcs -f tmp.xml ... [; ...] ; pcs cib-push tmp.xml)
but performing direct changes on the live cluster instead.

Following output of "clufter ccs2pcscmd --tmp-cib '' ... cluster.conf"
is therefore strongly discouraged as it is akin to the latter approach
(would define -- and start -- the resource, which only later would
get marked as unmanaged, but it was started in the interim ... oops).

> I don't think there's any exact replacement for autostart in pacemaker.
> Probably the closest is to set target-role=Stopped before stopping the
> cluster, and set target-role=Started when services are desired to be
> started.

For groups, the same problem arises with each pcs commands committed
separately, but the target state is, at least, what was originally
intended (resource [eventually] stopped until told otherwise).
In that light, I will likely exchange is-managed=false for
target-role=Stopped (pcs resource unmanage -> disable).

>> 2) Please put some alternatives to exclusive=0 and __independent_subtree?
>> what we have in Pacemaker instead of these?
> 
> My first though for exclusive=0 would be to configure negative
> colocation constraints between that resource and all other resources.
> For example:
> 
>   pcs constraint colocation add A with B "-INFINITY"
> 
> says that A should never run on a node where B is running (B is unaffected).

First, we are talking about exclusive=1 (again, clarity is damn
important).

Yes, this is one of the approaches.  It doesn't really scale, you and
possibly Pacemaker will spend plenty of cycles to grasp what's going on
in the config, and most importantly, you will have problems managing
these wires by hand.

The other approach would be to rely on utilization: "exclusive"
resources declared to eat up all the resources of a single node.
But then, you cannot group/collocate such a resource (note that
you cannot set utilization for the whole group collectively),
so again, not a versatile solution.

--> clufter currently refuses to convert configs when it spots
this "exclusive" property.

If there's anything better please let me know.

-- 
Jan (Poki)

pgppNE59txRsH.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Cluster resources migration from CMAN to Pacemaker

2016-01-25 Thread Jan Pokorný

On 24/01/16 16:54 +0530, jaspal singla wrote:
> Thanks Digimer for letting me about the tool. I was unaware of any such
> tools!! And because of that I was able to search clufter-cli tool for the
> migration.
> 
> Thanks a lot John for explaining each and everything in detailed manner. I
> am really admired the knowledge you guys have!!
> 
> I also noticed that clufter tool is written by you :). I am very thankful
> to you as it would save the ass of millions people like me who may have had
> difficulties in migration of their legacy programs from CMAN to Pacemaker.

Glad to hear this, indeed :)

> As suggested I tried to migrate my existing cluster.conf file from CMAN to
> Pacemaker through the use of clufter. But have couple of queries going
> forward, would appreciate if you could answer these.
> 
> Please find In-line queries:

Answers ditto...

>> Date: Fri, 22 Jan 2016 21:52:17 +0100
>> From: Jan Pokorn? 
>> Subject: Re: [ClusterLabs] Cluster resources migration from CMAN to
>> Pacemaker
>> Message-ID: <20160122205217.ge28...@redhat.com>
>> 
>> yes, as Digimer mentioned, clufter is the tool you may want to look
>> at.  Do not expect fully automatic miracles from it, though.
>> It's meant to show the conversion path, but one has to be walk it
>> very carefully and make adjustments every here and there.
>> In part because there is not a large overlap between resource agents
>> of both kinds.
>> 
>> On 22/01/16 17:32 +0530, jaspal singla wrote:
>>> I desperately need some help in order to migrate my cluster configuration
>>> from CMAN (RHEL-6.5) to PACEMAKER (RHEL-7.1).
>>> 
>>> I have tried to explore a lot but couldn't find similarities configuring
>>> same resources (created in CMAN's cluster.conf file) to Pacemaker.
>>> 
>>> I'd like to share cluster.conf of RHEL-6.5 and want to achieve the same
>>> thing through Pacemaker. Any help would be greatly appreciable!!
>>> 
>>> *Cluster.conf file*
>>> 
>>> ##
>>> 
>> 
>> [reformatted configuration file below for better readability and added
>> some comment in-line]
>> 
>>> 
>>  ^^^
>>  no, this is not the way to increase config version
>> 
>> This seems to be quite frequented mistake; looks like configuration
>> tools should have strictly refrained from using this XML declaration
>> in the first place.
>> 
>>> 
>>>   
>>>   
>>> 
>>>   
>>> 
>> 
>> (I suppose that other nodes were omitted)
>> 
> 
> No, its Single-Node Cluster Geographical Redundancy Configuration.
> 
> The geographical redundancy configuration allows us to locate two Prime
> Optical instances at geographically remote sites. One server instance is
> active; the other server instance is standby. The HA agent switches to the
> standby Element Management System (EMS) instance if an unrecoverable
> failure occurs on the active EMS instance. In a single-node cluster
> geographical redundancy configuration, there are two clusters with
> different names (one on each node), each containing a server.

Ah, it's more me not being familiar with rather exotic uses cases and
I definitely include degenerate single-node-on-purpose case here.

>>>   
>>>   
>>>   
>>>   
>>> 
>>>   >> restricted="0"/>
>> 
>> TODO: have to check what does it mean when FOD is not saturated
>>   with any cluster node references
> 
> No worries of using FOD as I don't think, it will be in use as we have
> groups in pacemaker.

FYI, I checked the code and it seams utterly useless to define an
empty failover domain and to refer to it from the resource group.
Based on its properties, just the logged warnings may vary.
Furthermore, enabling restricted property may prevent associated
groups to start at all.

Added a warning and corrected the conversion accordingly:
https://pagure.io/clufter/92dbe66b4eebb2b935c49bd4295b96c7954451c2

>>> 
>>> 
>>>   
>> 
>> General LSB-compliance-assumed commands are currently using a path hack
>> with lsb:XYZ resource specification.  In this very case, it means
>> the result after the conversion refers to
>> "lsb:../../..//data/Product/HA/bin/FsCheckAgent.py".
>> 
>> Agreed, there should be a better way to support arbitrary locations
>> beside /etc/init.d/XYZ.
>> 
> 
> Configured resources as LSB as you suggested.
>> 
>>>

Re: [ClusterLabs] Cluster resources migration from CMAN to Pacemaker

2016-01-22 Thread Jan Pokorný

Hello,

yes, as Digimer mentioned, clufter is the tool you may want to look
at.  Do not expect fully automatic miracles from it, though.
It's meant to show the conversion path, but one has to be walk it
very carefully and make adjustments every here and there.
In part because there is not a large overlap between resource agents 
of both kinds.

On 22/01/16 17:32 +0530, jaspal singla wrote:
> I desperately need some help in order to migrate my cluster configuration
> from CMAN (RHEL-6.5) to PACEMAKER (RHEL-7.1).
> 
> I have tried to explore a lot but couldn't find similarities configuring
> same resources (created in CMAN's cluster.conf file) to Pacemaker.
> 
> I'd like to share cluster.conf of RHEL-6.5 and want to achieve the same
> thing through Pacemaker. Any help would be greatly appreciable!!
> 
> *Cluster.conf file*
> 
> ##
> 

[reformatted configuration file below for better readability and added
some comment in-line]

> 
 ^^^
 no, this is not the way to increase config version

This seems to be quite frequented mistake; looks like configuration
tools should have strictly refrained from using this XML declaration
in the first place.

> 
>   
>   
> 
>   
> 

(I suppose that other nodes were omitted)

>   
>   
>   
>   
> 
>restricted="0"/>

TODO: have to check what does it mean when FOD is not saturated
  with any cluster node references

> 
> 
>   

General LSB-compliance-assumed commands are currently using a path hack
with lsb:XYZ resource specification.  In this very case, it means
the result after the conversion refers to
"lsb:../../..//data/Product/HA/bin/FsCheckAgent.py".

Agreed, there should be a better way to support arbitrary locations
beside /etc/init.d/XYZ.

>name="ORACLE_REPLICATOR"/>
>